Re: [PATCH v3] memory: Optimize replay of guest mapping

2023-04-18 Thread Peter Xu
On Tue, Apr 18, 2023 at 11:13:57AM +0100, Peter Maydell wrote:
> On Thu, 13 Apr 2023 at 12:12, Zhenzhong Duan  wrote:
> >
> > On x86, there are two notifiers registered due to vtd-ir memory
> > region splitting the entire address space. During replay of the
> > address space for each notifier, the whole address space is
> > scanned which is unnecessary. We only need to scan the space
> > belong to notifier monitored space.
> >
> > While on x86 IOMMU memory region spans over entire address space,
> > but on some other platforms(e.g. arm mps3-an547), IOMMU memory
> > region is only a window in the whole address space. user could
> > register a notifier with arbitrary scope beyond IOMMU memory
> > region. Though in current implementation replay is only triggered
> > by VFIO and dirty page sync with notifiers derived from memory
> > region section, but this isn't guaranteed in the future.
> >
> > So, we replay the intersection part of IOMMU memory region and
> > IOMMU notifier in memory_region_iommu_replay().
> >
> > Signed-off-by: Zhenzhong Duan 
> > ---
> > v3: Fix assert failure on mps3-an547
> > v2: Add an assert per Peter
> > Tested on x86 with a net card passed to guest(kvm/tcg), ping/ssh pass.
> > Also did simple bootup test with mps3-an547
> >
> >  hw/i386/intel_iommu.c | 2 +-
> >  softmmu/memory.c  | 5 +++--
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index a62896759c78..faade7def867 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -3850,7 +3850,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
> > *iommu_mr, IOMMUNotifier *n)
> >  .domain_id = vtd_get_domain_id(s, , vtd_as->pasid),
> >  };
> >
> > -vtd_page_walk(s, , 0, ~0ULL, , vtd_as->pasid);
> > +vtd_page_walk(s, , n->start, n->end, , vtd_as->pasid);
> >  }
> >  } else {
> >  trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
> > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > index b1a6cae6f583..f7af691991de 100644
> > --- a/softmmu/memory.c
> > +++ b/softmmu/memory.c
> > @@ -1925,7 +1925,7 @@ void memory_region_iommu_replay(IOMMUMemoryRegion 
> > *iommu_mr, IOMMUNotifier *n)
> >  {
> >  MemoryRegion *mr = MEMORY_REGION(iommu_mr);
> >  IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
> > -hwaddr addr, granularity;
> > +hwaddr addr, end, granularity;
> >  IOMMUTLBEntry iotlb;
> >
> >  /* If the IOMMU has its own replay callback, override */
> > @@ -1935,8 +1935,9 @@ void memory_region_iommu_replay(IOMMUMemoryRegion 
> > *iommu_mr, IOMMUNotifier *n)
> >  }
> >
> >  granularity = memory_region_iommu_get_min_page_size(iommu_mr);
> > +end = MIN(n->end, memory_region_size(mr));
> >
> > -for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> > +for (addr = n->start; addr < end; addr += granularity) {
> >  iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx);
> >  if (iotlb.perm != IOMMU_NONE) {
> >  n->notify(n, );
> 
> 
> The documentation for the replay method of IOMMUMemoryRegionClass
> says:
>  * The default implementation of memory_region_iommu_replay() is to
>  * call the IOMMU translate method for every page in the address space
>  * with flag == IOMMU_NONE and then call the notifier if translate
>  * returns a valid mapping. If this method is implemented then it
>  * overrides the default behaviour, and must provide the full semantics
>  * of memory_region_iommu_replay(), by calling @notifier for every
>  * translation present in the IOMMU.
> 
> This commit changes the default implementation so it's no longer
> doing this for every page in the address space. If the change is
> correct, we should update the doc comment too.
> 
> Oddly, the doc comment for memory_region_iommu_replay() itself
> doesn't very clearly state what its semantics are; it could
> probably be improved.
> 
> Anyway, this change is OK for the TCG use of iommu notifiers,
> because that doesn't care about replay.

Since the notifier contains the range information I'd say the change
shouldn't affect any caller but only a pure performance difference.  Indeed
it'll be nicer the documentation can be updated too.  Thanks,

-- 
Peter Xu




Re: [PATCH 0/5] Support both Ethernet interfaces on i.MX6UL and i.MX7

2023-04-18 Thread Guenter Roeck

On 4/18/23 05:10, Peter Maydell wrote:

On Wed, 15 Mar 2023 at 14:52, Guenter Roeck  wrote:


The SOC on i.MX6UL and i.MX7 has 2 Ethernet interfaces. The PHY on each may
be connected to separate MDIO busses, or both may be connected on the same
MDIO bus using different PHY addresses. Commit 461c51ad4275 ("Add a phy-num
property to the i.MX FEC emulator") added support for specifying PHY
addresses, but it did not provide support for linking the second PHY on
a given MDIO bus to the other Ethernet interface.

To be able to support two PHY instances on a single MDIO bus, two properties
are needed: First, there needs to be a flag indicating if the MDIO bus on
a given Ethernet interface is connected. If not, attempts to read from this
bus must always return 0x. Implement this property as phy-connected.
Second, if the MDIO bus on an interface is active, it needs a link to the
consumer interface to be able to provide PHY access for it. Implement this
property as phy-consumer.


So I was having a look at this to see if it was reasonably easy to
split out the PHY into its own device object, and I'm a bit confused.
I know basically 0 about MDIO, but wikipedia says that MDIO buses
have one master (the ethernet MAC) and potentially multiple PHYs.
However it looks like this patchset has configurations where
multiple MACs talk to the same MDIO bus. Am I confused about the
patchset, about the hardware, or about what MDIO supports?



It is quite similar to I2C, a serial interface with one master/controller
and a number of devices (PHYs) connected to it. There is a nice graphic
example at https://prodigytechno.com/mdio-management-data-input-output/.
Not sure I understand what is confusing about it. Can you explain ?

Thanks,
Guenter




[PATCH v2 00/13] virtio: add vhost-user-generic and reduce copy and paste

2023-04-18 Thread Alex Bennée
A lot of our vhost-user stubs are large chunks of boilerplate that do
(mostly) the same thing. This series attempts to fix that by defining
a new base class (vhost-user-base) which is used by a generic
vhost-user-device implementation. Then the rng, gpio and i2c
vhost-user devices become simple specialisations of the common base
defining the ID, number of queues and potentially the config handling.

In theory we could convert the rest of the vhost-user stubs but there
are complications caused by the config being split between the daemon
and QEMU. For example:

 -device vhost-user-device-pci,chardev=vus,virtio-id=8,num_vqs=3,config_size=36

works with the WIP vhost-user-scsi backend:

  https://github.com/rust-vmm/vhost-device/pull/301

but the concrete vhost-user-scsi-pci device fails because it expects
to handle config via the command line. You will see the report:

  qemu-system-aarch64: -device vhost-user-scsi-pci,chardev=vus:
  warning: vhost-user backend supports VHOST_USER_PROTOCOL_F_CONFIG
  but QEMU does not.

if you try. We could make the device a bit smarter but then we would
need to untangle the vhost_scsi_common_() logic which is shared with
the pure in kernel vhost implementation. The vhost-user-vsock stub
might be another one worth re-factoring although that has a similar
split architecture.

The overall diffstat shows a net deletion of code as well as
introducing some more documentation and moving the stubs into the
common build, further reducing redundancy.

Next Steps
--

>From Stefan's last email to the v1 posting we need:

vhost-user needs:
- A GET_DEVICE_ID message.
- A GET_CONFIG_SIZE message. Today it is assumed that the vhost-user
  frontend already knows the configuration space size.
- A protocol feature bit indicating that the device is a full VIRTIO
  device. These devices also need to implement the SET_STATUS message,
  which is rarely implemented today.

and implementing the VHOST_USER_GET_QUEUE_NUM and SET_STATUS messages
to make the generic device "self configuring".

Alex.

Alex Bennée (13):
  include: attempt to document device_class_set_props
  include/hw: document the device_class_set_parent_* fns
  hw/virtio: fix typo in VIRTIO_CONFIG_IRQ_IDX comments
  include/hw/virtio: document virtio_notify_config
  include/hw/virtio: add kerneldoc for virtio_init
  include/hw/virtio: document some more usage of notifiers
  virtio: add vhost-user-base and a generic vhost-user-device
  virtio: add PCI stub for vhost-user-device
  hw/virtio: derive vhost-user-rng from vhost-user-device
  hw/virtio: add config support to vhost-user-device
  hw/virtio: derive vhost-user-gpio from vhost-user-device
  hw/virtio: derive vhost-user-i2c from vhost-user-base
  docs/system: add a basic enumeration of vhost-user devices

 docs/system/devices/vhost-user-rng.rst |   2 +
 docs/system/devices/vhost-user.rst |  41 +++
 include/hw/qdev-core.h |  36 +++
 include/hw/virtio/vhost-user-device.h  |  46 +++
 include/hw/virtio/vhost-user-gpio.h|  23 +-
 include/hw/virtio/vhost-user-i2c.h |  18 +-
 include/hw/virtio/vhost-user-rng.h |  11 +-
 include/hw/virtio/virtio.h |  21 ++
 hw/display/vhost-user-gpu.c|   4 +-
 hw/net/virtio-net.c|   4 +-
 hw/virtio/vhost-user-device-pci.c  |  71 +
 hw/virtio/vhost-user-device.c  | 380 +++
 hw/virtio/vhost-user-fs.c  |   4 +-
 hw/virtio/vhost-user-gpio.c| 400 ++---
 hw/virtio/vhost-user-i2c.c | 255 +---
 hw/virtio/vhost-user-rng.c | 277 ++---
 hw/virtio/vhost-vsock-common.c |   4 +-
 hw/virtio/virtio-crypto.c  |   4 +-
 hw/virtio/meson.build  |  20 +-
 19 files changed, 686 insertions(+), 935 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user-device.h
 create mode 100644 hw/virtio/vhost-user-device-pci.c
 create mode 100644 hw/virtio/vhost-user-device.c

-- 
2.39.2




[PATCH v2 04/13] include/hw/virtio: document virtio_notify_config

2023-04-18 Thread Alex Bennée
Signed-off-by: Alex Bennée 
---
 include/hw/virtio/virtio.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index f236e94ca6..22ec098462 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -274,6 +274,13 @@ extern const VMStateInfo virtio_vmstate_info;
 
 int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id);
 
+/**
+ * virtio_notify_config() - signal a change to device config
+ * @vdev: the virtio device
+ *
+ * Assuming the virtio device is up (VIRTIO_CONFIG_S_DRIVER_OK) this
+ * will trigger a guest interrupt and update the config version.
+ */
 void virtio_notify_config(VirtIODevice *vdev);
 
 bool virtio_queue_get_notification(VirtQueue *vq);
-- 
2.39.2




Re: [PATCH 2/2] hw/acpi: i386: bump MADT to revision 5

2023-04-18 Thread Eric DeVolder




On 4/12/23 02:58, Igor Mammedov wrote:

On Tue, 11 Apr 2023 18:00:49 +0200
Igor Mammedov  wrote:


On Tue, 28 Mar 2023 11:59:26 -0400
Eric DeVolder  wrote:


Currently i386 QEMU generates MADT revision 3, and reports
MADT revision 1. ACPI 6.3 introduces MADT revision 5.

For MADT revision 4, that introduces ARM GIC structures, which do
not apply to i386.

For MADT revision 5, the Local APIC flags introduces the Online
Capable bitfield.

Making MADT generate and report revision 5 will solve problems with
CPU hotplug (the Online Capable flag indicates hotpluggable CPUs).


So spec mandates 3 possible states
   00t - not present and not can't be added later ever
   01t - present
   10t - not present but might be added later
and outlawed 11t combination

00t - doesn't make much sense (i.e. why put such entry in MADT in the 1st place)

but looking at kernel commit aa06e20f1be, it looks like
ACPI_MADT_ONLINE_CAPABLE was introduced to accommodate
firmware/hw folks who would stuff MADT with LAPIC entries
for all possible CPU models, and then patch it depending on
actually used CPU model instead of dynamically creating LAPIC
entries. (insane)


on second thought, QEMU doesn't need rev 5 MADT with this flag complications.
Also I see that kernel side fix ended up in checking ACPI spec version instead
of dealing with MADT revisions mess.

So for x86 lets bump revision to 3 or 4 to be in sync with
what QEMU actually uses.


If bumping to only 3 or 4, then there is no need for this patch series.

   


  

Signed-off-by: Eric DeVolder 
---
  hw/i386/acpi-common.c | 13 ++---
  1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
index 52e5c1439a..1e3a13a36c 100644
--- a/hw/i386/acpi-common.c
+++ b/hw/i386/acpi-common.c
@@ -38,8 +38,15 @@ void pc_madt_cpu_entry(int uid, const CPUArchIdList 
*apic_ids,
  {
  uint32_t apic_id = apic_ids->cpus[uid].arch_id;
  /* Flags – Local APIC Flags */
-uint32_t flags = apic_ids->cpus[uid].cpu != NULL || force_enabled ?
- 1 /* Enabled */ : 0;
+bool enabled = apic_ids->cpus[uid].cpu != NULL || force_enabled ?
+ true /* Enabled */ : false;
+/*
+ * ACPI 6.3 5.2.12.2 Local APIC Flags: OnlineCapable must be 0
+ * if Enabled is set.
+ */
+bool onlinecapable = enabled ? false : true; /* Online Capable */



+uint32_t flags = onlinecapable ? 0x2 : 0x0 |
+enabled ? 0x1 : 0x0;

align the last line with onlinecapable '

move /* Enabled */ and /* Online Capable */ comments right to magic values
i.e. onlinecapable ? 0x2 : 0x0 | /* Online Capable */ ...



Done.

I've gone ahead and posted a v2 with these changes; keeping MADT.revision at 5.
eric


  
  /* ACPI spec says that LAPIC entry for non present

   * CPU may be omitted from MADT or it must be marked
@@ -102,7 +109,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
  MachineClass *mc = MACHINE_GET_CLASS(x86ms);
  const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(x86ms));
  AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(adev);
-AcpiTable table = { .sig = "APIC", .rev = 1, .oem_id = oem_id,
+AcpiTable table = { .sig = "APIC", .rev = 5, .oem_id = oem_id,
  .oem_table_id = oem_table_id };
  
  acpi_table_begin(, table_data);








Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-04-18 Thread Daniel P . Berrangé
On Fri, Mar 31, 2023 at 12:27:48PM -0400, Peter Xu wrote:
> On Fri, Mar 31, 2023 at 05:10:16PM +0100, Daniel P. Berrangé wrote:
> > On Fri, Mar 31, 2023 at 11:55:03AM -0400, Peter Xu wrote:
> > > On Fri, Mar 31, 2023 at 12:30:45PM -0300, Fabiano Rosas wrote:
> > > > Peter Xu  writes:
> > > > 
> > > > > On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
> > > > >> >> Outgoing migration to file. NVMe disk. XFS filesystem.
> > > > >> >> 
> > > > >> >> - Single migration runs of stopped 32G guest with ~90% RAM usage. 
> > > > >> >> Guest
> > > > >> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all 
> > > > >> >> --verify -t
> > > > >> >>   10m -v`:
> > > > >> >> 
> > > > >> >> migration type  | MB/s | pages/s |  ms
> > > > >> >> +--+-+--
> > > > >> >> savevm io_uring |  434 |  102294 | 71473
> > > > >> >
> > > > >> > So I assume this is the non-live migration scenario.  Could you 
> > > > >> > explain
> > > > >> > what does io_uring mean here?
> > > > >> >
> > > > >> 
> > > > >> This table is all non-live migration. This particular line is a 
> > > > >> snapshot
> > > > >> (hmp_savevm->save_snapshot). I thought it could be relevant because 
> > > > >> it
> > > > >> is another way by which we write RAM into disk.
> > > > >
> > > > > I see, so if all non-live that explains, because I was curious what's 
> > > > > the
> > > > > relationship between this feature and the live snapshot that QEMU also
> > > > > supports.
> > > > >
> > > > > I also don't immediately see why savevm will be much slower, do you 
> > > > > have an
> > > > > answer?  Maybe it's somewhere but I just overlooked..
> > > > >
> > > > 
> > > > I don't have a concrete answer. I could take a jab and maybe blame the
> > > > extra memcpy for the buffer in QEMUFile? Or perhaps an unintended effect
> > > > of bandwidth limits?
> > > 
> > > IMHO it would be great if this can be investigated and reasons provided in
> > > the next cover letter.
> > > 
> > > > 
> > > > > IIUC this is "vm suspend" case, so there's an extra benefit knowledge 
> > > > > of
> > > > > "we can stop the VM".  It smells slightly weird to build this on top 
> > > > > of
> > > > > "migrate" from that pov, rather than "savevm", though.  Any thoughts 
> > > > > on
> > > > > this aspect (on why not building this on top of "savevm")?
> > > > >
> > > > 
> > > > I share the same perception. I have done initial experiments with
> > > > savevm, but I decided to carry on the work that was already started by
> > > > others because my understanding of the problem was yet incomplete.
> > > > 
> > > > One point that has been raised is that the fixed-ram format alone does
> > > > not bring that many performance improvements. So we'll need
> > > > multi-threading and direct-io on top of it. Re-using multifd
> > > > infrastructure seems like it could be a good idea.
> > > 
> > > The thing is IMHO concurrency is not as hard if VM stopped, and when we're
> > > 100% sure locally on where the page will go.
> > 
> > We shouldn't assume the VM is stopped though. When saving to the file
> > the VM may still be active. The fixed-ram format lets us re-write the
> > same memory location on disk multiple times in this case, thus avoiding
> > growth of the file size.
> 
> Before discussing on reusing multifd below, now I have a major confusing on
> the use case of the feature..
> 
> The question is whether we would like to stop the VM after fixed-ram
> migration completes.  I'm asking because:
> 
>   1. If it will stop, then it looks like a "VM suspend" to me. If so, could
>  anyone help explain why we don't stop the VM first then migrate?
>  Because it avoids copying single pages multiple times, no fiddling
>  with dirty tracking at all - we just don't ever track anything.  In
>  short, we'll stop the VM anyway, then why not stop it slightly
>  earlier?
> 
>   2. If it will not stop, then it's "VM live snapshot" to me.  We have
>  that, aren't we?  That's more efficient because it'll wr-protect all
>  guest pages, any write triggers a CoW and we only copy the guest pages
>  once and for all.
> 
> Either way to go, there's no need to copy any page more than once.  Did I
> miss anything perhaps very important?
> 
> I would guess it's option (1) above, because it seems we don't snapshot the
> disk alongside.  But I am really not sure now..

It is both options above.

Libvirt has multiple APIs where it currently uses its migrate-to-file
approach

  * virDomainManagedSave()

This saves VM state to an libvirt managed file, stops the VM, and the
file state is auto-restored on next request to start the VM, and the
file deleted. The VM CPUs are stopped during both save + restore
phase

  * virDomainSave/virDomainRestore

The former saves VM state to a file specified by the mgmt app/user.
A later call to virDomaniRestore starts the VM using that saved
state. The mgmt app / user can delete the file 

Re: [PATCH 0/2] Migration time prediction using calc-dirty-rate

2023-04-18 Thread Daniel P . Berrangé
On Tue, Feb 28, 2023 at 04:16:01PM +0300, Andrei Gudkov via wrote:
> Summary of calc-dirty-rate changes:
> 
> 1. The most important change is that now calc-dirty-rate produces
>a *vector* of dirty page measurements for progressively increasing time
>periods: 125ms, 250, 500, 750, 1000, 1500, .., up to specified calc-time.
>The motivation behind such change is that number of dirtied pages as
>a function of time starting from "clean state" (new migration iteration)
>is far from linear. Shape of this function depends on the workload type
>and intensity. Measuring number of dirty pages at progressively
>increasing periods allows to reconstruct this function using piece-wise
>interpolation.
> 
> 2. New metric added -- number of all-zero pages.
>Predictor needs to distinguish between number of zero and non-zero pages
>because during migration only 8 byte header is placed on the wire for
>all-zero page.
> 
> 3. Hashing function was changed from CRC32 to xxHash.
>This reduces overhead of sampling by ~10 times, which is important since
>now some of the measurement periods are sub-second.

Very good !

> 
> 4. Other trivial metrics were added for convenience: total number
>of VM pages, number of sampled pages, page size.
> 
> 
> After these changes output from calc-dirty-rate looks like this:
> 
> {
>   "page-size": 4096,
>   "periods": [125, 250, 375, 500, 750, 1000, 1500,
>   2000, 3000, 4001, 6000, 8000, 1,
>   15000, 2, 25000, 3, 35000,
>   4, 45000, 5, 6],
>   "status": "measured",
>   "sample-pages": 512,
>   "dirty-rate": 98,
>   "mode": "page-sampling",
>   "n-dirty-pages": [33, 78, 119, 151, 217, 236, 293, 336,
> 425, 505, 620, 756, 898, 1204, 1457,
> 1723, 1934, 2141, 2328, 2522, 2675, 2958],
>   "n-sampled-pages": 16392,
>   "n-zero-pages": 10060,
>   "n-total-pages": 8392704,
>   "start-time": 2916750,
>   "calc-time": 60
> }

Ok, so "periods" and "n-dirty-pages" pages arrays correlate with
each other.

> 
> Passing this data into prediction script, we get the following estimations:
> 
> Downtime> |125ms |250ms |500ms |   1000ms |   5000ms |unlim
> ---
>  100 Mbps |- |- |- |- |- |   16m59s  
>1 Gbps |- |- |- |- |- |1m40s
>2 Gbps |- |- |- |- |1m41s |  50s  
>  2.5 Gbps |- |- |- |- |1m07s |  40s
>5 Gbps |  48s |  46s |  31s |  28s |  25s |  20s
>   10 Gbps |  13s |  12s |  12s |  12s |  12s |  10s
>   25 Gbps |   5s |   5s |   5s |   5s |   4s |   4s
>   40 Gbps |   3s |   3s |   3s |   3s |   3s |   3s

This is fascinating and really helpful as an idea. It so nicely
shows the when it is not even worth bothering to try to start the
migrate unless you're willing to put up with large (5 sec) downtime.
or use autoconverge/post-copy.

I wonder if the calc-dirty-rate measurements also give enough info
to predict the likely number/duration of async page fetches needed
during post-copy phase ? Or does this give enough info to predict
how far down auto-converge should throttle the guest to enable
convergance.

> Quality of prediction was tested with YCSB benchmark. Memcached instance
> was installed into 32GiB VM, and a client generated a stream of requests.
> Between experiments we varied request size distribution, number of threads,
> and location of the client (inside or outside the VM).
> After short preheat phase, we measured calc-dirty-rate:
> 1. {"execute": "calc-dirty-rate", "arguments":{"calc-time":60}}
> 2. Wait 60 seconds
> 3. Collect results with {"execute": "query-dirty-rate"}
> 
> Afterwards we tried to migrate VM after randomly selecting max downtime
> and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
> of 5779 experiments failing badly: prediction error >=25% or incorrectly
> predicting migration success when in fact it didn't converge.

Nice results


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] ui/sdl2: disable SDL_HINT_GRAB_KEYBOARD on Windows

2023-04-18 Thread Bernhard Beschow



Am 18. April 2023 06:28:23 UTC schrieb "Volker Rümelin" :
>Windows sends an extra left control key up/down input event for
>every right alt key up/down input event for keyboards with
>international layout. Since commit 830473455f ("ui/sdl2: fix
>handling of AltGr key on Windows") QEMU uses a Windows low level
>keyboard hook procedure to reliably filter out the special left
>control key and to grab the keyboard on Windows.
>
>The SDL2 version 2.0.16 introduced its own Windows low level
>keyboard hook procedure to grab the keyboard. Windows calls this
>callback before the QEMU keyboard hook procedure. This disables
>the special left control key filter when the keyboard is grabbed.
>
>To fix the problem, disable the SDL2 Windows low level keyboard
>hook procedure.
>
>Reported-by: Bernhard Beschow 
>Signed-off-by: Volker Rümelin 

Tested-by: Bernhard Beschow 

>---
> ui/sdl2.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/ui/sdl2.c b/ui/sdl2.c
>index 00aadfae37..9d703200bf 100644
>--- a/ui/sdl2.c
>+++ b/ui/sdl2.c
>@@ -855,7 +855,10 @@ static void sdl2_display_init(DisplayState *ds, 
>DisplayOptions *o)
> #ifdef SDL_HINT_VIDEO_X11_NET_WM_BYPASS_COMPOSITOR /* only available since 
> SDL 2.0.8 */
> SDL_SetHint(SDL_HINT_VIDEO_X11_NET_WM_BYPASS_COMPOSITOR, "0");
> #endif
>+#ifndef CONFIG_WIN32
>+/* QEMU uses its own low level keyboard hook procecure on Windows */
> SDL_SetHint(SDL_HINT_GRAB_KEYBOARD, "1");
>+#endif
> #ifdef SDL_HINT_ALLOW_ALT_TAB_WHILE_GRABBED
> SDL_SetHint(SDL_HINT_ALLOW_ALT_TAB_WHILE_GRABBED, "0");
> #endif



Status of "block: Mark drain related functions GRAPH_RDLOCK"?

2023-04-18 Thread Stefan Hajnoczi
Hi Emanuele and Kevin,
The following commit is not in qemu.git/master or Kevin's block-next
tree:
https://repo.or.cz/qemu/kevin.git/commitdiff/b4959a8028f417a269168e1570b5e502123e64ed

Do you what the status of that patch is?

Multi-queue block layer code I'm working on depends on this change to
bdrv_co_yield_to_drain():

-replay_bh_schedule_oneshot_event(ctx, bdrv_co_drain_bh_cb, );
+replay_bh_schedule_oneshot_event(qemu_get_aio_context(),
+ bdrv_co_drain_bh_cb, );

I want to ensure that .drained_begin/end/poll() callbacks always run in
the main loop thread under the BQL.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-04-18 Thread Stefan Hajnoczi
On Tue, 18 Apr 2023 at 14:31, Eugenio Perez Martin  wrote:
>
> On Tue, Apr 18, 2023 at 7:59 PM Stefan Hajnoczi  wrote:
> >
> > On Tue, Apr 18, 2023 at 10:09:30AM +0200, Eugenio Perez Martin wrote:
> > > On Mon, Apr 17, 2023 at 9:33 PM Stefan Hajnoczi  
> > > wrote:
> > > >
> > > > On Mon, 17 Apr 2023 at 15:10, Eugenio Perez Martin 
> > > >  wrote:
> > > > >
> > > > > On Mon, Apr 17, 2023 at 5:38 PM Stefan Hajnoczi  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin 
> > > > > > wrote:
> > > > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek wrote:
> > > > > > > > > So-called "internal" virtio-fs migration refers to 
> > > > > > > > > transporting the
> > > > > > > > > back-end's (virtiofsd's) state through qemu's migration 
> > > > > > > > > stream.  To do
> > > > > > > > > this, we need to be able to transfer virtiofsd's internal 
> > > > > > > > > state to and
> > > > > > > > > from virtiofsd.
> > > > > > > > >
> > > > > > > > > Because virtiofsd's internal state will not be too large, we 
> > > > > > > > > believe it
> > > > > > > > > is best to transfer it as a single binary blob after the 
> > > > > > > > > streaming
> > > > > > > > > phase.  Because this method should be useful to other 
> > > > > > > > > vhost-user
> > > > > > > > > implementations, too, it is introduced as a general-purpose 
> > > > > > > > > addition to
> > > > > > > > > the protocol, not limited to vhost-user-fs.
> > > > > > > > >
> > > > > > > > > These are the additions to the protocol:
> > > > > > > > > - New vhost-user protocol feature 
> > > > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
> > > > > > > > >   This feature signals support for transferring state, and is 
> > > > > > > > > added so
> > > > > > > > >   that migration can fail early when the back-end has no 
> > > > > > > > > support.
> > > > > > > > >
> > > > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end 
> > > > > > > > > negotiate a pipe
> > > > > > > > >   over which to transfer the state.  The front-end sends an 
> > > > > > > > > FD to the
> > > > > > > > >   back-end into/from which it can write/read its state, and 
> > > > > > > > > the back-end
> > > > > > > > >   can decide to either use it, or reply with a different FD 
> > > > > > > > > for the
> > > > > > > > >   front-end to override the front-end's choice.
> > > > > > > > >   The front-end creates a simple pipe to transfer the state, 
> > > > > > > > > but maybe
> > > > > > > > >   the back-end already has an FD into/from which it has to 
> > > > > > > > > write/read
> > > > > > > > >   its state, in which case it will want to override the 
> > > > > > > > > simple pipe.
> > > > > > > > >   Conversely, maybe in the future we find a way to have the 
> > > > > > > > > front-end
> > > > > > > > >   get an immediate FD for the migration stream (in some 
> > > > > > > > > cases), in which
> > > > > > > > >   case we will want to send this to the back-end instead of 
> > > > > > > > > creating a
> > > > > > > > >   pipe.
> > > > > > > > >   Hence the negotiation: If one side has a better idea than a 
> > > > > > > > > plain
> > > > > > > > >   pipe, we will want to use that.
> > > > > > > > >
> > > > > > > > > - CHECK_DEVICE_STATE: After the state has been transferred 
> > > > > > > > > through the
> > > > > > > > >   pipe (the end indicated by EOF), the front-end invokes this 
> > > > > > > > > function
> > > > > > > > >   to verify success.  There is no in-band way (through the 
> > > > > > > > > pipe) to
> > > > > > > > >   indicate failure, so we need to check explicitly.
> > > > > > > > >
> > > > > > > > > Once the transfer pipe has been established via 
> > > > > > > > > SET_DEVICE_STATE_FD
> > > > > > > > > (which includes establishing the direction of transfer and 
> > > > > > > > > migration
> > > > > > > > > phase), the sending side writes its data into the pipe, and 
> > > > > > > > > the reading
> > > > > > > > > side reads it until it sees an EOF.  Then, the front-end will 
> > > > > > > > > check for
> > > > > > > > > success via CHECK_DEVICE_STATE, which on the destination side 
> > > > > > > > > includes
> > > > > > > > > checking for integrity (i.e. errors during deserialization).
> > > > > > > > >
> > > > > > > > > Suggested-by: Stefan Hajnoczi 
> > > > > > > > > Signed-off-by: Hanna Czenczek 
> > > > > > > > > ---
> > > > > > > > >  include/hw/virtio/vhost-backend.h |  24 +
> > > > > > > > >  include/hw/virtio/vhost.h |  79 
> > > > > > > > >  hw/virtio/vhost-user.c| 147 
> > > > > > > > > ++
> > > > > > > > >  hw/virtio/vhost.c |  37 
> > > > > > > > >  4 files changed, 287 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/include/hw/virtio/vhost-backend.h 
> > > > > > > > > 

Re: [PATCH V4] tracing: install trace events file only if necessary

2023-04-18 Thread Stefan Hajnoczi
On Fri, 7 Apr 2023 at 21:05,  wrote:
>
> From: Carlos Santos 
>
> It is not useful when configuring with --enable-trace-backends=nop.
>
> Signed-off-by: Carlos Santos 
> ---
> Changes v1->v2:
>   Install based on chosen trace backend, not on chosen emulators.
> Changes v2->v3:
>   Add missing comma
> Changes v3->v4:
>   Fix array comparison:
> get_option('trace_backends') != [ 'nop' ]
>   not
> get_option('trace_backends') != 'nop'
> ---
>  trace/meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to my block-next tree:
https://gitlab.com/stefanha/qemu/-/commit/d0380c64b4002e6da3b6e205468030d9e76dcc4a

Stefan



Re: [PATCH 2/2] tests/qtest: make more migration pre-copy scenarios run non-live

2023-04-18 Thread Fabiano Rosas
Daniel P. Berrangé  writes:

> There are 27 pre-copy live migration scenarios being tested. In all of
> these we force non-convergance and run for one iteration, then let it
> converge and wait for completion during the second (or following)
> iterations. At 3 mbps bandwidth limit the first iteration takes a very
> long time (~30 seconds).
>
> While it is important to test the migration passes and convergance
> logic, it is overkill to do this for all 27 pre-copy scenarios. The
> TLS migration scenarios in particular are merely exercising different
> code paths during connection establishment.
>
> To optimize time taken, switch most of the test scenarios to run
> non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> a massive speed up for most of the test scenarios.
>
> For test coverage the following scenarios are unchanged
>
>  * Precopy with UNIX sockets
>  * Precopy with UNIX sockets and dirty ring tracking
>  * Precopy with XBZRLE
>  * Precopy with multifd
>
> Signed-off-by: Daniel P. Berrangé 

...

> -qtest_qmp_eventwait(to, "RESUME");
> +if (!args->live) {
> +qtest_qmp_discard_response(to, "{ 'execute' : 'cont'}");
> +}
> +if (!got_resume) {
> +qtest_qmp_eventwait(to, "RESUME");
> +}

Hi Daniel,

On an aarch64 host I'm sometimes (~30%) seeing a hang here on a TLS test:

../configure --target-list=aarch64-softmmu --enable-gnutls

... ./tests/qtest/migration-test --tap -k -p 
/aarch64/migration/precopy/tcp/tls/psk/match

(gdb) bt
#0  0xf7b33f8c in recv () from /lib64/libpthread.so.0
#1  0xaaac8bf4 in recv (__flags=0, __n=1, __buf=0xe477, __fd=5) 
at /usr/include/bits/socket2.h:44
#2  qmp_fd_receive (fd=5) at ../tests/qtest/libqmp.c:73
#3  0xaaac6dbc in qtest_qmp_receive_dict (s=0xaaca7d10) at 
../tests/qtest/libqtest.c:713
#4  qtest_qmp_eventwait_ref (s=0xaaca7d10, event=0xaab26ce8 "RESUME") 
at ../tests/qtest/libqtest.c:837
#5  0xaaac6e34 in qtest_qmp_eventwait (s=, 
event=) at ../tests/qtest/libqtest.c:850
#6  0xaaabbd90 in test_precopy_common (args=0xe590, 
args@entry=0xe5a0) at ../tests/qtest/migration-test.c:1393
#7  0xaaabc804 in test_precopy_tcp_tls_psk_match () at 
../tests/qtest/migration-test.c:1564
#8  0xf7c89630 in ?? () from //usr/lib64/libglib-2.0.so.0
...
#15 0xf7c89a70 in g_test_run_suite () from //usr/lib64/libglib-2.0.so.0
#16 0xf7c89ae4 in g_test_run () from //usr/lib64/libglib-2.0.so.0
#17 0xaaab7fdc in main (argc=, argv=) at 
../tests/qtest/migration-test.c:2642



Re: [PATCH] .gitlab-ci.d/cirrus: Drop the CI job for compiling with FreeBSD 12

2023-04-18 Thread Warner Losh
On Tue, Apr 18, 2023 at 10:02 AM Thomas Huth  wrote:

> FreeBSD 13.0 has been released in April 2021:
>
>  https://www.freebsd.org/releases/13.0R/announce/
>
> According to QEMU's support policy, we stop supporting the previous
> major release two years after the the new major release has been
> published. So we can stop testing FreeBSD 12 in our CI now.
>

13.2 was just released this week, and the FreeBSD project will be
dropping support for 12 by the end of the year. 14.0 is up in late
string / early summer.


> Signed-off-by: Thomas Huth 
>

Reviewed-by: Warner Losh 


> ---
>  We should likely also update tests/vm/freebsd ... however, FreeBSD 13
>  seems not to use the serial console by default anymore, so I've got
>  no clue how we could use their images now... Does anybody have any
>  suggestions?
>

I should look at this... It should still be using serial console by
default...

Warner


>  .gitlab-ci.d/cirrus.yml | 13 -
>  .gitlab-ci.d/cirrus/freebsd-12.vars | 16 
>  tests/lcitool/refresh   |  1 -
>  3 files changed, 30 deletions(-)
>  delete mode 100644 .gitlab-ci.d/cirrus/freebsd-12.vars
>
> diff --git a/.gitlab-ci.d/cirrus.yml b/.gitlab-ci.d/cirrus.yml
> index 502dfd612c..1507c928e5 100644
> --- a/.gitlab-ci.d/cirrus.yml
> +++ b/.gitlab-ci.d/cirrus.yml
> @@ -44,19 +44,6 @@
>variables:
>  QEMU_JOB_CIRRUS: 1
>
> -x64-freebsd-12-build:
> -  extends: .cirrus_build_job
> -  variables:
> -NAME: freebsd-12
> -CIRRUS_VM_INSTANCE_TYPE: freebsd_instance
> -CIRRUS_VM_IMAGE_SELECTOR: image_family
> -CIRRUS_VM_IMAGE_NAME: freebsd-12-4
> -CIRRUS_VM_CPUS: 8
> -CIRRUS_VM_RAM: 8G
> -UPDATE_COMMAND: pkg update; pkg upgrade -y
> -INSTALL_COMMAND: pkg install -y
> -TEST_TARGETS: check
> -
>  x64-freebsd-13-build:
>extends: .cirrus_build_job
>variables:
> diff --git a/.gitlab-ci.d/cirrus/freebsd-12.vars
> b/.gitlab-ci.d/cirrus/freebsd-12.vars
> deleted file mode 100644
> index 44d8a2a511..00
> --- a/.gitlab-ci.d/cirrus/freebsd-12.vars
> +++ /dev/null
> @@ -1,16 +0,0 @@
> -# THIS FILE WAS AUTO-GENERATED
> -#
> -#  $ lcitool variables freebsd-12 qemu
> -#
> -# https://gitlab.com/libvirt/libvirt-ci
> -
> -CCACHE='/usr/local/bin/ccache'
> -CPAN_PKGS=''
> -CROSS_PKGS=''
> -MAKE='/usr/local/bin/gmake'
> -NINJA='/usr/local/bin/ninja'
> -PACKAGING_COMMAND='pkg'
> -PIP3='/usr/local/bin/pip-3.8'
> -PKGS='alsa-lib bash bison bzip2 ca_root_nss capstone4 ccache
> cdrkit-genisoimage cmocka ctags curl cyrus-sasl dbus diffutils dtc flex
> fusefs-libs3 gettext git glib gmake gnutls gsed gtk3 json-c libepoxy libffi
> libgcrypt libjpeg-turbo libnfs libslirp libspice-server libssh libtasn1
> llvm lzo2 meson ncurses nettle ninja opencv pixman pkgconf png py39-numpy
> py39-pillow py39-pip py39-sphinx py39-sphinx_rtd_theme py39-yaml python3
> rpm2cpio sdl2 sdl2_image snappy sndio socat spice-protocol tesseract
> usbredir virglrenderer vte3 zstd'
> -PYPI_PKGS=''
> -PYTHON='/usr/local/bin/python3'
> diff --git a/tests/lcitool/refresh b/tests/lcitool/refresh
> index c0d7ad5516..4c568242d2 100755
> --- a/tests/lcitool/refresh
> +++ b/tests/lcitool/refresh
> @@ -182,7 +182,6 @@ try:
>  #
>  # Cirrus packages lists for GitLab
>  #
> -generate_cirrus("freebsd-12")
>  generate_cirrus("freebsd-13")
>  generate_cirrus("macos-12")
>
> --
> 2.31.1
>
>


[PATCH 3/3] migration/postcopy: Detect file system on dest host

2023-04-18 Thread Peter Xu
Postcopy requires the memory support userfaultfd to work.  Right now we
check it but it's a bit too late (when switching to postcopy migration).

Do that early right at enabling of postcopy.

Note that this is still only a best effort because ramblocks can be
dynamically created.  We can add check in hostmem creations and fail if
postcopy enabled, but maybe that's too aggressive.

Still, we have chance to fail the most obvious where we know there's an
existing unsupported ramblock.

Signed-off-by: Peter Xu 
---
 migration/postcopy-ram.c | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 93f39f8e06..560530b758 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -336,11 +336,12 @@ static bool ufd_check_and_apply(int ufd, 
MigrationIncomingState *mis)
 
 /* Callback from postcopy_ram_supported_by_host block iterator.
  */
-static int test_ramblock_postcopiable(RAMBlock *rb, void *opaque)
+static int test_ramblock_postcopiable(RAMBlock *rb)
 {
 const char *block_name = qemu_ram_get_idstr(rb);
 ram_addr_t length = qemu_ram_get_used_length(rb);
 size_t pagesize = qemu_ram_pagesize(rb);
+const char *fs;
 
 if (length % pagesize) {
 error_report("Postcopy requires RAM blocks to be a page size multiple,"
@@ -348,6 +349,15 @@ static int test_ramblock_postcopiable(RAMBlock *rb, void 
*opaque)
  "page size of 0x%zx", block_name, length, pagesize);
 return 1;
 }
+
+if (rb->fd >= 0) {
+fs = file_memory_backend_get_fs_type(rb->mr->owner);
+if (strcmp(fs, "tmpfs") && strcmp(fs, "hugetlbfs")) {
+error_report("Host backend files need to be TMPFS or HUGETLBFS 
only");
+return 1;
+}
+}
+
 return 0;
 }
 
@@ -366,6 +376,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState 
*mis)
 struct uffdio_range range_struct;
 uint64_t feature_mask;
 Error *local_err = NULL;
+RAMBlock *block;
 
 if (qemu_target_page_size() > pagesize) {
 error_report("Target page size bigger than host page size");
@@ -390,9 +401,18 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState 
*mis)
 goto out;
 }
 
-/* We don't support postcopy with shared RAM yet */
-if (foreach_not_ignored_block(test_ramblock_postcopiable, NULL)) {
-goto out;
+/*
+ * We don't support postcopy with some type of ramblocks.
+ *
+ * NOTE: we explicitly ignored ramblock_is_ignored() instead we checked
+ * all possible ramblocks.  This is because this function can be called
+ * when creating the migration object, during the phase RAM_MIGRATABLE
+ * is not even properly set for all the ramblocks.
+ */
+RAMBLOCK_FOREACH(block) {
+if (test_ramblock_postcopiable(block)) {
+goto out;
+}
 }
 
 /*
-- 
2.39.1




[PATCH 2/3] vl.c: Create late backends before migration object

2023-04-18 Thread Peter Xu
The migration object may want to check against different types of memory
when initialized.  Delay the creation to be after late backends.

Signed-off-by: Peter Xu 
---
 softmmu/vl.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index ea20b23e4c..ad394b402f 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -3583,14 +3583,19 @@ void qemu_init(int argc, char **argv)
  machine_class->name, machine_class->deprecation_reason);
 }
 
+/*
+ * Create backends before creating migration objects, so that it can
+ * check against compatibilities on the backend memories (e.g. postcopy
+ * over memory-backend-file objects).
+ */
+qemu_create_late_backends();
+
 /*
  * Note: creates a QOM object, must run only after global and
  * compat properties have been set up.
  */
 migration_object_init();
 
-qemu_create_late_backends();
-
 /* parse features once if machine provides default cpu_type */
 current_machine->cpu_type = machine_class->default_cpu_type;
 if (cpu_option) {
-- 
2.39.1




Re: [PATCH v10 9/9] KVM: Enable and expose KVM_MEM_PRIVATE

2023-04-18 Thread Ackerley Tng

Sean Christopherson  writes:


On Tue, Mar 28, 2023, Chao Peng wrote:

On Fri, Mar 24, 2023 at 10:29:25AM +0800, Xiaoyao Li wrote:
> On 3/24/2023 10:10 AM, Chao Peng wrote:
> > On Wed, Mar 22, 2023 at 05:41:31PM -0700, Isaku Yamahata wrote:
> > > On Wed, Mar 08, 2023 at 03:40:26PM +0800,
> > > Chao Peng  wrote:
> > >
> > > > On Wed, Mar 08, 2023 at 12:13:24AM +, Ackerley Tng wrote:
> > > > > Chao Peng  writes:
> > > > >
> > > > > > On Sat, Jan 14, 2023 at 12:01:01AM +, Sean  
Christopherson wrote:

> > > > > > > On Fri, Dec 02, 2022, Chao Peng wrote:
> > > > > > +static bool kvm_check_rmem_offset_alignment(u64 offset, u64  
gpa)

> > > > > > +{
> > > > > > + if (!offset)
> > > > > > + return true;
> > > > > > + if (!gpa)
> > > > > > + return false;
> > > > > > +
> > > > > > +	return !!(count_trailing_zeros(offset) >=  
count_trailing_zeros(gpa));

> > >
> > > This check doesn't work expected. For example, offset = 2GB,  
gpa=4GB

> > > this check fails.
> >
> > This case is expected to fail as Sean initially suggested[*]:
> >I would rather reject memslot if the gfn has lesser alignment than
> >the offset. I'm totally ok with this approach _if_ there's a use  
case.
> >Until such a use case presents itself, I would rather be  
conservative

> >from a uAPI perspective.
> >
> > I understand that we put tighter restriction on this but if you see  
such

> > restriction is really a big issue for real usage, instead of a
> > theoretical problem, then we can loosen the check here. But at that  
time

> > below code is kind of x86 specific and may need improve.
> >
> > BTW, in latest code, I replaced count_trailing_zeros() with fls64():
> >return !!(fls64(offset) >= fls64(gpa));
>
> wouldn't it be !!(ffs64(offset) <= ffs64(gpa)) ?



As the function document explains, here we want to return true when
ALIGNMENT(offset) >= ALIGNMENT(gpa), so '>=' is what we need.



It's worthy clarifying that in Sean's original suggestion he actually
mentioned the opposite. He said 'reject memslot if the gfn has lesser
alignment than the offset', but I wonder this is his purpose, since
if ALIGNMENT(offset) < ALIGNMENT(gpa), we wouldn't be possible to map
the page as largepage. Consider we have below config:



   gpa=2M, offset=1M



In this case KVM tries to map gpa at 2M as 2M hugepage but the physical
page at the offset(1M) in private_fd cannot provide the 2M page due to
misalignment.



But as we discussed in the off-list thread, here we do find a real use
case indicating this check is too strict. i.e. QEMU immediately fails
when launch a guest > 2G memory. For this case QEMU splits guest memory
space into two slots:



   Slot#1(ram_below_4G): gpa=0x0, offset=0x0, size=2G
   Slot#2(ram_above_4G): gpa=4G,  offset=2G,  size=totalsize-2G



This strict alignment check fails for slot#2 because offset(2G) has less
alignment than gpa(4G). To allow this, one solution can revert to my
previous change in kvm_alloc_memslot_metadata() to disallow hugepage
only when the offset/gpa are not aligned to related page size.



Sean, How do you think?


I agree, a pure alignment check is too restrictive, and not really what I  
intended
despite past me literally saying that's what I wanted :-)  I think I may  
have also
inverted the "less alignment" statement, but luckily I believe that ends  
up being

a moot point.


The goal is to avoid having to juggle scenarios where KVM wants to create  
a hugepage,
but restrictedmem can't provide one because of a misaligned file offset.   
I think
the rule we want is that the offset must be aligned to the largest page  
size allowed
by the memslot _size_.  E.g. on x86, if the memslot size is >=1GiB then  
the offset
must be 1GiB or beter, ditto for >=2MiB and >=4KiB (ignoring that 4KiB is  
already a

requirement).


We could loosen that to say the largest size allowed by the memslot, but  
I don't
think that's worth the effort unless it's trivially easy to implement in  
code,
e.g. KVM could technically allow a 4KiB aligned offset if the memslot is  
2MiB
sized but only 4KiB aligned on the GPA.  I doubt there's a real use case  
for such

a memslot, so I want to disallow that unless it's super easy to implement.


Checking my understanding here about why we need this alignment check:

When KVM requests a page from restrictedmem, KVM will provide an offset
into the file in terms of 4K pages.

When shmem is configured to use hugepages, shmem_get_folio() will round
the requested offset down to the nearest hugepage-aligned boundary in
shmem_alloc_hugefolio().

Example of problematic configuration provided to
KVM_SET_USER_MEMORY_REGION2:

+ shmem configured to use 1GB pages
+ restrictedmem_offset provided to KVM_SET_USER_MEMORY_REGION2: 0x4000
+ memory_size provided in KVM_SET_USER_MEMORY_REGION2: 1GB
+ KVM requests offset (pgoff_t) 0x8, which translates to offset 0x8000

restrictedmem_get_page() and shmem_get_folio() returns the page for
offset 0x0 in the file, 

Re: [PATCH v2 6/8] accel/tcg: Uncache the host address for instruction fetch when tlb size < 1

2023-04-18 Thread LIU Zhiwei



On 2023/4/18 22:06, Weiwei Li wrote:

When PMP entry overlap part of the page, we'll set the tlb_size to 1, which
will make the address in tlb entry set with TLB_INVALID_MASK, and the next
access will again go through tlb_fill.However, this way will not work in
tb_gen_code() => get_page_addr_code_hostp(): the TLB host address will be
cached, and the following instructions can use this host address directly
which may lead to the bypass of PMP related check.


We can add a link to the issue in the commit message,

https://gitlab.com/qemu-project/qemu/-/issues/1542

Reviewed-by: LIU Zhiwei 

Zhiwei



Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  accel/tcg/cputlb.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index e984a98dc4..efa0cb67c9 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1696,6 +1696,11 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState 
*env, target_ulong addr,
  if (p == NULL) {
  return -1;
  }
+
+if (full->lg_page_size < TARGET_PAGE_BITS) {
+return -1;
+}
+
  if (hostp) {
  *hostp = p;
  }




Re: Move vhost-user SET_STATUS 0 after get vring base?

2023-04-18 Thread Yajun Wu



On 4/18/2023 11:34 PM, Michael S. Tsirkin wrote:

On Tue, Apr 18, 2023 at 11:18:11AM -0400, Stefan Hajnoczi wrote:

Hi,
Cindy's commit ca71db438bdc ("vhost: implement vhost_dev_start method")
added SET_STATUS calls to vhost_dev_start() and vhost_dev_stop() for all
vhost backends.

Eugenio's commit c3716f260bff ("vdpa: move vhost reset after get vring
base") deferred the SET_STATUS 0 call in vhost_dev_stop() until after
GET_VRING_BASE for vDPA only. In that commit Eugenio said, "A patch to
make vhost_user_dev_start more similar to vdpa is desirable, but it can
be added on top".

I agree and think it's a good idea to keep the vhost backends in sync
where possible.

vhost-user still has the old behavior where QEMU sends SET_STATUS 0
before GET_VRING_BASE. Most existing vhost-user backends don't implement
the SET_STATUS message, so I think no one has tripped over this yet.

Any thoughts on making vhost-user behave like vDPA here?

Stefan

Wow. Well  SET_STATUS 0 resets the device so yes, I think doing that
before GET_VRING_BASE will lose a state. Donnu how it does not trip
up people, indeed the only idea is if people ignore SET_STATUS.


--
MST


For DPDK vhost-user backend SET_STATUS 0 (reset) is ignored.

Yajun





Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-04-18 Thread Peter Xu
On Tue, Apr 18, 2023 at 05:58:44PM +0100, Daniel P. Berrangé wrote:
> On Fri, Mar 31, 2023 at 12:27:48PM -0400, Peter Xu wrote:
> > On Fri, Mar 31, 2023 at 05:10:16PM +0100, Daniel P. Berrangé wrote:
> > > On Fri, Mar 31, 2023 at 11:55:03AM -0400, Peter Xu wrote:
> > > > On Fri, Mar 31, 2023 at 12:30:45PM -0300, Fabiano Rosas wrote:
> > > > > Peter Xu  writes:
> > > > > 
> > > > > > On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
> > > > > >> >> Outgoing migration to file. NVMe disk. XFS filesystem.
> > > > > >> >> 
> > > > > >> >> - Single migration runs of stopped 32G guest with ~90% RAM 
> > > > > >> >> usage. Guest
> > > > > >> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all 
> > > > > >> >> --verify -t
> > > > > >> >>   10m -v`:
> > > > > >> >> 
> > > > > >> >> migration type  | MB/s | pages/s |  ms
> > > > > >> >> +--+-+--
> > > > > >> >> savevm io_uring |  434 |  102294 | 71473
> > > > > >> >
> > > > > >> > So I assume this is the non-live migration scenario.  Could you 
> > > > > >> > explain
> > > > > >> > what does io_uring mean here?
> > > > > >> >
> > > > > >> 
> > > > > >> This table is all non-live migration. This particular line is a 
> > > > > >> snapshot
> > > > > >> (hmp_savevm->save_snapshot). I thought it could be relevant 
> > > > > >> because it
> > > > > >> is another way by which we write RAM into disk.
> > > > > >
> > > > > > I see, so if all non-live that explains, because I was curious 
> > > > > > what's the
> > > > > > relationship between this feature and the live snapshot that QEMU 
> > > > > > also
> > > > > > supports.
> > > > > >
> > > > > > I also don't immediately see why savevm will be much slower, do you 
> > > > > > have an
> > > > > > answer?  Maybe it's somewhere but I just overlooked..
> > > > > >
> > > > > 
> > > > > I don't have a concrete answer. I could take a jab and maybe blame the
> > > > > extra memcpy for the buffer in QEMUFile? Or perhaps an unintended 
> > > > > effect
> > > > > of bandwidth limits?
> > > > 
> > > > IMHO it would be great if this can be investigated and reasons provided 
> > > > in
> > > > the next cover letter.
> > > > 
> > > > > 
> > > > > > IIUC this is "vm suspend" case, so there's an extra benefit 
> > > > > > knowledge of
> > > > > > "we can stop the VM".  It smells slightly weird to build this on 
> > > > > > top of
> > > > > > "migrate" from that pov, rather than "savevm", though.  Any 
> > > > > > thoughts on
> > > > > > this aspect (on why not building this on top of "savevm")?
> > > > > >
> > > > > 
> > > > > I share the same perception. I have done initial experiments with
> > > > > savevm, but I decided to carry on the work that was already started by
> > > > > others because my understanding of the problem was yet incomplete.
> > > > > 
> > > > > One point that has been raised is that the fixed-ram format alone does
> > > > > not bring that many performance improvements. So we'll need
> > > > > multi-threading and direct-io on top of it. Re-using multifd
> > > > > infrastructure seems like it could be a good idea.
> > > > 
> > > > The thing is IMHO concurrency is not as hard if VM stopped, and when 
> > > > we're
> > > > 100% sure locally on where the page will go.
> > > 
> > > We shouldn't assume the VM is stopped though. When saving to the file
> > > the VM may still be active. The fixed-ram format lets us re-write the
> > > same memory location on disk multiple times in this case, thus avoiding
> > > growth of the file size.
> > 
> > Before discussing on reusing multifd below, now I have a major confusing on
> > the use case of the feature..
> > 
> > The question is whether we would like to stop the VM after fixed-ram
> > migration completes.  I'm asking because:
> > 
> >   1. If it will stop, then it looks like a "VM suspend" to me. If so, could
> >  anyone help explain why we don't stop the VM first then migrate?
> >  Because it avoids copying single pages multiple times, no fiddling
> >  with dirty tracking at all - we just don't ever track anything.  In
> >  short, we'll stop the VM anyway, then why not stop it slightly
> >  earlier?
> > 
> >   2. If it will not stop, then it's "VM live snapshot" to me.  We have
> >  that, aren't we?  That's more efficient because it'll wr-protect all
> >  guest pages, any write triggers a CoW and we only copy the guest pages
> >  once and for all.
> > 
> > Either way to go, there's no need to copy any page more than once.  Did I
> > miss anything perhaps very important?
> > 
> > I would guess it's option (1) above, because it seems we don't snapshot the
> > disk alongside.  But I am really not sure now..
> 
> It is both options above.
> 
> Libvirt has multiple APIs where it currently uses its migrate-to-file
> approach
> 
>   * virDomainManagedSave()
> 
> This saves VM state to an libvirt managed file, stops the VM, and the
> file state is auto-restored on next request to start the 

[RFC PATCH 3/3] acpi: add generic port device object

2023-04-18 Thread Dave Jiang
Signed-off-by: Dave Jiang 
---
 hw/acpi/genport.c   |   61 +++
 hw/acpi/meson.build |1 +
 hw/i386/acpi-build.c|   32 ++-
 include/hw/acpi/aml-build.h |4 +--
 softmmu/vl.c|   26 ++
 5 files changed, 115 insertions(+), 9 deletions(-)
 create mode 100644 hw/acpi/genport.c

diff --git a/hw/acpi/genport.c b/hw/acpi/genport.c
new file mode 100644
index ..5738730323c2
--- /dev/null
+++ b/hw/acpi/genport.c
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Generic Port device implementation
+ *
+ * Copyright (C) 2023 Intel Corporation
+ */
+#include "qemu/osdep.h"
+#include "hw/qdev-properties.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "qom/object_interfaces.h"
+#include "hw/qdev-core.h"
+
+#define TYPE_GENERIC_PORT_DEVICE "genport"
+
+#define GENPORT_NUMA_NODE_PROP "node"
+#define GENPORT_DEV_PROP   "genport"
+
+typedef struct GenericPortDevice {
+/* private */
+DeviceState parent_obj;
+
+/* public */
+uint32_t node;
+} GenericPortDevice;
+
+typedef struct GenericPortDeviceClass {
+DeviceClass parent_class;
+} GenericPortDeviceClass;
+
+static Property genport_properties[] = {
+DEFINE_PROP_UINT32(GENPORT_NUMA_NODE_PROP, GenericPortDevice, node, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(GenericPortDevice, genport_device,
+   GENERIC_PORT_DEVICE, DEVICE,
+   { TYPE_USER_CREATABLE },
+   { NULL })
+
+static void genport_device_init(Object *obj)
+{
+}
+
+static void genport_device_finalize(Object *obj)
+{
+}
+
+static void genport_device_realize(DeviceState *dev, Error **errp)
+{
+}
+
+static void genport_device_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+dc->realize = genport_device_realize;
+dc->desc = "Generic Port";
+device_class_set_props(dc, genport_properties);
+}
+
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index e0bf39bf4cd6..5247554998b0 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -5,6 +5,7 @@ acpi_ss.add(files(
   'bios-linker-loader.c',
   'core.c',
   'utils.c',
+  'genport.c',
 ))
 acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c', 
'cpu_hotplug.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_false: 
files('acpi-cpu-hotplug-stub.c'))
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 0d9e610af12b..db850bfd170d 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1514,12 +1514,22 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
 if (pci_bus_is_cxl(bus)) {
-CxlHBDev *hb_entry;
+CxlHBDev *hb_entry, *match;
+bool found = false;
 struct Aml *pkg = aml_package(2);
 
-hb_entry = g_malloc0(sizeof(*hb_entry));
-hb_entry->uid = bus_num;
-QSLIST_INSERT_HEAD(_hb_list_head, hb_entry, entry);
+QSLIST_FOREACH(match, _hb_list_head, entry)
+{
+if (match->uid == bus_num) {
+found = true;
+break;
+}
+}
+if (!found) {
+hb_entry = g_malloc0(sizeof(*hb_entry));
+hb_entry->uid = bus_num;
+QSLIST_INSERT_HEAD(_hb_list_head, hb_entry, entry);
+}
 
 aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0016")));
 aml_append(pkg, aml_eisaid("PNP0A08"));
@@ -1892,6 +1902,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 NULL);
 AcpiTable table = { .sig = "SRAT", .rev = 1, .oem_id = x86ms->oem_id,
 .oem_table_id = x86ms->oem_table_id };
+int pxm_domain;
 
 acpi_table_begin(, table_data);
 build_append_int_noprefix(table_data, 1, 4); /* Reserved */
@@ -1986,16 +1997,23 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 
 sgx_epc_build_srat(table_data);
 
+/* FIXME: this is a hack, need a node property for genport */
+pxm_domain = 6;
 QSLIST_FOREACH(hb_entry, _hb_list_head, entry)
 {
 ACPIDeviceHandle handle = {
 .hid = "ACPI0016",
-.uid = hb_entry->uid,
+.reserved = { 0 },
 };
+char uid_str[5];
 uint32_t flags = GEN_AFFINITY_ENABLED;
 
-build_srat_generic_port_affinity(table_data, 0, nb_numa_nodes,
- , flags);
+snprintf(uid_str, 4, "%u", hb_entry->uid);
+

[RFC PATCH 0/3] QEMU ACPI generic port support

2023-04-18 Thread Dave Jiang
s small RFC patch series is really a hack on what I need from qemu rather
than a proper implementation. I'm hoping to get some guidance from the list on
how to implement this correctly for qemu upstream. Thank you!

The patch series provides support for the ACPI Generic Port support that's
defined by ACPI spec 6.5 5.2.16.7 (Generic Port Affinity Structure). The
series also adds a genport object that allows locality data to be injected via
qemu commandline to the HMAT tables. The generic port support is to allow a hot
plugged CXL memory device to calculate the locality data from the CPU to
the CXL device. The generic port related data provides the locality data from
the CPU to the CXL host bridge (latency and bandwidth). These data in
addition to the PCIe link data, CDAT from device, and CXL switch CDAT if switch
exist, provides the locality data for the entire path.

Patch1: Adds Generic Port Affinity Structure sub-tables to the SRAT. For
each CXL Host Bridge (HB) a GPAS entry is created with a unique proximity
domain. For example, if the system is created with 4 proximity domains (PXM) for
system memory, then the next GPAS will get PXM 4 and so on.

Patch2: Add the json support for generic port. Split out because
clang-format really clobbers the json files.

Patch3: Add a generic port object. The intention here is to allow setup of
numa nodes, add hmat-lb data and node distance for the generic targets. I had to
add a hack in qemu_create_cli_devices() to realize the genport objects. I need
guidance on where and how to do this properly so the genport objects
realize at the correct place and time.

Example of genport setup:
-object genport,id=$X -numa node,genport=genport$X,nodeid=$Y,initiator=$Z
-numa 
hmat-lb,initiator=$Z,target=$X,hierarchy=memory,data-type=access-latency,latency=$latency
-numa 
hmat-lb,initiator=$Z,target=$X,hierarchy=memory,data-type=access-bandwidth,bandwidth=$bandwidthM
for ((i = 0; i < total_nodes; i++)); do
for ((j = 0; j < cxl_hbs; j++ )); do# 2 CXL HBs
-numa dist,src=$i,dst=$X,val=$dist
done
done
Linux kernel support:
https://lore.kernel.org/linux-cxl/168088732996.1441063.10107817505475386072.stgit@djiang5-mobl3/T/#t

---

Dave Jiang (3):
  hw/acpi: Add support for Generic Port Affinity Structure to SRAT
  genport: Add json support for generic port
  acpi: add generic port device object


 hw/acpi/aml-build.c | 21 +
 hw/acpi/genport.c   | 61 +
 hw/acpi/meson.build |  1 +
 hw/i386/acpi-build.c| 45 +++
 include/hw/acpi/aml-build.h | 27 
 qapi/machine.json   |  3 +-
 qapi/qom.json   | 12 
 softmmu/vl.c| 26 
 8 files changed, 195 insertions(+), 1 deletion(-)
 create mode 100644 hw/acpi/genport.c

--




[RFC PATCH 2/3] genport: Add json support for generic port

2023-04-18 Thread Dave Jiang
Add QOM json update for ACPI generic port object to support HMAT
enumeration.

Signed-off-by: Dave Jiang 
---
 qapi/machine.json |3 ++-
 qapi/qom.json |   12 
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index 068427b8feb8..39cb5bd713f6 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -479,7 +479,8 @@
'*cpus':   ['uint16'],
'*mem':'size',
'*memdev': 'str',
-   '*initiator': 'uint16' }}
+   '*initiator': 'uint16',
+   '*genport': 'str' }}
 
 ##
 # @NumaDistOptions:
diff --git a/qapi/qom.json b/qapi/qom.json
index 30e76653ad28..8f5faff49114 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -444,6 +444,16 @@
   'base': 'NetfilterProperties',
   'data': { '*vnet_hdr_support': 'bool' } }
 
+##
+# @GenericPortDeviceProperties:
+#
+# Properties for generic port devices.
+#
+# Since: 7.2
+##
+{ 'struct': 'GenericPortDeviceProperties',
+  'data': {} }
+
 ##
 # @InputBarrierProperties:
 #
@@ -886,6 +896,7 @@
 'filter-redirector',
 'filter-replay',
 'filter-rewriter',
+'genport',
 'input-barrier',
 { 'name': 'input-linux',
   'if': 'CONFIG_LINUX' },
@@ -955,6 +966,7 @@
   'filter-redirector':  'FilterRedirectorProperties',
   'filter-replay':  'NetfilterProperties',
   'filter-rewriter':'FilterRewriterProperties',
+  'genport':'GenericPortDeviceProperties',
   'input-barrier':  'InputBarrierProperties',
   'input-linux':{ 'type': 'InputLinuxProperties',
   'if': 'CONFIG_LINUX' },





[RFC PATCH 1/3] hw/acpi: Add support for Generic Port Affinity Structure to SRAT

2023-04-18 Thread Dave Jiang
The Generic Port Affinity Structure is added for the System Resource
Affinity Table in ACPI r6.4. It provides information on the proximity
domain that's associated with a device handle. This information in
combination with HMAT can be used by the CXL driver to calculate the
bandwidth and latency information between the CPU and the CXL Host Bridge
(HB).

Add a list to account for the ACPI0016 (CXL HB ACPI devices) being
created. Create GAPS entries equivalent to the number of HB devices
constructed by qemu using the list and inject the relevant device handle.

The proximity domain will be set to 0 for simplicity to enable Linux kernel
side debugging and usage of the new SRAT sub-tables.

Signed-off-by: Dave Jiang 
---
 hw/acpi/aml-build.c |   21 +
 hw/i386/acpi-build.c|   27 +++
 include/hw/acpi/aml-build.h |   27 +++
 3 files changed, 75 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index ea331a20d131..949759efc0a7 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1938,6 +1938,27 @@ void build_srat_memory(GArray *table_data, uint64_t base,
 build_append_int_noprefix(table_data, 0, 8); /* Reserved */
 }
 
+/*
+ * ACPI spec, Revision 6.5
+ * 5.2.16.7 Generic Port Affinity Structure
+ */
+void build_srat_generic_port_affinity(GArray *table_data, uint8_t htype,
+  int node, ACPIDeviceHandle *handle,
+  GenericAffinityFlags flags)
+{
+build_append_int_noprefix(table_data, 6, 1); /* Type */
+build_append_int_noprefix(table_data, 32, 1);/* Length */
+build_append_int_noprefix(table_data, 0, 1); /* Reserved */
+build_append_int_noprefix(table_data, htype, 1); /* Device Handle Type */
+build_append_int_noprefix(table_data, node, 4);  /* Proximity Domain */
+build_append_int_noprefix(table_data, handle->raw[0],
+  8); /* Device Handle part 1 */
+build_append_int_noprefix(table_data, handle->raw[1],
+  8);/* Device Handle part 2 */
+build_append_int_noprefix(table_data, flags, 4); /* Flags */
+build_append_int_noprefix(table_data, 0, 4); /* Reserved */
+}
+
 /*
  * ACPI spec 5.2.17 System Locality Distance Information Table
  * (Revision 2.0 or later)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index d449e5b76f30..0d9e610af12b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -132,6 +132,13 @@ const struct AcpiGenericAddress x86_nvdimm_acpi_dsmio = {
 .bit_width = NVDIMM_ACPI_IO_LEN << 3
 };
 
+typedef struct CxlHBDev {
+uint32_t uid;
+QSLIST_ENTRY(CxlHBDev) entry;
+} CxlHBDev;
+
+static QSLIST_HEAD(, CxlHBDev) cxl_hb_list_head;
+
 static void init_common_fadt_data(MachineState *ms, Object *o,
   AcpiFadtData *data)
 {
@@ -1507,8 +1514,13 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
 if (pci_bus_is_cxl(bus)) {
+CxlHBDev *hb_entry;
 struct Aml *pkg = aml_package(2);
 
+hb_entry = g_malloc0(sizeof(*hb_entry));
+hb_entry->uid = bus_num;
+QSLIST_INSERT_HEAD(_hb_list_head, hb_entry, entry);
+
 aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0016")));
 aml_append(pkg, aml_eisaid("PNP0A08"));
 aml_append(pkg, aml_eisaid("PNP0A03"));
@@ -1866,6 +1878,7 @@ static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
 int i;
+CxlHBDev *hb_entry;
 int numa_mem_start, slots;
 uint64_t mem_len, mem_base, next_base;
 MachineClass *mc = MACHINE_GET_CLASS(machine);
@@ -1973,6 +1986,18 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 
 sgx_epc_build_srat(table_data);
 
+QSLIST_FOREACH(hb_entry, _hb_list_head, entry)
+{
+ACPIDeviceHandle handle = {
+.hid = "ACPI0016",
+.uid = hb_entry->uid,
+};
+uint32_t flags = GEN_AFFINITY_ENABLED;
+
+build_srat_generic_port_affinity(table_data, 0, nb_numa_nodes,
+ , flags);
+}
+
 /*
  * TODO: this part is not in ACPI spec and current linux kernel boots fine
  * without these entries. But I recall there were issues the last time I
@@ -2728,6 +2753,8 @@ void acpi_setup(void)
 return;
 }
 
+QSLIST_INIT(_hb_list_head);
+
 build_state = g_malloc0(sizeof *build_state);
 
 acpi_build_tables_init();
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index d1fb08514bfa..32a4f574abaa 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -204,6 +204,10 @@ typedef 

[PATCH 0/3] migration/hostmem: Allow to fail early for postcopy on specific fs type

2023-04-18 Thread Peter Xu
Postcopy can fail in a weird way when guest mem is put onto a random file:

https://bugzilla.redhat.com/show_bug.cgi?id=2057267

It's because we only check userfault privilege on dest QEMU but don't check
memory types.  We do so only until the UFFDIO_REGISTER right after we
switch to postcopy live migration from precopy but it could be too late.

This series tries to make it fail early by checking ramblock fs type if
backed by a memory-backend-file.

Now when it happens it'll fail the dest QEMU from the start:

./qemu-system-x86_64 \
-global migration.x-postcopy-ram=on \
-incoming defer \
-object memory-backend-file,id=mem,size=128M,mem-path=$memfile \
-machine memory-backend=mem

qemu-system-x86_64: Host backend files need to be TMPFS or HUGETLBFS only
qemu-system-x86_64: Postcopy is not supported

It will also fail e.g. QMP migrate-set-capabilities properly.

Please have a look, thanks.

Peter Xu (3):
  hostmem: Detect and cache fs type for file hostmem
  vl.c: Create late backends before migration object
  migration/postcopy: Detect file system on dest host

 backends/hostmem-file.c  | 37 -
 include/sysemu/hostmem.h |  1 +
 migration/postcopy-ram.c | 28 
 softmmu/vl.c |  9 +++--
 4 files changed, 68 insertions(+), 7 deletions(-)

-- 
2.39.1




[PATCH 1/3] hostmem: Detect and cache fs type for file hostmem

2023-04-18 Thread Peter Xu
Detect the file system for a memory-backend-file object and cache it within
the object if possible when CONFIG_LINUX (using statfs).

Only support the two important types of memory (tmpfs, hugetlbfs) and keep
the rest as "unknown" for now.

Signed-off-by: Peter Xu 
---
 backends/hostmem-file.c  | 37 -
 include/sysemu/hostmem.h |  1 +
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 25141283c4..2484e45a11 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -18,13 +18,17 @@
 #include "sysemu/hostmem.h"
 #include "qom/object_interfaces.h"
 #include "qom/object.h"
+#ifdef CONFIG_LINUX
+#include 
+#include 
+#endif
 
 OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendFile, MEMORY_BACKEND_FILE)
 
 
 struct HostMemoryBackendFile {
 HostMemoryBackend parent_obj;
-
+__fsword_t fs_type;
 char *mem_path;
 uint64_t align;
 bool discard_data;
@@ -52,6 +56,15 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 return;
 }
 
+#ifdef CONFIG_LINUX
+struct statfs fs;
+if (!statfs(fb->mem_path, )) {
+fb->fs_type = fs.f_type;
+} else {
+fb->fs_type = 0;
+}
+#endif
+
 name = host_memory_backend_get_name(backend);
 ram_flags = backend->share ? RAM_SHARED : 0;
 ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
@@ -181,6 +194,28 @@ static void file_backend_unparent(Object *obj)
 }
 }
 
+const char *file_memory_backend_get_fs_type(Object *obj)
+{
+#ifdef CONFIG_LINUX
+HostMemoryBackendFile *fb = (HostMemoryBackendFile *)
+object_dynamic_cast(obj, TYPE_MEMORY_BACKEND_FILE);
+
+if (!fb) {
+goto out;
+}
+
+switch (fb->fs_type) {
+case TMPFS_MAGIC:
+return "tmpfs";
+case HUGETLBFS_MAGIC:
+return "hugetlbfs";
+}
+
+out:
+#endif
+return "unknown";
+}
+
 static void
 file_backend_class_init(ObjectClass *oc, void *data)
 {
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 39326f1d4f..0354cffa6b 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -81,5 +81,6 @@ void host_memory_backend_set_mapped(HostMemoryBackend 
*backend, bool mapped);
 bool host_memory_backend_is_mapped(HostMemoryBackend *backend);
 size_t host_memory_backend_pagesize(HostMemoryBackend *memdev);
 char *host_memory_backend_get_name(HostMemoryBackend *backend);
+const char *file_memory_backend_get_fs_type(Object *obj);
 
 #endif
-- 
2.39.1




Re: [PATCH] .gitlab-ci.d/cirrus: Drop the CI job for compiling with FreeBSD 12

2023-04-18 Thread Alex Bennée


Thomas Huth  writes:

> FreeBSD 13.0 has been released in April 2021:
>
>  https://www.freebsd.org/releases/13.0R/announce/
>
> According to QEMU's support policy, we stop supporting the previous
> major release two years after the the new major release has been
> published. So we can stop testing FreeBSD 12 in our CI now.
>
> Signed-off-by: Thomas Huth 

Queued to testing/next, thanks.

> ---
>  We should likely also update tests/vm/freebsd ... however, FreeBSD 13
>  seems not to use the serial console by default anymore, so I've got
>  no clue how we could use their images now... Does anybody have any
>  suggestions?

Don't we have ssh support for all the test/vm images?

>
>  .gitlab-ci.d/cirrus.yml | 13 -
>  .gitlab-ci.d/cirrus/freebsd-12.vars | 16 
>  tests/lcitool/refresh   |  1 -
>  3 files changed, 30 deletions(-)
>  delete mode 100644 .gitlab-ci.d/cirrus/freebsd-12.vars
>
> diff --git a/.gitlab-ci.d/cirrus.yml b/.gitlab-ci.d/cirrus.yml
> index 502dfd612c..1507c928e5 100644
> --- a/.gitlab-ci.d/cirrus.yml
> +++ b/.gitlab-ci.d/cirrus.yml
> @@ -44,19 +44,6 @@
>variables:
>  QEMU_JOB_CIRRUS: 1
>  
> -x64-freebsd-12-build:
> -  extends: .cirrus_build_job
> -  variables:
> -NAME: freebsd-12
> -CIRRUS_VM_INSTANCE_TYPE: freebsd_instance
> -CIRRUS_VM_IMAGE_SELECTOR: image_family
> -CIRRUS_VM_IMAGE_NAME: freebsd-12-4
> -CIRRUS_VM_CPUS: 8
> -CIRRUS_VM_RAM: 8G
> -UPDATE_COMMAND: pkg update; pkg upgrade -y
> -INSTALL_COMMAND: pkg install -y
> -TEST_TARGETS: check
> -
>  x64-freebsd-13-build:
>extends: .cirrus_build_job
>variables:
> diff --git a/.gitlab-ci.d/cirrus/freebsd-12.vars 
> b/.gitlab-ci.d/cirrus/freebsd-12.vars
> deleted file mode 100644
> index 44d8a2a511..00
> --- a/.gitlab-ci.d/cirrus/freebsd-12.vars
> +++ /dev/null
> @@ -1,16 +0,0 @@
> -# THIS FILE WAS AUTO-GENERATED
> -#
> -#  $ lcitool variables freebsd-12 qemu
> -#
> -# https://gitlab.com/libvirt/libvirt-ci
> -
> -CCACHE='/usr/local/bin/ccache'
> -CPAN_PKGS=''
> -CROSS_PKGS=''
> -MAKE='/usr/local/bin/gmake'
> -NINJA='/usr/local/bin/ninja'
> -PACKAGING_COMMAND='pkg'
> -PIP3='/usr/local/bin/pip-3.8'
> -PKGS='alsa-lib bash bison bzip2 ca_root_nss capstone4 ccache 
> cdrkit-genisoimage cmocka ctags curl cyrus-sasl dbus diffutils dtc flex 
> fusefs-libs3 gettext git glib gmake gnutls gsed gtk3 json-c libepoxy libffi 
> libgcrypt libjpeg-turbo libnfs libslirp libspice-server libssh libtasn1 llvm 
> lzo2 meson ncurses nettle ninja opencv pixman pkgconf png py39-numpy 
> py39-pillow py39-pip py39-sphinx py39-sphinx_rtd_theme py39-yaml python3 
> rpm2cpio sdl2 sdl2_image snappy sndio socat spice-protocol tesseract usbredir 
> virglrenderer vte3 zstd'
> -PYPI_PKGS=''
> -PYTHON='/usr/local/bin/python3'
> diff --git a/tests/lcitool/refresh b/tests/lcitool/refresh
> index c0d7ad5516..4c568242d2 100755
> --- a/tests/lcitool/refresh
> +++ b/tests/lcitool/refresh
> @@ -182,7 +182,6 @@ try:
>  #
>  # Cirrus packages lists for GitLab
>  #
> -generate_cirrus("freebsd-12")
>  generate_cirrus("freebsd-13")
>  generate_cirrus("macos-12")


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



[PATCH 1/2] Reduce vdpa initialization / startup overhead

2023-04-18 Thread peili . dev
From: Pei Li 

Currently, part of the vdpa initialization / startup process
needs to trigger many ioctls per vq, which is very inefficient
and causing unnecessary context switch between user mode and
kernel mode.

This patch creates an additional ioctl() command, namely
VHOST_VDPA_GET_VRING_GROUP_BATCH, that will batching
commands of VHOST_VDPA_GET_VRING_GROUP into a single
ioctl() call.

Signed-off-by: Pei Li 
---
 hw/virtio/vhost-vdpa.c   | 31 +++-
 include/standard-headers/linux/vhost_types.h |  3 ++
 linux-headers/linux/vhost.h  |  7 +
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index bc6bad23d5..6d45ff8539 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -679,7 +679,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
 0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
 0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
-0x1ULL << VHOST_BACKEND_F_SUSPEND;
+0x1ULL << VHOST_BACKEND_F_SUSPEND |
+0x1ULL << VHOST_BACKEND_F_IOCTL_BATCH;
 int r;
 
 if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, )) {
@@ -731,14 +732,28 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, 
int idx)
 
 static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
 {
-int i;
+int i, nvqs = dev->nvqs;
+uint64_t backend_features = dev->backend_cap;
+
 trace_vhost_vdpa_set_vring_ready(dev);
-for (i = 0; i < dev->nvqs; ++i) {
-struct vhost_vring_state state = {
-.index = dev->vq_index + i,
-.num = 1,
-};
-vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, );
+
+if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_IOCTL_BATCH))) {
+for (i = 0; i < nvqs; ++i) {
+struct vhost_vring_state state = {
+.index = dev->vq_index + i,
+.num = 1,
+};
+vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, );
+}
+} else {
+struct vhost_vring_state states[nvqs + 1];
+states[0].num = nvqs;
+for (i = 1; i <= nvqs; ++i) {
+states[i].index = dev->vq_index + i - 1;
+states[i].num = 1;
+}
+
+vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE_BATCH, [0]);
 }
 return 0;
 }
diff --git a/include/standard-headers/linux/vhost_types.h 
b/include/standard-headers/linux/vhost_types.h
index c41a73fe36..068d0e1ceb 100644
--- a/include/standard-headers/linux/vhost_types.h
+++ b/include/standard-headers/linux/vhost_types.h
@@ -164,4 +164,7 @@ struct vhost_vdpa_iova_range {
 /* Device can be suspended */
 #define VHOST_BACKEND_F_SUSPEND  0x4
 
+/* IOCTL requests can be batched */
+#define VHOST_BACKEND_F_IOCTL_BATCH 0x6
+
 #endif
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index f9f115a7c7..4c9ddd0a0e 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -180,4 +180,11 @@
  */
 #define VHOST_VDPA_SUSPEND _IO(VHOST_VIRTIO, 0x7D)
 
+/* Batch version of VHOST_VDPA_SET_VRING_ENABLE
+ *
+ * Enable/disable the ring while batching the commands.
+ */
+#define VHOST_VDPA_SET_VRING_ENABLE_BATCH  _IOW(VHOST_VIRTIO, 0x7F, \
+struct vhost_vring_state)
+
 #endif
-- 
2.25.1




[PATCH 2/2] Reduce vdpa initialization / startup overhead

2023-04-18 Thread peili . dev
From: Pei Li 

Currently, part of the vdpa initialization / startup process
needs to trigger many ioctls per vq, which is very inefficient
and causing unnecessary context switch between user mode and
kernel mode.

This patch creates an additional ioctl() command, namely
VHOST_VDPA_SET_VRING_ENABLE_BATCH, that will batching
commands of VHOST_VDPA_SET_VRING_ENABLE_BATCH into a single
ioctl() call.

Signed-off-by: Pei Li 
---
 linux-headers/linux/vhost.h | 10 ++
 net/vhost-vdpa.c| 70 +++--
 2 files changed, 70 insertions(+), 10 deletions(-)

diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index 4c9ddd0a0e..f7cfa324c4 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -187,4 +187,14 @@
 #define VHOST_VDPA_SET_VRING_ENABLE_BATCH  _IOW(VHOST_VIRTIO, 0x7F, \
 struct vhost_vring_state)
 
+/* Batch version of VHOST_VDPA_GET_VRING_GROUP
+ *
+ * Get the group for a virtqueue: read index, write group in num,
+ * The virtqueue index is stored in the index field of
+ * vhost_vring_state. The group for this specific virtqueue is
+ * returned via num field of vhost_vring_state while batching commands.
+ */
+#define VHOST_VDPA_GET_VRING_GROUP_BATCH   _IOWR(VHOST_VIRTIO, 0x82, \
+ struct vhost_vring_state)
+
 #endif
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 99904a0da7..ed4f2d5c49 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -377,6 +377,47 @@ static int64_t vhost_vdpa_get_vring_group(int device_fd, 
unsigned vq_index)
 return state.num;
 }
 
+static int64_t vhost_vdpa_get_vring_group_batch(int device_fd, unsigned 
vq_index)
+{
+int r;
+struct vhost_vring_state states[vq_index + 1];
+int64_t cvq_group;
+
+states[0].num = vq_index;
+
+for (int i = 1; i <= vq_index; ++i) {
+states[i].index = i - 1;
+}
+
+r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP_BATCH, [0]);
+
+if (unlikely(r < 0)) {
+error_report("Cannot get VQ %d group: %s", vq_index - 1,
+ g_strerror(errno));
+return r;
+}
+
+cvq_group = states[vq_index].num;
+
+if (unlikely(cvq_group < 0)) {
+return cvq_group;
+}
+
+for (int i = 1; i < vq_index; ++i) {
+int64_t group = states[i].num;
+
+if (unlikely(group < 0)) {
+return group;
+}
+
+if (group == cvq_group) {
+return 0;
+}
+}
+
+return vq_index;
+}
+
 static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
unsigned vq_group,
unsigned asid_num)
@@ -512,19 +553,28 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
  * than the last vq. VQ group of last group passed in cvq_group.
  */
 cvq_index = v->dev->vq_index_end - 1;
-cvq_group = vhost_vdpa_get_vring_group(v->device_fd, cvq_index);
-if (unlikely(cvq_group < 0)) {
-return cvq_group;
-}
-for (int i = 0; i < cvq_index; ++i) {
-int64_t group = vhost_vdpa_get_vring_group(v->device_fd, i);
 
-if (unlikely(group < 0)) {
-return group;
+if (! (backend_features & BIT_ULL(VHOST_BACKEND_F_IOCTL_BATCH))) {
+cvq_group = vhost_vdpa_get_vring_group(v->device_fd, cvq_index);
+if (unlikely(cvq_group < 0)) {
+return cvq_group;
 }
+for (int i = 0; i < cvq_index; ++i) {
+int64_t group = vhost_vdpa_get_vring_group(v->device_fd, i);
 
-if (group == cvq_group) {
-return 0;
+if (unlikely(group < 0)) {
+return group;
+}
+
+if (group == cvq_group) {
+return 0;
+}
+}
+} else {
+cvq_group = vhost_vdpa_get_vring_group_batch(v->device_fd, cvq_index + 
1);
+
+if (unlikely(cvq_group <= 0)) {
+return cvq_group;
 }
 }
 
-- 
2.25.1




Move vhost-user SET_STATUS 0 after get vring base?

2023-04-18 Thread Stefan Hajnoczi
Hi,
Cindy's commit ca71db438bdc ("vhost: implement vhost_dev_start method")
added SET_STATUS calls to vhost_dev_start() and vhost_dev_stop() for all
vhost backends.

Eugenio's commit c3716f260bff ("vdpa: move vhost reset after get vring
base") deferred the SET_STATUS 0 call in vhost_dev_stop() until after
GET_VRING_BASE for vDPA only. In that commit Eugenio said, "A patch to
make vhost_user_dev_start more similar to vdpa is desirable, but it can
be added on top".

I agree and think it's a good idea to keep the vhost backends in sync
where possible.

vhost-user still has the old behavior where QEMU sends SET_STATUS 0
before GET_VRING_BASE. Most existing vhost-user backends don't implement
the SET_STATUS message, so I think no one has tripped over this yet.

Any thoughts on making vhost-user behave like vDPA here?

Stefan


signature.asc
Description: PGP signature


[PATCH v2 13/13] docs/system: add a basic enumeration of vhost-user devices

2023-04-18 Thread Alex Bennée
Make it clear the vhost-user-device is intended for expert use only.

Signed-off-by: Alex Bennée 

---
v2
  - make clear vhost-user-device for expert use
---
 docs/system/devices/vhost-user-rng.rst |  2 ++
 docs/system/devices/vhost-user.rst | 41 ++
 2 files changed, 43 insertions(+)

diff --git a/docs/system/devices/vhost-user-rng.rst 
b/docs/system/devices/vhost-user-rng.rst
index a145d4105c..ead1405326 100644
--- a/docs/system/devices/vhost-user-rng.rst
+++ b/docs/system/devices/vhost-user-rng.rst
@@ -1,3 +1,5 @@
+.. _vhost_user_rng:
+
 QEMU vhost-user-rng - RNG emulation
 ===
 
diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 86128114fa..7038cece3e 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -15,6 +15,47 @@ to the guest. The code is mostly boilerplate although each 
device has
 a ``chardev`` option which specifies the ID of the ``--chardev``
 device that connects via a socket to the vhost-user *daemon*.
 
+Each device will have an virtio-mmio and virtio-pci variant. See your
+platform details for what sort of virtio bus to use.
+
+.. list-table:: vhost-user devices
+  :widths: 20 20 60
+  :header-rows: 1
+
+  * - Device
+- Type
+- Notes
+  * - vhost-user-device
+- Generic Development Device
+- You must manually specify ``virtio-id`` and the correct ``num_vqs``. 
Intended for expert use.
+  * - vhost-user-blk
+- Block storage
+-
+  * - vhost-user-fs
+- File based storage driver
+- See https://gitlab.com/virtio-fs/virtiofsd
+  * - vhost-user-scsi
+- SCSI based storage
+- See contrib/vhost-user/scsi
+  * - vhost-user-gpio
+- Proxy gpio pins to host
+- See https://github.com/rust-vmm/vhost-device
+  * - vhost-user-i2c
+- Proxy i2c devices to host
+- See https://github.com/rust-vmm/vhost-device
+  * - vhost-user-input
+- Generic input driver
+- See contrib/vhost-user-input
+  * - vhost-user-rng
+- Entropy driver
+- :ref:`vhost_user_rng`
+  * - vhost-user-gpu
+- GPU driver
+-
+  * - vhost-user-vsock
+- Socket based communication
+- See https://github.com/rust-vmm/vhost-device
+
 vhost-user daemon
 =
 
-- 
2.39.2




Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests

2023-04-18 Thread Andrey Ryabinin



On 4/18/23 08:17, Michael S. Tsirkin wrote:
> On Tue, Apr 18, 2023 at 05:13:11AM +, Raphael Norwitz wrote:
>> Hey Andrey - apologies for the late reply here.
>>
>> It sounds like you are dealing with a buggy guest, rather than a QEMU issue.
>>
>>> On Apr 10, 2023, at 11:39 AM, Andrey Ryabinin  wrote:
>>>
>>>
>>>
>>> On 4/10/23 10:35, Andrey Ryabinin wrote:
 Some guests hang on boot when using the vhost-user-blk-pci device,
 but boot normally when using the virtio-blk device. The problem occurs
 because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
 before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to
>>
>> Virtio 1.1 Section 3.1.1, says during setup “[t]he driver MUST NOT notify 
>> the device before setting DRIVER_OK.”
>>
>> Therefore what you are describing is buggy guest behavior. Sounds like the 
>> driver should be made to either
>> - not advertise VIRTIO_F_VERSION_1
>> - not kick before setting VIRTIO_CONFIG_S_DRIVER_OK
>>
>> If anything, the virtio-blk virtio_blk_handle_output() function should 
>> probably check start_on_kick?
> 
> Question is, how easy is this guest to fix.
> 

I wouldn't count on that.

In this case the guest is Foritgate firewall, apparently from this guys 
https://www.fortinet.com/
It seems that the kernel they use claims itself as 3.2.16 Linux kernel, however 
it looks like
it's not vanilla kernel, but modified with some backports. I'm guessing that 
they backported
the patches introducing VIRTIO_F_VERSTION_1, but they didn't add this patch 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7a11370e5e6c26566904bb7f08281093a3002ff2

I've tried to look the sources of the kernel they use but failed to find any.
Found only some news about gpl voilation from 2005 )




Re: [PATCH 0/2] Migration time prediction using calc-dirty-rate

2023-04-18 Thread Daniel P . Berrangé
Juan,

This series could use some feedback from the migration maintainer
POV. I think it looks like a valuable idea to take which could
significantly help mgmt apps plan migration.

Daniel

On Tue, Apr 18, 2023 at 01:25:08PM +, Gudkov Andrei via wrote:
> ping5
> 
> https://patchew.org/QEMU/cover.1677589218.git.gudkov.and...@huawei.com/
> 
> -Original Message-
> From: Gudkov Andrei 
> Sent: Monday, April 10, 2023 18:19
> To: 'qemu-devel@nongnu.org' 
> Cc: 'quint...@redhat.com' ; 'dgilb...@redhat.com' 
> ; 'js...@redhat.com' ; 
> 'ebl...@redhat.com' 
> Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> ping4
> 
> https://patchew.org/QEMU/cover.1677589218.git.gudkov.and...@huawei.com/
> 
> -Original Message-
> From: Gudkov Andrei 
> Sent: Monday, April 3, 2023 17:42
> To: 'qemu-devel@nongnu.org' 
> Cc: 'quint...@redhat.com' ; 'dgilb...@redhat.com' 
> 
> Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> ping3
> 
> https://patchew.org/QEMU/cover.1677589218.git.gudkov.and...@huawei.com/
> 
> -Original Message-
> From: Gudkov Andrei 
> Sent: Monday, March 27, 2023 17:09
> To: 'qemu-devel@nongnu.org' 
> Cc: 'quint...@redhat.com' ; 'dgilb...@redhat.com' 
> 
> Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> ping2
> 
> https://patchew.org/QEMU/cover.1677589218.git.gudkov.and...@huawei.com/
> 
> -Original Message-
> From: Gudkov Andrei 
> Sent: Friday, March 17, 2023 16:29
> To: qemu-devel@nongnu.org
> Cc: quint...@redhat.com; dgilb...@redhat.com
> Subject: RE: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> ping
> 
> https://patchew.org/QEMU/cover.1677589218.git.gudkov.and...@huawei.com/
> 
> -Original Message-
> From: Gudkov Andrei 
> Sent: Tuesday, February 28, 2023 16:16
> To: qemu-devel@nongnu.org
> Cc: quint...@redhat.com; dgilb...@redhat.com; Gudkov Andrei 
> 
> Subject: [PATCH 0/2] Migration time prediction using calc-dirty-rate
> 
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v19 01/21] s390x/cpu topology: add s390 specifics to CPU topology

2023-04-18 Thread Pierre Morel



On 4/18/23 14:38, Nina Schoetterl-Glausch wrote:

On Tue, 2023-04-18 at 12:01 +0200, Pierre Morel wrote:

On 4/18/23 10:53, Nina Schoetterl-Glausch wrote:

On Mon, 2023-04-03 at 18:28 +0200, Pierre Morel wrote:

S390 adds two new SMP levels, drawers and books to the CPU
topology.
The S390 CPU have specific topology features like dedication
and entitlement to give to the guest indications on the host
vCPUs scheduling and help the guest take the best decisions
on the scheduling of threads on the vCPUs.

Let us provide the SMP properties with books and drawers levels
and S390 CPU with dedication and entitlement,

Signed-off-by: Pierre Morel 
Reviewed-by: Thomas Huth 
---

[...]

diff --git a/qapi/machine-common.json b/qapi/machine-common.json
new file mode 100644
index 00..73ea38d976
--- /dev/null
+++ b/qapi/machine-common.json
@@ -0,0 +1,22 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or later.
+# See the COPYING file in the top-level directory.
+
+##
+# = Machines S390 data types
+##
+
+##
+# @CpuS390Entitlement:
+#
+# An enumeration of cpu entitlements that can be assumed by a virtual
+# S390 CPU
+#
+# Since: 8.1
+##
+{ 'enum': 'CpuS390Entitlement',
+  'prefix': 'S390_CPU_ENTITLEMENT',
+  'data': [ 'horizontal', 'low', 'medium', 'high' ] }

You can get rid of the horizontal value now that the entitlement is ignored if 
the
polarization is vertical.


Right, horizontal is not used, but what would you like?

- replace horizontal with 'none' ?

- add or substract 1 when we do the conversion between enum string and
value ?

Yeah, I would completely drop it because it is a meaningless value
and adjust the conversion to the cpu value accordingly.

frankly I prefer to keep horizontal here which is exactly what is given
in the documentation for entitlement = 0

Not sure what you mean with this.


I mean: Extract from the PoP:



The following values are used:
PP Meaning
0 The one or more CPUs represented by the TLE are
horizontally polarized.
1 The one or more CPUs represented by the TLE are
vertically polarized. Entitlement is low.
2 The one or more CPUs represented by the TLE are
vertically polarized. Entitlement is medium.
3 The one or more CPUs represented by the TLE are
vertically polarized. Entitlement is high.



Also I find that using an enum to systematically add/subtract a value is 
for me weird.


so I really prefer to keep "horizontal", "low", "medium", "high" event 
"horizontal" will never appear.


A mater of taste, it does not change anything to the functionality or 
the API.







[...]


diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
index b10a8541ff..57165fa3a0 100644
--- a/target/s390x/cpu.c
+++ b/target/s390x/cpu.c
@@ -37,6 +37,7 @@
   #ifndef CONFIG_USER_ONLY
   #include "sysemu/reset.h"
   #endif
+#include "hw/s390x/cpu-topology.h"
   
   #define CR0_RESET   0xE0UL

   #define CR14_RESET  0xC200UL;
@@ -259,6 +260,12 @@ static gchar *s390_gdb_arch_name(CPUState *cs)
   static Property s390x_cpu_properties[] = {
   #if !defined(CONFIG_USER_ONLY)
   DEFINE_PROP_UINT32("core-id", S390CPU, env.core_id, 0),
+DEFINE_PROP_INT32("socket-id", S390CPU, env.socket_id, -1),
+DEFINE_PROP_INT32("book-id", S390CPU, env.book_id, -1),
+DEFINE_PROP_INT32("drawer-id", S390CPU, env.drawer_id, -1),
+DEFINE_PROP_BOOL("dedicated", S390CPU, env.dedicated, false),
+DEFINE_PROP_UINT8("entitlement", S390CPU, env.entitlement,
+  S390_CPU_ENTITLEMENT__MAX),

I would define an entitlement PropertyInfo in qdev-properties-system.[ch],
then one can use e.g.

-device z14-s390x-cpu,core-id=11,entitlement=high


Don't you think it is an enhancement we can do later?

It's a user visible change, so no.



We could have kept both string and integer.



But it's not complicated, should be just:

const PropertyInfo qdev_prop_cpus390entitlement = {
 .name = "CpuS390Entitlement",
 .enum_table = _lookup,
 .get   = qdev_propinfo_get_enum,
 .set   = qdev_propinfo_set_enum,
 .set_default_value = qdev_propinfo_set_default_value_enum,
};

Plus a comment & build bug in qdev-properties-system.c

and

extern const PropertyInfo qdev_prop_cpus390entitlement;
#define DEFINE_PROP_CPUS390ENTITLEMENT(_n, _s, _f, _d) \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_cpus390entitlement, \
CpuS390Entitlement)

in qdev-properties-system.h

You need to change the type of env.entitlement and set the default to 1 for 
medium
and that should be it.



OK, it does not change anything to the functionality but is a little bit 
more pretty.






on the command line and cpu hotplug.

I think setting the default entitlement to medium here should be fine.

[...]

right, I had medium before and should not have change it.

Anyway what ever the default is, it must be changed later depending on
dedication.

No, you can just set it to medium and get rid of the 

Re: [PATCH] coverity: unify Fedora dockerfiles

2023-04-18 Thread Daniel P . Berrangé
On Fri, Mar 31, 2023 at 01:48:44PM -0400, Paolo Bonzini wrote:
> The Fedora CI and coverity runs are using a slightly different set of
> packages.  Copy most of the content over from tests/docker while
> keeping the commands at the end that unpack the tools.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  scripts/coverity-scan/coverity-scan.docker | 250 -
>  1 file changed, 145 insertions(+), 105 deletions(-)
> 
> diff --git a/scripts/coverity-scan/coverity-scan.docker 
> b/scripts/coverity-scan/coverity-scan.docker
> index 6f60a52d23..a349578526 100644
> --- a/scripts/coverity-scan/coverity-scan.docker
> +++ b/scripts/coverity-scan/coverity-scan.docker
> @@ -15,112 +15,152 @@
>  # The work of actually doing the build is handled by the
>  # run-coverity-scan script.

snip

> +   zstd && \
> +nosync dnf autoremove -y && \
> +nosync dnf clean all -y && \
> +rpm -qa | sort > /packages.txt && \
> +mkdir -p /usr/libexec/ccache-wrappers && \
> +ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/c++ && \
> +ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/cc && \
> +ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/clang && \
> +ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/g++ && \
> +ln -s /usr/bin/ccache /usr/libexec/ccache-wrappers/gcc
> +
> +ENV CCACHE_WRAPPERSDIR "/usr/libexec/ccache-wrappers"
> +ENV LANG "en_US.UTF-8"
> +ENV MAKE "/usr/bin/make"
> +ENV NINJA "/usr/bin/ninja"
> +ENV PYTHON "/usr/bin/python3"
> +ENV QEMU_CONFIGURE_OPTS --meson=internal
> +
> +RUN dnf install -y curl wget

Note this leaves the dnf cache since it doesn't remove 'clean all',
and thus bloats the container layer.

>  ENV COVERITY_TOOL_BASE=/coverity-tools
>  COPY coverity_tool.tgz coverity_tool.tgz
>  RUN mkdir -p /coverity-tools/coverity_tool && cd 
> /coverity-tools/coverity_tool && tar xf /coverity_tool.tgz

We could actually make this entire thing be generated by the
tests/lcitool/refresh script

Create  tests/lcitool/projects/coverity.yml with

--
packages:
  - curl
  - wget

And then pass *both*  'qemu' and 'coverity' as project names when
generating the container, so it'll create a dockerfile that installs
both sets of packages in one command.

The ENV/COPY/RUN commands can be put in the refersh script


coverity_extras = [
 "ENV COVERITY_TOOL_BASE=/coverity-tools"
 "COPY coverity_tool.tgz coverity_tool.tgz"
 "RUN mkdir -p /coverity-tools/coverity_tool && cd 
/coverity-tools/coverity_tool && tar xf /coverity_tool.tgz"
]

and adding  trailer="".join(coverity_extras)

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 2/3] target/arm: Set ptw->out_secure correctly for stage 2 translations

2023-04-18 Thread Richard Henderson

On 4/18/23 13:30, Peter Maydell wrote:

On Tue, 18 Apr 2023 at 12:01, Richard Henderson
 wrote:


On 4/14/23 18:04, Peter Maydell wrote:

+/* Check if page table walk is to secure or non-secure PA space. */
+ptw->out_secure = (is_secure
+   && !(pte_secure
+? env->cp15.vstcr_el2 & VSTCR_SW
+: env->cp15.vtcr_el2 & VTCR_NSW));
+} else {
+/* Regime is physical */
+ptw->out_secure = pte_secure;


Is that last comment really correct?  I think it could still be stage1 of 2.


I borrowed the comment from earlier in the function, in the ptw->in_debug
branch of the code, which has the same

if (regime_is_stage2(s2_mmu_idx)) {
   ...stuff...
} else {
   /* Regime is physical */
}

structure as this one does after this patch. If s2_mmu_idx isn't
a stage 2 index and it's not one of the Phys indexes, what is it ?


Oh, right.  Nevermind.

r~




Re: [PATCH v19 01/21] s390x/cpu topology: add s390 specifics to CPU topology

2023-04-18 Thread Daniel P . Berrangé
On Tue, Apr 04, 2023 at 02:26:05PM +0200, Pierre Morel wrote:
> 
> On 4/4/23 09:03, Cédric Le Goater wrote:
> > On 4/3/23 18:28, Pierre Morel wrote:
> > > diff --git a/include/hw/s390x/cpu-topology.h
> > > b/include/hw/s390x/cpu-topology.h
> > > new file mode 100644
> > > index 00..83f31604cc
> > > --- /dev/null
> > > +++ b/include/hw/s390x/cpu-topology.h
> > > @@ -0,0 +1,15 @@
> > > +/*
> > > + * CPU Topology
> > > + *
> > > + * Copyright IBM Corp. 2022
> > 
> > Shouldn't we have some range : 2022-2023 ?
> 
> There was a discussion on this in the first spins, I think to remember that
> Nina wanted 22 and Thomas 23,
> 
> now we have a third opinion :) .
> 
> I must say that all three have their reasons and I take what the majority
> wants.
> 
> A vote?

Whether or not to include a single year, or range of years in
the copyright statement is ultimately a policy decision for the
copyright holder to take (IBM in this case I presume), and not
subject to community vote/preferences.

I will note that some (possibly even many) organizations consider
the year to be largely redundant and devoid of legal benefit, so
are happy with basically any usage of dates (first year, most recent
year, a range of years, or none at all). With this in mind, QEMU is
willing to accept any usage wrt dates in the copyright statement.

It is possible that IBM have a specific policy their employees are
expected to follow. If so, follow that.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 0/5] Support both Ethernet interfaces on i.MX6UL and i.MX7

2023-04-18 Thread Guenter Roeck

On 4/18/23 08:32, Peter Maydell wrote:

On Tue, 18 Apr 2023 at 16:18, Guenter Roeck  wrote:

On 4/18/23 07:46, Peter Maydell wrote:

I guess I don't understand what the topology is for these specific
SoCs, then. If there's only one master that might be connected
to multiple PHYs, why does one ethernet device in QEMU need to
know about the other one? Are the PHYs connected to just that
first ethernet device, or to both? This bit in your cover letter
makes it sound like "both ethernet interfaces connect to the same
MDIO bus which has both PHYs on it":



Yes, that is exactly how it is, similar to the configuration in the picture
at prodigytechno.com. I don't recall what I wrote in the cover letter, but
"Both Ethernet PHYs connect to the same MDIO bus which is connected to one
of the Ethernet MACs" would be the most accurate description I can think of.



Each MAC (Ethernet interface, instance of TYPE_IMX_FEC in qemu) has its own
MDIO bus. Currently QEMU assumes that each PHY is connected to the MDIO bus
on its associated MAC interface. That is not the case on the emulated boards,
where all PHYs are connected to a single MDIO bus.


So looking again at that diagram on that website, I think I understand
now: for data transfer to/from the outside world, MAC1 talks only through
PHY1 and MAC2 only through PHY2 (over the links marked "MII/GMII/XGMII"),
but the "control" connection is via MDIO, and on these boards you have to
configure PHY2 by doing the MDIO reads and writes via MAC1, even though
MAC1 has nothing otherwise to do with PHY2 ? (And MAC2 has no devices on
its MDIO bus at all.)



Correct.

Thanks,
Guenter




Re: QEMU developers fortnightly conference call for agenda for 2023-04-18

2023-04-18 Thread Alex Bennée


Juan Quintela  writes:

> Hi
>
> Please, send any topic that you are interested in covering.
>

>
>  Call details:

Please find the recording at:

  https://fileserver.linaro.org/s/nJTSCLyQBfo6GLJ

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH 0/2] tests/migration: Fix migration-test slowdown

2023-04-18 Thread Daniel P . Berrangé
On Tue, Apr 18, 2023 at 03:19:33PM +0200, Juan Quintela wrote:
> Thomas Huth  wrote:
> > On 18/04/2023 13.42, Juan Quintela wrote:
> >> Thomas Huth  wrote:
> >>> On 12/04/2023 16.19, Juan Quintela wrote:
>  Since commit:
>  commit 1bfc8dde505f1e6a92697c52aa9b09e81b54c78f
>  Author: Dr. David Alan Gilbert 
>  Date:   Mon Mar 6 15:26:12 2023 +
>    tests/migration: Tweek auto converge limits check
>    Thomas found an autoconverge test failure where the
>    migration completed before the autoconverge had kicked in.
>    [...]
>  migration-test has become very slow.
>  On my laptop, before that commit migration-test takes 2min10seconds
>  After that commit, it takes around 11minutes
>  We can't revert it because it fixes a real problem when the host
>  machine is overloaded.  See the comment on test_migrate_auto_converge().
> >>>
> >>> Thanks, your patches decrease the time to run the migration-test from
> >>> 16 minutes down to 5 minutes on my system, that's a great improvement,
> >>> indeed!
> >>>
> >>> Tested-by: Thomas Huth 
> >> Thanks
> >> 
> >>> (though 5 minutes are still quite a lot for qtests ... maybe some
> >>> other parts could be moved to only run with g_test_slow() ?)
> >> Hi
> >> Could you gime the output of:
> >> time for i in $(./tests/qtest/migration-test -l | grep "^/"); do
> >> echo $i; time ./tests/qtest/migration-test -p $i; done
> >> To see what tests are taking so long on your system?
> >> On my system (i9900K processor, i.e. not the latest) and
> >> auto_converge
> >> moved to slow the total of the tests take a bit more than 1 minute.
> >
> > This is with both of your patches applied:
> 
> 
> > /x86_64/migration/postcopy/plain
> > /x86_64/migration/postcopy/plain: OK
> >
> > real0m35,446s
> > user0m47,208s
> > sys 0m11,828s
> 
> This is quite slower than on mine, basically almost all the code that
> does migration.

This is expected AFAIK.  The migrate_postcopy_prepare method
waits for 1 complete pre-copy pass to run at 3mbps, before
switching to pre-copy mode.

> 
> $ time ./tests/qtest/migration-test -p /x86_64/migration/postcopy/plain
> # random seed: R02S42809b71f513e8524bd24df5facd5768
> # Start of x86_64 tests
> # Start of migration tests
> # Start of postcopy tests
> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-246853.sock 
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-246853.qmp,id=char0 -mon 
> chardev=char0,mode=control -display none -accel kvm -accel tcg -name 
> source,debug-threads=on -m 150M -serial 
> file:/tmp/migration-test-1MGL31/src_serial -drive 
> file=/tmp/migration-test-1MGL31/bootsect,format=raw-accel qtest
> # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-246853.sock 
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-246853.qmp,id=char0 -mon 
> chardev=char0,mode=control -display none -accel kvm -accel tcg -name 
> target,debug-threads=on -m 150M -serial 
> file:/tmp/migration-test-1MGL31/dest_serial -incoming 
> unix:/tmp/migration-test-1MGL31/migsocket -drive 
> file=/tmp/migration-test-1MGL31/bootsect,format=raw-accel qtest
> ok 1 /x86_64/migration/postcopy/plain
> # End of postcopy tests
> # End of migration tests
> # End of x86_64 tests
> 1..1
> 
> real  0m1.104s
> user  0m0.697s
> sys   0m0.414s

That is surprisingly fast - it is like it is not doing the pre-copy
pass at all.


> > real5m32,733s
> > user7m24,380s
> > sys 1m50,801s
> 
> Ouch.
> 
> Can I ask:
> - what is your machine?  It is specially slow?
>   Otherwise I want to know why it is happening.

This matches what I see in my laptop - any test which runs a full
pre-copy pass gets 30 seconds time added for this phase


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: Move vhost-user SET_STATUS 0 after get vring base?

2023-04-18 Thread Eugenio Perez Martin
On Tue, Apr 18, 2023 at 5:18 PM Stefan Hajnoczi  wrote:
>
> Hi,
> Cindy's commit ca71db438bdc ("vhost: implement vhost_dev_start method")
> added SET_STATUS calls to vhost_dev_start() and vhost_dev_stop() for all
> vhost backends.
>
> Eugenio's commit c3716f260bff ("vdpa: move vhost reset after get vring
> base") deferred the SET_STATUS 0 call in vhost_dev_stop() until after
> GET_VRING_BASE for vDPA only. In that commit Eugenio said, "A patch to
> make vhost_user_dev_start more similar to vdpa is desirable, but it can
> be added on top".
>
> I agree and think it's a good idea to keep the vhost backends in sync
> where possible.
>
> vhost-user still has the old behavior where QEMU sends SET_STATUS 0
> before GET_VRING_BASE. Most existing vhost-user backends don't implement
> the SET_STATUS message, so I think no one has tripped over this yet.
>

My bet is that those backends simply do not migrate so they don't hit
it. But maybe those backends return -1 for GET_VRING_BASE and use
split vq, so it can be fetched from guest's used idx?

> Any thoughts on making vhost-user behave like vDPA here?
>

I guess the first step should be to gather a list of backends that use
SET_STATUS and are interested in migration. But in my opinion the
current behavior can be considered a bug and it is unlikely that it is
implemented properly there.

* If they ignore the set_status, we can totally reorder the order and
it will be the same.
* If they always return an error for GET_VRING_BASE then they will
keep it returning, so no harm here either.
* If they use more complicated logic like "return -1 for
GET_VRING_BASE as long as the device is not DRIVER_OK". Improving the
situation in this case.

Thanks!




[PATCH] .gitlab-ci.d/cirrus: Drop the CI job for compiling with FreeBSD 12

2023-04-18 Thread Thomas Huth
FreeBSD 13.0 has been released in April 2021:

 https://www.freebsd.org/releases/13.0R/announce/

According to QEMU's support policy, we stop supporting the previous
major release two years after the the new major release has been
published. So we can stop testing FreeBSD 12 in our CI now.

Signed-off-by: Thomas Huth 
---
 We should likely also update tests/vm/freebsd ... however, FreeBSD 13
 seems not to use the serial console by default anymore, so I've got
 no clue how we could use their images now... Does anybody have any
 suggestions?

 .gitlab-ci.d/cirrus.yml | 13 -
 .gitlab-ci.d/cirrus/freebsd-12.vars | 16 
 tests/lcitool/refresh   |  1 -
 3 files changed, 30 deletions(-)
 delete mode 100644 .gitlab-ci.d/cirrus/freebsd-12.vars

diff --git a/.gitlab-ci.d/cirrus.yml b/.gitlab-ci.d/cirrus.yml
index 502dfd612c..1507c928e5 100644
--- a/.gitlab-ci.d/cirrus.yml
+++ b/.gitlab-ci.d/cirrus.yml
@@ -44,19 +44,6 @@
   variables:
 QEMU_JOB_CIRRUS: 1
 
-x64-freebsd-12-build:
-  extends: .cirrus_build_job
-  variables:
-NAME: freebsd-12
-CIRRUS_VM_INSTANCE_TYPE: freebsd_instance
-CIRRUS_VM_IMAGE_SELECTOR: image_family
-CIRRUS_VM_IMAGE_NAME: freebsd-12-4
-CIRRUS_VM_CPUS: 8
-CIRRUS_VM_RAM: 8G
-UPDATE_COMMAND: pkg update; pkg upgrade -y
-INSTALL_COMMAND: pkg install -y
-TEST_TARGETS: check
-
 x64-freebsd-13-build:
   extends: .cirrus_build_job
   variables:
diff --git a/.gitlab-ci.d/cirrus/freebsd-12.vars 
b/.gitlab-ci.d/cirrus/freebsd-12.vars
deleted file mode 100644
index 44d8a2a511..00
--- a/.gitlab-ci.d/cirrus/freebsd-12.vars
+++ /dev/null
@@ -1,16 +0,0 @@
-# THIS FILE WAS AUTO-GENERATED
-#
-#  $ lcitool variables freebsd-12 qemu
-#
-# https://gitlab.com/libvirt/libvirt-ci
-
-CCACHE='/usr/local/bin/ccache'
-CPAN_PKGS=''
-CROSS_PKGS=''
-MAKE='/usr/local/bin/gmake'
-NINJA='/usr/local/bin/ninja'
-PACKAGING_COMMAND='pkg'
-PIP3='/usr/local/bin/pip-3.8'
-PKGS='alsa-lib bash bison bzip2 ca_root_nss capstone4 ccache 
cdrkit-genisoimage cmocka ctags curl cyrus-sasl dbus diffutils dtc flex 
fusefs-libs3 gettext git glib gmake gnutls gsed gtk3 json-c libepoxy libffi 
libgcrypt libjpeg-turbo libnfs libslirp libspice-server libssh libtasn1 llvm 
lzo2 meson ncurses nettle ninja opencv pixman pkgconf png py39-numpy 
py39-pillow py39-pip py39-sphinx py39-sphinx_rtd_theme py39-yaml python3 
rpm2cpio sdl2 sdl2_image snappy sndio socat spice-protocol tesseract usbredir 
virglrenderer vte3 zstd'
-PYPI_PKGS=''
-PYTHON='/usr/local/bin/python3'
diff --git a/tests/lcitool/refresh b/tests/lcitool/refresh
index c0d7ad5516..4c568242d2 100755
--- a/tests/lcitool/refresh
+++ b/tests/lcitool/refresh
@@ -182,7 +182,6 @@ try:
 #
 # Cirrus packages lists for GitLab
 #
-generate_cirrus("freebsd-12")
 generate_cirrus("freebsd-13")
 generate_cirrus("macos-12")
 
-- 
2.31.1




[PATCH v2 06/13] include/hw/virtio: document some more usage of notifiers

2023-04-18 Thread Alex Bennée
Lets document some more of the core VirtIODevice structure.

Signed-off-by: Alex Bennée 
---
 include/hw/virtio/virtio.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 1ba7a9dd74..ef77e9ef0e 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -150,10 +150,18 @@ struct VirtIODevice
 VMChangeStateEntry *vmstate;
 char *bus_name;
 uint8_t device_endian;
+/**
+ * @user_guest_notifier_mask: gate usage of ->guest_notifier_mask() 
callback.
+ * This is used to suppress the masking of guest updates for
+ * vhost-user devices which are asynchronous by design.
+ */
 bool use_guest_notifier_mask;
 AddressSpace *dma_as;
 QLIST_HEAD(, VirtQueue) *vector_queues;
 QTAILQ_ENTRY(VirtIODevice) next;
+/**
+ * @config_notifier: the event notifier that handles config events
+ */
 EventNotifier config_notifier;
 };
 
-- 
2.39.2




[PATCH v2 09/13] hw/virtio: derive vhost-user-rng from vhost-user-device

2023-04-18 Thread Alex Bennée
Now we can take advantage of our new base class and make
vhost-user-rng a much simpler boilerplate wrapper. Also as this
doesn't require any target specific hacks we only need to build the
stubs once.

Signed-off-by: Alex Bennée 

---
v2
  - new derivation layout
  - move directly to softmmu_virtio_ss
---
 include/hw/virtio/vhost-user-rng.h |  11 +-
 hw/virtio/vhost-user-rng.c | 277 +++--
 hw/virtio/meson.build  |   7 +-
 3 files changed, 28 insertions(+), 267 deletions(-)

diff --git a/include/hw/virtio/vhost-user-rng.h 
b/include/hw/virtio/vhost-user-rng.h
index ddd9f01eea..13139c0d9d 100644
--- a/include/hw/virtio/vhost-user-rng.h
+++ b/include/hw/virtio/vhost-user-rng.h
@@ -12,21 +12,14 @@
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
-#include "chardev/char-fe.h"
+#include "hw/virtio/vhost-user-device.h"
 
 #define TYPE_VHOST_USER_RNG "vhost-user-rng"
 OBJECT_DECLARE_SIMPLE_TYPE(VHostUserRNG, VHOST_USER_RNG)
 
 struct VHostUserRNG {
 /*< private >*/
-VirtIODevice parent;
-CharBackend chardev;
-struct vhost_virtqueue *vhost_vq;
-struct vhost_dev vhost_dev;
-VhostUserState vhost_user;
-VirtQueue *req_vq;
-bool connected;
-
+VHostUserBase parent;
 /*< public >*/
 };
 
diff --git a/hw/virtio/vhost-user-rng.c b/hw/virtio/vhost-user-rng.c
index efc54cd3fb..71d3991f93 100644
--- a/hw/virtio/vhost-user-rng.c
+++ b/hw/virtio/vhost-user-rng.c
@@ -3,7 +3,7 @@
  *
  * Copyright (c) 2021 Mathieu Poirier 
  *
- * Implementation seriously tailored on vhost-user-i2c.c
+ * Simple wrapper of the generic vhost-user-device.
  *
  * SPDX-License-Identifier: GPL-2.0-or-later
  */
@@ -13,281 +13,46 @@
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/vhost-user-rng.h"
-#include "qemu/error-report.h"
 #include "standard-headers/linux/virtio_ids.h"
 
-static const int feature_bits[] = {
-VIRTIO_F_RING_RESET,
-VHOST_INVALID_FEATURE_BIT
-};
-
-static void vu_rng_start(VirtIODevice *vdev)
-{
-VHostUserRNG *rng = VHOST_USER_RNG(vdev);
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-int ret;
-int i;
-
-if (!k->set_guest_notifiers) {
-error_report("binding does not support guest notifiers");
-return;
-}
-
-ret = vhost_dev_enable_notifiers(>vhost_dev, vdev);
-if (ret < 0) {
-error_report("Error enabling host notifiers: %d", -ret);
-return;
-}
-
-ret = k->set_guest_notifiers(qbus->parent, rng->vhost_dev.nvqs, true);
-if (ret < 0) {
-error_report("Error binding guest notifier: %d", -ret);
-goto err_host_notifiers;
-}
-
-rng->vhost_dev.acked_features = vdev->guest_features;
-ret = vhost_dev_start(>vhost_dev, vdev, true);
-if (ret < 0) {
-error_report("Error starting vhost-user-rng: %d", -ret);
-goto err_guest_notifiers;
-}
-
-/*
- * guest_notifier_mask/pending not used yet, so just unmask
- * everything here. virtio-pci will do the right thing by
- * enabling/disabling irqfd.
- */
-for (i = 0; i < rng->vhost_dev.nvqs; i++) {
-vhost_virtqueue_mask(>vhost_dev, vdev, i, false);
-}
-
-return;
-
-err_guest_notifiers:
-k->set_guest_notifiers(qbus->parent, rng->vhost_dev.nvqs, false);
-err_host_notifiers:
-vhost_dev_disable_notifiers(>vhost_dev, vdev);
-}
-
-static void vu_rng_stop(VirtIODevice *vdev)
-{
-VHostUserRNG *rng = VHOST_USER_RNG(vdev);
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-int ret;
-
-if (!k->set_guest_notifiers) {
-return;
-}
-
-vhost_dev_stop(>vhost_dev, vdev, true);
-
-ret = k->set_guest_notifiers(qbus->parent, rng->vhost_dev.nvqs, false);
-if (ret < 0) {
-error_report("vhost guest notifier cleanup failed: %d", ret);
-return;
-}
-
-vhost_dev_disable_notifiers(>vhost_dev, vdev);
-}
-
-static void vu_rng_set_status(VirtIODevice *vdev, uint8_t status)
-{
-VHostUserRNG *rng = VHOST_USER_RNG(vdev);
-bool should_start = virtio_device_should_start(vdev, status);
-
-if (vhost_dev_is_started(>vhost_dev) == should_start) {
-return;
-}
-
-if (should_start) {
-vu_rng_start(vdev);
-} else {
-vu_rng_stop(vdev);
-}
-}
-
-static uint64_t vu_rng_get_features(VirtIODevice *vdev,
-uint64_t requested_features, Error **errp)
-{
-VHostUserRNG *rng = VHOST_USER_RNG(vdev);
-
-return vhost_get_features(>vhost_dev, feature_bits,
-  requested_features);
-}
-
-static void vu_rng_handle_output(VirtIODevice *vdev, VirtQueue *vq)
-{
-/*
- * Not normally called; it's the daemon that handles the queue;
- * however virtio's cleanup path can call this.
- */
-}
-
-static void 

Re: [PATCH v3] test: Fix test-crypto-secret when compiling without keyring support

2023-04-18 Thread Daniel P . Berrangé
On Fri, Apr 14, 2023 at 01:42:52PM +0200, Juan Quintela wrote:
> Linux keyring support is protected by CONFIG_KEYUTILS.
> We also need CONFIG_SECRET_KEYRING.
> 
> Signed-off-by: Juan Quintela 
> 
> ---
> 
> - Previous version of this patch changed the meson build rules.
>   Daniel told me that the proper fix was to change the #ifdef test.
> 
> - Change rule again.  We need both defines.
> - Put both defines in #endif (thomas)
> ---
>  tests/unit/test-crypto-secret.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests

2023-04-18 Thread Andrey Ryabinin
On 4/18/23 07:13, Raphael Norwitz wrote:
> Hey Andrey - apologies for the late reply here.
> 
> It sounds like you are dealing with a buggy guest, rather than a QEMU issue.

No arguing here, the guest is buggy.
However, the issue with QEMU is that virtio-blk tolerate such buggy guest
while vhost-user-blk is not.
We've been using virtio-blk in our cloud for a while and recently started 
switching to vhost-user-blk
which led us to discover this problem.

>> On Apr 10, 2023, at 11:39 AM, Andrey Ryabinin  wrote:
>>
>>
>>
>> On 4/10/23 10:35, Andrey Ryabinin wrote:
>>> Some guests hang on boot when using the vhost-user-blk-pci device,
>>> but boot normally when using the virtio-blk device. The problem occurs
>>> because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
>>> before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to
> 
> Virtio 1.1 Section 3.1.1, says during setup “[t]he driver MUST NOT notify the 
> device before setting DRIVER_OK.”
> 
> Therefore what you are describing is buggy guest behavior. Sounds like the 
> driver should be made to either
> - not advertise VIRTIO_F_VERSION_1
> - not kick before setting VIRTIO_CONFIG_S_DRIVER_OK
> 
> If anything, the virtio-blk virtio_blk_handle_output() function should 
> probably check start_on_kick?
> 

Ideally this should have been done from the start. But if we do it now we'll 
just break these guests.




[PATCH v2 2/4] hw/acpi: arm: bump MADT to revision 5

2023-04-18 Thread Eric DeVolder
Currently ARM QEMU generates, and reports, MADT revision 4. ACPI 6.3
introduces MADT revision 5.

For MADT revision 5, the GICC structure adds an SPE Overflow Interrupt
field. This new 2-byte field is created from the existing 3-byte
Reserved field. The spec indicates if the SPE overflow interrupt is
not supported, to zero the field.

Signed-off-by: Eric DeVolder 
---
 hw/arm/virt-acpi-build.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 4156111d49..23268dd981 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -705,7 +705,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 int i;
 VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
 const MemMapEntry *memmap = vms->memmap;
-AcpiTable table = { .sig = "APIC", .rev = 4, .oem_id = vms->oem_id,
+AcpiTable table = { .sig = "APIC", .rev = 5, .oem_id = vms->oem_id,
 .oem_table_id = vms->oem_table_id };
 
 acpi_table_begin(, table_data);
@@ -763,7 +763,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 /* Processor Power Efficiency Class */
 build_append_int_noprefix(table_data, 0, 1);
 /* Reserved */
-build_append_int_noprefix(table_data, 0, 3);
+build_append_int_noprefix(table_data, 0, 1);
+/* SPE overflow Interrupt */
+build_append_int_noprefix(table_data, 0, 2);
 }
 
 if (vms->gic_version != VIRT_GIC_VERSION_2) {
-- 
2.31.1




[PATCH v2 4/4] ACPI: bios-tables-test.c step 5 (updated expected table binaries)

2023-04-18 Thread Eric DeVolder
Following the guidelines in tests/qtest/bios-tables-test.c, this
is step 6.

For the cpuhp test case, it is started with:
 -smp 2,cores=3,sockets=2,maxcpus=6

So two of six CPUs are present, leaving 4 hot-pluggable CPUs. This
is what the disassembly diff below shows (two entries with Enabled=1
and the new Online Capable bit 0, and four entries with Enabled=0 and
Online Capable bit 1).

 --- /tmp/asl-NP2E31.dsl2023-04-18 10:46:26.483612104 -0400
 +++ /tmp/asl-C03E31.dsl2023-04-18 10:46:26.481612093 -0400
 @@ -1,89 +1,89 @@
  /*
   * Intel ACPI Component Architecture
   * AML/ASL+ Disassembler version 20230331 (64-bit version)
   * Copyright (c) 2000 - 2023 Intel Corporation
   *
 - * Disassembly of tests/data/acpi/pc/APIC.cphp, Tue Apr 18 10:46:26 2023
 + * Disassembly of /tmp/aml-6A5E31, Tue Apr 18 10:46:26 2023
   *
   * ACPI Data Table [APIC]
   *
   * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue (in 
hex)
   */

  [000h  004h]   Signature : "APIC"[Multiple APIC 
Description Table (MADT)]
  [004h 0004 004h]Table Length : 00A0
 -[008h 0008 001h]Revision : 01
 -[009h 0009 001h]Checksum : 18
 +[008h 0008 001h]Revision : 05
 +[009h 0009 001h]Checksum : 0C
  [00Ah 0010 006h]  Oem ID : "BOCHS "
  [010h 0016 008h]Oem Table ID : "BXPC"
  [018h 0024 004h]Oem Revision : 0001
  [01Ch 0028 004h] Asl Compiler ID : "BXPC"
  [020h 0032 004h]   Asl Compiler Revision : 0001

  [024h 0036 004h]  Local Apic Address : FEE0
  [028h 0040 004h]   Flags (decoded below) : 0001
   PC-AT Compatibility : 1

  [02Ch 0044 001h]   Subtable Type : 00 [Processor Local APIC]
  [02Dh 0045 001h]  Length : 08
  [02Eh 0046 001h]Processor ID : 00
  [02Fh 0047 001h]   Local Apic ID : 00
  [030h 0048 004h]   Flags (decoded below) : 0001
 Processor Enabled : 1
Runtime Online Capable : 0

  [034h 0052 001h]   Subtable Type : 00 [Processor Local APIC]
  [035h 0053 001h]  Length : 08
  [036h 0054 001h]Processor ID : 01
  [037h 0055 001h]   Local Apic ID : 01
  [038h 0056 004h]   Flags (decoded below) : 0001
 Processor Enabled : 1
Runtime Online Capable : 0

  [03Ch 0060 001h]   Subtable Type : 00 [Processor Local APIC]
  [03Dh 0061 001h]  Length : 08
  [03Eh 0062 001h]Processor ID : 02
  [03Fh 0063 001h]   Local Apic ID : 02
 -[040h 0064 004h]   Flags (decoded below) : 
 +[040h 0064 004h]   Flags (decoded below) : 0002
 Processor Enabled : 0
 -  Runtime Online Capable : 0
 +  Runtime Online Capable : 1

  [044h 0068 001h]   Subtable Type : 00 [Processor Local APIC]
  [045h 0069 001h]  Length : 08
  [046h 0070 001h]Processor ID : 03
  [047h 0071 001h]   Local Apic ID : 04
 -[048h 0072 004h]   Flags (decoded below) : 
 +[048h 0072 004h]   Flags (decoded below) : 0002
 Processor Enabled : 0
 -  Runtime Online Capable : 0
 +  Runtime Online Capable : 1

  [04Ch 0076 001h]   Subtable Type : 00 [Processor Local APIC]
  [04Dh 0077 001h]  Length : 08
  [04Eh 0078 001h]Processor ID : 04
  [04Fh 0079 001h]   Local Apic ID : 05
 -[050h 0080 004h]   Flags (decoded below) : 
 +[050h 0080 004h]   Flags (decoded below) : 0002
 Processor Enabled : 0
 -  Runtime Online Capable : 0
 +  Runtime Online Capable : 1

  [054h 0084 001h]   Subtable Type : 00 [Processor Local APIC]
  [055h 0085 001h]  Length : 08
  [056h 0086 001h]Processor ID : 05
  [057h 0087 001h]   Local Apic ID : 06
 -[058h 0088 004h]   Flags (decoded below) : 
 +[058h 0088 004h]   Flags (decoded below) : 0002
 Processor Enabled : 0
 -  Runtime Online Capable : 0
 +  Runtime Online Capable : 1

  [05Ch 0092 001h]   Subtable Type : 01 [I/O APIC]
  [05Dh 0093 001h]  Length : 0C
  [05Eh 0094 001h] I/O Apic ID : 00
  [05Fh 0095 001h]Reserved : 00
  [060h 0096 004h] Address : FEC0
  [064h 0100 004h]   Interrupt : 

  [068h 0104 001h]   Subtable Type : 02 [Interrupt Source Override]

[PATCH v2 3/4] hw/acpi: i386: bump MADT to revision 5

2023-04-18 Thread Eric DeVolder
Currently i386 QEMU generates MADT revision 3, and reports
MADT revision 1. ACPI 6.3 introduces MADT revision 5.

For MADT revision 4, that introduces ARM GIC structures, which do
not apply to i386.

For MADT revision 5, the Local APIC flags introduces the Online
Capable bitfield.

Making MADT generate and report revision 5 will solve problems with
CPU hotplug (the Online Capable flag indicates hotpluggable CPUs).

Link: 
https://lore.kernel.org/linux-acpi/20230327191026.3454-1-eric.devol...@oracle.com/T/#t
Signed-off-by: Eric DeVolder 
---
 hw/i386/acpi-common.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
index 52e5c1439a..286c1c5c32 100644
--- a/hw/i386/acpi-common.c
+++ b/hw/i386/acpi-common.c
@@ -38,8 +38,15 @@ void pc_madt_cpu_entry(int uid, const CPUArchIdList 
*apic_ids,
 {
 uint32_t apic_id = apic_ids->cpus[uid].arch_id;
 /* Flags – Local APIC Flags */
-uint32_t flags = apic_ids->cpus[uid].cpu != NULL || force_enabled ?
- 1 /* Enabled */ : 0;
+bool enabled = apic_ids->cpus[uid].cpu != NULL || force_enabled ?
+ true : false;
+/*
+ * ACPI 6.3 5.2.12.2 Local APIC Flags: OnlineCapable must be 0
+ * if Enabled is set.
+ */
+bool onlinecapable = enabled ? false : true;
+uint32_t flags = onlinecapable ? 0x2 : 0x0 | /* Online Capable */
+ enabled ? 0x1 : 0x0; /* Enabled */
 
 /* ACPI spec says that LAPIC entry for non present
  * CPU may be omitted from MADT or it must be marked
@@ -102,7 +109,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
 MachineClass *mc = MACHINE_GET_CLASS(x86ms);
 const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(x86ms));
 AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(adev);
-AcpiTable table = { .sig = "APIC", .rev = 1, .oem_id = oem_id,
+AcpiTable table = { .sig = "APIC", .rev = 5, .oem_id = oem_id,
 .oem_table_id = oem_table_id };
 
 acpi_table_begin(, table_data);
-- 
2.31.1




[PATCH v2 0/4] hw/acpi: bump MADT to revision 5

2023-04-18 Thread Eric DeVolder
The following Linux kernel change broke CPU hotplug for MADT revision
less than 5.

 e2869bd7af60 ("x86/acpi/boot: Do not register processors that cannot be 
onlined for x2APIC")

Discussion on this topic can be located here:

 
https://lore.kernel.org/linux-acpi/20230327191026.3454-1-eric.devol...@oracle.com/T/#t

which resulted in the following fixes Linux in 6.3-rc5:

 a74fabfbd1b7: ("x86/ACPI/boot: Use FADT version to check support for online 
capable")
 fed8d8773b8e: ("x86/acpi/boot: Correct acpi_is_processor_usable() check")

However, as part of the investigation into resolving this breakage, I
learned that i386 QEMU reports revision 1, while technically it
generates revision 3. Aarch64 generates and reports revision 4.

ACPI 6.3 bumps MADT revision to 5 as it introduces an Online Capable
flag that the above Linux patch utilizes to denote hot pluggable CPUs.

So in order to bump MADT to the current revision of 5, need to
validate that all MADT table changes between 1 and 5 are present
in QEMU.

Below is a table summarizing the changes to the MADT. This information
gleamed from the ACPI specs on uefi.org.

ACPIMADTWhat
Version Version
1.0 MADT not present
2.0 1   Section 5.2.10.4
3.0 2   Section 5.2.11.4
 5.2.11.13 Local SAPIC Structure added two new fields:
  ACPI Processor UID Value
  ACPI Processor UID String
 5.2.10.14 Platform Interrupt Sources Structure:
  Reserved changed to Platform Interrupt Sources Flags
3.0b2   Section 5.2.11.4
 Added a section describing guidelines for the ordering of
 processors in the MADT to support proper boot processor
 and multi-threaded logical processor operation.
4.0 3   Section 5.2.12
 Adds Processor Local x2APIC structure type 9
 Adds Local x2APIC NMI structure type 0xA
5.0 3   Section 5.2.12
6.0 3   Section 5.2.12
6.0a4   Section 5.2.12
 Adds ARM GIC structure types 0xB-0xF
6.2a45  Section 5.2.12   <--- version 45, is indeed accurate!
6.2b5   Section 5.2.12
 GIC ITS last Reserved offset changed to 16 from 20 (typo)
6.3 5   Section 5.2.12
 Adds Local APIC Flags Online Capable!
 Adds GICC SPE Overflow Interrupt field
6.4 5   Section 5.2.12
 Adds Multiprocessor Wakeup Structure type 0x10
 (change notes says structure previously misplaced?)
6.5 5   Section 5.2.12

For the MADT revision change 1 -> 2, the spec has a change to the
SAPIC structure. In general, QEMU does not generate/support SAPIC.
So the QEMU i386 MADT revision can safely be moved to 2.

For the MADT revision change 2 -> 3, the spec adds Local x2APIC
structures. QEMU has long supported x2apic ACPI structures. A simple
search of x2apic within QEMU source and hw/i386/acpi-common.c
specifically reveals this. So the QEMU i386 MADT revision can safely
be moved to 3.

For the MADT revision change 3 -> 4, the spec adds support for the ARM
GIC structures. QEMU ARM does in fact generate and report revision 4.
As these will not be used by i386 QEMU, so then the QEMU i386 MADT
revision can safely be moved to 4 as well.

Now for the MADT revision change 4 -> 5, the spec adds the Online
Capable flag to the Local APIC structure, and the ARM GICC SPE
Overflow Interrupt field.

For the ARM SPE, an existing 3-byte Reserved field is broken into a 1-
byte Reserved field and a 2-byte SPE field.  The spec says that is SPE
Overflow is not supported, it should be zero.

For the i386 Local APIC flag Online Capable, the spec has certain rules
about this value. And in particuar setting this value now explicitly
indicates a hotpluggable CPU.

So this patch makes the needed changes to move both i386 MADT
to revision 5.

Without these changes, the information below shows "how" CPU hotplug
breaks with the current upstream Linux kernel 6.3.  For example, a Linux
guest started with:

 qemu-system-x86_64 -smp 30,maxcpus=32 ...

and then attempting to hotplug a CPU:

  (QEMU) device_add id=cpu30 driver=host-x86_64-cpu socket-id=0 core-id=30 
thread-id=0

fails with the following:

  APIC: NR_CPUS/possible_cpus limit of 30 reached. Processor 30/0x.
  ACPI: Unable to map lapic to logical cpu number
  acpi LNXCPU:1e: Enumeration failure

  # dmesg | grep smpboot
  smpboot: Allowing 30 CPUs, 0 hotplug CPUs
  smpboot: CPU0: Intel(R) Xeon(R) CPU D-1533 @ 2.10GHz (family: 0x)
  smpboot: Max logical packages: 1
  smpboot: Total of 30 processors activated (125708.76 BogoMIPS)

  # iasl -d /sys/firmware/tables/acpi/APIC
  [000h    4]Signature : "APIC"[Multiple APIC 
Descript
  [004h 0004   4] Table Length : 0170
  [008h 0008   1] Revision : 01  <=
  [009h 0009   1] Checksum : 9C
  

[PATCH v2 1/4] ACPI: bios-tables-test.c step 2 (allowed-diff entries)

2023-04-18 Thread Eric DeVolder
Following the guidelines in tests/qtest/bios-tables-test.c, this
change sets-up bios-tables-test-allowed-diff.h to exclude the
imminent changes to the APIC tables, per step 2.

Signed-off-by: Eric DeVolder 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..1e5e354ecf 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,5 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/APIC",
+"tests/data/acpi/q35/APIC",
+"tests/data/acpi/microvm/APIC",
+"tests/data/acpi/virt/APIC",
-- 
2.31.1




Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-04-18 Thread Eugenio Perez Martin
On Tue, Apr 18, 2023 at 7:59 PM Stefan Hajnoczi  wrote:
>
> On Tue, Apr 18, 2023 at 10:09:30AM +0200, Eugenio Perez Martin wrote:
> > On Mon, Apr 17, 2023 at 9:33 PM Stefan Hajnoczi  wrote:
> > >
> > > On Mon, 17 Apr 2023 at 15:10, Eugenio Perez Martin  
> > > wrote:
> > > >
> > > > On Mon, Apr 17, 2023 at 5:38 PM Stefan Hajnoczi  
> > > > wrote:
> > > > >
> > > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin wrote:
> > > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek wrote:
> > > > > > > > So-called "internal" virtio-fs migration refers to transporting 
> > > > > > > > the
> > > > > > > > back-end's (virtiofsd's) state through qemu's migration stream. 
> > > > > > > >  To do
> > > > > > > > this, we need to be able to transfer virtiofsd's internal state 
> > > > > > > > to and
> > > > > > > > from virtiofsd.
> > > > > > > >
> > > > > > > > Because virtiofsd's internal state will not be too large, we 
> > > > > > > > believe it
> > > > > > > > is best to transfer it as a single binary blob after the 
> > > > > > > > streaming
> > > > > > > > phase.  Because this method should be useful to other vhost-user
> > > > > > > > implementations, too, it is introduced as a general-purpose 
> > > > > > > > addition to
> > > > > > > > the protocol, not limited to vhost-user-fs.
> > > > > > > >
> > > > > > > > These are the additions to the protocol:
> > > > > > > > - New vhost-user protocol feature 
> > > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
> > > > > > > >   This feature signals support for transferring state, and is 
> > > > > > > > added so
> > > > > > > >   that migration can fail early when the back-end has no 
> > > > > > > > support.
> > > > > > > >
> > > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end 
> > > > > > > > negotiate a pipe
> > > > > > > >   over which to transfer the state.  The front-end sends an FD 
> > > > > > > > to the
> > > > > > > >   back-end into/from which it can write/read its state, and the 
> > > > > > > > back-end
> > > > > > > >   can decide to either use it, or reply with a different FD for 
> > > > > > > > the
> > > > > > > >   front-end to override the front-end's choice.
> > > > > > > >   The front-end creates a simple pipe to transfer the state, 
> > > > > > > > but maybe
> > > > > > > >   the back-end already has an FD into/from which it has to 
> > > > > > > > write/read
> > > > > > > >   its state, in which case it will want to override the simple 
> > > > > > > > pipe.
> > > > > > > >   Conversely, maybe in the future we find a way to have the 
> > > > > > > > front-end
> > > > > > > >   get an immediate FD for the migration stream (in some cases), 
> > > > > > > > in which
> > > > > > > >   case we will want to send this to the back-end instead of 
> > > > > > > > creating a
> > > > > > > >   pipe.
> > > > > > > >   Hence the negotiation: If one side has a better idea than a 
> > > > > > > > plain
> > > > > > > >   pipe, we will want to use that.
> > > > > > > >
> > > > > > > > - CHECK_DEVICE_STATE: After the state has been transferred 
> > > > > > > > through the
> > > > > > > >   pipe (the end indicated by EOF), the front-end invokes this 
> > > > > > > > function
> > > > > > > >   to verify success.  There is no in-band way (through the 
> > > > > > > > pipe) to
> > > > > > > >   indicate failure, so we need to check explicitly.
> > > > > > > >
> > > > > > > > Once the transfer pipe has been established via 
> > > > > > > > SET_DEVICE_STATE_FD
> > > > > > > > (which includes establishing the direction of transfer and 
> > > > > > > > migration
> > > > > > > > phase), the sending side writes its data into the pipe, and the 
> > > > > > > > reading
> > > > > > > > side reads it until it sees an EOF.  Then, the front-end will 
> > > > > > > > check for
> > > > > > > > success via CHECK_DEVICE_STATE, which on the destination side 
> > > > > > > > includes
> > > > > > > > checking for integrity (i.e. errors during deserialization).
> > > > > > > >
> > > > > > > > Suggested-by: Stefan Hajnoczi 
> > > > > > > > Signed-off-by: Hanna Czenczek 
> > > > > > > > ---
> > > > > > > >  include/hw/virtio/vhost-backend.h |  24 +
> > > > > > > >  include/hw/virtio/vhost.h |  79 
> > > > > > > >  hw/virtio/vhost-user.c| 147 
> > > > > > > > ++
> > > > > > > >  hw/virtio/vhost.c |  37 
> > > > > > > >  4 files changed, 287 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/include/hw/virtio/vhost-backend.h 
> > > > > > > > b/include/hw/virtio/vhost-backend.h
> > > > > > > > index ec3fbae58d..5935b32fe3 100644
> > > > > > > > --- a/include/hw/virtio/vhost-backend.h
> > > > > > > > +++ b/include/hw/virtio/vhost-backend.h
> > > > > > > > @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
> > > > > > > >  

[PATCH v2 4/8] target/riscv: Flush TLB only when pmpcfg/pmpaddr really changes

2023-04-18 Thread Weiwei Li
TLB needn't be flushed when pmpcfg/pmpaddr don't changes.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Alistair Francis 
---
 target/riscv/pmp.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 8645b1e1c1..ec86fccd2e 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -26,7 +26,7 @@
 #include "trace.h"
 #include "exec/exec-all.h"
 
-static void pmp_write_cfg(CPURISCVState *env, uint32_t addr_index,
+static bool pmp_write_cfg(CPURISCVState *env, uint32_t addr_index,
   uint8_t val);
 static uint8_t pmp_read_cfg(CPURISCVState *env, uint32_t addr_index);
 static void pmp_update_rule(CPURISCVState *env, uint32_t pmp_index);
@@ -83,7 +83,7 @@ static inline uint8_t pmp_read_cfg(CPURISCVState *env, 
uint32_t pmp_index)
  * Accessor to set the cfg reg for a specific PMP/HART
  * Bounds checks and relevant lock bit.
  */
-static void pmp_write_cfg(CPURISCVState *env, uint32_t pmp_index, uint8_t val)
+static bool pmp_write_cfg(CPURISCVState *env, uint32_t pmp_index, uint8_t val)
 {
 if (pmp_index < MAX_RISCV_PMPS) {
 bool locked = true;
@@ -119,14 +119,17 @@ static void pmp_write_cfg(CPURISCVState *env, uint32_t 
pmp_index, uint8_t val)
 
 if (locked) {
 qemu_log_mask(LOG_GUEST_ERROR, "ignoring pmpcfg write - locked\n");
-} else {
+} else if (env->pmp_state.pmp[pmp_index].cfg_reg != val) {
 env->pmp_state.pmp[pmp_index].cfg_reg = val;
 pmp_update_rule(env, pmp_index);
+return true;
 }
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
   "ignoring pmpcfg write - out of bounds\n");
 }
+
+return false;
 }
 
 static void pmp_decode_napot(target_ulong a, target_ulong *sa,
@@ -477,16 +480,19 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 int i;
 uint8_t cfg_val;
 int pmpcfg_nums = 2 << riscv_cpu_mxl(env);
+bool modified = false;
 
 trace_pmpcfg_csr_write(env->mhartid, reg_index, val);
 
 for (i = 0; i < pmpcfg_nums; i++) {
 cfg_val = (val >> 8 * i)  & 0xff;
-pmp_write_cfg(env, (reg_index * 4) + i, cfg_val);
+modified |= pmp_write_cfg(env, (reg_index * 4) + i, cfg_val);
 }
 
 /* If PMP permission of any addr has been changed, flush TLB pages. */
-tlb_flush(env_cpu(env));
+if (modified) {
+tlb_flush(env_cpu(env));
+}
 }
 
 
@@ -535,9 +541,11 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 }
 
 if (!pmp_is_locked(env, addr_index)) {
-env->pmp_state.pmp[addr_index].addr_reg = val;
-pmp_update_rule(env, addr_index);
-tlb_flush(env_cpu(env));
+if (env->pmp_state.pmp[addr_index].addr_reg != val) {
+env->pmp_state.pmp[addr_index].addr_reg = val;
+pmp_update_rule(env, addr_index);
+tlb_flush(env_cpu(env));
+}
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
   "ignoring pmpaddr write - locked\n");
-- 
2.25.1




[PATCH v2 5/8] target/riscv: flush tb when PMP entry changes

2023-04-18 Thread Weiwei Li
The translation block may also be affected when PMP entry changes.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/pmp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index ec86fccd2e..37bc76c474 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -25,6 +25,7 @@
 #include "cpu.h"
 #include "trace.h"
 #include "exec/exec-all.h"
+#include "exec/tb-flush.h"
 
 static bool pmp_write_cfg(CPURISCVState *env, uint32_t addr_index,
   uint8_t val);
@@ -492,6 +493,7 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 /* If PMP permission of any addr has been changed, flush TLB pages. */
 if (modified) {
 tlb_flush(env_cpu(env));
+tb_flush(env_cpu(env));
 }
 }
 
@@ -545,6 +547,7 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 env->pmp_state.pmp[addr_index].addr_reg = val;
 pmp_update_rule(env, addr_index);
 tlb_flush(env_cpu(env));
+tb_flush(env_cpu(env));
 }
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
-- 
2.25.1




[PATCH v2 0/8] target/riscv: Fix PMP related problem

2023-04-18 Thread Weiwei Li
This patchset tries to fix the PMP bypass problem issue 
https://gitlab.com/qemu-project/qemu/-/issues/1542:

- TLB will be cached if the matched PMP entry cover the whole page.  However 
PMP entries with higher priority may cover part of the page (but not match the 
access address), which means different regions in this page may have different 
permission rights. So the TLB also cannot be cached in this case (patch 1).
- Writing to pmpaddr didn't trigger tlb flush (patch 3). 
- The tb isn't flushed when PMP permission changes, so It also may hit  the tb 
and bypass the changed PMP check for instruction fetch (patch 5). 
- We set the tlb_size to 1 to make the TLB_INVALID_MASK set, and and the next 
access will again go through tlb_fill. However, this way will not work in 
tb_gen_code() => get_page_addr_code_hostp(): the TLB host address will be 
cached, and the following instructions can use this host address directly which 
may lead to the bypass of PMP related check (patch 6).

The port is available here:
https://github.com/plctlab/plct-qemu/tree/plct-pmp-fix-v2

v2:

- Update commit message for patch 1
- Add default tlb_size when pmp is diabled or there is no rules and only get 
the tlb size when translation success in patch 2
- Update get_page_addr_code_hostp instead of probe_access_internal to fix the 
cached host address for instruction fetch in patch 6
- Add patch 7 to make the short up really work in pmp_hart_has_privs
- Add patch 8 to use pmp_update_rule_addr() and pmp_update_rule_nums() 
separately

Weiwei Li (8):
  target/riscv: Update pmp_get_tlb_size()
  target/riscv: Move pmp_get_tlb_size apart from
get_physical_address_pmp
  target/riscv: flush tlb when pmpaddr is updated
  target/riscv: Flush TLB only when pmpcfg/pmpaddr really changes
  target/riscv: flush tb when PMP entry changes
  accel/tcg: Uncache the host address for instruction fetch when tlb
size < 1
  target/riscv: Make the short cut really work in pmp_hart_has_privs
  target/riscv: Separate pmp_update_rule() in pmpcfg_csr_write Use
pmp_update_rule_addr() and pmp_update_rule_nums() separately to
update rule nums only once for each pmpcfg_csr_write. Then we can
also move tlb_flush and tb_flush into pmp_update_rule_nums().

 accel/tcg/cputlb.c|   5 +
 target/riscv/cpu_helper.c |  24 +--
 target/riscv/pmp.c| 316 --
 target/riscv/pmp.h|   3 +-
 4 files changed, 181 insertions(+), 167 deletions(-)

-- 
2.25.1




[PATCH v2 8/8] target/riscv: Separate pmp_update_rule() in pmpcfg_csr_write Use pmp_update_rule_addr() and pmp_update_rule_nums() separately to update rule nums only once for each pmpcfg_csr_write. Th

2023-04-18 Thread Weiwei Li
Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/pmp.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 67347c5887..1cce3f0ce4 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -122,7 +122,7 @@ static bool pmp_write_cfg(CPURISCVState *env, uint32_t 
pmp_index, uint8_t val)
 qemu_log_mask(LOG_GUEST_ERROR, "ignoring pmpcfg write - locked\n");
 } else if (env->pmp_state.pmp[pmp_index].cfg_reg != val) {
 env->pmp_state.pmp[pmp_index].cfg_reg = val;
-pmp_update_rule(env, pmp_index);
+pmp_update_rule_addr(env, pmp_index);
 return true;
 }
 } else {
@@ -208,6 +208,9 @@ void pmp_update_rule_nums(CPURISCVState *env)
 env->pmp_state.num_rules++;
 }
 }
+
+tlb_flush(env_cpu(env));
+tb_flush(env_cpu(env));
 }
 
 /*
@@ -487,8 +490,7 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 
 /* If PMP permission of any addr has been changed, flush TLB pages. */
 if (modified) {
-tlb_flush(env_cpu(env));
-tb_flush(env_cpu(env));
+pmp_update_rule_nums(env);
 }
 }
 
@@ -541,8 +543,6 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 if (env->pmp_state.pmp[addr_index].addr_reg != val) {
 env->pmp_state.pmp[addr_index].addr_reg = val;
 pmp_update_rule(env, addr_index);
-tlb_flush(env_cpu(env));
-tb_flush(env_cpu(env));
 }
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
-- 
2.25.1




[PATCH v2 6/8] accel/tcg: Uncache the host address for instruction fetch when tlb size < 1

2023-04-18 Thread Weiwei Li
When PMP entry overlap part of the page, we'll set the tlb_size to 1, which
will make the address in tlb entry set with TLB_INVALID_MASK, and the next
access will again go through tlb_fill.However, this way will not work in
tb_gen_code() => get_page_addr_code_hostp(): the TLB host address will be
cached, and the following instructions can use this host address directly
which may lead to the bypass of PMP related check.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 accel/tcg/cputlb.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index e984a98dc4..efa0cb67c9 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1696,6 +1696,11 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState 
*env, target_ulong addr,
 if (p == NULL) {
 return -1;
 }
+
+if (full->lg_page_size < TARGET_PAGE_BITS) {
+return -1;
+}
+
 if (hostp) {
 *hostp = p;
 }
-- 
2.25.1




[PATCH v2 3/8] target/riscv: flush tlb when pmpaddr is updated

2023-04-18 Thread Weiwei Li
TLB should be flushed not only for pmpcfg csr changes, but also for
pmpaddr csr changes.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Alistair Francis 
---
 target/riscv/pmp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 643388dc23..8645b1e1c1 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -537,6 +537,7 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 if (!pmp_is_locked(env, addr_index)) {
 env->pmp_state.pmp[addr_index].addr_reg = val;
 pmp_update_rule(env, addr_index);
+tlb_flush(env_cpu(env));
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
   "ignoring pmpaddr write - locked\n");
-- 
2.25.1




Re: [PATCH 0/5] Support both Ethernet interfaces on i.MX6UL and i.MX7

2023-04-18 Thread Peter Maydell
On Tue, 18 Apr 2023 at 15:42, Guenter Roeck  wrote:
>
> On 4/18/23 05:10, Peter Maydell wrote:
> > On Wed, 15 Mar 2023 at 14:52, Guenter Roeck  wrote:
> > So I was having a look at this to see if it was reasonably easy to
> > split out the PHY into its own device object, and I'm a bit confused.
> > I know basically 0 about MDIO, but wikipedia says that MDIO buses
> > have one master (the ethernet MAC) and potentially multiple PHYs.
> > However it looks like this patchset has configurations where
> > multiple MACs talk to the same MDIO bus. Am I confused about the
> > patchset, about the hardware, or about what MDIO supports?
> >
>
> It is quite similar to I2C, a serial interface with one master/controller
> and a number of devices (PHYs) connected to it. There is a nice graphic
> example at https://prodigytechno.com/mdio-management-data-input-output/.
> Not sure I understand what is confusing about it. Can you explain ?

I guess I don't understand what the topology is for these specific
SoCs, then. If there's only one master that might be connected
to multiple PHYs, why does one ethernet device in QEMU need to
know about the other one? Are the PHYs connected to just that
first ethernet device, or to both? This bit in your cover letter
makes it sound like "both ethernet interfaces connect to the same
MDIO bus which has both PHYs on it":

>> The SOC on i.MX6UL and i.MX7 has 2 Ethernet interfaces. The PHY on each may
>> be connected to separate MDIO busses, or both may be connected on the same
>> MDIO bus using different PHY addresses.

but maybe I'm misreading it.

thanks
-- PMM



Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests

2023-04-18 Thread Michael S. Tsirkin
On Tue, Apr 18, 2023 at 06:37:04PM +0200, Andrey Ryabinin wrote:
> On 4/18/23 07:13, Raphael Norwitz wrote:
> > Hey Andrey - apologies for the late reply here.
> > 
> > It sounds like you are dealing with a buggy guest, rather than a QEMU issue.
> 
> No arguing here, the guest is buggy.
> However, the issue with QEMU is that virtio-blk tolerate such buggy guest
> while vhost-user-blk is not.
> We've been using virtio-blk in our cloud for a while and recently started 
> switching to vhost-user-blk
> which led us to discover this problem.
> 
> >> On Apr 10, 2023, at 11:39 AM, Andrey Ryabinin  wrote:
> >>
> >>
> >>
> >> On 4/10/23 10:35, Andrey Ryabinin wrote:
> >>> Some guests hang on boot when using the vhost-user-blk-pci device,
> >>> but boot normally when using the virtio-blk device. The problem occurs
> >>> because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
> >>> before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to
> > 
> > Virtio 1.1 Section 3.1.1, says during setup “[t]he driver MUST NOT notify 
> > the device before setting DRIVER_OK.”
> > 
> > Therefore what you are describing is buggy guest behavior. Sounds like the 
> > driver should be made to either
> > - not advertise VIRTIO_F_VERSION_1
> > - not kick before setting VIRTIO_CONFIG_S_DRIVER_OK
> > 
> > If anything, the virtio-blk virtio_blk_handle_output() function should 
> > probably check start_on_kick?
> > 
> 
> Ideally this should have been done from the start. But if we do it now we'll 
> just break these guests.

The problem with hacks like this is the problem proliferates.  What are
those guests and how hard are they to fix?

-- 
MST




Re: [PATCH 0/6] Adding the Android Emulator hypervisor driver accelerator

2023-04-18 Thread Haitao Shan
On Tue, Apr 4, 2023 at 4:55 AM Paolo Bonzini  wrote:
>
> On 3/3/23 18:39, Haitao Shan wrote:
> >> No, we're always open to new proposals. It merely means that it
> >> might be harder to justify why the new hypervisor is a net benefit
> >> for QEMU, when there is a competing solution supported by the OS
> >> vendor.
> >
> > Thanks for the clarification. It is great that the door is not shut 
> > completely.
>
> Hi,
>
> sorry for not answering before.
Thanks for your reply. I was taking a long vacation and did not see
your reply earlier.

>
> I think in general QEMU should be open to merging work from the Android
> Emulator.  If AEHD is useful to the Android emulator, I would consider
> it interesting for QEMU as well.
Thanks for being open to us. For patchset V1, the most important
feedback we can have is that our work can be useful to the community
(not just the android emulator).

>
> However, I would rather have it as an extension to KVM if possible
> rather than a completely new emulator.  One possibility is to introduce
> a new file that encapsulates all KVM ioctls, with a struct that
> encapsulates the Unix file descriptor/Windows HANDLE.  For example
>
> int kvm_ioctl_get_supported_cpuid(KVMState *s, struct kvm_cpuid *cpuid,
>int max)
> {
>  cpuid->nent = max;
> #ifdef CONFIG_POSIX
>  return ioctl(s, KVM_GET_SUPPORTED_VCPUID, cpuid);
> #else
>  size_t size = sizeof(*cpuid) + max * sizeof(*cpuid->entries);
>  return aehd_ioctl(s, AEHD_GET_SUPPORTED_CPUID, cpuid, size, cpuid,
> size);
> #endif
> }
>
> int kvm_ioctl_create_vcpu(KVMState *s, int vcpu_id, CPUState *out)
> {
> #ifdef CONFIG_POSIX
>  out.kvm_fd = kvm_vm_ioctl(KVM_CREATE_VCPU, vcpu_id);
>  return out.kvm_fd;
> #else
>  return aehd_vm_ioctl(s, AEHD_CREATE_VCPU, _id, sizeof(vcpu_id),
>   _fd, sizeof(out.kvm_fd));
> #endif
> }
>
> etc.
>
> These are just general examples, the actual level of abstraction is up
> to you.
I will work on the new patchset. And most likely it will take some time.

>
> Paolo
>


-- 
Haitao @Google



[PATCH v4 2/4] Add MEN Chameleon Bus via PCI carrier

2023-04-18 Thread Johannes Thumshirn
Add PCI based MEN Chameleon Bus carrier emulation.

Acked-by: Alistair Francis 
Signed-off-by: Johannes Thumshirn 
---
 hw/mcb/Kconfig  |   6 +
 hw/mcb/mcb-pci.c| 298 
 hw/mcb/meson.build  |   1 +
 hw/mcb/trace-events |   4 +
 hw/mcb/trace.h  |   1 +
 meson.build |   1 +
 6 files changed, 311 insertions(+)
 create mode 100644 hw/mcb/mcb-pci.c
 create mode 100644 hw/mcb/trace-events
 create mode 100644 hw/mcb/trace.h

diff --git a/hw/mcb/Kconfig b/hw/mcb/Kconfig
index 36a7a583a8..7deb96c2fe 100644
--- a/hw/mcb/Kconfig
+++ b/hw/mcb/Kconfig
@@ -1,2 +1,8 @@
 config MCB
 bool
+
+config MCB_PCI
+bool
+default y if PCI_DEVICES
+depends on PCI
+select MCB
diff --git a/hw/mcb/mcb-pci.c b/hw/mcb/mcb-pci.c
new file mode 100644
index 00..905b9adb3b
--- /dev/null
+++ b/hw/mcb/mcb-pci.c
@@ -0,0 +1,298 @@
+/*
+ * QEMU MEN Chameleon Bus emulation
+ *
+ * Copyright (C) 2023 Johannes Thumshirn 
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/mcb/mcb.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pci_device.h"
+#include "hw/qdev-properties.h"
+#include "migration/vmstate.h"
+#include "trace.h"
+
+typedef struct {
+uint8_t revision;
+char model;
+uint8_t minor;
+uint8_t bus_type;
+uint16_t magic;
+uint16_t reserved;
+/* This one has no '\0' at the end!!! */
+char filename[12];
+} ChameleonFPGAHeader;
+#define CHAMELEON_BUS_TYPE_WISHBONE 0
+#define CHAMELEONV2_MAGIC 0xabce
+
+typedef struct {
+PCIDevice dev;
+MCBus bus;
+MemoryRegion ctbl;
+uint16_t status;
+uint8_t int_set;
+ChameleonFPGAHeader *header;
+
+uint8_t minor;
+uint8_t rev;
+uint8_t model;
+} MPCIState;
+
+#define TYPE_MCB_PCI "mcb-pci"
+
+#define MPCI(obj)   \
+OBJECT_CHECK(MPCIState, (obj), TYPE_MCB_PCI)
+
+#define CHAMELEON_TABLE_SIZE 0x200
+#define N_MODULES 32
+
+#define PCI_VENDOR_ID_MEN 0x1a88
+#define PCI_DEVICE_ID_MEN_MCBPCI 0x4d45
+
+static uint32_t read_header(MPCIState *s, hwaddr addr)
+{
+uint32_t ret = 0;
+ChameleonFPGAHeader *header = s->header;
+
+switch (addr >> 2) {
+case 0:
+ret |= header->revision;
+ret |= header->model << 8;
+ret |= header->minor << 16;
+ret |= header->bus_type << 24;
+break;
+case 1:
+ret |= header->magic;
+ret |= header->reserved << 16;
+break;
+case 2:
+memcpy(, header->filename, sizeof(uint32_t));
+break;
+case 3:
+memcpy(, header->filename + sizeof(uint32_t),
+   sizeof(uint32_t));
+break;
+case 4:
+memcpy(, header->filename + 2 * sizeof(uint32_t),
+   sizeof(uint32_t));
+}
+
+return ret;
+}
+
+static uint32_t read_gdd(MCBDevice *mdev, int reg)
+{
+ChameleonDeviceDescriptor *gdd;
+uint32_t ret = 0;
+
+gdd = mdev->gdd;
+
+switch (reg) {
+case 0:
+ret = gdd->reg1;
+break;
+case 1:
+ret = gdd->reg2;
+break;
+case 2:
+ret = gdd->offset;
+break;
+case 3:
+ret = gdd->size;
+break;
+}
+
+return ret;
+}
+
+static uint64_t mpci_chamtbl_read(void *opaque, hwaddr addr, unsigned size)
+{
+MPCIState *s = opaque;
+MCBus *bus = >bus;
+MCBDevice *mdev;
+
+trace_mpci_chamtbl_read(addr, size);
+
+if (addr < sizeof(ChameleonFPGAHeader)) {
+return le32_to_cpu(read_header(s, addr));
+} else if (addr >= sizeof(ChameleonFPGAHeader) &&
+   addr < CHAMELEON_TABLE_SIZE) {
+/* Handle read on chameleon table */
+BusChild *kid;
+DeviceState *qdev;
+int slot;
+int offset;
+int i;
+
+offset = addr - sizeof(ChameleonFPGAHeader);
+slot = offset / sizeof(ChameleonDeviceDescriptor);
+
+kid = QTAILQ_FIRST((bus)->children);
+for (i = 0; i < slot; i++) {
+kid = QTAILQ_NEXT(kid, sibling);
+if (!kid) { /* Last element */
+return ~0U;
+}
+}
+qdev = kid->child;
+mdev = MCB_DEVICE(qdev);
+offset -= slot * 16;
+
+return le32_to_cpu(read_gdd(mdev, offset / 4));
+}
+
+return 0;
+}
+
+static void mpci_chamtbl_write(void *opaque, hwaddr addr, uint64_t val,
+   unsigned size)
+{
+
+if (addr < CHAMELEON_TABLE_SIZE) {
+trace_mpci_chamtbl_write(addr, val);
+}
+
+return;
+}
+
+static const MemoryRegionOps mpci_chamtbl_ops = {
+.read = mpci_chamtbl_read,
+.write = mpci_chamtbl_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 4
+},
+};
+

[PATCH v4 1/4] Add MEN Chameleon Bus emulation

2023-04-18 Thread Johannes Thumshirn
The MEN Chameleon Bus (MCB) is an on-chip bus system exposing IP Cores of an
FPGA to a outside bus system like PCIe.

Acked-by: Alistair Francis 
Signed-off-by: Johannes Thumshirn 
---
 MAINTAINERS  |   6 ++
 hw/Kconfig   |   1 +
 hw/mcb/Kconfig   |   2 +
 hw/mcb/mcb.c | 180 +++
 hw/mcb/meson.build   |   1 +
 hw/meson.build   |   1 +
 include/hw/mcb/mcb.h | 106 +
 7 files changed, 297 insertions(+)
 create mode 100644 hw/mcb/Kconfig
 create mode 100644 hw/mcb/mcb.c
 create mode 100644 hw/mcb/meson.build
 create mode 100644 include/hw/mcb/mcb.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 2c2068ea5c..1fa5909a97 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1962,6 +1962,12 @@ R: Paolo Bonzini 
 S: Odd Fixes
 F: hw/char/
 
+MEN Chameleon Bus
+M: Johannes Thumshirn 
+S: Maintained
+F: hw/mcb/
+F: include/hw/mcb/
+
 Network devices
 M: Jason Wang 
 S: Odd Fixes
diff --git a/hw/Kconfig b/hw/Kconfig
index ba62ff6417..f5ef84b10b 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -18,6 +18,7 @@ source intc/Kconfig
 source ipack/Kconfig
 source ipmi/Kconfig
 source isa/Kconfig
+source mcb/Kconfig
 source mem/Kconfig
 source misc/Kconfig
 source net/Kconfig
diff --git a/hw/mcb/Kconfig b/hw/mcb/Kconfig
new file mode 100644
index 00..36a7a583a8
--- /dev/null
+++ b/hw/mcb/Kconfig
@@ -0,0 +1,2 @@
+config MCB
+bool
diff --git a/hw/mcb/mcb.c b/hw/mcb/mcb.c
new file mode 100644
index 00..1c4f693a73
--- /dev/null
+++ b/hw/mcb/mcb.c
@@ -0,0 +1,180 @@
+/*
+ * QEMU MEN Chameleon Bus emulation
+ *
+ * Copyright (C) 2023 Johannes Thumshirn 
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "hw/mcb/mcb.h"
+#include "hw/irq.h"
+#include "hw/qdev-properties.h"
+#include "migration/vmstate.h"
+
+ChameleonDeviceDescriptor *mcb_new_chameleon_descriptor(MCBus *bus, uint8_t id,
+uint8_t rev,
+uint8_t var,
+uint32_t size)
+{
+BusChild *kid;
+ChameleonDeviceDescriptor *gdd;
+uint32_t reg1 = 0;
+uint32_t end = 0x200;
+
+gdd =  g_new0(ChameleonDeviceDescriptor, 1);
+if (!gdd) {
+return NULL;
+}
+
+reg1 |= GDD_DEV(id);
+reg1 |= GDD_DTY(CHAMELEON_DTYPE_GENERAL);
+reg1 |= GDD_REV(rev);
+reg1 |= GDD_VAR(var);
+gdd->reg1 = cpu_to_le32(reg1);
+
+QTAILQ_FOREACH(kid, (bus)->children, sibling) {
+DeviceState *qdev = kid->child;
+MCBDevice *mdev = MCB_DEVICE(qdev);
+
+if (mdev->gdd) {
+end += mdev->gdd->size;
+}
+}
+
+gdd->offset = end;
+gdd->size = size;
+
+return gdd;
+}
+
+static void mcb_irq_handler(void *opaque, int irq_num, int level)
+{
+MCBDevice *dev = opaque;
+MCBus *bus = MCB_BUS(qdev_get_parent_bus(DEVICE(dev)));
+
+if (bus->set_irq) {
+bus->set_irq(dev, irq_num, level);
+}
+}
+
+qemu_irq mcb_allocate_irq(MCBDevice *dev)
+{
+int irq = 0;
+return qemu_allocate_irq(mcb_irq_handler, dev, irq);
+}
+
+MCBDevice *mcb_device_find(MCBus *bus, hwaddr addr)
+{
+BusChild *kid;
+uint32_t start;
+uint32_t end;
+
+QTAILQ_FOREACH(kid, (bus)->children, sibling) {
+DeviceState *qdev = kid->child;
+MCBDevice *mdev = MCB_DEVICE(qdev);
+
+start = mdev->gdd->offset;
+end = start + mdev->gdd->size;
+
+if (addr >= start && addr <= end) {
+return mdev;
+}
+}
+return NULL;
+}
+
+void mcb_bus_init(MCBus *bus, size_t bus_size,
+  DeviceState *parent,
+  uint8_t n_slots,
+  qemu_irq_handler handler)
+{
+qbus_init(bus, bus_size, TYPE_MCB_BUS, parent, NULL);
+bus->n_slots = n_slots;
+bus->set_irq = handler;
+}
+
+static void mcb_device_realize(DeviceState *dev, Error **errp)
+{
+MCBDevice *mdev = MCB_DEVICE(dev);
+MCBus *bus = MCB_BUS(qdev_get_parent_bus(dev));
+MCBDeviceClass *k = MCB_DEVICE_GET_CLASS(dev);
+
+if (mdev->slot < 0) {
+mdev->slot = bus->free_slot;
+}
+
+if (mdev->slot >= bus->n_slots) {
+error_setg(errp, "Only %" PRIu8 " slots available.", bus->n_slots);
+return;
+}
+bus->free_slot = mdev->slot + 1;
+
+mdev->irq = qemu_allocate_irqs(bus->set_irq, mdev, 1);
+
+k->realize(dev, errp);
+}
+
+static void mcb_device_unrealize(DeviceState *dev)
+{
+MCBDevice *mdev = MCB_DEVICE(dev);
+MCBDeviceClass *k = MCB_DEVICE_GET_CLASS(dev);
+
+if (k->unrealize) {
+k->unrealize(dev);
+return;
+}
+
+qemu_free_irqs(mdev->irq, 1);
+}
+
+static Property mcb_device_props[] = {
+DEFINE_PROP_INT32("slot", MCBDevice, slot, -1),
+

[PATCH v4 4/4] wdt_z069: Add support for MEN 16z069 Watchdog

2023-04-18 Thread Johannes Thumshirn
Add 16z069 Watchdog over MEN Chameleon BUS emulation.

Signed-off-by: Johannes Thumshirn 
---
 hw/watchdog/Kconfig  |   5 +
 hw/watchdog/meson.build  |   1 +
 hw/watchdog/trace-events |   6 ++
 hw/watchdog/wdt_z069.c   | 207 +++
 4 files changed, 219 insertions(+)
 create mode 100644 hw/watchdog/wdt_z069.c

diff --git a/hw/watchdog/Kconfig b/hw/watchdog/Kconfig
index 66e1d029e3..a3f1196f66 100644
--- a/hw/watchdog/Kconfig
+++ b/hw/watchdog/Kconfig
@@ -20,3 +20,8 @@ config WDT_IMX2
 
 config WDT_SBSA
 bool
+
+config WDT_Z069
+bool
+default y if MCB
+depends on MCB
diff --git a/hw/watchdog/meson.build b/hw/watchdog/meson.build
index 8974b5cf4c..7bc353774e 100644
--- a/hw/watchdog/meson.build
+++ b/hw/watchdog/meson.build
@@ -6,4 +6,5 @@ softmmu_ss.add(when: 'CONFIG_WDT_DIAG288', if_true: 
files('wdt_diag288.c'))
 softmmu_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files('wdt_aspeed.c'))
 softmmu_ss.add(when: 'CONFIG_WDT_IMX2', if_true: files('wdt_imx2.c'))
 softmmu_ss.add(when: 'CONFIG_WDT_SBSA', if_true: files('sbsa_gwdt.c'))
+softmmu_ss.add(when: 'CONFIG_WDT_Z069', if_true: files('wdt_z069.c'))
 specific_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr_watchdog.c'))
diff --git a/hw/watchdog/trace-events b/hw/watchdog/trace-events
index 54371ae075..854fa3f0c8 100644
--- a/hw/watchdog/trace-events
+++ b/hw/watchdog/trace-events
@@ -17,6 +17,12 @@ spapr_watchdog_query(uint64_t caps) "caps=0x%" PRIx64
 spapr_watchdog_query_lpm(uint64_t caps) "caps=0x%" PRIx64
 spapr_watchdog_expired(uint64_t num, unsigned action) "num=%" PRIu64 " 
action=%u"
 
+# wdt_z069.c
+men_z069_wdt_enable(unsigned int timeout) "next timeout will fire in +%dms"
+men_z069_wdt_write_wtr(uint16_t tout, uint16_t t, unsigned int timeout) "new 
timeout: %u (0x%x) %u"
+men_z069_wdt_write_wvr(uint16_t wvr, unsigned int timeout) "watchdog triggered 
(wvr=0x%x), next timeout will fire in +%dms"
+men_z069_wdt_read(unsigned long addr, int size, uint64_t ret) "addr=0x%lx, 
size=%d, ret=0x%" PRIx64
+
 # watchdog.c
 watchdog_perform_action(unsigned int action) "action=%u"
 watchdog_set_action(unsigned int action) "action=%u"
diff --git a/hw/watchdog/wdt_z069.c b/hw/watchdog/wdt_z069.c
new file mode 100644
index 00..eb58a44697
--- /dev/null
+++ b/hw/watchdog/wdt_z069.c
@@ -0,0 +1,207 @@
+/*
+ * QEMU MEN 16z069 Watchdog over MCB emulation
+ *
+ * Copyright (C) 2023 Johannes Thumshirn 
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu/module.h"
+#include "qemu/timer.h"
+#include "sysemu/watchdog.h"
+#include "hw/mcb/mcb.h"
+#include "migration/vmstate.h"
+#include "hw/qdev-properties.h"
+#include "trace.h"
+
+#define MEN_Z069_WTR 0x10
+#define MEN_Z069_WTR_WDEN BIT(15)
+#define MEN_Z069_WTR_WDET_MASK  0x7fff
+#define MEN_Z069_WVR 0x14
+
+#define CLK_500(x) ((x) * 2) /* 500Hz in ms */
+
+typedef struct {
+/*< private >*/
+MCBDevice dev;
+
+/*< public >*/
+QEMUTimer *timer;
+
+bool enabled;
+unsigned int timeout;
+
+MemoryRegion mmio;
+
+/* Registers */
+uint16_t wtr;
+uint16_t wvr;
+} MENZ069State;
+
+static void men_z069_wdt_enable(MENZ069State *s)
+{
+trace_men_z069_wdt_enable(s->timeout);
+timer_mod(s->timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + s->timeout);
+}
+
+static void men_z069_wdt_disable(MENZ069State *s)
+{
+timer_del(s->timer);
+}
+
+static uint64_t men_z069_wdt_read(void *opaque, hwaddr addr, unsigned size)
+{
+MENZ069State *s = opaque;
+uint64_t ret;
+
+switch (addr) {
+case MEN_Z069_WTR:
+ret = s->wtr;
+break;
+case MEN_Z069_WVR:
+ret = s->wvr;
+break;
+default:
+ret = 0UL;
+break;
+}
+
+trace_men_z069_wdt_read(addr, size, ret);
+return ret;
+}
+
+static void men_z069_wdt_write(void *opaque, hwaddr addr, uint64_t v,
+   unsigned size)
+{
+MENZ069State *s = opaque;
+bool old_ena = s->enabled;
+uint16_t val = v & 0x;
+uint16_t tout;
+
+switch (addr) {
+case MEN_Z069_WTR:
+s->wtr = val;
+tout = val & MEN_Z069_WTR_WDET_MASK;
+s->timeout = CLK_500(tout);
+s->enabled = val & MEN_Z069_WTR_WDEN;
+trace_men_z069_wdt_write_wtr(tout, tout, s->timeout);
+
+if (old_ena && !s->enabled) {
+men_z069_wdt_disable(s);
+} else if (!old_ena && s->enabled) {
+men_z069_wdt_enable(s);
+}
+
+break;
+case MEN_Z069_WVR:
+/* The watchdog trigger value toggles between 0x and 0x */
+if (val == (s->wvr ^ 0x)) {
+s->wvr = val;
+trace_men_z069_wdt_write_wvr(s->wvr, s->timeout);
+timer_mod(s->timer,
+  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + s->timeout);
+}
+break;
+default:
+break;
+}
+

[PATCH v4 0/4] Add emulation of MEN Chameleon Hardware

2023-04-18 Thread Johannes Thumshirn
Add emulation of MEN Chameleon Hardware to Qemu.
This emulation is specifically designed to test the upstream Linux kernel
drivers when one has no access to the hardware.

The emulation consists of the bus itself, a PCI hardware target creating the
bus, MEN Micro Electronic's 8250 based UART via MCB and a watchdog timer.

Changes since v2:
- Converted DPRINTF() to tracing infrastructure again (Alistair)

Changes since v2:
- Adjusted license to GPL 2 or later (Peter)

Changes since v1:
- Converted DPRINTF() to tracing infrastructure (Alistair)
- Fixed style issues (Alistair)

Johannes Thumshirn (4):
  Add MEN Chameleon Bus emulation
  Add MEN Chameleon Bus via PCI carrier
  serial-mcb: Add serial via MEN chameleon bus
  wdt_z069: Add support for MEN 16z069 Watchdog

 MAINTAINERS  |   6 +
 hw/Kconfig   |   1 +
 hw/char/Kconfig  |   6 +
 hw/char/meson.build  |   1 +
 hw/char/serial-mcb.c | 115 +++
 hw/mcb/Kconfig   |   8 ++
 hw/mcb/mcb-pci.c | 298 +++
 hw/mcb/mcb.c | 180 +++
 hw/mcb/meson.build   |   2 +
 hw/mcb/trace-events  |   4 +
 hw/mcb/trace.h   |   1 +
 hw/meson.build   |   1 +
 hw/watchdog/Kconfig  |   5 +
 hw/watchdog/meson.build  |   1 +
 hw/watchdog/trace-events |   6 +
 hw/watchdog/wdt_z069.c   | 207 +++
 include/hw/mcb/mcb.h | 106 ++
 meson.build  |   1 +
 18 files changed, 949 insertions(+)
 create mode 100644 hw/char/serial-mcb.c
 create mode 100644 hw/mcb/Kconfig
 create mode 100644 hw/mcb/mcb-pci.c
 create mode 100644 hw/mcb/mcb.c
 create mode 100644 hw/mcb/meson.build
 create mode 100644 hw/mcb/trace-events
 create mode 100644 hw/mcb/trace.h
 create mode 100644 hw/watchdog/wdt_z069.c
 create mode 100644 include/hw/mcb/mcb.h

-- 
2.39.2




Re: [PATCH 0/5] Support both Ethernet interfaces on i.MX6UL and i.MX7

2023-04-18 Thread Guenter Roeck

On 4/18/23 07:46, Peter Maydell wrote:

On Tue, 18 Apr 2023 at 15:42, Guenter Roeck  wrote:


On 4/18/23 05:10, Peter Maydell wrote:

On Wed, 15 Mar 2023 at 14:52, Guenter Roeck  wrote:
So I was having a look at this to see if it was reasonably easy to
split out the PHY into its own device object, and I'm a bit confused.
I know basically 0 about MDIO, but wikipedia says that MDIO buses
have one master (the ethernet MAC) and potentially multiple PHYs.
However it looks like this patchset has configurations where
multiple MACs talk to the same MDIO bus. Am I confused about the
patchset, about the hardware, or about what MDIO supports?



It is quite similar to I2C, a serial interface with one master/controller
and a number of devices (PHYs) connected to it. There is a nice graphic
example at https://prodigytechno.com/mdio-management-data-input-output/.
Not sure I understand what is confusing about it. Can you explain ?


I guess I don't understand what the topology is for these specific
SoCs, then. If there's only one master that might be connected
to multiple PHYs, why does one ethernet device in QEMU need to
know about the other one? Are the PHYs connected to just that
first ethernet device, or to both? This bit in your cover letter
makes it sound like "both ethernet interfaces connect to the same
MDIO bus which has both PHYs on it":



Yes, that is exactly how it is, similar to the configuration in the picture
at prodigytechno.com. I don't recall what I wrote in the cover letter, but
"Both Ethernet PHYs connect to the same MDIO bus which is connected to one
of the Ethernet MACs" would be the most accurate description I can think of.


The SOC on i.MX6UL and i.MX7 has 2 Ethernet interfaces. The PHY on each may
be connected to separate MDIO busses, or both may be connected on the same
MDIO bus using different PHY addresses.




Each MAC (Ethernet interface, instance of TYPE_IMX_FEC in qemu) has its own
MDIO bus. Currently QEMU assumes that each PHY is connected to the MDIO bus
on its associated MAC interface. That is not the case on the emulated boards,
where all PHYs are connected to a single MDIO bus.

Userspace, when talking to the Ethernet controllers, knows that the PHY
of the second Ethernet controller is connected to the MDIO bus on the first
Ethernet controller. QEMU has to be told about that and otherwise misses that
MDIO commands sent to the second PHY (on the first Ethernet controller)
influence the second MAC interface.

From this exchange I can only assume that my implementation is unacceptable.
All I can say is that it works.

Thanks,
Guenter




Re: [PATCH 00/12] virtio: add vhost-user-generic and reduce copy and paste

2023-04-18 Thread Stefan Hajnoczi
On Mon, Apr 17, 2023 at 05:14:59PM +0100, Alex Bennée wrote:
> 
> Stefan Hajnoczi  writes:
> 
> > On Fri, 14 Apr 2023 at 12:06, Alex Bennée  wrote:
> >>
> >> A lot of our vhost-user stubs are large chunks of boilerplate that do
> >> (mostly) the same thing. This series attempts to fix that by defining
> >> a new base class for vhost-user devices and then converting the rng
> >> and gpio devices to be based off them. You can even use
> >> vhost-user-device directly if you supply it with the right magic
> >> numbers (which is helpful for development).
> >>
> >> However the final patch runs into the weeds because I don't yet have a
> >> clean way to represent in QOM the fixing of certain properties for the
> >> specialised classes.
> >>
> >> The series is a net reduction in code and an increase in
> >> documentation but obviously needs to iron out a few more warts. I'm
> >> open to suggestions on the best way to tweak the QOM stuff.
> >
> > --device vhost-user-device is not really possible because vhost-user
> > devices are not full VIRTIO devices. vhost-user devices depend on
> > device-specific code in the VMM by design.
> 
> What device specific code? You certainly need to instantiate stuff in
> the DTB/ACPI tables for -M virt but everything else can be handed off to
> the vhost-user daemon.

There are vhost-user device types that lack functionality entirely, like
vhost-user-net's lack of the controlq virtqueue and configuration space.
It is not even possible to query the MAC address from a vhost-user-net
device. It's not a full virtio-net device and the VMM has to fill in the
gaps to emulate the missing virtqueues and configuration space.

There are device type-specific vhost-user messages like
VHOST_USER_SEND_RARP and VHOST_USER_CREATE_CRYPTO_SESSION that can't
really be supported by a generic --device vhost-user-device.

Live migration is another device-specific aspect that is handled by the
vhost-user frontend.

> Indeed the split brain is a bit silly in some places. For example is
> QEMU really the best arbiter of a block device config when the actual
> backend is a separate process. We have config passing in the vhost-user
> spec.

Optional vhost-user messages like configuration space access may be in
the spec, but existing device types cannot rely on them for backwards
compatibility reasons. Therefore most existing device types don't use
the configuration space. Those that do may only access parts of the
configuration space from the backend and emulate the rest.

The blog post I linked mentions a new protocol feature bit
(VHOST_USER_PROTOCOL_F_VDPA) to distinguish new vhost-user backends that
are full VIRTIO devices. This doesn't exist today, but I think something
like that is necessary to detect devices that will work with --device
vhost-user-device.

> > The "subset of a VIRTIO device" design made sense for vhost_net.
> > Nowadays there are other device types that are close to full VIRTIO
> > devices, although the vhost-user protocol doesn't support the full
> > VIRTIO device lifecycle.
> 
> What are we missing?

vhost-user needs:
- A GET_DEVICE_ID message.
- A GET_CONFIG_SIZE message. Today it is assumed that the vhost-user
  frontend already knows the configuration space size.
- A protocol feature bit indicating that the device is a full VIRTIO
  device. These devices also need to implement the SET_STATUS message,
  which is rarely implemented today.

> > I think a user-creatable --device vhost-user-device is not a good idea
> > today. It creates confusion. Many people aren't aware of the
> > architectural difference between vhost-user and VIRTIO devices. The
> > result is that VMMs and vhost-user backends implement increasingly
> > brittle VIRTIO configuration space and feature bit logic as they
> > knowingly or unknowingly try to paper over the fact that a traditional
> > vhost-user device isn't a full VIRTIO device.
> 
> I've always found the device feature gating in QEMU confusing. Surely we
> can rely on the daemon to properly enumerate the features it supports?

The backend's features cannot be passed through to the guest because
there is also an emulated VIRTIO transport (virtio-pci, virtio-mmio,
virtio-ccw) involved. The transport may support a different feature set
from the vhost-user backend. For example, the transport may support
Packed Virtqueues but the backend may not. So some filtering is
necessary in the VMM.

Since some of the device-specific functionality may be handled by the
VMM, then this extends beyond just the transport feature bits.

But I agree with you that it's ugly and complex. I have to re-read the
code every time because it works in a strange way.

> > It is possible to resolve this difference and make --device
> > vhost-user-device work properly for devices that want to be full
> > VIRTIO devices. See "Making VMM device shims optional" here:
> > https://blog.vmsplice.net/2020/09/on-unifying-vhost-user-and-virtio.html
> >
> > Even after extending the vhost-user 

[PATCH v2 03/13] hw/virtio: fix typo in VIRTIO_CONFIG_IRQ_IDX comments

2023-04-18 Thread Alex Bennée
Fixes: 544f0278af (virtio: introduce macro VIRTIO_CONFIG_IRQ_IDX)
Signed-off-by: Alex Bennée 
---
 hw/display/vhost-user-gpu.c| 4 ++--
 hw/net/virtio-net.c| 4 ++--
 hw/virtio/vhost-user-fs.c  | 4 ++--
 hw/virtio/vhost-user-gpio.c| 2 +-
 hw/virtio/vhost-vsock-common.c | 4 ++--
 hw/virtio/virtio-crypto.c  | 4 ++--
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index 71dfd956b8..7c61a7c3ac 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -489,7 +489,7 @@ vhost_user_gpu_guest_notifier_pending(VirtIODevice *vdev, 
int idx)
 
 /*
  * Add the check for configure interrupt, Use VIRTIO_CONFIG_IRQ_IDX -1
- * as the Marco of configure interrupt's IDX, If this driver does not
+ * as the macro of configure interrupt's IDX, If this driver does not
  * support, the function will return
  */
 
@@ -506,7 +506,7 @@ vhost_user_gpu_guest_notifier_mask(VirtIODevice *vdev, int 
idx, bool mask)
 
 /*
  * Add the check for configure interrupt, Use VIRTIO_CONFIG_IRQ_IDX -1
- * as the Marco of configure interrupt's IDX, If this driver does not
+ * as the macro of configure interrupt's IDX, If this driver does not
  * support, the function will return
  */
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 53e1c32643..c53616a080 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3359,7 +3359,7 @@ static bool 
virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
 }
 /*
  * Add the check for configure interrupt, Use VIRTIO_CONFIG_IRQ_IDX -1
- * as the Marco of configure interrupt's IDX, If this driver does not
+ * as the macro of configure interrupt's IDX, If this driver does not
  * support, the function will return false
  */
 
@@ -3391,7 +3391,7 @@ static void virtio_net_guest_notifier_mask(VirtIODevice 
*vdev, int idx,
 }
 /*
  *Add the check for configure interrupt, Use VIRTIO_CONFIG_IRQ_IDX -1
- * as the Marco of configure interrupt's IDX, If this driver does not
+ * as the macro of configure interrupt's IDX, If this driver does not
  * support, the function will return
  */
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 83fc20e49e..49d699ffc2 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -161,7 +161,7 @@ static void vuf_guest_notifier_mask(VirtIODevice *vdev, int 
idx,
 
 /*
  * Add the check for configure interrupt, Use VIRTIO_CONFIG_IRQ_IDX -1
- * as the Marco of configure interrupt's IDX, If this driver does not
+ * as the macro of configure interrupt's IDX, If this driver does not
  * support, the function will return
  */
 
@@ -177,7 +177,7 @@ static bool vuf_guest_notifier_pending(VirtIODevice *vdev, 
int idx)
 
 /*
  * Add the check for configure interrupt, Use VIRTIO_CONFIG_IRQ_IDX -1
- * as the Marco of configure interrupt's IDX, If this driver does not
+ * as the macro of configure interrupt's IDX, If this driver does not
  * support, the function will return
  */
 
diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
index d6927b610a..3b013f2d0f 100644
--- a/hw/virtio/vhost-user-gpio.c
+++ b/hw/virtio/vhost-user-gpio.c
@@ -194,7 +194,7 @@ static void vu_gpio_guest_notifier_mask(VirtIODevice *vdev, 
int idx, bool mask)
 
 /*
  * Add the check for configure interrupt, Use VIRTIO_CONFIG_IRQ_IDX -1
- * as the Marco of configure interrupt's IDX, If this driver does not
+ * as the macro of configure interrupt's IDX, If this driver does not
  * support, the function will return
  */
 
diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
index d2b5519d5a..623bdf91cc 100644
--- a/hw/virtio/vhost-vsock-common.c
+++ b/hw/virtio/vhost-vsock-common.c
@@ -129,7 +129,7 @@ static void 
vhost_vsock_common_guest_notifier_mask(VirtIODevice *vdev, int idx,
 
 /*
  * Add the check for configure interrupt, Use VIRTIO_CONFIG_IRQ_IDX -1
- * as the Marco of configure interrupt's IDX, If this driver does not
+ * as the macro of configure interrupt's IDX, If this driver does not
  * support, the function will return
  */
 
@@ -146,7 +146,7 @@ static bool 
vhost_vsock_common_guest_notifier_pending(VirtIODevice *vdev,
 
 /*
  * Add the check for configure interrupt, Use VIRTIO_CONFIG_IRQ_IDX -1
- * as the Marco of configure interrupt's IDX, If this driver does not
+ * as the macro of configure interrupt's IDX, If this driver does not
  * support, the function will return
  */
 
diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
index 802e1b9659..6b3e607329 100644
--- a/hw/virtio/virtio-crypto.c
+++ b/hw/virtio/virtio-crypto.c
@@ -1208,7 +1208,7 @@ static void 
virtio_crypto_guest_notifier_mask(VirtIODevice *vdev, int idx,
 
 /*
  * Add the 

[PATCH v2 10/13] hw/virtio: add config support to vhost-user-device

2023-04-18 Thread Alex Bennée
To use the generic device the user will need to provide the config
region size via the command line. We also add a notifier so the guest
can be pinged if the remote daemon updates the config.

With these changes:

  -device vhost-user-device-pci,virtio-id=41,num_vqs=2,config_size=8

is equivalent to:

  -device vhost-user-gpio-pci

Signed-off-by: Alex Bennée 
---
 include/hw/virtio/vhost-user-device.h |  1 +
 hw/virtio/vhost-user-device.c | 58 ++-
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/vhost-user-device.h 
b/include/hw/virtio/vhost-user-device.h
index 9105011e25..3ddf88a146 100644
--- a/include/hw/virtio/vhost-user-device.h
+++ b/include/hw/virtio/vhost-user-device.h
@@ -22,6 +22,7 @@ struct VHostUserBase {
 CharBackend chardev;
 uint16_t virtio_id;
 uint32_t num_vqs;
+uint32_t config_size;
 /* State tracking */
 VhostUserState vhost_user;
 struct vhost_virtqueue *vhost_vq;
diff --git a/hw/virtio/vhost-user-device.c b/hw/virtio/vhost-user-device.c
index b0239fa033..2b028cae08 100644
--- a/hw/virtio/vhost-user-device.c
+++ b/hw/virtio/vhost-user-device.c
@@ -117,6 +117,42 @@ static uint64_t vub_get_features(VirtIODevice *vdev,
 return vub->vhost_dev.features & ~(1ULL << VHOST_USER_F_PROTOCOL_FEATURES);
 }
 
+/*
+ * To handle VirtIO config we need to know the size of the config
+ * space. We don't cache the config but re-fetch it from the guest
+ * every time in case something has changed.
+ */
+static void vub_get_config(VirtIODevice *vdev, uint8_t *config)
+{
+VHostUserBase *vub = VHOST_USER_BASE(vdev);
+Error *local_err = NULL;
+
+/*
+ * There will have been a warning during vhost_dev_init, but lets
+ * assert here as nothing will go right now.
+ */
+g_assert(vub->config_size && vub->vhost_user.supports_config == true);
+
+if (vhost_dev_get_config(>vhost_dev, config,
+ vub->config_size, _err)) {
+error_report_err(local_err);
+}
+}
+
+/*
+ * When the daemon signals an update to the config we just need to
+ * signal the guest as we re-read the config on demand above.
+ */
+static int vub_config_notifier(struct vhost_dev *dev)
+{
+virtio_notify_config(dev->vdev);
+return 0;
+}
+
+const VhostDevConfigOps vub_config_ops = {
+.vhost_dev_config_notifier = vub_config_notifier,
+};
+
 static void vub_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 /*
@@ -141,12 +177,21 @@ static int vub_connect(DeviceState *dev)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserBase *vub = VHOST_USER_BASE(vdev);
+struct vhost_dev *vhost_dev = >vhost_dev;
 
 if (vub->connected) {
 return 0;
 }
 vub->connected = true;
 
+/*
+ * If we support VHOST_USER_GET_CONFIG we must enable the notifier
+ * so we can ping the guest when it updates.
+ */
+if (vub->vhost_user.supports_config) {
+vhost_dev_set_config_notifier(vhost_dev, _config_ops);
+}
+
 /* restore vhost state */
 if (virtio_device_started(vdev, vdev->status)) {
 vub_start(vdev);
@@ -214,11 +259,20 @@ static void vub_device_realize(DeviceState *dev, Error 
**errp)
 vub->num_vqs = 1; /* reasonable default? */
 }
 
+/*
+ * We can't handle config requests unless we know the size of the
+ * config region, specialisations of the vhost-user-device will be
+ * able to set this.
+ */
+if (vub->config_size) {
+vub->vhost_user.supports_config = true;
+}
+
 if (!vhost_user_init(>vhost_user, >chardev, errp)) {
 return;
 }
 
-virtio_init(vdev, vub->virtio_id, 0);
+virtio_init(vdev, vub->virtio_id, vub->config_size);
 
 /*
  * Disable guest notifiers, by default all notifications will be via the
@@ -268,6 +322,7 @@ static void vub_class_init(ObjectClass *klass, void *data)
 vdc->realize = vub_device_realize;
 vdc->unrealize = vub_device_unrealize;
 vdc->get_features = vub_get_features;
+vdc->get_config = vub_get_config;
 vdc->set_status = vub_set_status;
 }
 
@@ -295,6 +350,7 @@ static Property vud_properties[] = {
 DEFINE_PROP_CHR("chardev", VHostUserBase, chardev),
 DEFINE_PROP_UINT16("virtio-id", VHostUserBase, virtio_id, 0),
 DEFINE_PROP_UINT32("num_vqs", VHostUserBase, num_vqs, 1),
+DEFINE_PROP_UINT32("config_size", VHostUserBase, config_size, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.39.2




[PATCH v2 01/13] include: attempt to document device_class_set_props

2023-04-18 Thread Alex Bennée
I'm still not sure how I achieve by use case of the parent class
defining the following properties:

  static Property vud_properties[] = {
  DEFINE_PROP_CHR("chardev", VHostUserDevice, chardev),
  DEFINE_PROP_UINT16("id", VHostUserDevice, id, 0),
  DEFINE_PROP_UINT32("num_vqs", VHostUserDevice, num_vqs, 1),
  DEFINE_PROP_END_OF_LIST(),
  };

But for the specialisation of the class I want the id to default to
the actual device id, e.g.:

  static Property vu_rng_properties[] = {
  DEFINE_PROP_UINT16("id", VHostUserDevice, id, VIRTIO_ID_RNG),
  DEFINE_PROP_UINT32("num_vqs", VHostUserDevice, num_vqs, 1),
  DEFINE_PROP_END_OF_LIST(),
  };

And so far the API for doing that isn't super clear.

Signed-off-by: Alex Bennée 
---
 include/hw/qdev-core.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index bd50ad5ee1..d4bbc30c92 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -776,6 +776,15 @@ BusState *sysbus_get_default(void);
 char *qdev_get_fw_dev_path(DeviceState *dev);
 char *qdev_get_own_fw_dev_path_from_handler(BusState *bus, DeviceState *dev);
 
+/**
+ * device_class_set_props(): add a set of properties to an device
+ * @dc: the parent DeviceClass all devices inherit
+ * @props: an array of properties, terminate by DEFINE_PROP_END_OF_LIST()
+ *
+ * This will add a set of properties to the object. It will fault if
+ * you attempt to add an existing property defined by a parent class.
+ * To modify an inherited property you need to use
+ */
 void device_class_set_props(DeviceClass *dc, Property *props);
 
 /**
-- 
2.39.2




[PATCH v2 12/13] hw/virtio: derive vhost-user-i2c from vhost-user-base

2023-04-18 Thread Alex Bennée
Now we can take advantage of the new base class and make
vhost-user-i2c a much simpler boilerplate wrapper. Also as this
doesn't require any target specific hacks we only need to build the
stubs once.

Signed-off-by: Alex Bennée 

---
v2
  - update to new inheritance scheme
  - move build to common code
---
 include/hw/virtio/vhost-user-i2c.h |  18 +-
 hw/virtio/vhost-user-i2c.c | 255 ++---
 hw/virtio/meson.build  |   5 +-
 3 files changed, 26 insertions(+), 252 deletions(-)

diff --git a/include/hw/virtio/vhost-user-i2c.h 
b/include/hw/virtio/vhost-user-i2c.h
index 0f7acd40e3..47153782d1 100644
--- a/include/hw/virtio/vhost-user-i2c.h
+++ b/include/hw/virtio/vhost-user-i2c.h
@@ -12,20 +12,18 @@
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
 
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
+#include "hw/virtio/vhost-user-device.h"
+
 #define TYPE_VHOST_USER_I2C "vhost-user-i2c-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VHostUserI2C, VHOST_USER_I2C)
 
 struct VHostUserI2C {
-VirtIODevice parent;
-CharBackend chardev;
-struct vhost_virtqueue *vhost_vq;
-struct vhost_dev vhost_dev;
-VhostUserState vhost_user;
-VirtQueue *vq;
-bool connected;
+/*< private >*/
+VHostUserBase parent;
+/*< public >*/
 };
 
-/* Virtio Feature bits */
-#define VIRTIO_I2C_F_ZERO_LENGTH_REQUEST   0
-
 #endif /* QEMU_VHOST_USER_I2C_H */
diff --git a/hw/virtio/vhost-user-i2c.c b/hw/virtio/vhost-user-i2c.c
index 60eaf0d95b..4a1f644a87 100644
--- a/hw/virtio/vhost-user-i2c.c
+++ b/hw/virtio/vhost-user-i2c.c
@@ -14,237 +14,21 @@
 #include "qemu/error-report.h"
 #include "standard-headers/linux/virtio_ids.h"
 
-static const int feature_bits[] = {
-VIRTIO_I2C_F_ZERO_LENGTH_REQUEST,
-VIRTIO_F_RING_RESET,
-VHOST_INVALID_FEATURE_BIT
+static Property vi2c_properties[] = {
+DEFINE_PROP_CHR("chardev", VHostUserBase, chardev),
+DEFINE_PROP_END_OF_LIST(),
 };
 
-static void vu_i2c_start(VirtIODevice *vdev)
-{
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-VHostUserI2C *i2c = VHOST_USER_I2C(vdev);
-int ret, i;
-
-if (!k->set_guest_notifiers) {
-error_report("binding does not support guest notifiers");
-return;
-}
-
-ret = vhost_dev_enable_notifiers(>vhost_dev, vdev);
-if (ret < 0) {
-error_report("Error enabling host notifiers: %d", -ret);
-return;
-}
-
-ret = k->set_guest_notifiers(qbus->parent, i2c->vhost_dev.nvqs, true);
-if (ret < 0) {
-error_report("Error binding guest notifier: %d", -ret);
-goto err_host_notifiers;
-}
-
-i2c->vhost_dev.acked_features = vdev->guest_features;
-
-ret = vhost_dev_start(>vhost_dev, vdev, true);
-if (ret < 0) {
-error_report("Error starting vhost-user-i2c: %d", -ret);
-goto err_guest_notifiers;
-}
-
-/*
- * guest_notifier_mask/pending not used yet, so just unmask
- * everything here. virtio-pci will do the right thing by
- * enabling/disabling irqfd.
- */
-for (i = 0; i < i2c->vhost_dev.nvqs; i++) {
-vhost_virtqueue_mask(>vhost_dev, vdev, i, false);
-}
-
-return;
-
-err_guest_notifiers:
-k->set_guest_notifiers(qbus->parent, i2c->vhost_dev.nvqs, false);
-err_host_notifiers:
-vhost_dev_disable_notifiers(>vhost_dev, vdev);
-}
-
-static void vu_i2c_stop(VirtIODevice *vdev)
-{
-VHostUserI2C *i2c = VHOST_USER_I2C(vdev);
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-int ret;
-
-if (!k->set_guest_notifiers) {
-return;
-}
-
-vhost_dev_stop(>vhost_dev, vdev, true);
-
-ret = k->set_guest_notifiers(qbus->parent, i2c->vhost_dev.nvqs, false);
-if (ret < 0) {
-error_report("vhost guest notifier cleanup failed: %d", ret);
-return;
-}
-
-vhost_dev_disable_notifiers(>vhost_dev, vdev);
-}
-
-static void vu_i2c_set_status(VirtIODevice *vdev, uint8_t status)
-{
-VHostUserI2C *i2c = VHOST_USER_I2C(vdev);
-bool should_start = virtio_device_should_start(vdev, status);
-
-if (vhost_dev_is_started(>vhost_dev) == should_start) {
-return;
-}
-
-if (should_start) {
-vu_i2c_start(vdev);
-} else {
-vu_i2c_stop(vdev);
-}
-}
-
-static uint64_t vu_i2c_get_features(VirtIODevice *vdev,
-uint64_t requested_features, Error **errp)
-{
-VHostUserI2C *i2c = VHOST_USER_I2C(vdev);
-
-virtio_add_feature(_features, VIRTIO_I2C_F_ZERO_LENGTH_REQUEST);
-return vhost_get_features(>vhost_dev, feature_bits, 
requested_features);
-}
-
-static void vu_i2c_handle_output(VirtIODevice *vdev, VirtQueue *vq)
-{
-/*
- * Not normally called; it's the daemon that handles the queue;
- * however virtio's cleanup path can call this.
-  

[PATCH v2 11/13] hw/virtio: derive vhost-user-gpio from vhost-user-device

2023-04-18 Thread Alex Bennée
Now the new base class supports config handling we can take advantage
and make vhost-user-gpio a much simpler boilerplate wrapper. Also as
this doesn't require any target specific hacks we only need to build
the stubs once.

Signed-off-by: Alex Bennée 

---
v2
  - use new vhost-user-base
  - move build to common code
---
 include/hw/virtio/vhost-user-gpio.h |  23 +-
 hw/virtio/vhost-user-gpio.c | 400 ++--
 hw/virtio/meson.build   |   5 +-
 3 files changed, 22 insertions(+), 406 deletions(-)

diff --git a/include/hw/virtio/vhost-user-gpio.h 
b/include/hw/virtio/vhost-user-gpio.h
index a9d3f9b049..0948654dec 100644
--- a/include/hw/virtio/vhost-user-gpio.h
+++ b/include/hw/virtio/vhost-user-gpio.h
@@ -12,33 +12,14 @@
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
-#include "standard-headers/linux/virtio_gpio.h"
-#include "chardev/char-fe.h"
+#include "hw/virtio/vhost-user-device.h"
 
 #define TYPE_VHOST_USER_GPIO "vhost-user-gpio-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VHostUserGPIO, VHOST_USER_GPIO);
 
 struct VHostUserGPIO {
 /*< private >*/
-VirtIODevice parent_obj;
-CharBackend chardev;
-struct virtio_gpio_config config;
-struct vhost_virtqueue *vhost_vqs;
-struct vhost_dev vhost_dev;
-VhostUserState vhost_user;
-VirtQueue *command_vq;
-VirtQueue *interrupt_vq;
-/**
- * There are at least two steps of initialization of the
- * vhost-user device. The first is a "connect" step and
- * second is a "start" step. Make a separation between
- * those initialization phases by using two fields.
- *
- * @connected: see vu_gpio_connect()/vu_gpio_disconnect()
- * @started_vu: see vu_gpio_start()/vu_gpio_stop()
- */
-bool connected;
-bool started_vu;
+VHostUserBase parent;
 /*< public >*/
 };
 
diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
index 3b013f2d0f..9f37c25415 100644
--- a/hw/virtio/vhost-user-gpio.c
+++ b/hw/virtio/vhost-user-gpio.c
@@ -11,382 +11,25 @@
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/vhost-user-gpio.h"
-#include "qemu/error-report.h"
 #include "standard-headers/linux/virtio_ids.h"
-#include "trace.h"
+#include "standard-headers/linux/virtio_gpio.h"
 
-#define REALIZE_CONNECTION_RETRIES 3
-#define VHOST_NVQS 2
-
-/* Features required from VirtIO */
-static const int feature_bits[] = {
-VIRTIO_F_VERSION_1,
-VIRTIO_F_NOTIFY_ON_EMPTY,
-VIRTIO_RING_F_INDIRECT_DESC,
-VIRTIO_RING_F_EVENT_IDX,
-VIRTIO_GPIO_F_IRQ,
-VIRTIO_F_RING_RESET,
-VHOST_INVALID_FEATURE_BIT
-};
-
-static void vu_gpio_get_config(VirtIODevice *vdev, uint8_t *config)
-{
-VHostUserGPIO *gpio = VHOST_USER_GPIO(vdev);
-
-memcpy(config, >config, sizeof(gpio->config));
-}
-
-static int vu_gpio_config_notifier(struct vhost_dev *dev)
-{
-VHostUserGPIO *gpio = VHOST_USER_GPIO(dev->vdev);
-
-memcpy(dev->vdev->config, >config, sizeof(gpio->config));
-virtio_notify_config(dev->vdev);
-
-return 0;
-}
-
-const VhostDevConfigOps gpio_ops = {
-.vhost_dev_config_notifier = vu_gpio_config_notifier,
+static Property vgpio_properties[] = {
+DEFINE_PROP_CHR("chardev", VHostUserBase, chardev),
+DEFINE_PROP_END_OF_LIST(),
 };
 
-static int vu_gpio_start(VirtIODevice *vdev)
+static void vgpio_realize(DeviceState *dev, Error **errp)
 {
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-VHostUserGPIO *gpio = VHOST_USER_GPIO(vdev);
-struct vhost_dev *vhost_dev = >vhost_dev;
-int ret, i;
+VHostUserBase *vub = VHOST_USER_BASE(dev);
+VHostUserBaseClass *vubc = VHOST_USER_BASE_GET_CLASS(dev);
 
-if (!k->set_guest_notifiers) {
-error_report("binding does not support guest notifiers");
-return -ENOSYS;
-}
+/* Fixed for GPIO */
+vub->virtio_id = VIRTIO_ID_GPIO;
+vub->num_vqs = 2;
+vub->config_size = sizeof(struct virtio_gpio_config);
 
-ret = vhost_dev_enable_notifiers(vhost_dev, vdev);
-if (ret < 0) {
-error_report("Error enabling host notifiers: %d", ret);
-return ret;
-}
-
-ret = k->set_guest_notifiers(qbus->parent, vhost_dev->nvqs, true);
-if (ret < 0) {
-error_report("Error binding guest notifier: %d", ret);
-goto err_host_notifiers;
-}
-
-/*
- * Before we start up we need to ensure we have the final feature
- * set needed for the vhost configuration. The backend may also
- * apply backend_features when the feature set is sent.
- */
-vhost_ack_features(>vhost_dev, feature_bits, vdev->guest_features);
-
-ret = vhost_dev_start(>vhost_dev, vdev, false);
-if (ret < 0) {
-error_report("Error starting vhost-user-gpio: %d", ret);
-goto err_guest_notifiers;
-}
-gpio->started_vu = true;
-
-/*
- * guest_notifier_mask/pending 

[PATCH v2 07/13] virtio: add vhost-user-base and a generic vhost-user-device

2023-04-18 Thread Alex Bennée
In theory we shouldn't need to repeat so much boilerplate to support
vhost-user backends. This provides a generic vhost-user-base QOM
object and a derived vhost-user-device for which the user needs to
provide the few bits of information that aren't currently provided by
the vhost-user protocol. This should provide a baseline implementation
from which the other vhost-user stub can specialise.

Signed-off-by: Alex Bennée 

---
v2
  - split into vub and vud
---
 include/hw/virtio/vhost-user-device.h |  45 
 hw/virtio/vhost-user-device.c | 324 ++
 hw/virtio/meson.build |   2 +
 3 files changed, 371 insertions(+)
 create mode 100644 include/hw/virtio/vhost-user-device.h
 create mode 100644 hw/virtio/vhost-user-device.c

diff --git a/include/hw/virtio/vhost-user-device.h 
b/include/hw/virtio/vhost-user-device.h
new file mode 100644
index 00..9105011e25
--- /dev/null
+++ b/include/hw/virtio/vhost-user-device.h
@@ -0,0 +1,45 @@
+/*
+ * Vhost-user generic virtio device
+ *
+ * Copyright (c) 2023 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef QEMU_VHOST_USER_DEVICE_H
+#define QEMU_VHOST_USER_DEVICE_H
+
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
+
+#define TYPE_VHOST_USER_BASE "vhost-user-base"
+
+OBJECT_DECLARE_TYPE(VHostUserBase, VHostUserBaseClass, VHOST_USER_BASE)
+
+struct VHostUserBase {
+VirtIODevice parent;
+/* Properties */
+CharBackend chardev;
+uint16_t virtio_id;
+uint32_t num_vqs;
+/* State tracking */
+VhostUserState vhost_user;
+struct vhost_virtqueue *vhost_vq;
+struct vhost_dev vhost_dev;
+GPtrArray *vqs;
+bool connected;
+};
+
+/* needed so we can use the base realize after specialisation
+   tweaks */
+struct VHostUserBaseClass {
+/*< private >*/
+VirtioDeviceClass parent_class;
+/*< public >*/
+DeviceRealize parent_realize;
+};
+
+/* shared for the benefit of the derived pci class */
+#define TYPE_VHOST_USER_DEVICE "vhost-user-device"
+
+#endif /* QEMU_VHOST_USER_DEVICE_H */
diff --git a/hw/virtio/vhost-user-device.c b/hw/virtio/vhost-user-device.c
new file mode 100644
index 00..b0239fa033
--- /dev/null
+++ b/hw/virtio/vhost-user-device.c
@@ -0,0 +1,324 @@
+/*
+ * Generic vhost-user stub. This can be used to connect to any
+ * vhost-user backend. All configuration details must be handled by
+ * the vhost-user daemon itself
+ *
+ * Copyright (c) 2023 Linaro Ltd
+ * Author: Alex Bennée 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/vhost-user-device.h"
+#include "qemu/error-report.h"
+
+static void vub_start(VirtIODevice *vdev)
+{
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+VHostUserBase *vub = VHOST_USER_BASE(vdev);
+int ret, i;
+
+if (!k->set_guest_notifiers) {
+error_report("binding does not support guest notifiers");
+return;
+}
+
+ret = vhost_dev_enable_notifiers(>vhost_dev, vdev);
+if (ret < 0) {
+error_report("Error enabling host notifiers: %d", -ret);
+return;
+}
+
+ret = k->set_guest_notifiers(qbus->parent, vub->vhost_dev.nvqs, true);
+if (ret < 0) {
+error_report("Error binding guest notifier: %d", -ret);
+goto err_host_notifiers;
+}
+
+vub->vhost_dev.acked_features = vdev->guest_features;
+
+ret = vhost_dev_start(>vhost_dev, vdev, true);
+if (ret < 0) {
+error_report("Error starting vhost-user-device: %d", -ret);
+goto err_guest_notifiers;
+}
+
+/*
+ * guest_notifier_mask/pending not used yet, so just unmask
+ * everything here. virtio-pci will do the right thing by
+ * enabling/disabling irqfd.
+ */
+for (i = 0; i < vub->vhost_dev.nvqs; i++) {
+vhost_virtqueue_mask(>vhost_dev, vdev, i, false);
+}
+
+return;
+
+err_guest_notifiers:
+k->set_guest_notifiers(qbus->parent, vub->vhost_dev.nvqs, false);
+err_host_notifiers:
+vhost_dev_disable_notifiers(>vhost_dev, vdev);
+}
+
+static void vub_stop(VirtIODevice *vdev)
+{
+VHostUserBase *vub = VHOST_USER_BASE(vdev);
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+int ret;
+
+if (!k->set_guest_notifiers) {
+return;
+}
+
+vhost_dev_stop(>vhost_dev, vdev, true);
+
+ret = k->set_guest_notifiers(qbus->parent, vub->vhost_dev.nvqs, false);
+if (ret < 0) {
+error_report("vhost guest notifier cleanup failed: %d", ret);
+return;
+}
+
+vhost_dev_disable_notifiers(>vhost_dev, vdev);
+}
+
+static void vub_set_status(VirtIODevice *vdev, uint8_t status)
+{
+VHostUserBase *vub = VHOST_USER_BASE(vdev);
+bool should_start = virtio_device_should_start(vdev, 

[PATCH v2 02/13] include/hw: document the device_class_set_parent_* fns

2023-04-18 Thread Alex Bennée
These are useful functions for when you want proper inheritance of
functionality across realize/unrealize calls.

Signed-off-by: Alex Bennée 
---
 include/hw/qdev-core.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index d4bbc30c92..b1d194b561 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -795,9 +795,36 @@ void device_class_set_props(DeviceClass *dc, Property 
*props);
 void device_class_set_parent_reset(DeviceClass *dc,
DeviceReset dev_reset,
DeviceReset *parent_reset);
+
+/**
+ * device_class_set_parent_realize(): set up for chaining realize fns
+ * @dc: The device class
+ * @dev_realize: the device realize function
+ * @parent_realize: somewhere to save the parents realize function
+ *
+ * This is intended to be used when the new realize function will
+ * eventually call its parent realization function during creation.
+ * This requires storing the function call somewhere (usually in the
+ * instance structure) so you can eventually call:
+ *   my_dev->parent_realize(dev, errp);
+ */
 void device_class_set_parent_realize(DeviceClass *dc,
  DeviceRealize dev_realize,
  DeviceRealize *parent_realize);
+
+
+/**
+ * device_class_set_parent_unrealize(): set up for chaining unrealize fns
+ * @dc: The device class
+ * @dev_unrealize: the device realize function
+ * @parent_unrealize: somewhere to save the parents unrealize function
+ *
+ * This is intended to be used when the new unrealize function will
+ * eventually call its parent unrealization function during the
+ * unrealize phase. This requires storing the function call somewhere
+ * (usually in the instance structure) so you can eventually call:
+ *   my_dev->parent_unrealize(dev);
+ */
 void device_class_set_parent_unrealize(DeviceClass *dc,
DeviceUnrealize dev_unrealize,
DeviceUnrealize *parent_unrealize);
-- 
2.39.2




[PATCH v2 05/13] include/hw/virtio: add kerneldoc for virtio_init

2023-04-18 Thread Alex Bennée
Signed-off-by: Alex Bennée 
---
 include/hw/virtio/virtio.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 22ec098462..1ba7a9dd74 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -217,6 +217,12 @@ struct VirtioDeviceClass {
 void virtio_instance_init_common(Object *proxy_obj, void *data,
  size_t vdev_size, const char *vdev_name);
 
+/**
+ * virtio_init() - initialise the common VirtIODevice structure
+ * @vdev: pointer to VirtIODevice
+ * @device_id: the VirtIO device ID (see virtio_ids.h)
+ * @config_size: size of the config space
+ */
 void virtio_init(VirtIODevice *vdev, uint16_t device_id, size_t config_size);
 
 void virtio_cleanup(VirtIODevice *vdev);
-- 
2.39.2




[PATCH v2 08/13] virtio: add PCI stub for vhost-user-device

2023-04-18 Thread Alex Bennée
This is all pretty much boilerplate.

Signed-off-by: Alex Bennée 
Tested-by: Erik Schilling 
---
 hw/virtio/vhost-user-device-pci.c | 71 +++
 hw/virtio/meson.build |  1 +
 2 files changed, 72 insertions(+)
 create mode 100644 hw/virtio/vhost-user-device-pci.c

diff --git a/hw/virtio/vhost-user-device-pci.c 
b/hw/virtio/vhost-user-device-pci.c
new file mode 100644
index 00..41f9b7905b
--- /dev/null
+++ b/hw/virtio/vhost-user-device-pci.c
@@ -0,0 +1,71 @@
+/*
+ * Vhost-user generic virtio device PCI glue
+ *
+ * Copyright (c) 2023 Linaro Ltd
+ * Author: Alex Bennée 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/vhost-user-device.h"
+#include "hw/virtio/virtio-pci.h"
+
+struct VHostUserDevicePCI {
+VirtIOPCIProxy parent_obj;
+VHostUserBase vub;
+};
+
+typedef struct VHostUserDevicePCI VHostUserDevicePCI;
+
+#define TYPE_VHOST_USER_DEVICE_PCI "vhost-user-device-pci-base"
+
+DECLARE_INSTANCE_CHECKER(VHostUserDevicePCI,
+ VHOST_USER_DEVICE_PCI,
+ TYPE_VHOST_USER_DEVICE_PCI)
+
+static void vhost_user_device_pci_realize(VirtIOPCIProxy *vpci_dev, Error 
**errp)
+{
+VHostUserDevicePCI *dev = VHOST_USER_DEVICE_PCI(vpci_dev);
+DeviceState *vdev = DEVICE(>vub);
+
+vpci_dev->nvectors = 1;
+qdev_realize(vdev, BUS(_dev->bus), errp);
+}
+
+static void vhost_user_device_pci_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+k->realize = vhost_user_device_pci_realize;
+set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
+pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+pcidev_k->device_id = 0; /* Set by virtio-pci based on virtio id */
+pcidev_k->revision = 0x00;
+pcidev_k->class_id = PCI_CLASS_COMMUNICATION_OTHER;
+}
+
+static void vhost_user_device_pci_instance_init(Object *obj)
+{
+VHostUserDevicePCI *dev = VHOST_USER_DEVICE_PCI(obj);
+
+virtio_instance_init_common(obj, >vub, sizeof(dev->vub),
+TYPE_VHOST_USER_DEVICE);
+}
+
+static const VirtioPCIDeviceTypeInfo vhost_user_device_pci_info = {
+.base_name = TYPE_VHOST_USER_DEVICE_PCI,
+.non_transitional_name = "vhost-user-device-pci",
+.instance_size = sizeof(VHostUserDevicePCI),
+.instance_init = vhost_user_device_pci_instance_init,
+.class_init = vhost_user_device_pci_class_init,
+};
+
+static void vhost_user_device_pci_register(void)
+{
+virtio_pci_types_register(_user_device_pci_info);
+}
+
+type_init(vhost_user_device_pci_register);
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 43e5fa3f7d..c0a86b94ae 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -13,6 +13,7 @@ if have_vhost
 # fixme - this really should be generic
 specific_virtio_ss.add(files('vhost-user.c'))
 softmmu_virtio_ss.add(files('vhost-user-device.c'))
+softmmu_virtio_ss.add(when: 'CONFIG_VIRTIO_PCI', if_true: 
files('vhost-user-device-pci.c'))
   endif
   if have_vhost_vdpa
 specific_virtio_ss.add(files('vhost-vdpa.c', 'vhost-shadow-virtqueue.c'))
-- 
2.39.2




[PATCH v2 2/8] target/riscv: Move pmp_get_tlb_size apart from get_physical_address_pmp

2023-04-18 Thread Weiwei Li
pmp_get_tlb_size can be separated from get_physical_address_pmp and is only
needed when ret == TRANSLATE_SUCCESS.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu_helper.c | 21 +++--
 target/riscv/pmp.c|  4 
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 075fc0538a..ea08ca9fbb 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -676,14 +676,11 @@ void riscv_cpu_set_mode(CPURISCVState *env, target_ulong 
newpriv)
  *
  * @env: CPURISCVState
  * @prot: The returned protection attributes
- * @tlb_size: TLB page size containing addr. It could be modified after PMP
- *permission checking. NULL if not set TLB page for addr.
  * @addr: The physical address to be checked permission
  * @access_type: The type of MMU access
  * @mode: Indicates current privilege level.
  */
-static int get_physical_address_pmp(CPURISCVState *env, int *prot,
-target_ulong *tlb_size, hwaddr addr,
+static int get_physical_address_pmp(CPURISCVState *env, int *prot, hwaddr addr,
 int size, MMUAccessType access_type,
 int mode)
 {
@@ -703,9 +700,6 @@ static int get_physical_address_pmp(CPURISCVState *env, int 
*prot,
 }
 
 *prot = pmp_priv_to_page_prot(pmp_priv);
-if (tlb_size != NULL) {
-*tlb_size = pmp_get_tlb_size(env, addr);
-}
 
 return TRANSLATE_SUCCESS;
 }
@@ -905,7 +899,7 @@ restart:
 }
 
 int pmp_prot;
-int pmp_ret = get_physical_address_pmp(env, _prot, NULL, pte_addr,
+int pmp_ret = get_physical_address_pmp(env, _prot, pte_addr,
sizeof(target_ulong),
MMU_DATA_LOAD, PRV_S);
 if (pmp_ret != TRANSLATE_SUCCESS) {
@@ -1300,13 +1294,12 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, 
int size,
 prot &= prot2;
 
 if (ret == TRANSLATE_SUCCESS) {
-ret = get_physical_address_pmp(env, _pmp, _size, pa,
+ret = get_physical_address_pmp(env, _pmp, pa,
size, access_type, mode);
 
 qemu_log_mask(CPU_LOG_MMU,
   "%s PMP address=" HWADDR_FMT_plx " ret %d prot"
-  " %d tlb_size " TARGET_FMT_lu "\n",
-  __func__, pa, ret, prot_pmp, tlb_size);
+  " %d\n", __func__, pa, ret, prot_pmp);
 
 prot &= prot_pmp;
 }
@@ -1333,13 +1326,12 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, 
int size,
   __func__, address, ret, pa, prot);
 
 if (ret == TRANSLATE_SUCCESS) {
-ret = get_physical_address_pmp(env, _pmp, _size, pa,
+ret = get_physical_address_pmp(env, _pmp, pa,
size, access_type, mode);
 
 qemu_log_mask(CPU_LOG_MMU,
   "%s PMP address=" HWADDR_FMT_plx " ret %d prot"
-  " %d tlb_size " TARGET_FMT_lu "\n",
-  __func__, pa, ret, prot_pmp, tlb_size);
+  " %d\n", __func__, pa, ret, prot_pmp);
 
 prot &= prot_pmp;
 }
@@ -1350,6 +1342,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 }
 
 if (ret == TRANSLATE_SUCCESS) {
+tlb_size = pmp_get_tlb_size(env, pa);
 tlb_set_page(cs, address & ~(tlb_size - 1), pa & ~(tlb_size - 1),
  prot, mmu_idx, tlb_size);
 return true;
diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 78bcd969ec..643388dc23 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -612,6 +612,10 @@ target_ulong pmp_get_tlb_size(CPURISCVState *env, 
target_ulong addr)
 target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
 int i;
 
+if (!riscv_cpu_cfg(env)->pmp || !pmp_get_num_rules(env)) {
+return TARGET_PAGE_SIZE;
+}
+
 for (i = 0; i < MAX_RISCV_PMPS; i++) {
 pmp_sa = env->pmp_state.addr[i].sa;
 pmp_ea = env->pmp_state.addr[i].ea;
-- 
2.25.1




[PATCH v2 7/8] target/riscv: Make the short cut really work in pmp_hart_has_privs

2023-04-18 Thread Weiwei Li
We needn't check the PMP entries if there is no PMP rules.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/pmp.c | 251 ++---
 1 file changed, 123 insertions(+), 128 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 37bc76c474..67347c5887 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -315,149 +315,144 @@ int pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
 target_ulong e = 0;
 
 /* Short cut if no rules */
-if (0 == pmp_get_num_rules(env)) {
-if (pmp_hart_has_privs_default(env, addr, size, privs,
-   allowed_privs, mode)) {
-ret = MAX_RISCV_PMPS;
-}
-}
-
-if (size == 0) {
-if (riscv_cpu_cfg(env)->mmu) {
-/*
- * If size is unknown (0), assume that all bytes
- * from addr to the end of the page will be accessed.
- */
-pmp_size = -(addr | TARGET_PAGE_MASK);
+if (pmp_get_num_rules(env) != 0) {
+if (size == 0) {
+if (riscv_cpu_cfg(env)->mmu) {
+/*
+ * If size is unknown (0), assume that all bytes
+ * from addr to the end of the page will be accessed.
+ */
+pmp_size = -(addr | TARGET_PAGE_MASK);
+} else {
+pmp_size = sizeof(target_ulong);
+}
 } else {
-pmp_size = sizeof(target_ulong);
-}
-} else {
-pmp_size = size;
-}
-
-/*
- * 1.10 draft priv spec states there is an implicit order
- * from low to high
- */
-for (i = 0; i < MAX_RISCV_PMPS; i++) {
-s = pmp_is_in_range(env, i, addr);
-e = pmp_is_in_range(env, i, addr + pmp_size - 1);
-
-/* partially inside */
-if ((s + e) == 1) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "pmp violation - access is partially inside\n");
-ret = -1;
-break;
+pmp_size = size;
 }
 
-/* fully inside */
-const uint8_t a_field =
-pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg);
-
 /*
- * Convert the PMP permissions to match the truth table in the
- * ePMP spec.
+ * 1.10 draft priv spec states there is an implicit order
+ * from low to high
  */
-const uint8_t epmp_operation =
-((env->pmp_state.pmp[i].cfg_reg & PMP_LOCK) >> 4) |
-((env->pmp_state.pmp[i].cfg_reg & PMP_READ) << 2) |
-(env->pmp_state.pmp[i].cfg_reg & PMP_WRITE) |
-((env->pmp_state.pmp[i].cfg_reg & PMP_EXEC) >> 2);
+for (i = 0; i < MAX_RISCV_PMPS; i++) {
+s = pmp_is_in_range(env, i, addr);
+e = pmp_is_in_range(env, i, addr + pmp_size - 1);
+
+/* partially inside */
+if ((s + e) == 1) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "pmp violation - access is partially inside\n");
+ret = -1;
+break;
+}
+
+/* fully inside */
+const uint8_t a_field =
+pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg);
 
-if (((s + e) == 2) && (PMP_AMATCH_OFF != a_field)) {
 /*
- * If the PMP entry is not off and the address is in range,
- * do the priv check
+ * Convert the PMP permissions to match the truth table in the
+ * ePMP spec.
  */
-if (!MSECCFG_MML_ISSET(env)) {
-/*
- * If mseccfg.MML Bit is not set, do pmp priv check
- * This will always apply to regular PMP.
- */
-*allowed_privs = PMP_READ | PMP_WRITE | PMP_EXEC;
-if ((mode != PRV_M) || pmp_is_locked(env, i)) {
-*allowed_privs &= env->pmp_state.pmp[i].cfg_reg;
-}
-} else {
+const uint8_t epmp_operation =
+((env->pmp_state.pmp[i].cfg_reg & PMP_LOCK) >> 4) |
+((env->pmp_state.pmp[i].cfg_reg & PMP_READ) << 2) |
+(env->pmp_state.pmp[i].cfg_reg & PMP_WRITE) |
+((env->pmp_state.pmp[i].cfg_reg & PMP_EXEC) >> 2);
+
+if (((s + e) == 2) && (PMP_AMATCH_OFF != a_field)) {
 /*
- * If mseccfg.MML Bit set, do the enhanced pmp priv check
+ * If the PMP entry is not off and the address is in range,
+ * do the priv check
  */
-if (mode == PRV_M) {
-switch (epmp_operation) {
-case 0:
-case 1:
-case 4:
-case 5:
-case 6:
-case 7:
-case 8:
-

[PATCH] configure: Honour cross-prefix when finding ObjC compiler

2023-04-18 Thread Peter Maydell
Currently when configure picks an ObjectiveC compiler it doesn't pay
attention to the cross-prefix.  This isn't a big deal in practice,
because we only use ObjC on macos and you can't cross-compile to
macos.  But it's a bit inconsistent.

Rearrange the handling of objcc in configure so that we do the
same thing that we do with cc and cxx. This means that the logic
for picking the ObjC compiler goes from:
 if --objcc is specified, use that
 otherwise if clang is available, use that
 otherwise use $cc
to:
 if --objcc is specified, use that
 otherwise if --cross-prefix is specified, use ${cross_prefix}clang
 otherwise if clang is available, use that
 otherwise use $cc

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1185
Signed-off-by: Peter Maydell 
---
 configure | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/configure b/configure
index 800b5850f40..646048db706 100755
--- a/configure
+++ b/configure
@@ -316,6 +316,8 @@ for opt do
   ;;
   --cxx=*) CXX="$optarg"
   ;;
+  --objcc=*) objcc="$optarg"
+  ;;
   --cpu=*) cpu="$optarg"
   ;;
   --extra-cflags=*)
@@ -361,6 +363,21 @@ else
   cxx="${CXX-${cross_prefix}g++}"
 fi
 
+# Preferred ObjC compiler:
+# $objcc (if set, i.e. via --objcc option)
+# ${cross_prefix}clang (if cross-prefix specified)
+# clang (if available)
+# $cc
+if test -z "${objcc}${cross_prefix}"; then
+  if has clang; then
+objcc=clang
+  else
+objcc="$cc"
+  fi
+else
+  objcc="${objcc-${cross_prefix}clang}"
+fi
+
 ar="${AR-${cross_prefix}ar}"
 as="${AS-${cross_prefix}as}"
 ccas="${CCAS-$cc}"
@@ -647,13 +664,6 @@ do
 fi
 done
 
-# Default objcc to clang if available, otherwise use CC
-if has clang; then
-  objcc=clang
-else
-  objcc="$cc"
-fi
-
 if test "$mingw32" = "yes" ; then
   EXESUF=".exe"
   # MinGW needs -mthreads for TLS and macro _MT.
@@ -713,7 +723,7 @@ for opt do
   ;;
   --cxx=*)
   ;;
-  --objcc=*) objcc="$optarg"
+  --objcc=*)
   ;;
   --make=*) make="$optarg"
   ;;
-- 
2.34.1




Re: [PATCH v19 01/21] s390x/cpu topology: add s390 specifics to CPU topology

2023-04-18 Thread Nina Schoetterl-Glausch
> On 4/18/23 14:38, Nina Schoetterl-Glausch wrote:
> > On Tue, 2023-04-18 at 12:01 +0200, Pierre Morel wrote:
> > > On 4/18/23 10:53, Nina Schoetterl-Glausch wrote:
> > > > On Mon, 2023-04-03 at 18:28 +0200, Pierre Morel wrote:
> > > > > S390 adds two new SMP levels, drawers and books to the CPU
> > > > > topology.
> > > > > The S390 CPU have specific topology features like dedication
> > > > > and entitlement to give to the guest indications on the host
> > > > > vCPUs scheduling and help the guest take the best decisions
> > > > > on the scheduling of threads on the vCPUs.
> > > > > 
> > > > > Let us provide the SMP properties with books and drawers levels
> > > > > and S390 CPU with dedication and entitlement,
> > > > > 
> > > > > Signed-off-by: Pierre Morel 
> > > > > Reviewed-by: Thomas Huth 
> > > > > ---
> > [...]
> > > > > diff --git a/qapi/machine-common.json b/qapi/machine-common.json
> > > > > new file mode 100644
> > > > > index 00..73ea38d976
> > > > > --- /dev/null
> > > > > +++ b/qapi/machine-common.json
> > > > > @@ -0,0 +1,22 @@
> > > > > +# -*- Mode: Python -*-
> > > > > +# vim: filetype=python
> > > > > +#
> > > > > +# This work is licensed under the terms of the GNU GPL, version 2 or 
> > > > > later.
> > > > > +# See the COPYING file in the top-level directory.
> > > > > +
> > > > > +##
> > > > > +# = Machines S390 data types
> > > > > +##
> > > > > +
> > > > > +##
> > > > > +# @CpuS390Entitlement:
> > > > > +#
> > > > > +# An enumeration of cpu entitlements that can be assumed by a virtual
> > > > > +# S390 CPU
> > > > > +#
> > > > > +# Since: 8.1
> > > > > +##
> > > > > +{ 'enum': 'CpuS390Entitlement',
> > > > > +  'prefix': 'S390_CPU_ENTITLEMENT',
> > > > > +  'data': [ 'horizontal', 'low', 'medium', 'high' ] }
> > > > You can get rid of the horizontal value now that the entitlement is 
> > > > ignored if the
> > > > polarization is vertical.
> > > 
> > > Right, horizontal is not used, but what would you like?
> > > 
> > > - replace horizontal with 'none' ?
> > > 
> > > - add or substract 1 when we do the conversion between enum string and
> > > value ?
> > Yeah, I would completely drop it because it is a meaningless value
> > and adjust the conversion to the cpu value accordingly.
> > > frankly I prefer to keep horizontal here which is exactly what is given
> > > in the documentation for entitlement = 0
> > Not sure what you mean with this.
> 
> I mean: Extract from the PoP:
> 
> 
> 
> The following values are used:
> PP Meaning
> 0 The one or more CPUs represented by the TLE are
> horizontally polarized.
> 1 The one or more CPUs represented by the TLE are
> vertically polarized. Entitlement is low.
> 2 The one or more CPUs represented by the TLE are
> vertically polarized. Entitlement is medium.
> 3 The one or more CPUs represented by the TLE are
> vertically polarized. Entitlement is high.
> 
> 
> 
> Also I find that using an enum to systematically add/subtract a value is 
> for me weird.

It is, I'd do:

+static s390_topology_id s390_topology_from_cpu(S390CPU *cpu)
+{
+struct S390CcwMachineState *s390ms = S390_CCW_MACHINE(current_machine);
+s390_topology_id topology_id = {0};
+
+topology_id.drawer = cpu->env.drawer_id;
+topology_id.book = cpu->env.book_id;
+topology_id.socket = cpu->env.socket_id;
+topology_id.origin = cpu->env.core_id / 64;
+topology_id.type = S390_TOPOLOGY_CPU_IFL;
+topology_id.dedicated = cpu->env.dedicated;
+
+if (s390ms->vertical_polarization) {
+uint8_t to_polarization[] = {
+[S390_CPU_ENTITLEMENT_LOW] = 1,
+[S390_CPU_ENTITLEMENT_MEDIUM] = 2,
+[S390_CPU_ENTITLEMENT_HIGH] = 3,
+};
+topology_id.entitlement = to_polarization[cpu->env.entitlement];
+}
+
+return topology_id;
+}

You can also use a switch of course.
I'd also rename s390_topology_id.entitlement to polarization.

> 
> so I really prefer to keep "horizontal", "low", "medium", "high" event 
> "horizontal" will never appear.
> 
> A mater of taste, it does not change anything to the functionality or 
> the API.

Well, it does change the API a bit, namely which values mean what,
currently there is a value 0 that you're not supposed to use, that would go 
away.
It also shows up in some meta command to print qapi interfaces.
And dropping it simplifies the implementation IMO --- you don't need
to think about and prevent usage of a nonexistent state.
> 
> 
> > > 
> > > 
> > > > [...]
> > > > 
> > > > > diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
> > > > > index b10a8541ff..57165fa3a0 100644
> > > > > --- a/target/s390x/cpu.c
> > > > > +++ b/target/s390x/cpu.c
> > > > > @@ -37,6 +37,7 @@
> > > > >#ifndef CONFIG_USER_ONLY
> > > > >#include "sysemu/reset.h"
> > > > >#endif
> > > > > +#include "hw/s390x/cpu-topology.h"
> > > > >
> > > > >#define CR0_RESET   0xE0UL
> > > > >#define CR14_RESET  0xC200UL;
> > > > > @@ -259,6 +260,12 @@ static gchar 

Re: [PATCH 0/5] Support both Ethernet interfaces on i.MX6UL and i.MX7

2023-04-18 Thread Peter Maydell
On Tue, 18 Apr 2023 at 16:18, Guenter Roeck  wrote:
> On 4/18/23 07:46, Peter Maydell wrote:
> > I guess I don't understand what the topology is for these specific
> > SoCs, then. If there's only one master that might be connected
> > to multiple PHYs, why does one ethernet device in QEMU need to
> > know about the other one? Are the PHYs connected to just that
> > first ethernet device, or to both? This bit in your cover letter
> > makes it sound like "both ethernet interfaces connect to the same
> > MDIO bus which has both PHYs on it":
> >
>
> Yes, that is exactly how it is, similar to the configuration in the picture
> at prodigytechno.com. I don't recall what I wrote in the cover letter, but
> "Both Ethernet PHYs connect to the same MDIO bus which is connected to one
> of the Ethernet MACs" would be the most accurate description I can think of.

> Each MAC (Ethernet interface, instance of TYPE_IMX_FEC in qemu) has its own
> MDIO bus. Currently QEMU assumes that each PHY is connected to the MDIO bus
> on its associated MAC interface. That is not the case on the emulated boards,
> where all PHYs are connected to a single MDIO bus.

So looking again at that diagram on that website, I think I understand
now: for data transfer to/from the outside world, MAC1 talks only through
PHY1 and MAC2 only through PHY2 (over the links marked "MII/GMII/XGMII"),
but the "control" connection is via MDIO, and on these boards you have to
configure PHY2 by doing the MDIO reads and writes via MAC1, even though
MAC1 has nothing otherwise to do with PHY2 ? (And MAC2 has no devices on
its MDIO bus at all.)

> Userspace, when talking to the Ethernet controllers, knows that the PHY
> of the second Ethernet controller is connected to the MDIO bus on the first
> Ethernet controller. QEMU has to be told about that and otherwise misses that
> MDIO commands sent to the second PHY (on the first Ethernet controller)
> influence the second MAC interface.
>
>  From this exchange I can only assume that my implementation is unacceptable.

Not at all -- I'm just trying to understand what the hardware we're
modelling is doing, so I can figure out what we "ought" in theory
to be doing and whether that's too much pain to do right now...

thanks
-- PMM



Re: [PATCH] io: mark mixed functions that can suspend

2023-04-18 Thread Daniel P . Berrangé
On Thu, Apr 06, 2023 at 12:28:00PM +0200, Paolo Bonzini wrote:
> There should be no paths from a coroutine_fn to aio_poll, however in
> practice coroutine_mixed_fn will call aio_poll in the !qemu_in_coroutine()
> path.  By marking mixed functions, we can track accurately the call paths
> that execute entirely in coroutine context, and find more missing
> coroutine_fn markers.  This results in more accurate checks that
> coroutine code does not end up blocking.
> 
> If the marking were extended transitively to all functions that call
> these ones, static analysis could be done much more efficiently.
> However, this is a start and makes it possible to use vrc's path-based
> searches to find potential bugs where coroutine_fns call blocking functions.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  include/io/channel.h | 78 ++--
>  io/channel.c | 78 ++--
>  2 files changed, 78 insertions(+), 78 deletions(-)

Reviewed-by: Daniel P. Berrangé 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 1/2] migration/calc-dirty-rate: new metrics in sampling mode

2023-04-18 Thread Daniel P . Berrangé
On Tue, Feb 28, 2023 at 04:16:02PM +0300, Andrei Gudkov via wrote:
> * Collect number of all-zero pages
> * Collect vector of number of dirty pages for different time periods
> * Report total number of pages, number of sampled pages and page size
> * Replaced CRC32 with xxHash for performance reasons

I'd suggest that the CRC32 -> xxHash change should be a separate
commit from the newly reported statistics, since they're independant
functional changes.

> 
> Signed-off-by: Andrei Gudkov 
> ---
>  migration/dirtyrate.c | 219 +-
>  migration/dirtyrate.h |  26 -
>  qapi/migration.json   |  25 +
>  3 files changed, 218 insertions(+), 52 deletions(-)

> diff --git a/qapi/migration.json b/qapi/migration.json
> index c84fa10e86..1a1d7cb30a 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1830,6 +1830,25 @@
>  # @mode: mode containing method of calculate dirtyrate includes
>  #'page-sampling' and 'dirty-ring' (Since 6.2)
>  #
> +# @page-size: page size in bytes
> +#
> +# @n-total-pages: total number of VM pages
> +#
> +# @n-sampled-pages: number of sampled pages
> +#
> +# @n-zero-pages: number of observed zero pages among all sampled pages.
> +#Normally all pages are zero when VM starts, but
> +#their number progressively goes down as VM fills more
> +#and more memory with useful data.
> +#Migration of zero pages is optimized: only their headers
> +#are copied but not the (zero) data.
> +#
> +# @periods: array of time periods expressed in milliseconds for which
> +#   dirty-sample measurements are collected
> +#
> +# @n-dirty-pages: number of pages among all sampled pages that were observed
> +# as changed after respective time period
> +#

Each field addition needs a "(Since )" tag with QEMU version

The docs probably ought to be explicit that the size of @periods
array is the same as @n-dirty-pages array.

>  # @vcpu-dirty-rate: dirtyrate for each vcpu if dirty-ring
>  #   mode specified (Since 6.2)
>  #
> @@ -1842,6 +1861,12 @@
> 'calc-time': 'int64',
> 'sample-pages': 'uint64',
> 'mode': 'DirtyRateMeasureMode',
> +   'page-size': 'int64',
> +   '*n-total-pages': 'int64',
> +   '*n-sampled-pages': 'int64',
> +   '*n-zero-pages': 'int64',
> +   '*periods': ['int64'],
> +   '*n-dirty-pages': ['int64'],
> '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
>  
>  ##
> -- 
> 2.30.2
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v2 1/8] target/riscv: Update pmp_get_tlb_size()

2023-04-18 Thread Weiwei Li
PMP entries before the matched PMP entry(including the matched PMP entry)
may overlap partial of the tlb page, which may make different regions in
that page have different permission rights, such as for
PMP0(0x8008~0x800F, R) and PMP1(0x80001000~0x80001FFF, RWX))
write access to 0x8000 will match PMP1. However we cannot cache the tlb
for it since this will make the write access to 0x8008 bypass the check
of PMP0. So we should check all of them and set the tlb size to 1 in this
case.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu_helper.c |  7 ++-
 target/riscv/pmp.c| 35 ++-
 target/riscv/pmp.h|  3 +--
 3 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 433ea529b0..075fc0538a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -703,11 +703,8 @@ static int get_physical_address_pmp(CPURISCVState *env, 
int *prot,
 }
 
 *prot = pmp_priv_to_page_prot(pmp_priv);
-if ((tlb_size != NULL) && pmp_index != MAX_RISCV_PMPS) {
-target_ulong tlb_sa = addr & ~(TARGET_PAGE_SIZE - 1);
-target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
-
-*tlb_size = pmp_get_tlb_size(env, pmp_index, tlb_sa, tlb_ea);
+if (tlb_size != NULL) {
+*tlb_size = pmp_get_tlb_size(env, addr);
 }
 
 return TRANSLATE_SUCCESS;
diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 1f5aca42e8..78bcd969ec 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -601,28 +601,37 @@ target_ulong mseccfg_csr_read(CPURISCVState *env)
 }
 
 /*
- * Calculate the TLB size if the start address or the end address of
+ * Calculate the TLB size if any start address or the end address of
  * PMP entry is presented in the TLB page.
  */
-target_ulong pmp_get_tlb_size(CPURISCVState *env, int pmp_index,
-  target_ulong tlb_sa, target_ulong tlb_ea)
+target_ulong pmp_get_tlb_size(CPURISCVState *env, target_ulong addr)
 {
-target_ulong pmp_sa = env->pmp_state.addr[pmp_index].sa;
-target_ulong pmp_ea = env->pmp_state.addr[pmp_index].ea;
+target_ulong pmp_sa;
+target_ulong pmp_ea;
+target_ulong tlb_sa = addr & ~(TARGET_PAGE_SIZE - 1);
+target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
+int i;
+
+for (i = 0; i < MAX_RISCV_PMPS; i++) {
+pmp_sa = env->pmp_state.addr[i].sa;
+pmp_ea = env->pmp_state.addr[i].ea;
 
-if (pmp_sa <= tlb_sa && pmp_ea >= tlb_ea) {
-return TARGET_PAGE_SIZE;
-} else {
 /*
- * At this point we have a tlb_size that is the smallest possible size
- * That fits within a TARGET_PAGE_SIZE and the PMP region.
- *
- * If the size is less then TARGET_PAGE_SIZE we drop the size to 1.
+ * If any start address or the end address of PMP entry is presented
+ * in the TLB page and cannot override the whole TLB page we drop the
+ * size to 1.
  * This means the result isn't cached in the TLB and is only used for
  * a single translation.
  */
-return 1;
+if (pmp_sa <= tlb_sa && pmp_ea >= tlb_ea) {
+return TARGET_PAGE_SIZE;
+} else if ((pmp_sa >= tlb_sa && pmp_sa <= tlb_ea) ||
+   (pmp_ea >= tlb_sa && pmp_ea <= tlb_ea)) {
+return 1;
+}
 }
+
+return TARGET_PAGE_SIZE;
 }
 
 /*
diff --git a/target/riscv/pmp.h b/target/riscv/pmp.h
index b296ea1fc6..0a7e24750b 100644
--- a/target/riscv/pmp.h
+++ b/target/riscv/pmp.h
@@ -76,8 +76,7 @@ int pmp_hart_has_privs(CPURISCVState *env, target_ulong addr,
target_ulong size, pmp_priv_t privs,
pmp_priv_t *allowed_privs,
target_ulong mode);
-target_ulong pmp_get_tlb_size(CPURISCVState *env, int pmp_index,
-  target_ulong tlb_sa, target_ulong tlb_ea);
+target_ulong pmp_get_tlb_size(CPURISCVState *env, target_ulong addr);
 void pmp_update_rule_addr(CPURISCVState *env, uint32_t pmp_index);
 void pmp_update_rule_nums(CPURISCVState *env);
 uint32_t pmp_get_num_rules(CPURISCVState *env);
-- 
2.25.1




Re: Move vhost-user SET_STATUS 0 after get vring base?

2023-04-18 Thread Michael S. Tsirkin
On Tue, Apr 18, 2023 at 11:18:11AM -0400, Stefan Hajnoczi wrote:
> Hi,
> Cindy's commit ca71db438bdc ("vhost: implement vhost_dev_start method")
> added SET_STATUS calls to vhost_dev_start() and vhost_dev_stop() for all
> vhost backends.
> 
> Eugenio's commit c3716f260bff ("vdpa: move vhost reset after get vring
> base") deferred the SET_STATUS 0 call in vhost_dev_stop() until after
> GET_VRING_BASE for vDPA only. In that commit Eugenio said, "A patch to
> make vhost_user_dev_start more similar to vdpa is desirable, but it can
> be added on top".
> 
> I agree and think it's a good idea to keep the vhost backends in sync
> where possible.
> 
> vhost-user still has the old behavior where QEMU sends SET_STATUS 0
> before GET_VRING_BASE. Most existing vhost-user backends don't implement
> the SET_STATUS message, so I think no one has tripped over this yet.
> 
> Any thoughts on making vhost-user behave like vDPA here?
> 
> Stefan

Wow. Well  SET_STATUS 0 resets the device so yes, I think doing that
before GET_VRING_BASE will lose a state. Donnu how it does not trip
up people, indeed the only idea is if people ignore SET_STATUS.


-- 
MST




[PATCH v4 3/4] serial-mcb: Add serial via MEN chameleon bus

2023-04-18 Thread Johannes Thumshirn
Add MEN z125 UART over MEN Chameleon Bus emulation.

Acked-by: Alistair Francis 
Signed-off-by: Johannes Thumshirn 
---
 hw/char/Kconfig  |   6 +++
 hw/char/meson.build  |   1 +
 hw/char/serial-mcb.c | 115 +++
 3 files changed, 122 insertions(+)
 create mode 100644 hw/char/serial-mcb.c

diff --git a/hw/char/Kconfig b/hw/char/Kconfig
index 6b6cf2fc1d..9e8ebf1d3d 100644
--- a/hw/char/Kconfig
+++ b/hw/char/Kconfig
@@ -71,3 +71,9 @@ config GOLDFISH_TTY
 
 config SHAKTI_UART
 bool
+
+config SERIAL_MCB
+bool
+default y if MCB
+depends on MCB
+select SERIAL
diff --git a/hw/char/meson.build b/hw/char/meson.build
index e02c60dd54..d5893a142d 100644
--- a/hw/char/meson.build
+++ b/hw/char/meson.build
@@ -20,6 +20,7 @@ softmmu_ss.add(when: 'CONFIG_SHAKTI_UART', if_true: 
files('shakti_uart.c'))
 softmmu_ss.add(when: 'CONFIG_VIRTIO_SERIAL', if_true: 
files('virtio-console.c'))
 softmmu_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen_console.c'))
 softmmu_ss.add(when: 'CONFIG_XILINX', if_true: files('xilinx_uartlite.c'))
+softmmu_ss.add(when: 'CONFIG_SERIAL_MCB', if_true: files('serial-mcb.c'))
 
 softmmu_ss.add(when: 'CONFIG_AVR_USART', if_true: files('avr_usart.c'))
 softmmu_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_uart.c'))
diff --git a/hw/char/serial-mcb.c b/hw/char/serial-mcb.c
new file mode 100644
index 00..09f8fec11e
--- /dev/null
+++ b/hw/char/serial-mcb.c
@@ -0,0 +1,115 @@
+/*
+ * QEMU MEN 16z125 UART over MCB emulation
+ *
+ * Copyright (C) 2023 Johannes Thumshirn 
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "hw/char/serial.h"
+#include "hw/mcb/mcb.h"
+#include "hw/irq.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
+#include "migration/vmstate.h"
+
+struct MCBSerialState {
+MCBDevice dev;
+SerialState state;
+};
+
+#define TYPE_MCB_SERIAL "mcb-serial"
+OBJECT_DECLARE_SIMPLE_TYPE(MCBSerialState, MCB_SERIAL)
+
+static void serial_mcb_realize(DeviceState *dev, Error **errp)
+{
+MCBDevice *mdev = MCB_DEVICE(dev);
+MCBSerialState *mss = DO_UPCAST(MCBSerialState, dev, mdev);
+MCBus *bus = MCB_BUS(qdev_get_parent_bus(DEVICE(dev)));
+SerialState *s = >state;
+
+mdev->gdd = mcb_new_chameleon_descriptor(bus, 125, mdev->rev,
+ mdev->var, 0x10);
+if (!mdev->gdd) {
+return;
+}
+
+s->baudbase = 115200;
+if (!qdev_realize(DEVICE(s), NULL, errp)) {
+return;
+}
+
+s->irq = mcb_allocate_irq(>dev);
+memory_region_init_io(>io, OBJECT(mss), _io_ops, s, "serial", 8);
+memory_region_add_subregion(>mmio_region, mdev->gdd->offset, >io);
+}
+
+static void serial_mcb_unrealize(DeviceState *dev)
+{
+MCBDevice *mdev = MCB_DEVICE(dev);
+MCBSerialState *mss = DO_UPCAST(MCBSerialState, dev, mdev);
+SerialState *s = >state;
+
+qdev_unrealize(DEVICE(s));
+qemu_free_irq(s->irq);
+g_free(>gdd);
+}
+
+static const VMStateDescription vmstate_mcb_serial = {
+.name = "mcb-serial",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_MCB_DEVICE(dev, MCBSerialState),
+VMSTATE_STRUCT(state, MCBSerialState, 0, vmstate_serial, SerialState),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static Property serial_mcb_properties[] = {
+DEFINE_PROP_UINT8("rev", MCBSerialState, dev.rev, 0),
+DEFINE_PROP_UINT8("var", MCBSerialState, dev.var, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void serial_mcb_class_initfn(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+MCBDeviceClass *mc = MCB_DEVICE_CLASS(klass);
+
+mc->realize = serial_mcb_realize;
+mc->unrealize = serial_mcb_unrealize;
+
+set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
+dc->desc = "MEN 16z125 UART over MCB";
+dc->vmsd = _mcb_serial;
+device_class_set_props(dc, serial_mcb_properties);
+}
+
+static void serial_mcb_init(Object *o)
+{
+MCBSerialState *mss = MCB_SERIAL(o);
+
+object_initialize_child(o, "serial", >state, TYPE_SERIAL);
+
+qdev_alias_all_properties(DEVICE(>state), o);
+}
+
+static const TypeInfo serial_mcb_info = {
+.name = "mcb-serial",
+.parent = TYPE_MCB_DEVICE,
+.instance_size = sizeof(MCBSerialState),
+.instance_init = serial_mcb_init,
+.class_init = serial_mcb_class_initfn,
+};
+
+static void serial_mcb_register_types(void)
+{
+type_register_static(_mcb_info);
+}
+
+type_init(serial_mcb_register_types);
-- 
2.39.2




Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-04-18 Thread Stefan Hajnoczi
On Tue, Apr 18, 2023 at 10:09:30AM +0200, Eugenio Perez Martin wrote:
> On Mon, Apr 17, 2023 at 9:33 PM Stefan Hajnoczi  wrote:
> >
> > On Mon, 17 Apr 2023 at 15:10, Eugenio Perez Martin  
> > wrote:
> > >
> > > On Mon, Apr 17, 2023 at 5:38 PM Stefan Hajnoczi  
> > > wrote:
> > > >
> > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin wrote:
> > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek wrote:
> > > > > > > So-called "internal" virtio-fs migration refers to transporting 
> > > > > > > the
> > > > > > > back-end's (virtiofsd's) state through qemu's migration stream.  
> > > > > > > To do
> > > > > > > this, we need to be able to transfer virtiofsd's internal state 
> > > > > > > to and
> > > > > > > from virtiofsd.
> > > > > > >
> > > > > > > Because virtiofsd's internal state will not be too large, we 
> > > > > > > believe it
> > > > > > > is best to transfer it as a single binary blob after the streaming
> > > > > > > phase.  Because this method should be useful to other vhost-user
> > > > > > > implementations, too, it is introduced as a general-purpose 
> > > > > > > addition to
> > > > > > > the protocol, not limited to vhost-user-fs.
> > > > > > >
> > > > > > > These are the additions to the protocol:
> > > > > > > - New vhost-user protocol feature 
> > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
> > > > > > >   This feature signals support for transferring state, and is 
> > > > > > > added so
> > > > > > >   that migration can fail early when the back-end has no support.
> > > > > > >
> > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate 
> > > > > > > a pipe
> > > > > > >   over which to transfer the state.  The front-end sends an FD to 
> > > > > > > the
> > > > > > >   back-end into/from which it can write/read its state, and the 
> > > > > > > back-end
> > > > > > >   can decide to either use it, or reply with a different FD for 
> > > > > > > the
> > > > > > >   front-end to override the front-end's choice.
> > > > > > >   The front-end creates a simple pipe to transfer the state, but 
> > > > > > > maybe
> > > > > > >   the back-end already has an FD into/from which it has to 
> > > > > > > write/read
> > > > > > >   its state, in which case it will want to override the simple 
> > > > > > > pipe.
> > > > > > >   Conversely, maybe in the future we find a way to have the 
> > > > > > > front-end
> > > > > > >   get an immediate FD for the migration stream (in some cases), 
> > > > > > > in which
> > > > > > >   case we will want to send this to the back-end instead of 
> > > > > > > creating a
> > > > > > >   pipe.
> > > > > > >   Hence the negotiation: If one side has a better idea than a 
> > > > > > > plain
> > > > > > >   pipe, we will want to use that.
> > > > > > >
> > > > > > > - CHECK_DEVICE_STATE: After the state has been transferred 
> > > > > > > through the
> > > > > > >   pipe (the end indicated by EOF), the front-end invokes this 
> > > > > > > function
> > > > > > >   to verify success.  There is no in-band way (through the pipe) 
> > > > > > > to
> > > > > > >   indicate failure, so we need to check explicitly.
> > > > > > >
> > > > > > > Once the transfer pipe has been established via 
> > > > > > > SET_DEVICE_STATE_FD
> > > > > > > (which includes establishing the direction of transfer and 
> > > > > > > migration
> > > > > > > phase), the sending side writes its data into the pipe, and the 
> > > > > > > reading
> > > > > > > side reads it until it sees an EOF.  Then, the front-end will 
> > > > > > > check for
> > > > > > > success via CHECK_DEVICE_STATE, which on the destination side 
> > > > > > > includes
> > > > > > > checking for integrity (i.e. errors during deserialization).
> > > > > > >
> > > > > > > Suggested-by: Stefan Hajnoczi 
> > > > > > > Signed-off-by: Hanna Czenczek 
> > > > > > > ---
> > > > > > >  include/hw/virtio/vhost-backend.h |  24 +
> > > > > > >  include/hw/virtio/vhost.h |  79 
> > > > > > >  hw/virtio/vhost-user.c| 147 
> > > > > > > ++
> > > > > > >  hw/virtio/vhost.c |  37 
> > > > > > >  4 files changed, 287 insertions(+)
> > > > > > >
> > > > > > > diff --git a/include/hw/virtio/vhost-backend.h 
> > > > > > > b/include/hw/virtio/vhost-backend.h
> > > > > > > index ec3fbae58d..5935b32fe3 100644
> > > > > > > --- a/include/hw/virtio/vhost-backend.h
> > > > > > > +++ b/include/hw/virtio/vhost-backend.h
> > > > > > > @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
> > > > > > >  VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
> > > > > > >  } VhostSetConfigType;
> > > > > > >
> > > > > > > +typedef enum VhostDeviceStateDirection {
> > > > > > > +/* Transfer state from back-end (device) to front-end */
> > > > > > > +VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
> > > > > > > +/* Transfer state 

Re: [PATCH v2 1/1] ui/sdl2: disable SDL_HINT_GRAB_KEYBOARD on Windows

2023-04-18 Thread Bernhard Beschow



Am 18. April 2023 06:56:52 UTC schrieb "Volker Rümelin" :
>Windows sends an extra left control key up/down input event for
>every right alt key up/down input event for keyboards with
>international layout. Since commit 830473455f ("ui/sdl2: fix
>handling of AltGr key on Windows") QEMU uses a Windows low level
>keyboard hook procedure to reliably filter out the special left
>control key and to grab the keyboard on Windows.
>
>The SDL2 version 2.0.16 introduced its own Windows low level
>keyboard hook procedure to grab the keyboard. Windows calls this
>callback before the QEMU keyboard hook procedure. This disables
>the special left control key filter when the keyboard is grabbed.
>
>To fix the problem, disable the SDL2 Windows low level keyboard
>hook procedure.
>
>Reported-by: Bernhard Beschow 
>Signed-off-by: Volker Rümelin 

FWIW:

Tested-by: Bernhard Beschow 

>---
> ui/sdl2.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/ui/sdl2.c b/ui/sdl2.c
>index 00aadfae37..9d703200bf 100644
>--- a/ui/sdl2.c
>+++ b/ui/sdl2.c
>@@ -855,7 +855,10 @@ static void sdl2_display_init(DisplayState *ds, 
>DisplayOptions *o)
> #ifdef SDL_HINT_VIDEO_X11_NET_WM_BYPASS_COMPOSITOR /* only available since 
> SDL 2.0.8 */
> SDL_SetHint(SDL_HINT_VIDEO_X11_NET_WM_BYPASS_COMPOSITOR, "0");
> #endif
>+#ifndef CONFIG_WIN32
>+/* QEMU uses its own low level keyboard hook procecure on Windows */
> SDL_SetHint(SDL_HINT_GRAB_KEYBOARD, "1");
>+#endif
> #ifdef SDL_HINT_ALLOW_ALT_TAB_WHILE_GRABBED
> SDL_SetHint(SDL_HINT_ALLOW_ALT_TAB_WHILE_GRABBED, "0");
> #endif



RE: [PATCH v3] memory: Optimize replay of guest mapping

2023-04-18 Thread Duan, Zhenzhong

>-Original Message-
>From: Peter Xu 
>Sent: Tuesday, April 18, 2023 9:56 PM
>To: Peter Maydell 
>Cc: Duan, Zhenzhong ; qemu-
>de...@nongnu.org; m...@redhat.com; jasow...@redhat.com;
>marcel.apfelb...@gmail.com; pbonz...@redhat.com;
>richard.hender...@linaro.org; edua...@habkost.net; da...@redhat.com;
>phi...@linaro.org
>Subject: Re: [PATCH v3] memory: Optimize replay of guest mapping
>
>On Tue, Apr 18, 2023 at 11:13:57AM +0100, Peter Maydell wrote:
>> On Thu, 13 Apr 2023 at 12:12, Zhenzhong Duan
> wrote:
>> >
>> > On x86, there are two notifiers registered due to vtd-ir memory
>> > region splitting the entire address space. During replay of the
>> > address space for each notifier, the whole address space is scanned
>> > which is unnecessary. We only need to scan the space belong to
>> > notifier monitored space.
>> >
>> > While on x86 IOMMU memory region spans over entire address space,
>> > but on some other platforms(e.g. arm mps3-an547), IOMMU memory
>> > region is only a window in the whole address space. user could
>> > register a notifier with arbitrary scope beyond IOMMU memory region.
>> > Though in current implementation replay is only triggered by VFIO
>> > and dirty page sync with notifiers derived from memory region
>> > section, but this isn't guaranteed in the future.
>> >
>> > So, we replay the intersection part of IOMMU memory region and IOMMU
>> > notifier in memory_region_iommu_replay().
>> >
>> > Signed-off-by: Zhenzhong Duan 
>> > ---
>> > v3: Fix assert failure on mps3-an547
>> > v2: Add an assert per Peter
>> > Tested on x86 with a net card passed to guest(kvm/tcg), ping/ssh pass.
>> > Also did simple bootup test with mps3-an547
>> >
>> >  hw/i386/intel_iommu.c | 2 +-
>> >  softmmu/memory.c  | 5 +++--
>> >  2 files changed, 4 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
>> > a62896759c78..faade7def867 100644
>> > --- a/hw/i386/intel_iommu.c
>> > +++ b/hw/i386/intel_iommu.c
>> > @@ -3850,7 +3850,7 @@ static void
>vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
>> >  .domain_id = vtd_get_domain_id(s, , vtd_as->pasid),
>> >  };
>> >
>> > -vtd_page_walk(s, , 0, ~0ULL, , vtd_as->pasid);
>> > +vtd_page_walk(s, , n->start, n->end, ,
>> > + vtd_as->pasid);
>> >  }
>> >  } else {
>> >  trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
>> > diff --git a/softmmu/memory.c b/softmmu/memory.c index
>> > b1a6cae6f583..f7af691991de 100644
>> > --- a/softmmu/memory.c
>> > +++ b/softmmu/memory.c
>> > @@ -1925,7 +1925,7 @@ void
>> > memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr,
>IOMMUNotifier *n)  {
>> >  MemoryRegion *mr = MEMORY_REGION(iommu_mr);
>> >  IOMMUMemoryRegionClass *imrc =
>IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
>> > -hwaddr addr, granularity;
>> > +hwaddr addr, end, granularity;
>> >  IOMMUTLBEntry iotlb;
>> >
>> >  /* If the IOMMU has its own replay callback, override */ @@
>> > -1935,8 +1935,9 @@ void
>memory_region_iommu_replay(IOMMUMemoryRegion *iommu_mr,
>IOMMUNotifier *n)
>> >  }
>> >
>> >  granularity = memory_region_iommu_get_min_page_size(iommu_mr);
>> > +end = MIN(n->end, memory_region_size(mr));
>> >
>> > -for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
>> > +for (addr = n->start; addr < end; addr += granularity) {
>> >  iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n-
>>iommu_idx);
>> >  if (iotlb.perm != IOMMU_NONE) {
>> >  n->notify(n, );
>>
>>
>> The documentation for the replay method of IOMMUMemoryRegionClass
>> says:
>>  * The default implementation of memory_region_iommu_replay() is to
>>  * call the IOMMU translate method for every page in the address space
>>  * with flag == IOMMU_NONE and then call the notifier if translate
>>  * returns a valid mapping. If this method is implemented then it
>>  * overrides the default behaviour, and must provide the full semantics
>>  * of memory_region_iommu_replay(), by calling @notifier for every
>>  * translation present in the IOMMU.
>>
>> This commit changes the default implementation so it's no longer doing
>> this for every page in the address space. If the change is correct, we
>> should update the doc comment too.
>>
>> Oddly, the doc comment for memory_region_iommu_replay() itself doesn't
>> very clearly state what its semantics are; it could probably be
>> improved.
>>
>> Anyway, this change is OK for the TCG use of iommu notifiers, because
>> that doesn't care about replay.
>
>Since the notifier contains the range information I'd say the change shouldn't
>affect any caller but only a pure performance difference.  Indeed it'll be 
>nicer
>the documentation can be updated too.  Thanks,
>
>--
>Peter Xu
Thanks Peter Maydell and Peter Xu's comments, will add doc update.
May I ask if it's preferred to add doc update to 

[PATCH] hw/misc/ivshmem: Use 32-bit addressing for the memory BAR

2023-04-18 Thread Geoffrey McRae
Since OVMF 202211 the bios maps BAR2 to an upper address which has the
undesirable effect of making it impossible to map the memory under Linux
due to it exceeding the maximum permissible range for hotplug memory
(see `mhp_get_pluggable_range` in `mm/memory_hotplug.c`). This patch
resolves this by configuring the BAR as 32-bit.

Signed-off-by: Geoffrey McRae 
---
 hw/misc/ivshmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index d66d912172..2f8f7e2030 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -913,7 +913,7 @@ static void ivshmem_common_realize(PCIDevice *dev, Error 
**errp)
 pci_register_bar(PCI_DEVICE(s), 2,
  PCI_BASE_ADDRESS_SPACE_MEMORY |
  PCI_BASE_ADDRESS_MEM_PREFETCH |
- PCI_BASE_ADDRESS_MEM_TYPE_64,
+ PCI_BASE_ADDRESS_MEM_TYPE_32,
  s->ivshmem_bar2);
 }
 
-- 
2.39.2




Re: [PATCH v2 6/8] accel/tcg: Uncache the host address for instruction fetch when tlb size < 1

2023-04-18 Thread Richard Henderson

On 4/18/23 16:06, Weiwei Li wrote:

When PMP entry overlap part of the page, we'll set the tlb_size to 1, which
will make the address in tlb entry set with TLB_INVALID_MASK, and the next
access will again go through tlb_fill.However, this way will not work in
tb_gen_code() => get_page_addr_code_hostp(): the TLB host address will be
cached, and the following instructions can use this host address directly
which may lead to the bypass of PMP related check.

Signed-off-by: Weiwei Li
Signed-off-by: Junqiang Wang
---
  accel/tcg/cputlb.c | 5 +
  1 file changed, 5 insertions(+)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 1/2] i386/acpi: fix inconsistent QEMU/OVMF device paths

2023-04-18 Thread zhangying (AZ)

> On Tue, 18 Apr 2023 09:06:30 +
> "zhangying (AZ)" via  wrote:
> 
> > > On 30.07.20 17:58, Michael S. Tsirkin wrote:
> > > > macOS uses ACPI UIDs to build the DevicePath for NVRAM boot
> > > > options, while OVMF firmware gets them via an internal channel through
> QEMU.
> > > > Due to a bug in QEMU ACPI currently UEFI firmware and ACPI have
> > > > different values, and this makes the underlying operating system
> > > > unable to report its boot option.
> > > >
> > > > The particular node in question is the primary PciRoot (PCI0 in
> > > > ACPI), which for some reason gets assigned 1 in ACPI UID and 0 in
> > > > the DevicePath. This is due to the _UID assigned to it by
> > > > build_dsdt in hw/i386/acpi-build.c Which does not correspond to
> > > > the primary PCI identifier given by pcibus_num in hw/pci/pci.c
> > > >
> > > > Reference with the device paths, OVMF startup logs, and ACPI table
> > > > dumps (SysReport):
> > > > https://github.com/acidanthera/bugtracker/issues/1050
> > > >
> > > > In UEFI v2.8, section "10.4.2 Rules with ACPI _HID and _UID" ends
> > > > with the paragraph,
> > > >
> > > > Root PCI bridges will use the plug and play ID of PNP0A03, This will
> > > > be stored in the ACPI Device Path _HID field, or in the Expanded
> > > > ACPI Device Path _CID field to match the ACPI name space. The _UID
> > > > in the ACPI Device Path structure must match the _UID in the ACPI
> > > > name space.
> > > >
> > > > (See especially the last sentence.)
> > > >
> > > > Considering *extra* root bridges / root buses (with bus number >
> > > > 0), QEMU's ACPI generator actually does the right thing; since
> > > > QEMU commit
> > > > c96d9286a6d7 ("i386/acpi-build: more traditional _UID and _HID for
> > > > PXB root buses", 2015-06-11).
> > > >
> > > > However, the _UID values for root bridge zero (on both i440fx and
> > > > q35) have always been "wrong" (from UEFI perspective), going back
> > > > in QEMU to commit 74523b850189 ("i386: add ACPI table files from
> > > > seabios", 2013-10-14).
> > > >
> > > > Even in SeaBIOS, these _UID values have always been 1; see commit
> > > > a4d357638c57 ("Port rombios32 code from bochs-bios.", 2008-03-08)
> > > > for i440fx, and commit ecbe3fd61511 ("seabios: q35: add dsdt",
> > > > 2012-12-01) for q35.
> > > >
> > > > Suggested-by: Laszlo Ersek 
> > > > Tested-by: vit9696 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >  hw/i386/acpi-build.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
> > > > b7bc2a..7a5a8b3521 100644
> > > > --- a/hw/i386/acpi-build.c
> > > > +++ b/hw/i386/acpi-build.c
> > > > @@ -1497,7 +1497,7 @@ build_dsdt(GArray *table_data, BIOSLinker
> *linker,
> > > >  dev = aml_device("PCI0");
> > > >  aml_append(dev, aml_name_decl("_HID",
> > > aml_eisaid("PNP0A03")));
> > > >  aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> > > > -aml_append(dev, aml_name_decl("_UID", aml_int(1)));
> > > > +aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> > > >  aml_append(sb_scope, dev);
> > > >  aml_append(dsdt, sb_scope);
> > > >
> > > > @@ -1512,7 +1512,7 @@ build_dsdt(GArray *table_data, BIOSLinker
> *linker,
> > > >  aml_append(dev, aml_name_decl("_HID",
> > > aml_eisaid("PNP0A08")));
> > > >  aml_append(dev, aml_name_decl("_CID",
> > > aml_eisaid("PNP0A03")));
> > > >  aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> > > > -aml_append(dev, aml_name_decl("_UID", aml_int(1)));
> > > > +aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> > > >  aml_append(dev, build_q35_osc_method());
> > > >  aml_append(sb_scope, dev);
> > > >  aml_append(dsdt, sb_scope);
> > > >
> > >
> > > This "breaks" Windows guests created/installed before this change in
> > > the sense of Windows gets confused and declares that most of the
> > > devices changed and thus it has new entries for them in the device
> > > manager where settings of the old one do not apply anymore.
> > >
> > > We were made aware of this by our users when making QEMU 5.2.0
> > > available on a more used repository of us. Users complained that
> > > their static network configuration got thrown out in Windows 2016 or
> > > 2019 server VMs, and Windows tried to use DHCP (which was not
> > > available in their environments) and thus their Windows VMs had no network
> connectivity at all anymore.
> > >
> > > It's currently not yet quite 100% clear to me with what QEMU version
> > > the Windows VM must be installed with, from reading the patch I have
> > > to believe it must be before that, but we got mixed reports and a
> > > colleague could not replicate it from upgrade of 4.0 to 5.2 (I did
> > > /not/ confirm that one). Anyway, just writing this all to avoid people 
> > > seeing
> different results and brushing this off.
> > >
> > > So 

[PATCH v3 0/7] target/riscv: Fix PMP related problem

2023-04-18 Thread Weiwei Li
This patchset tries to fix the PMP bypass problem issue 
https://gitlab.com/qemu-project/qemu/-/issues/1542:

TLB will be cached if the matched PMP entry cover the whole page.  However PMP 
entries with higher priority may cover part of the page (but not match the 
access address), which means different regions in this page may have different 
permission rights. So it also cannot be cached (patch 1).

Writing to pmpaddr didn't trigger tlb flush (patch 3). 

We set the tlb_size to 1 to make the TLB_INVALID_MASK set, and and the next 
access will again go through tlb_fill. However, this way will not work in 
tb_gen_code() => get_page_addr_code_hostp(): the TLB host address will be 
cached, and the following instructions can use this host address directly which 
may lead to the bypass of PMP related check (patch 6).

The port is available here:
https://github.com/plctlab/plct-qemu/tree/plct-pmp-fix-v3

v2:

Update commit message for patch 1

Add default tlb_size when pmp is diabled or there is no rules and only get the 
tlb size when translation success in patch 2

Update get_page_addr_code_hostp instead of probe_access_internal to fix the 
cached host address for instruction fetch in patch 6

Add patch 7 to make the short up really work in pmp_hart_has_privs

Add patch 8 to use pmp_update_rule_addr() and pmp_update_rule_nums() separately

v3:

Ignore disabled PMP entry in pmp_get_tlb_size() in Patch 1

Drop Patch 5, since tb jmp cache have been flushed in tlb_flush, so flush tb 
seems unnecessary.

Fix commit message problems in Patch 8 (Patch 7 in new patchset)

Weiwei Li (7):
  target/riscv: Update pmp_get_tlb_size()
  target/riscv: Move pmp_get_tlb_size apart from
get_physical_address_pmp
  target/riscv: Flush TLB when pmpaddr is updated
  target/riscv: Flush TLB only when pmpcfg/pmpaddr really changes
  accel/tcg: Uncache the host address for instruction fetch when tlb
size < 1
  target/riscv: Make the short cut really work in pmp_hart_has_privs
  target/riscv: Separate pmp_update_rule() in pmpcfg_csr_write

 accel/tcg/cputlb.c|   5 +
 target/riscv/cpu_helper.c |  24 +--
 target/riscv/pmp.c| 318 --
 target/riscv/pmp.h|   3 +-
 4 files changed, 183 insertions(+), 167 deletions(-)

-- 
2.25.1




[PATCH v3 2/7] target/riscv: Move pmp_get_tlb_size apart from get_physical_address_pmp

2023-04-18 Thread Weiwei Li
pmp_get_tlb_size can be separated from get_physical_address_pmp and is only
needed when ret == TRANSLATE_SUCCESS.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu_helper.c | 21 +++--
 target/riscv/pmp.c|  4 
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 075fc0538a..ea08ca9fbb 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -676,14 +676,11 @@ void riscv_cpu_set_mode(CPURISCVState *env, target_ulong 
newpriv)
  *
  * @env: CPURISCVState
  * @prot: The returned protection attributes
- * @tlb_size: TLB page size containing addr. It could be modified after PMP
- *permission checking. NULL if not set TLB page for addr.
  * @addr: The physical address to be checked permission
  * @access_type: The type of MMU access
  * @mode: Indicates current privilege level.
  */
-static int get_physical_address_pmp(CPURISCVState *env, int *prot,
-target_ulong *tlb_size, hwaddr addr,
+static int get_physical_address_pmp(CPURISCVState *env, int *prot, hwaddr addr,
 int size, MMUAccessType access_type,
 int mode)
 {
@@ -703,9 +700,6 @@ static int get_physical_address_pmp(CPURISCVState *env, int 
*prot,
 }
 
 *prot = pmp_priv_to_page_prot(pmp_priv);
-if (tlb_size != NULL) {
-*tlb_size = pmp_get_tlb_size(env, addr);
-}
 
 return TRANSLATE_SUCCESS;
 }
@@ -905,7 +899,7 @@ restart:
 }
 
 int pmp_prot;
-int pmp_ret = get_physical_address_pmp(env, _prot, NULL, pte_addr,
+int pmp_ret = get_physical_address_pmp(env, _prot, pte_addr,
sizeof(target_ulong),
MMU_DATA_LOAD, PRV_S);
 if (pmp_ret != TRANSLATE_SUCCESS) {
@@ -1300,13 +1294,12 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, 
int size,
 prot &= prot2;
 
 if (ret == TRANSLATE_SUCCESS) {
-ret = get_physical_address_pmp(env, _pmp, _size, pa,
+ret = get_physical_address_pmp(env, _pmp, pa,
size, access_type, mode);
 
 qemu_log_mask(CPU_LOG_MMU,
   "%s PMP address=" HWADDR_FMT_plx " ret %d prot"
-  " %d tlb_size " TARGET_FMT_lu "\n",
-  __func__, pa, ret, prot_pmp, tlb_size);
+  " %d\n", __func__, pa, ret, prot_pmp);
 
 prot &= prot_pmp;
 }
@@ -1333,13 +1326,12 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, 
int size,
   __func__, address, ret, pa, prot);
 
 if (ret == TRANSLATE_SUCCESS) {
-ret = get_physical_address_pmp(env, _pmp, _size, pa,
+ret = get_physical_address_pmp(env, _pmp, pa,
size, access_type, mode);
 
 qemu_log_mask(CPU_LOG_MMU,
   "%s PMP address=" HWADDR_FMT_plx " ret %d prot"
-  " %d tlb_size " TARGET_FMT_lu "\n",
-  __func__, pa, ret, prot_pmp, tlb_size);
+  " %d\n", __func__, pa, ret, prot_pmp);
 
 prot &= prot_pmp;
 }
@@ -1350,6 +1342,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 }
 
 if (ret == TRANSLATE_SUCCESS) {
+tlb_size = pmp_get_tlb_size(env, pa);
 tlb_set_page(cs, address & ~(tlb_size - 1), pa & ~(tlb_size - 1),
  prot, mmu_idx, tlb_size);
 return true;
diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 22f3b3f217..d1ef9457ea 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -612,6 +612,10 @@ target_ulong pmp_get_tlb_size(CPURISCVState *env, 
target_ulong addr)
 target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
 int i;
 
+if (!riscv_cpu_cfg(env)->pmp || !pmp_get_num_rules(env)) {
+return TARGET_PAGE_SIZE;
+}
+
 for (i = 0; i < MAX_RISCV_PMPS; i++) {
 if (pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg) == PMP_AMATCH_OFF) {
 continue;
-- 
2.25.1




[PATCH v3 7/7] target/riscv: Separate pmp_update_rule() in pmpcfg_csr_write

2023-04-18 Thread Weiwei Li
Use pmp_update_rule_addr() and pmp_update_rule_nums() separately to
update rule nums only once for each pmpcfg_csr_write. Then we can also
move tlb_flush into pmp_update_rule_nums().

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/pmp.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 755ed2b963..7d825c1746 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -121,7 +121,7 @@ static bool pmp_write_cfg(CPURISCVState *env, uint32_t 
pmp_index, uint8_t val)
 qemu_log_mask(LOG_GUEST_ERROR, "ignoring pmpcfg write - locked\n");
 } else if (env->pmp_state.pmp[pmp_index].cfg_reg != val) {
 env->pmp_state.pmp[pmp_index].cfg_reg = val;
-pmp_update_rule(env, pmp_index);
+pmp_update_rule_addr(env, pmp_index);
 return true;
 }
 } else {
@@ -207,6 +207,8 @@ void pmp_update_rule_nums(CPURISCVState *env)
 env->pmp_state.num_rules++;
 }
 }
+
+tlb_flush(env_cpu(env));
 }
 
 /*
@@ -486,7 +488,7 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 
 /* If PMP permission of any addr has been changed, flush TLB pages. */
 if (modified) {
-tlb_flush(env_cpu(env));
+pmp_update_rule_nums(env);
 }
 }
 
@@ -539,7 +541,6 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 if (env->pmp_state.pmp[addr_index].addr_reg != val) {
 env->pmp_state.pmp[addr_index].addr_reg = val;
 pmp_update_rule(env, addr_index);
-tlb_flush(env_cpu(env));
 }
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
-- 
2.25.1




[PATCH v3 6/7] target/riscv: Make the short cut really work in pmp_hart_has_privs

2023-04-18 Thread Weiwei Li
We needn't check the PMP entries if there is no PMP rules.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/pmp.c | 251 ++---
 1 file changed, 123 insertions(+), 128 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 7feaddd7eb..755ed2b963 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -314,149 +314,144 @@ int pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
 target_ulong e = 0;
 
 /* Short cut if no rules */
-if (0 == pmp_get_num_rules(env)) {
-if (pmp_hart_has_privs_default(env, addr, size, privs,
-   allowed_privs, mode)) {
-ret = MAX_RISCV_PMPS;
-}
-}
-
-if (size == 0) {
-if (riscv_cpu_cfg(env)->mmu) {
-/*
- * If size is unknown (0), assume that all bytes
- * from addr to the end of the page will be accessed.
- */
-pmp_size = -(addr | TARGET_PAGE_MASK);
+if (pmp_get_num_rules(env) != 0) {
+if (size == 0) {
+if (riscv_cpu_cfg(env)->mmu) {
+/*
+ * If size is unknown (0), assume that all bytes
+ * from addr to the end of the page will be accessed.
+ */
+pmp_size = -(addr | TARGET_PAGE_MASK);
+} else {
+pmp_size = sizeof(target_ulong);
+}
 } else {
-pmp_size = sizeof(target_ulong);
-}
-} else {
-pmp_size = size;
-}
-
-/*
- * 1.10 draft priv spec states there is an implicit order
- * from low to high
- */
-for (i = 0; i < MAX_RISCV_PMPS; i++) {
-s = pmp_is_in_range(env, i, addr);
-e = pmp_is_in_range(env, i, addr + pmp_size - 1);
-
-/* partially inside */
-if ((s + e) == 1) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "pmp violation - access is partially inside\n");
-ret = -1;
-break;
+pmp_size = size;
 }
 
-/* fully inside */
-const uint8_t a_field =
-pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg);
-
 /*
- * Convert the PMP permissions to match the truth table in the
- * ePMP spec.
+ * 1.10 draft priv spec states there is an implicit order
+ * from low to high
  */
-const uint8_t epmp_operation =
-((env->pmp_state.pmp[i].cfg_reg & PMP_LOCK) >> 4) |
-((env->pmp_state.pmp[i].cfg_reg & PMP_READ) << 2) |
-(env->pmp_state.pmp[i].cfg_reg & PMP_WRITE) |
-((env->pmp_state.pmp[i].cfg_reg & PMP_EXEC) >> 2);
+for (i = 0; i < MAX_RISCV_PMPS; i++) {
+s = pmp_is_in_range(env, i, addr);
+e = pmp_is_in_range(env, i, addr + pmp_size - 1);
+
+/* partially inside */
+if ((s + e) == 1) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "pmp violation - access is partially inside\n");
+ret = -1;
+break;
+}
+
+/* fully inside */
+const uint8_t a_field =
+pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg);
 
-if (((s + e) == 2) && (PMP_AMATCH_OFF != a_field)) {
 /*
- * If the PMP entry is not off and the address is in range,
- * do the priv check
+ * Convert the PMP permissions to match the truth table in the
+ * ePMP spec.
  */
-if (!MSECCFG_MML_ISSET(env)) {
-/*
- * If mseccfg.MML Bit is not set, do pmp priv check
- * This will always apply to regular PMP.
- */
-*allowed_privs = PMP_READ | PMP_WRITE | PMP_EXEC;
-if ((mode != PRV_M) || pmp_is_locked(env, i)) {
-*allowed_privs &= env->pmp_state.pmp[i].cfg_reg;
-}
-} else {
+const uint8_t epmp_operation =
+((env->pmp_state.pmp[i].cfg_reg & PMP_LOCK) >> 4) |
+((env->pmp_state.pmp[i].cfg_reg & PMP_READ) << 2) |
+(env->pmp_state.pmp[i].cfg_reg & PMP_WRITE) |
+((env->pmp_state.pmp[i].cfg_reg & PMP_EXEC) >> 2);
+
+if (((s + e) == 2) && (PMP_AMATCH_OFF != a_field)) {
 /*
- * If mseccfg.MML Bit set, do the enhanced pmp priv check
+ * If the PMP entry is not off and the address is in range,
+ * do the priv check
  */
-if (mode == PRV_M) {
-switch (epmp_operation) {
-case 0:
-case 1:
-case 4:
-case 5:
-case 6:
-case 7:
-case 8:
-

[PATCH v3 5/7] accel/tcg: Uncache the host address for instruction fetch when tlb size < 1

2023-04-18 Thread Weiwei Li
When PMP entry overlap part of the page, we'll set the tlb_size to 1, which
will make the address in tlb entry set with TLB_INVALID_MASK, and the next
access will again go through tlb_fill.However, this way will not work in
tb_gen_code() => get_page_addr_code_hostp(): the TLB host address will be
cached, and the following instructions can use this host address directly
which may lead to the bypass of PMP related check.
Fix https://gitlab.com/qemu-project/qemu/-/issues/1542.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: LIU Zhiwei 
---
 accel/tcg/cputlb.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index e984a98dc4..efa0cb67c9 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1696,6 +1696,11 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState 
*env, target_ulong addr,
 if (p == NULL) {
 return -1;
 }
+
+if (full->lg_page_size < TARGET_PAGE_BITS) {
+return -1;
+}
+
 if (hostp) {
 *hostp = p;
 }
-- 
2.25.1




Re: [PATCH v2 2/4] hw/acpi: arm: bump MADT to revision 5

2023-04-18 Thread Michael S. Tsirkin
On Tue, Apr 18, 2023 at 12:52:17PM -0400, Eric DeVolder wrote:
> Currently ARM QEMU generates, and reports, MADT revision 4. ACPI 6.3
> introduces MADT revision 5.
> 
> For MADT revision 5, the GICC structure adds an SPE Overflow Interrupt
> field. This new 2-byte field is created from the existing 3-byte
> Reserved field. The spec indicates if the SPE overflow interrupt is
> not supported, to zero the field.
> 
> Signed-off-by: Eric DeVolder 

So why do we bother changing this? I'd rather defer until
we actually intend to fill this field.

> ---
>  hw/arm/virt-acpi-build.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 4156111d49..23268dd981 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -705,7 +705,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
> VirtMachineState *vms)
>  int i;
>  VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
>  const MemMapEntry *memmap = vms->memmap;
> -AcpiTable table = { .sig = "APIC", .rev = 4, .oem_id = vms->oem_id,
> +AcpiTable table = { .sig = "APIC", .rev = 5, .oem_id = vms->oem_id,
>  .oem_table_id = vms->oem_table_id };
>  
>  acpi_table_begin(, table_data);
> @@ -763,7 +763,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
> VirtMachineState *vms)
>  /* Processor Power Efficiency Class */
>  build_append_int_noprefix(table_data, 0, 1);
>  /* Reserved */
> -build_append_int_noprefix(table_data, 0, 3);
> +build_append_int_noprefix(table_data, 0, 1);
> +/* SPE overflow Interrupt */
> +build_append_int_noprefix(table_data, 0, 2);
>  }
>  
>  if (vms->gic_version != VIRT_GIC_VERSION_2) {
> -- 
> 2.31.1




[Bug 1769053] Re: Ability to control phys-bits through libvirt

2023-04-18 Thread Christian Ehrhardt 
While the bugzilla case wasn't updated this landed in v8.7.0 via a series around
https://gitlab.com/libvirt/libvirt/-/commit/e6c29f09e5b75d7a8d79ae670407060446282c78

v9.0.0 of libvirt is in Ubuntu Lunar, due to that - from now on - one
can control the physical bit settings in a defined way through libvirt.

See maxphysaddr in [1] for how to use that.

Mid term Ubuntu will consider no more adding further variants of the
workaround, that was providing machine types with the -hpb suffix to
allow larger guests.

[1]: https://libvirt.org/formatdomain.html#cpu-model-and-topology

** Changed in: libvirt (Ubuntu)
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1769053

Title:
  Ability to control phys-bits through libvirt

Status in libvirt:
  Confirmed
Status in QEMU:
  Invalid
Status in libvirt package in Ubuntu:
  Fix Released
Status in qemu package in Ubuntu:
  Invalid

Bug description:
  Attempting to start a KVM guest with more than 1TB of RAM fails.

  It looks like we might need some extra patches:
  https://lists.gnu.org/archive/html/qemu-discuss/2017-12/msg5.html

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: qemu-system-x86 1:2.11+dfsg-1ubuntu7
  ProcVersionSignature: Ubuntu 4.15.0-20.21-generic 4.15.17
  Uname: Linux 4.15.0-20-generic x86_64
  ApportVersion: 2.20.9-0ubuntu7
  Architecture: amd64
  CurrentDesktop: Unity:Unity7:ubuntu
  Date: Fri May  4 16:21:14 2018
  InstallationDate: Installed on 2017-04-05 (393 days ago)
  InstallationMedia: Ubuntu 16.10 "Yakkety Yak" - Release amd64 (20161012.2)
  MachineType: Dell Inc. XPS 13 9360
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-20-generic 
root=/dev/mapper/ubuntu--vg-root ro quiet splash transparent_hugepage=madvise 
vt.handoff=1
  SourcePackage: qemu
  UpgradeStatus: Upgraded to bionic on 2018-04-30 (3 days ago)
  dmi.bios.date: 02/26/2018
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.6.2
  dmi.board.name: 0PF86Y
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A00
  dmi.chassis.type: 9
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr2.6.2:bd02/26/2018:svnDellInc.:pnXPS139360:pvr:rvnDellInc.:rn0PF86Y:rvrA00:cvnDellInc.:ct9:cvr:
  dmi.product.family: XPS
  dmi.product.name: XPS 13 9360
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/libvirt/+bug/1769053/+subscriptions




[PATCH v3 3/7] target/riscv: Flush TLB when pmpaddr is updated

2023-04-18 Thread Weiwei Li
TLB should be flushed not only for pmpcfg csr changes, but also for
pmpaddr csr changes.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Alistair Francis 
---
 target/riscv/pmp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index d1ef9457ea..bcd190d3a3 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -537,6 +537,7 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 if (!pmp_is_locked(env, addr_index)) {
 env->pmp_state.pmp[addr_index].addr_reg = val;
 pmp_update_rule(env, addr_index);
+tlb_flush(env_cpu(env));
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
   "ignoring pmpaddr write - locked\n");
-- 
2.25.1




[PATCH v3 1/7] target/riscv: Update pmp_get_tlb_size()

2023-04-18 Thread Weiwei Li
PMP entries before the matched PMP entry(including the matched PMP entry)
may overlap partial of the tlb page, which may make different regions in
that page have different permission rights, such as for
PMP0(0x8008~0x800F, R) and PMP1(0x80001000~0x80001FFF, RWX))
write access to 0x8000 will match PMP1. However we cannot cache the tlb
for it since this will make the write access to 0x8008 bypass the check
of PMP0. So we should check all of them and set the tlb size to 1 in this
case.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu_helper.c |  7 ++-
 target/riscv/pmp.c| 39 ++-
 target/riscv/pmp.h|  3 +--
 3 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 433ea529b0..075fc0538a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -703,11 +703,8 @@ static int get_physical_address_pmp(CPURISCVState *env, 
int *prot,
 }
 
 *prot = pmp_priv_to_page_prot(pmp_priv);
-if ((tlb_size != NULL) && pmp_index != MAX_RISCV_PMPS) {
-target_ulong tlb_sa = addr & ~(TARGET_PAGE_SIZE - 1);
-target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
-
-*tlb_size = pmp_get_tlb_size(env, pmp_index, tlb_sa, tlb_ea);
+if (tlb_size != NULL) {
+*tlb_size = pmp_get_tlb_size(env, addr);
 }
 
 return TRANSLATE_SUCCESS;
diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 1f5aca42e8..22f3b3f217 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -601,28 +601,41 @@ target_ulong mseccfg_csr_read(CPURISCVState *env)
 }
 
 /*
- * Calculate the TLB size if the start address or the end address of
+ * Calculate the TLB size if any start address or the end address of
  * PMP entry is presented in the TLB page.
  */
-target_ulong pmp_get_tlb_size(CPURISCVState *env, int pmp_index,
-  target_ulong tlb_sa, target_ulong tlb_ea)
+target_ulong pmp_get_tlb_size(CPURISCVState *env, target_ulong addr)
 {
-target_ulong pmp_sa = env->pmp_state.addr[pmp_index].sa;
-target_ulong pmp_ea = env->pmp_state.addr[pmp_index].ea;
+target_ulong pmp_sa;
+target_ulong pmp_ea;
+target_ulong tlb_sa = addr & ~(TARGET_PAGE_SIZE - 1);
+target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
+int i;
+
+for (i = 0; i < MAX_RISCV_PMPS; i++) {
+if (pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg) == PMP_AMATCH_OFF) {
+continue;
+}
+
+pmp_sa = env->pmp_state.addr[i].sa;
+pmp_ea = env->pmp_state.addr[i].ea;
 
-if (pmp_sa <= tlb_sa && pmp_ea >= tlb_ea) {
-return TARGET_PAGE_SIZE;
-} else {
 /*
- * At this point we have a tlb_size that is the smallest possible size
- * That fits within a TARGET_PAGE_SIZE and the PMP region.
- *
- * If the size is less then TARGET_PAGE_SIZE we drop the size to 1.
+ * If any start address or the end address of PMP entry is presented
+ * in the TLB page and cannot override the whole TLB page we drop the
+ * size to 1.
  * This means the result isn't cached in the TLB and is only used for
  * a single translation.
  */
-return 1;
+if (pmp_sa <= tlb_sa && pmp_ea >= tlb_ea) {
+return TARGET_PAGE_SIZE;
+} else if ((pmp_sa >= tlb_sa && pmp_sa <= tlb_ea) ||
+   (pmp_ea >= tlb_sa && pmp_ea <= tlb_ea)) {
+return 1;
+}
 }
+
+return TARGET_PAGE_SIZE;
 }
 
 /*
diff --git a/target/riscv/pmp.h b/target/riscv/pmp.h
index b296ea1fc6..0a7e24750b 100644
--- a/target/riscv/pmp.h
+++ b/target/riscv/pmp.h
@@ -76,8 +76,7 @@ int pmp_hart_has_privs(CPURISCVState *env, target_ulong addr,
target_ulong size, pmp_priv_t privs,
pmp_priv_t *allowed_privs,
target_ulong mode);
-target_ulong pmp_get_tlb_size(CPURISCVState *env, int pmp_index,
-  target_ulong tlb_sa, target_ulong tlb_ea);
+target_ulong pmp_get_tlb_size(CPURISCVState *env, target_ulong addr);
 void pmp_update_rule_addr(CPURISCVState *env, uint32_t pmp_index);
 void pmp_update_rule_nums(CPURISCVState *env);
 uint32_t pmp_get_num_rules(CPURISCVState *env);
-- 
2.25.1




  1   2   3   >