Re: [PATCH v2 2/3] hw/ssi: Add Nuvoton PSPI Module

2023-02-07 Thread Philippe Mathieu-Daudé

On 7/2/23 20:45, Hao Wu wrote:

Nuvoton's PSPI is a general purpose SPI module which enables
connections to SPI-based peripheral devices.

Signed-off-by: Hao Wu 
Reviewed-by: Chris Rauer 
Reviewed-by: Philippe Mathieu-Daude 
---
  MAINTAINERS|   6 +-
  hw/ssi/meson.build |   2 +-
  hw/ssi/npcm_pspi.c | 221 +
  hw/ssi/trace-events|   5 +
  include/hw/ssi/npcm_pspi.h |  53 +
  5 files changed, 283 insertions(+), 4 deletions(-)
  create mode 100644 hw/ssi/npcm_pspi.c
  create mode 100644 include/hw/ssi/npcm_pspi.h




+static const MemoryRegionOps npcm_pspi_ctrl_ops = {
+.read = npcm_pspi_ctrl_read,
+.write = npcm_pspi_ctrl_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 4,


You said in v1 "The datasheet suggests it's either 8-bit or
16-bit accesses.", so we want max_access_size = 2 here, right?


+.unaligned = false,
+},
+.impl = {
+.min_access_size = 2,
+.max_access_size = 2,
+.unaligned = false,
+},
+};






Re: [PATCH 2/3] hw/ssi: Add Nuvoton PSPI Module

2023-02-07 Thread Philippe Mathieu-Daudé

On 7/2/23 20:21, Hao Wu wrote:
On Tue, Feb 7, 2023 at 10:46 AM Hao Wu > wrote:


Thanks for your review!

On Mon, Feb 6, 2023 at 11:13 PM Philippe Mathieu-Daudé
mailto:phi...@linaro.org>> wrote:

On 7/2/23 00:34, Hao Wu wrote:
 > Nuvoton's PSPI is a general purpose SPI module which enables
 > connections to SPI-based peripheral devices.
 >
 > Signed-off-by: Hao Wu mailto:wuhao...@google.com>>
 > Reviewed-by: Chris Rauer mailto:cra...@google.com>>
 > ---
 >   MAINTAINERS                |   6 +-
 >   hw/ssi/meson.build         |   2 +-
 >   hw/ssi/npcm_pspi.c         | 216
+
 >   hw/ssi/trace-events        |   5 +
 >   include/hw/ssi/npcm_pspi.h |  53 +
 >   5 files changed, 278 insertions(+), 4 deletions(-)
 >   create mode 100644 hw/ssi/npcm_pspi.c
 >   create mode 100644 include/hw/ssi/npcm_pspi.h




 > +static void npcm_pspi_realize(DeviceState *dev, Error **errp)
 > +{
 > +    NPCMPSPIState *s = NPCM_PSPI(dev);
 > +    SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 > +    Object *obj = OBJECT(dev);
 > +
 > +    s->spi = ssi_create_bus(dev, "pspi");

FYI there is an ongoing discussion about how to model QOM tree. If
this bus isn't shared with another controller, the "embed QOM child
in parent" style could be preferred. If so, the bus would be created
as:

        object_initialize_child(obj, "pspi", >spi, TYPE_SSI_BUS);

I was just following some existing code here. I think I can use the
new style. 


I've tried to use this and got the following error:
**
ERROR:../qom/object.c:511:object_initialize_with_type: assertion failed: 
(size >= type->instance_size)
Bail out! ERROR:../qom/object.c:511:object_initialize_with_type: 
assertion failed: (size >= type->instance_size)


I think the problem is that we define s->spi as SSIBus* instead of 
SSIBus. But if we define it as SSIBus, we'll
get an incomplete type error. Fixing it will require refactoring 
hw/ssi/ssi.c which I'm not sure if we want to do
it right now. This code is consistent with other code in hw/ssi so I 
guess we can leave it here for now and wait

for a future refactor.

Sorry, I was just mumbling alone thinking about what would need to be
done, not asking you to change it yet :/ I should have been
more explicit.



[PATCH] qmp: Add error proofing for query-acpi-ospm-status

2023-02-07 Thread Kunkun Jiang via
The ACPI device may not implement the ospm_status callback. Executing
qmp "query-acpi-ospm-status" will cause segmentation fault. Add error
proofing add log to avoid such serious consequences.

Signed-off-by: Kunkun Jiang 
---
 monitor/qmp-cmds.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index bf22a8c5a6..7652613635 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -229,7 +229,12 @@ ACPIOSTInfoList *qmp_query_acpi_ospm_status(Error **errp)
 AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(obj);
 AcpiDeviceIf *adev = ACPI_DEVICE_IF(obj);
 
-adevc->ospm_status(adev, );
+if (adevc->ospm_status) {
+adevc->ospm_status(adev, );
+} else {
+error_setg(errp, "command is not supported, "
+ "ACPI device missing ospm_status callback");
+}
 } else {
 error_setg(errp, "command is not supported, missing ACPI device");
 }
-- 
2.33.0




Re: [PATCH] MAINTAINERS: add myself to ui/ and audio/

2023-02-07 Thread Gerd Hoffmann
On Tue, Feb 07, 2023 at 12:56:10PM +0400, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Helping out with patch review & queue handling.
> 
> Signed-off-by: Marc-André Lureau 

Reviewed-by: Gerd Hoffmann 




Re: Porting the QEMU build architecture to Visual Studio

2023-02-07 Thread Yan Vugenfirer
Hi Andrew,

I created a Visual Studio project a long time ago for qemu-ga. It is an
ugly unsustainable hack that was done in the time before QEMU moved to
meson and I had to deal with GCC extension that MS compiler couldn't handle.
Today I would experiment with meson that should be able to create VS
projects: https://mesonbuild.com/Using-with-Visual-Studio.html and use
clang on Windows (info in the same link).

Best regards,
Yan.


On Wed, Feb 8, 2023 at 8:57 AM Philippe Mathieu-Daudé 
wrote:

> Hi Andrew,
>
> On 8/2/23 05:56, Andrew Numrich wrote:
> > Hello, I’m looking to experiment with QEMU in a Windows specific
> > environment. For that I’ll need to build QEMU’s source code in Visual
> > Studio 2017.
> >
> > I’m seeing that QEMU’s sources calls for a `config-host.h` file
> > generated by a `create_config` script. I don’t see said script anywhere
> > in the source tree.
> >
> > By googling I can see various creations of both these files in random
> > forks of QEMU around the internet, but I couldn’t be sure to use any of
> > them.
> >
> > I’m guessing these are somehow created by meson or ninja, but I don’t
> > have those easily on hand, being on Windows.
> >
> > Is there anyone that can point me in the right direction
>
> Paolo posted a patch to adapt QEMU build system to VSCode few
> years ago:
>
> https://lore.kernel.org/qemu-devel/20210512100906.621504-1-pbonz...@redhat.com/
>
> > or explain how
> > this is generated?
> >
> > Thanks
> >
> >   * Andrew Numrich
> >
> > http://github.com/toastmod
> >
>
>
>


Re: [PATCH v3 08/10] qmp: teach 'getfd' to import sockets on win32

2023-02-07 Thread Marc-André Lureau
Hi

On Tue, Feb 7, 2023 at 6:54 PM Daniel P. Berrangé 
wrote:

> On Tue, Feb 07, 2023 at 06:25:33PM +0400, marcandre.lur...@redhat.com
> wrote:
> > From: Marc-André Lureau 
> >
> > A process with enough capabilities can duplicate a socket to QEMU.
> > Modify 'getfd' to import it and add it to the monitor fd list, so it can
> > be later used by other commands.
> >
> > Note that we actually store the SOCKET in the FD list, appropriate care
> > must now be taken to use the correct socket functions (similar approach
> > is taken by our io/ code and in glib, this is internal and shouldn't
> > affect the QEMU/QMP users)
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  qapi/misc.json | 16 --
> >  monitor/fds.c  | 79 --
> >  monitor/hmp-cmds.c |  6 +++-
> >  3 files changed, 81 insertions(+), 20 deletions(-)
> >
> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index 27ef5a2b20..cd36d8befb 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -249,10 +249,18 @@
> >  ##
> >  # @getfd:
> >  #
> > -# Receive a file descriptor via SCM rights and assign it a name
> > +# On UNIX, receive a file descriptor via SCM rights and assign it a
> name.
> > +#
> > +# On Windows, (where ancillary socket fd-passing isn't an option yet),
> add a
> > +# socket that was duplicated to QEMU process with WSADuplicateSocketW()
> via
> > +# WSASocket() & WSAPROTOCOL_INFOW structure and assign it a name. A
> SOCKET is
> > +# considered as a kind of "file descriptor" in QMP context, for
> historical
> > +# reasons and simplicity. QEMU takes care to use socket functions
> appropriately.
> >  #
> >  # @fdname: file descriptor name
> >  #
> > +# @wsa-info: a WSAPROTOCOL_INFOW structure (encoded in base64). Since
> 8.0.
>
> This is a clever trick, but it also feels pretty gross from
> POV of QMP design normal practice, which would be to define
> a struct in QAPI to represent the WSAPROTOCOL_INFOW contents.
>
> The main downside would be that its more verbose to convert
> between the windows and QAPI structs.


WSAPROTOCOL_INFOW is a fairly big structure, with private/reserved files,
it contains another structure (WSAPROTOCOLCHAIN), has fixed-length arrays,
GUID, and utf16 string. QAPI-fying is going to be painful for no real gain.
It is opaque and simply given back to WSASocketW.

Markus, did you have a chance to look at the series? Can you review/comment
before I do further work?

thanks


> > @@ -270,7 +278,11 @@
> >  # <- { "return": {} }
> >  #
> >  ##
> > -{ 'command': 'getfd', 'data': {'fdname': 'str'} }
> > +{ 'command': 'getfd', 'data': {
> > +'fdname': 'str',
> > +'*wsa-info': {'type': 'str', 'if': 'CONFIG_WIN32'}
> > +  }
> > +}
>
> snip
>
> > +void qmp_getfd(const char *fdname,
> > +#ifdef WIN32
> > +   const char *wsa_info,
> > +#endif
> > +   Error **errp)
> > +{
> > +Monitor *cur_mon = monitor_cur();
> > +int fd;
> > +
> > +#ifdef WIN32
> > +if (wsa_info) {
> > +g_autofree WSAPROTOCOL_INFOW *info = NULL;
> > +gsize len;
> > +SOCKET sk;
> > +
> > +info = (void *)g_base64_decode(wsa_info, );
> > +if (len != sizeof(*info)) {
> > +error_setg(errp, "Invalid WSAPROTOCOL_INFOW value");
> > +return;
> > +}
>
>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|
>
>


Re: [PATCH v2 07/10] hw/ide/piix: Require an ISABus only for user-created instances

2023-02-07 Thread Philippe Mathieu-Daudé

On 8/2/23 01:43, BALATON Zoltan wrote:

On Wed, 8 Feb 2023, Bernhard Beschow wrote:
Am 7. Februar 2023 20:52:02 UTC schrieb Mark Cave-Ayland 
:

On 06/02/2023 23:40, Bernhard Beschow wrote:
Am 5. Februar 2023 22:32:03 UTC schrieb Mark Cave-Ayland 
:

On 05/02/2023 22:21, BALATON Zoltan wrote:

On Sun, 5 Feb 2023, Mark Cave-Ayland wrote:

On 26/01/2023 21:17, Bernhard Beschow wrote:

Internal instances now defer interrupt wiring to the caller which
decouples them from the ISABus. User-created devices still fish 
out the
ISABus from the QOM tree and the interrupt wiring remains in 
PIIX IDE.

The latter mechanism is considered a workaround and intended to be
removed once a deprecation period for user-created PIIX IDE 
devices is

over.

Signed-off-by: Bernhard Beschow 
---
   include/hw/ide/pci.h |  1 +
   hw/ide/piix.c    | 64 
++--

   hw/isa/piix.c    |  5 
   3 files changed, 56 insertions(+), 14 deletions(-)




   diff --git a/hw/isa/piix.c b/hw/isa/piix.c
index 54a1246a9d..f9974c2a77 100644
--- a/hw/isa/piix.c
+++ b/hw/isa/piix.c
@@ -345,9 +345,14 @@ static void pci_piix_realize(PCIDevice 
*dev, const char *uhci_type,

 /* IDE */
   qdev_prop_set_int32(DEVICE(>ide), "addr", dev->devfn + 
1);

+    qdev_prop_set_bit(DEVICE(>ide), "user-created", false);
   if (!qdev_realize(DEVICE(>ide), BUS(pci_bus), errp)) {
   return;
   }
+    qdev_connect_gpio_out(DEVICE(>ide), 0,
+  qdev_get_gpio_in(DEVICE(>pic), 14));
+    qdev_connect_gpio_out(DEVICE(>ide), 1,
+  qdev_get_gpio_in(DEVICE(>pic), 15));
 /* USB */
   if (d->has_usb) {


I haven't checked the datasheet, but I suspect this will be 
similar to the cmd646/via PCI-IDE interfaces in that there will 
be a PCI configuration register that will switch between ISA 
compatibility mode (and ISA irqs) and PCI mode (with PCI IRQs). 
So it would be the device configuration that would specify PCI or 
ISA mode, rather than the presence of an ISABus.


I forgot about this topic already and haven't follwed this series 
either so what I say may not fully make sense but I think CMD646 
and via-ide are different. CMD646 is a PCI device and should use 
PCI interrupts while via-ide is part of a southbridge/superio 
complex and connected to the ISA PICs within that southbride, so I 
think via-ide always uses ISA IRQs and the ISA btidge within the 
same chip may convert that to PCI IRQs or not (that part is where 
I'm lost also because we may not actually model it that way). 
After a long debate we managed to find a solution back then that 
works for every guest we use it for now so I think we don't want 
to touch it now until some real need arises. It does not worth the 
trouble and added complexity to model something that is not used 
just for the sake of correctness. By the time we find a use for 
that, the ISA emulation may evolve so it's easier to implement the 
missing switching between isa and native mode or we may want to do 
it differently
  (such as we do things differently now compared to what we did years 
ago). So I think it does not worth keeping the ISA model from being 
simplified for some theoretical uses in the future which we may not 
actually do any time soon.


Indeed we don't want (and have the time) to model features we don't need
or will never use. However taking the time to correctly understand these
devices help us to correctly model them. In particular when design flaws
have been identified in some models.

But I don't want to get into this again so 
just shared my thoughts and feel free to ignore it. I don't care where 
these patches go as long as the VIA model keeps working for me.


We certainly want to keep what we currently have working :)

I have a vague memory that ISA compatibility mode was part of the 
original PCI-BMDMA specification, but it has been a while since I 
last looked.


Bernhard, is there any mention of this in the PIIX datasheet(s)? 
For reference the cmd646 datasheet specifies that ISA mode or PCI 
mode is determined by register PROG_IF (0x9) in PCI configuration 
space.


I've found the following:

   "Only PCI masters have access to the IDE port. ISA Bus masters 
cannot access the IDE I/O port addresses. Memory targeted by the IDE 
interface acting as a PCI Bus master on behalf of IDE DMA slaves 
must reside on PCI, usually main memory implemented by the 
host-to-PCI bridge."


And:

   "PIIX4 can act as a PCI Bus master on behalf of an IDE slave 
device."


Does this perhaps mean that piix-ide does indeed have no ISA bus?


I'd be amazed if that were the case: certainly when the first 
motherboards came out with PCI and ISA slots, I'd expect the IDE 
legacy mode to be enabled by default since BIOSes and OSs such as DOS 
wouldn't have been PCI aware and would access the ISA ioports 
directly. From memory the OF PCI specification has mention of 
workarounds such as mapping the old VGA 

Re: Porting the QEMU build architecture to Visual Studio

2023-02-07 Thread Philippe Mathieu-Daudé

Hi Andrew,

On 8/2/23 05:56, Andrew Numrich wrote:
Hello, I’m looking to experiment with QEMU in a Windows specific 
environment. For that I’ll need to build QEMU’s source code in Visual 
Studio 2017.


I’m seeing that QEMU’s sources calls for a `config-host.h` file 
generated by a `create_config` script. I don’t see said script anywhere 
in the source tree.


By googling I can see various creations of both these files in random 
forks of QEMU around the internet, but I couldn’t be sure to use any of 
them.


I’m guessing these are somehow created by meson or ninja, but I don’t 
have those easily on hand, being on Windows.


Is there anyone that can point me in the right direction


Paolo posted a patch to adapt QEMU build system to VSCode few
years ago:
https://lore.kernel.org/qemu-devel/20210512100906.621504-1-pbonz...@redhat.com/

or explain how 
this is generated?


Thanks

  * Andrew Numrich

http://github.com/toastmod






[PULL v2 11/19] migration: Clean up includes

2023-02-07 Thread Markus Armbruster
This commit was created with scripts/clean-includes.

All .c should include qemu/osdep.h first.  The script performs three
related cleanups:

* Ensure .c files include qemu/osdep.h first.
* Including it in a .h is redundant, since the .c  already includes
  it.  Drop such inclusions.
* Likewise, including headers qemu/osdep.h includes is redundant.
  Drop these, too.

Signed-off-by: Markus Armbruster 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Juan Quintela 
Message-Id: <20230202133830.2152150-12-arm...@redhat.com>
[Straightforward conflict with commit d5890ea0722 resolved]
---
 include/qemu/userfaultfd.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/qemu/userfaultfd.h b/include/qemu/userfaultfd.h
index d764496f0b..18a4314212 100644
--- a/include/qemu/userfaultfd.h
+++ b/include/qemu/userfaultfd.h
@@ -15,7 +15,6 @@
 
 #ifdef CONFIG_LINUX
 
-#include "qemu/osdep.h"
 #include "exec/hwaddr.h"
 #include 
 
-- 
2.39.0




[PULL v2 00/19] Header cleanup patches for 2023-02-06

2023-02-07 Thread Markus Armbruster
The following changes since commit 969d09c3a6186c0a4bc8a41db0c1aba1c76081fc:

  Merge tag 'pull-aspeed-20230207' of https://github.com/legoater/qemu into 
staging (2023-02-07 20:13:38 +)

are available in the Git repository at:

  https://repo.or.cz/qemu/armbru.git tags/pull-include-2023-02-06-v2

for you to fetch changes up to a67dfa660b0dd944c8fedfac02806de75b0c08b1:

  Drop duplicate #include (2023-02-08 07:28:05 +0100)


Header cleanup patches for 2023-02-06


Markus Armbruster (19):
  scripts/clean-includes: Fully skip / ignore files
  scripts/clean-includes: Don't claim duplicate headers found when not
  scripts/clean-includes: Skip symbolic links
  scripts/clean-includes: Improve --git commit message
  bsd-user: Clean up includes
  crypto: Clean up includes
  hw/cxl: Clean up includes
  hw/input: Clean up includes
  hw/tricore: Clean up includes
  qga: Clean up includes
  migration: Clean up includes
  net: Clean up includes
  target/hexagon: Clean up includes
  riscv: Clean up includes
  block: Clean up includes
  accel: Clean up includes
  Fix non-first inclusions of qemu/osdep.h
  Don't include headers already included by qemu/osdep.h
  Drop duplicate #include

 backends/tpm/tpm_ioctl.h  |  2 --
 bsd-user/bsd-proc.h   |  4 
 bsd-user/qemu.h   |  1 -
 crypto/block-luks-priv.h  |  1 -
 fsdev/p9array.h   |  2 --
 include/block/graph-lock.h|  1 -
 include/block/write-threshold.h   |  2 --
 include/hw/arm/fsl-imx6ul.h   |  1 -
 include/hw/arm/fsl-imx7.h |  1 -
 include/hw/cxl/cxl_component.h|  2 --
 include/hw/cxl/cxl_host.h |  1 -
 include/hw/cxl/cxl_pci.h  |  1 -
 include/hw/input/pl050.h  |  1 -
 include/hw/misc/aspeed_lpc.h  |  2 --
 include/hw/pci/pcie_doe.h |  1 -
 include/hw/tricore/triboard.h |  1 -
 include/qemu/async-teardown.h |  2 --
 include/qemu/dbus.h   |  1 -
 include/qemu/host-utils.h |  1 -
 include/qemu/userfaultfd.h|  1 -
 include/sysemu/accel-blocker.h|  1 -
 include/sysemu/event-loop-base.h  |  1 -
 net/vmnet_int.h   |  1 -
 qga/cutils.h  |  2 --
 target/hexagon/hex_arch_types.h   |  1 -
 target/hexagon/mmvec/macros.h |  1 -
 target/riscv/pmu.h|  1 -
 accel/tcg/cpu-exec.c  |  1 -
 audio/sndioaudio.c|  2 +-
 backends/hostmem-epc.c|  2 +-
 backends/tpm/tpm_emulator.c   |  1 -
 block/export/vduse-blk.c  |  2 +-
 block/qapi.c  |  1 -
 bsd-user/arm/signal.c |  1 +
 bsd-user/arm/target_arch_cpu.c|  2 ++
 bsd-user/freebsd/os-sys.c |  1 +
 bsd-user/i386/signal.c|  1 +
 bsd-user/i386/target_arch_cpu.c   |  3 +--
 bsd-user/main.c   |  4 +---
 bsd-user/strace.c |  1 -
 bsd-user/x86_64/signal.c  |  1 +
 bsd-user/x86_64/target_arch_cpu.c |  3 +--
 hw/9pfs/9p.c  |  2 --
 hw/acpi/piix4.c   |  1 -
 hw/alpha/dp264.c  |  1 -
 hw/arm/virt.c |  1 -
 hw/arm/xlnx-versal.c  |  1 -
 hw/block/pflash_cfi01.c   |  1 -
 hw/core/machine.c |  1 -
 hw/display/virtio-gpu-udmabuf.c   |  1 -
 hw/hppa/machine.c |  1 -
 hw/hyperv/syndbg.c|  2 +-
 hw/i2c/pmbus_device.c |  1 -
 hw/i386/acpi-build.c  |  1 -
 hw/input/tsc210x.c|  1 -
 hw/loongarch/acpi-build.c |  1 -
 hw/misc/macio/cuda.c  |  1 -
 hw/misc/macio/pmu.c   |  1 -
 hw/net/xilinx_axienet.c   |  1 -
 hw/ppc/ppc405_uc.c|  2 --
 hw/ppc/ppc440_bamboo.c|  1 -
 hw/ppc/spapr_drc.c|  1 -
 hw/rdma/vmw/pvrdma_dev_ring.c |  1 -
 hw/remote/machine.c   |  1 -
 hw/remote/proxy-memory-listener.c |  1 -
 hw/remote/remote-obj.c|  1 -
 hw/rtc/mc146818rtc.c  |  1 -
 hw/s390x/virtio-ccw-serial.c  |  1 -
 hw/sensor/adm1272.c   |  1 -
 hw/usb/dev-storage-bot.c  |  1 -
 hw/usb/dev-storage-classic.c  |  1 -
 migration/postcopy-ram.c  |  2 --
 qga/commands-posix.c  |  1 -
 qga/cutils.c  |  3 ++-
 softmmu/dirtylimit.c  |  1 -
 softmmu/runstate.c|  1 -
 softmmu/vl.c  |  3 ---
 target/loongarch/translate.c  |  1 -
 target/mips/tcg/translate.c   |  1 -
 target/nios2/translate.c  |  2 --
 tcg/tci.c |  1 -
 tests/unit/test-cutils.c  |  1 -
 tests/unit/test-seccomp.c |  1 -
 ui/gtk.c  |  1 -
 ui/udmabuf.c  |  1

Re: [PATCH V2 1/4] qapi: strList_from_string

2023-02-07 Thread Marc-André Lureau
Hi

On Tue, Feb 7, 2023 at 10:50 PM Steve Sistare  wrote:
>
> Generalize hmp_split_at_comma() to take any delimiter character, rename
> as strList_from_string(), and move it to qapi/util.c.
>
> No functional change.

The g_strsplit() version was a bit simpler, but if you want to
optimize it a bit for 1 char delimiter only, ok.

Reviewed-by: Marc-André Lureau 

>
> Signed-off-by: Steve Sistare 
> ---
>  include/monitor/hmp.h  |  1 -
>  include/qapi/util.h|  9 +
>  monitor/hmp-cmds.c | 19 ---
>  net/net-hmp-cmds.c |  2 +-
>  qapi/qapi-util.c   | 23 +++
>  stats/stats-hmp-cmds.c |  2 +-
>  6 files changed, 34 insertions(+), 22 deletions(-)
>
> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> index 2220f14..e80848f 100644
> --- a/include/monitor/hmp.h
> +++ b/include/monitor/hmp.h
> @@ -19,7 +19,6 @@
>
>  bool hmp_handle_error(Monitor *mon, Error *err);
>  void hmp_help_cmd(Monitor *mon, const char *name);
> -strList *hmp_split_at_comma(const char *str);
>
>  void hmp_info_name(Monitor *mon, const QDict *qdict);
>  void hmp_info_version(Monitor *mon, const QDict *qdict);
> diff --git a/include/qapi/util.h b/include/qapi/util.h
> index 81a2b13..7d88b09 100644
> --- a/include/qapi/util.h
> +++ b/include/qapi/util.h
> @@ -22,6 +22,8 @@ typedef struct QEnumLookup {
>  const int size;
>  } QEnumLookup;
>
> +struct strList;
> +
>  const char *qapi_enum_lookup(const QEnumLookup *lookup, int val);
>  int qapi_enum_parse(const QEnumLookup *lookup, const char *buf,
>  int def, Error **errp);
> @@ -31,6 +33,13 @@ bool qapi_bool_parse(const char *name, const char *value, 
> bool *obj,
>  int parse_qapi_name(const char *name, bool complete);
>
>  /*
> + * Produce a strList from the character delimited string @in.
> + * All strings are g_strdup'd.
> + * A NULL or empty input string returns NULL.
> + */
> +struct strList *strList_from_string(const char *in, char delim);
> +
> +/*
>   * For any GenericList @list, insert @element at the front.
>   *
>   * Note that this macro evaluates @element exactly once, so it is safe
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 34bd8c6..9665e6e 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -39,25 +39,6 @@ bool hmp_handle_error(Monitor *mon, Error *err)
>  return false;
>  }
>
> -/*
> - * Split @str at comma.
> - * A null @str defaults to "".
> - */
> -strList *hmp_split_at_comma(const char *str)
> -{
> -char **split = g_strsplit(str ?: "", ",", -1);
> -strList *res = NULL;
> -strList **tail = 
> -int i;
> -
> -for (i = 0; split[i]; i++) {
> -QAPI_LIST_APPEND(tail, split[i]);
> -}
> -
> -g_free(split);
> -return res;
> -}
> -
>  void hmp_info_name(Monitor *mon, const QDict *qdict)
>  {
>  NameInfo *info;
> diff --git a/net/net-hmp-cmds.c b/net/net-hmp-cmds.c
> index 41d326b..30422d9 100644
> --- a/net/net-hmp-cmds.c
> +++ b/net/net-hmp-cmds.c
> @@ -72,7 +72,7 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
>  migrate_announce_params());
>
>  qapi_free_strList(params->interfaces);
> -params->interfaces = hmp_split_at_comma(interfaces_str);
> +params->interfaces = strList_from_string(interfaces_str, ',');
>  params->has_interfaces = params->interfaces != NULL;
>  params->id = g_strdup(id);
>  qmp_announce_self(params, NULL);
> diff --git a/qapi/qapi-util.c b/qapi/qapi-util.c
> index 63596e1..b61c73c 100644
> --- a/qapi/qapi-util.c
> +++ b/qapi/qapi-util.c
> @@ -15,6 +15,7 @@
>  #include "qapi/error.h"
>  #include "qemu/ctype.h"
>  #include "qapi/qmp/qerror.h"
> +#include "qapi/qapi-builtin-types.h"
>
>  CompatPolicy compat_policy;
>
> @@ -152,3 +153,25 @@ int parse_qapi_name(const char *str, bool complete)
>  }
>  return p - str;
>  }
> +
> +strList *strList_from_string(const char *in, char delim)
> +{
> +strList *res = NULL;
> +strList **tail = 
> +
> +while (in && in[0]) {
> +char *next = strchr(in, delim);
> +char *value;
> +
> +if (next) {
> +value = g_strndup(in, next - in);
> +in = next + 1; /* skip the delim */
> +} else {
> +value = g_strdup(in);
> +in = NULL;
> +}
> +QAPI_LIST_APPEND(tail, value);
> +}
> +
> +return res;
> +}
> diff --git a/stats/stats-hmp-cmds.c b/stats/stats-hmp-cmds.c
> index 531e35d..4a2adf8 100644
> --- a/stats/stats-hmp-cmds.c
> +++ b/stats/stats-hmp-cmds.c
> @@ -174,7 +174,7 @@ static StatsFilter *stats_filter(StatsTarget target, 
> const char *names,
>  request->provider = provider_idx;
>  if (names && !g_str_equal(names, "*")) {
>  request->has_names = true;
> -request->names = hmp_split_at_comma(names);
> +request->names = strList_from_string(names, ',');
>  }
>  

Re: [PATCH v2 12/23] vfio-user: region read/write

2023-02-07 Thread John Johnson


> On Feb 6, 2023, at 11:07 AM, Alex Williamson  
> wrote:
> 
> On Wed,  1 Feb 2023 21:55:48 -0800
> John Johnson  wrote:
> 
>> Add support for posted writes on remote devices
>> 
>> Signed-off-by: Elena Ufimtseva 
>> Signed-off-by: John G Johnson 
>> Signed-off-by: Jagannathan Raman 
>> ---
>> hw/vfio/user-protocol.h   |  12 +
>> hw/vfio/user.h|   1 +
>> include/hw/vfio/vfio-common.h |   3 +-
>> hw/vfio/common.c  |  23 ++---
>> hw/vfio/pci.c |   5 +-
>> hw/vfio/user-pci.c|   5 ++
>> hw/vfio/user.c| 112 
>> ++
>> hw/vfio/trace-events  |   1 +
>> 8 files changed, 154 insertions(+), 8 deletions(-)
>> 
>> diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h
>> index 6f70a48..6987435 100644
>> --- a/hw/vfio/user-protocol.h
>> +++ b/hw/vfio/user-protocol.h
>> @@ -139,4 +139,16 @@ typedef struct {
>> uint64_t offset;
>> } VFIOUserRegionInfo;
>> 
>> +/*
>> + * VFIO_USER_REGION_READ
>> + * VFIO_USER_REGION_WRITE
>> + */
>> +typedef struct {
>> +VFIOUserHdr hdr;
>> +uint64_t offset;
>> +uint32_t region;
>> +uint32_t count;
>> +char data[];
>> +} VFIOUserRegionRW;
>> +
>> #endif /* VFIO_USER_PROTOCOL_H */
>> diff --git a/hw/vfio/user.h b/hw/vfio/user.h
>> index e6485dc..3012a86 100644
>> --- a/hw/vfio/user.h
>> +++ b/hw/vfio/user.h
>> @@ -84,6 +84,7 @@ typedef struct VFIOUserProxy {
>> /* VFIOProxy flags */
>> #define VFIO_PROXY_CLIENT0x1
>> #define VFIO_PROXY_FORCE_QUEUED  0x4
>> +#define VFIO_PROXY_NO_POST   0x8
>> 
>> VFIOUserProxy *vfio_user_connect_dev(SocketAddress *addr, Error **errp);
>> void vfio_user_disconnect(VFIOUserProxy *proxy);
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 9fb4c80..bbc4b15 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -57,6 +57,7 @@ typedef struct VFIORegion {
>> VFIOMmap *mmaps;
>> uint8_t nr; /* cache the region number for debug */
>> int fd; /* fd to mmap() region */
>> +bool post_wr; /* writes can be posted */
>> } VFIORegion;
>> 
>> typedef struct VFIOMigration {
>> @@ -180,7 +181,7 @@ struct VFIODeviceIO {
>> int (*region_read)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t 
>> size,
>>void *data);
>> int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t 
>> size,
>> -void *data);
>> +void *data, bool post);
>> };
>> 
>> struct VFIOContainerIO {
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index d26b325..de64e53 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -215,6 +215,7 @@ void vfio_region_write(void *opaque, hwaddr addr,
>> uint32_t dword;
>> uint64_t qword;
>> } buf;
>> +bool post = region->post_wr;
>> int ret;
>> 
>> switch (size) {
>> @@ -235,12 +236,19 @@ void vfio_region_write(void *opaque, hwaddr addr,
>> break;
>> }
>> 
>> -ret = vbasedev->io->region_write(vbasedev, region->nr, addr, size, 
>> );
>> +/* read-after-write hazard if guest can directly access region */
>> +if (region->nr_mmaps) {
>> +post = false;
>> +}
>> +ret = vbasedev->io->region_write(vbasedev, region->nr, addr, size, ,
>> + post);
>> if (ret != size) {
>> +const char *err = ret < 0 ? strerror(-ret) : "short write";
>> +
>> error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
>> - ",%d) failed: %m",
>> + ",%d) failed: %s",
>>  __func__, vbasedev->name, region->nr,
>> - addr, data, size);
>> + addr, data, size, err);
>> }
>> trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
>> 
>> @@ -271,9 +279,11 @@ uint64_t vfio_region_read(void *opaque,
>> 
>> ret = vbasedev->io->region_read(vbasedev, region->nr, addr, size, );
>> if (ret != size) {
>> -error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
>> +const char *err = ret < 0 ? strerror(-ret) : "short read";
>> +
>> +error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %s",
>>  __func__, vbasedev->name, region->nr,
>> - addr, size);
>> + addr, size, err);
>> return (uint64_t)-1;
>> }
>> 
>> @@ -1584,6 +1594,7 @@ int vfio_region_setup(Object *obj, VFIODevice 
>> *vbasedev, VFIORegion *region,
>> region->size = info->size;
>> region->fd_offset = info->offset;
>> region->nr = index;
>> +region->post_wr = false;
>> if (vbasedev->regfds != NULL) {
>> region->fd = vbasedev->regfds[index];
>> } else {
>> @@ -2711,7 +2722,7 @@ static int vfio_io_region_read(VFIODevice *vbasedev, 
>> uint8_t index, off_t off,
>> }
>> 
>> static int 

Re: [PATCH v2 15/23] vfio-user: forward msix BAR accesses to server

2023-02-07 Thread John Johnson


> On Feb 6, 2023, at 12:33 PM, Alex Williamson  
> wrote:
> 
> On Wed,  1 Feb 2023 21:55:51 -0800
> John Johnson  wrote:
> 
>> Server holds device current device pending state
>> Use irq masking commands in socket case
>> 
>> Signed-off-by: John G Johnson 
>> Signed-off-by: Elena Ufimtseva 
>> Signed-off-by: Jagannathan Raman 
>> ---
>> hw/vfio/pci.h |  1 +
>> include/hw/vfio/vfio-common.h |  3 ++
>> hw/vfio/ccw.c |  1 +
>> hw/vfio/common.c  | 26 ++
>> hw/vfio/pci.c | 23 +++-
>> hw/vfio/platform.c|  1 +
>> hw/vfio/user-pci.c| 64 
>> +++
>> 7 files changed, 118 insertions(+), 1 deletion(-)
>> 
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 4f70664..d3e5d5f 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -113,6 +113,7 @@ typedef struct VFIOMSIXInfo {
>> uint32_t table_offset;
>> uint32_t pba_offset;
>> unsigned long *pending;
>> +MemoryRegion *pba_region;
>> } VFIOMSIXInfo;
>> 
>> /*
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index bbc4b15..2c58d7d 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -143,6 +143,7 @@ typedef struct VFIODevice {
>> bool ram_block_discard_allowed;
>> bool enable_migration;
>> bool use_regfds;
>> +bool can_mask_irq;
> 
> How can this be a device level property?  vfio-pci (kernel) supports
> masking of INTx.  It seems like there needs to be a per-index info or
> probe here to to support this.
> 

It is only used for MSIX masking.  MSI masking is done with
config space stores, and vfio-kernel and vfio-user handle them the
same by forwarding the config space writes to the server so it can
mask interrupts.

MSIX is trickier because the mask bits are in a memory region.
vfio-kernel doesn’t support SET_IRQS on MSIX vectors, so if the host
doesn’t allow client mapping of the MSIX table to do the masking, vfio
switches a masked vector’s event FD destination from KVM to QEMU, then
uses the QEMU PCI emulation to mask the vector.

vfio-user has to use messages to mask MSIX vectors, since there
is no host config space to map.  I originally forwarded the MSIX table
writes to the server to do the masking, but the feedback on the vfio-user
server changes was to add SET_IRQS support.

I can make this a PCI MSIX-specific bool if that helps.


> 
>> VFIODeviceOps *ops;
>> VFIODeviceIO *io;
>> unsigned int num_irqs;
>> @@ -239,6 +240,8 @@ void vfio_put_base_device(VFIODevice *vbasedev);
>> void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
>> void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
>> void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index);
>> +void vfio_unmask_single_irq(VFIODevice *vbasedev, int index, int irq);
>> +void vfio_mask_single_irq(VFIODevice *vbasedev, int index, int irq);
>> int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex,
>>int action, int fd, Error **errp);
>> void vfio_region_write(void *opaque, hwaddr addr,
>> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
>> index 00605bd..bf67670 100644
>> --- a/hw/vfio/ccw.c
>> +++ b/hw/vfio/ccw.c
>> @@ -616,6 +616,7 @@ static void vfio_ccw_get_device(VFIOGroup *group, 
>> VFIOCCWDevice *vcdev,
>> vcdev->vdev.dev = >cdev.parent_obj.parent_obj;
>> vcdev->vdev.io = _dev_io_ioctl;
>> vcdev->vdev.use_regfds = false;
>> +vcdev->vdev.can_mask_irq = false;
>> 
>> return;
>> 
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index de64e53..0c1cb21 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -102,6 +102,32 @@ void vfio_mask_single_irqindex(VFIODevice *vbasedev, 
>> int index)
>> vbasedev->io->set_irqs(vbasedev, _set);
>> }
>> 
>> +void vfio_mask_single_irq(VFIODevice *vbasedev, int index, int irq)
>> +{
>> +struct vfio_irq_set irq_set = {
>> +.argsz = sizeof(irq_set),
>> +.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
>> +.index = index,
>> +.start = irq,
>> +.count = 1,
>> +};
>> +
>> +vbasedev->io->set_irqs(vbasedev, _set);
>> +}
>> +
>> +void vfio_unmask_single_irq(VFIODevice *vbasedev, int index, int irq)
>> +{
>> +struct vfio_irq_set irq_set = {
>> +.argsz = sizeof(irq_set),
>> +.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
>> +.index = index,
>> +.start = irq,
>> +.count = 1,
>> +};
>> +
>> +vbasedev->io->set_irqs(vbasedev, _set);
>> +}
>> +
>> static inline const char *action_to_str(int action)
>> {
>> switch (action) {
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 42e7c82..7b16f8f 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -477,6 +477,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
>> unsigned int nr,
>> {
>> 

[PATCH v2] target/riscv: Remove privileged spec version restriction for RVV

2023-02-07 Thread frank . chang
From: Frank Chang 

The RVV specification does not require that the core needs to support
the privileged specification v1.12.0 to support RVV, and there is no
dependency from ISA level.

This commit removes the restriction from both RVV CSRs and extension CPU
ISA string.

Signed-off-by: Frank Chang 
Reviewed-by: Bin Meng 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/cpu.c |  2 +-
 target/riscv/csr.c | 21 +++--
 2 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 0dd2f0c753..93b52b826c 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -73,7 +73,7 @@ struct isa_ext_data {
  */
 static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(h, false, PRIV_VERSION_1_12_0, ext_h),
-ISA_EXT_DATA_ENTRY(v, false, PRIV_VERSION_1_12_0, ext_v),
+ISA_EXT_DATA_ENTRY(v, false, PRIV_VERSION_1_10_0, ext_v),
 ISA_EXT_DATA_ENTRY(zicsr, true, PRIV_VERSION_1_10_0, ext_icsr),
 ISA_EXT_DATA_ENTRY(zifencei, true, PRIV_VERSION_1_10_0, ext_ifencei),
 ISA_EXT_DATA_ENTRY(zihintpause, true, PRIV_VERSION_1_10_0, 
ext_zihintpause),
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index fa17d7770c..1b0a0c1693 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3980,20 +3980,13 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
 [CSR_FRM]  = { "frm",  fs, read_frm, write_frm},
 [CSR_FCSR] = { "fcsr", fs, read_fcsr,write_fcsr   },
 /* Vector CSRs */
-[CSR_VSTART]   = { "vstart",   vs, read_vstart,  write_vstart,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VXSAT]= { "vxsat",vs, read_vxsat,   write_vxsat,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VXRM] = { "vxrm", vs, read_vxrm,write_vxrm,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VCSR] = { "vcsr", vs, read_vcsr,write_vcsr,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VL]   = { "vl",   vs, read_vl,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VTYPE]= { "vtype",vs, read_vtype,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VLENB]= { "vlenb",vs, read_vlenb,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
+[CSR_VSTART]   = { "vstart",   vs, read_vstart,  write_vstart },
+[CSR_VXSAT]= { "vxsat",vs, read_vxsat,   write_vxsat  },
+[CSR_VXRM] = { "vxrm", vs, read_vxrm,write_vxrm   },
+[CSR_VCSR] = { "vcsr", vs, read_vcsr,write_vcsr   },
+[CSR_VL]   = { "vl",   vs, read_vl},
+[CSR_VTYPE]= { "vtype",vs, read_vtype },
+[CSR_VLENB]= { "vlenb",vs, read_vlenb },
 /* User Timers and Counters */
 [CSR_CYCLE]= { "cycle",ctr,read_hpmcounter  },
 [CSR_INSTRET]  = { "instret",  ctr,read_hpmcounter  },
-- 
2.36.1




Re: [PATCH] target/riscv: Remove .min_priv_ver restriction from RVV CSRs

2023-02-07 Thread Frank Chang
I realized that I should also remove the privileged version check
in isa_edata_arr[], too.
I will send out v2 patch to fix it.

Regards,
Frank Chang

On Tue, Feb 7, 2023 at 4:43 PM  wrote:

> From: Frank Chang 
>
> The RVV specification does not require that the core needs to support
> the privileged specification v1.12.0 to support RVV, and there is no
> dependency from ISA level. This commit removes the restriction.
>
> Signed-off-by: Frank Chang 
> ---
>  target/riscv/csr.c | 21 +++--
>  1 file changed, 7 insertions(+), 14 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index fa17d7770c4..1b0a0c1693c 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -3980,20 +3980,13 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
>  [CSR_FRM]  = { "frm",  fs, read_frm, write_frm},
>  [CSR_FCSR] = { "fcsr", fs, read_fcsr,write_fcsr   },
>  /* Vector CSRs */
> -[CSR_VSTART]   = { "vstart",   vs, read_vstart,  write_vstart,
> -   .min_priv_ver = PRIV_VERSION_1_12_0},
> -[CSR_VXSAT]= { "vxsat",vs, read_vxsat,   write_vxsat,
> -   .min_priv_ver = PRIV_VERSION_1_12_0},
> -[CSR_VXRM] = { "vxrm", vs, read_vxrm,write_vxrm,
> -   .min_priv_ver = PRIV_VERSION_1_12_0},
> -[CSR_VCSR] = { "vcsr", vs, read_vcsr,write_vcsr,
> -   .min_priv_ver = PRIV_VERSION_1_12_0},
> -[CSR_VL]   = { "vl",   vs, read_vl,
> -   .min_priv_ver = PRIV_VERSION_1_12_0},
> -[CSR_VTYPE]= { "vtype",vs, read_vtype,
> -   .min_priv_ver = PRIV_VERSION_1_12_0},
> -[CSR_VLENB]= { "vlenb",vs, read_vlenb,
> -   .min_priv_ver = PRIV_VERSION_1_12_0},
> +[CSR_VSTART]   = { "vstart",   vs, read_vstart,  write_vstart },
> +[CSR_VXSAT]= { "vxsat",vs, read_vxsat,   write_vxsat  },
> +[CSR_VXRM] = { "vxrm", vs, read_vxrm,write_vxrm   },
> +[CSR_VCSR] = { "vcsr", vs, read_vcsr,write_vcsr   },
> +[CSR_VL]   = { "vl",   vs, read_vl},
> +[CSR_VTYPE]= { "vtype",vs, read_vtype },
> +[CSR_VLENB]= { "vlenb",vs, read_vlenb },
>  /* User Timers and Counters */
>  [CSR_CYCLE]= { "cycle",ctr,read_hpmcounter  },
>  [CSR_INSTRET]  = { "instret",  ctr,read_hpmcounter  },
> --
> 2.25.1
>
>


Re: [PATCH 04/10] hw/riscv/virt: virt-acpi-build.c: Add basic ACPI tables

2023-02-07 Thread Sunil V L
On Wed, Feb 08, 2023 at 09:06:48AM +0800, Bin Meng wrote:
> On Wed, Feb 8, 2023 at 2:15 AM Sunil V L  wrote:
> >
> > On Wed, Feb 08, 2023 at 12:10:55AM +0800, Bin Meng wrote:
> > > On Mon, Feb 6, 2023 at 9:24 PM Sunil V L  wrote:
> > > >
> > > > On Mon, Feb 06, 2023 at 06:17:50PM +0800, Bin Meng wrote:
> > > > > On Thu, Feb 2, 2023 at 12:54 PM Sunil V L  
> > > > > wrote:
> > > > > >
> > > > > > Add few basic ACPI tables and DSDT with few devices in a
> > > > > > new file virt-acpi-build.c.
> > > > > >
> > > > > > These are mostly leveraged from arm64.
> > > > >
> > > > > There are lots of same ACPI codes existing in x86/arm/riscv. I believe
> > > > > some refactoring work is needed before ACPI support fully lands on
> > > > > RISC-V.
> > > > > For example, we can extract the common part among x86/arm/riscv into a
> > > > > separate file, like hw/acpi/acpi-build.c?
> > > > >
> > > >
> > > > While it appears like there is same code in multiple places, those
> > > > functions take architecture specific MachineState parameter. Some tables
> > > > like MADT though with same name, will have different contents for
> > > > different architectures.
> > > >
> > > > Only one function which Daniel also pointed is the wrapper
> > > > acpi_align_size() which can be made common. I am not
> > > > sure whether it is worth of refactoring.
> > > >
> > >
> > > It's more than that. For example,
> > >
> > > With acpi_dsdt_add_cpus(), the only meaningful parameter is the number
> > > of cpus, so there is no need to pass the architecture specific
> > > MachineState as the parameter.
> > >
> > I would not make this function common. The reason is, an architecture may
> > choose to attach different ACPI objects under the CPU node.
> >
> 
> Is it possible to insert architect specific CPU nodes after this
> common API builds the basic CPU node in DSDT?
> 
No. They need to be added in the same loop. Otherwise, it will cause
issues to the AML interpreter.

> > > Likewise, acpi_dsdt_add_fw_cfg() can be made generic as well.
> > >
> > The issue is, these things are not exactly common across all architectures.
> > x86 has bit different data in these objects. While today it appears they
> > are same for riscv and arm, in future things may change for an architecture.
> > It doesn't look like it is a standard practice to build files under
> > hw/acpi for specific architectures. Hence, IMO, it is better to keep these
> > things in architecture specific folders to allow them to do differently in
> > future.
> >
> 
> It looks like hw/acpi/ghes.c is for Arm from MAINTAINERS.
> 
APEI is a standard feature but it is up to the architecture to use it or
not. Probably, it is maintained by ARM since they adopted first. But if
you look at hw/acpi/meson.build, it is not architecture specific.

> > But I look forward for the feedback from other architecture maintainers on
> > this topic. My experience in qemu is very limited. So, I need help from
> > experts.
> 
> + Michael and Igor
> 
Thanks!
Sunil



[PATCH v13 2/2] vhost-vdpa: add support for vIOMMU

2023-02-07 Thread Cindy Lu
1.Add support for vIOMMU.
Add the new function to deal with iommu MR.
- during iommu_region_add register a specific IOMMU notifier,
  and store all notifiers in a list.
- during iommu_region_del, compare and delete the IOMMU notifier from the list
- since the SVQ not support iommu yet, add the check for IOMMU
  in vhost_vdpa_dev_start, if the SVQ and IOMMU enable at the same time
  function will return fail.

2.Skip the check in vhost_vdpa_listener_skipped_section() while
MR is IOMMU, Move this check to  vhost_vdpa_iommu_map_notify()

Verified in vp_vdpa and vdpa_sim_net driver

Signed-off-by: Cindy Lu 
---
 hw/virtio/vhost-vdpa.c | 173 ++---
 include/hw/virtio/vhost-vdpa.h |  11 +++
 2 files changed, 173 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 542e003101..46f676ab71 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -26,6 +26,7 @@
 #include "cpu.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "hw/virtio/virtio-access.h"
 
 /*
  * Return one past the end of the end of section. Be careful with uint64_t
@@ -60,15 +61,22 @@ static bool 
vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
  iova_min, section->offset_within_address_space);
 return true;
 }
+/*
+ * While using vIOMMU, Sometimes the section will be larger than iova_max
+ * but the memory that  actually mapping is smaller, So skip the check
+ * here. Will add the check in vhost_vdpa_iommu_map_notify,
+ *There is the real size that maps to the kernel
+ */
 
-llend = vhost_vdpa_section_end(section);
-if (int128_gt(llend, int128_make64(iova_max))) {
-error_report("RAM section out of device range (max=0x%" PRIx64
- ", end addr=0x%" PRIx64 ")",
- iova_max, int128_get64(llend));
-return true;
+if (!memory_region_is_iommu(section->mr)) {
+llend = vhost_vdpa_section_end(section);
+if (int128_gt(llend, int128_make64(iova_max))) {
+error_report("RAM section out of device range (max=0x%" PRIx64
+ ", end addr=0x%" PRIx64 ")",
+ iova_max, int128_get64(llend));
+return true;
+}
 }
-
 return false;
 }
 
@@ -185,6 +193,118 @@ static void vhost_vdpa_listener_commit(MemoryListener 
*listener)
 v->iotlb_batch_begin_sent = false;
 }
 
+static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
+
+hwaddr iova = iotlb->iova + iommu->iommu_offset;
+struct vhost_vdpa *v = iommu->dev;
+void *vaddr;
+int ret;
+Int128 llend;
+
+if (iotlb->target_as != _space_memory) {
+error_report("Wrong target AS \"%s\", only system memory is allowed",
+ iotlb->target_as->name ? iotlb->target_as->name : "none");
+return;
+}
+RCU_READ_LOCK_GUARD();
+/* check if RAM section out of device range */
+llend = int128_add(int128_makes64(iotlb->addr_mask), int128_makes64(iova));
+if (int128_gt(llend, int128_make64(v->iova_range.last))) {
+error_report("RAM section out of device range (max=0x%" PRIx64
+ ", end addr=0x%" PRIx64 ")",
+ v->iova_range.last, int128_get64(llend));
+return;
+}
+
+vhost_vdpa_iotlb_batch_begin_once(v);
+
+if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+bool read_only;
+
+if (!memory_get_xlat_addr(iotlb, , NULL, _only, NULL)) {
+return;
+}
+
+ret = vhost_vdpa_dma_map(v, VHOST_VDPA_GUEST_PA_ASID, iova,
+ iotlb->addr_mask + 1, vaddr, read_only);
+if (ret) {
+error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
+ "0x%" HWADDR_PRIx ", %p) = %d (%m)",
+ v, iova, iotlb->addr_mask + 1, vaddr, ret);
+}
+} else {
+ret = vhost_vdpa_dma_unmap(v, VHOST_VDPA_GUEST_PA_ASID, iova,
+   iotlb->addr_mask + 1);
+if (ret) {
+error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
+ "0x%" HWADDR_PRIx ") = %d (%m)",
+ v, iova, iotlb->addr_mask + 1, ret);
+}
+}
+}
+
+static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+
+struct vdpa_iommu *iommu;
+Int128 end;
+int iommu_idx;
+IOMMUMemoryRegion *iommu_mr;
+int ret;
+
+iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+
+iommu = g_malloc0(sizeof(*iommu));
+end = int128_add(int128_make64(section->offset_within_region),
+ section->size);
+end = 

[PATCH v13 0/2] vhost-vdpa: add support for vIOMMU

2023-02-07 Thread Cindy Lu
These patches are to support vIOMMU in vdpa device

changes in V3
1. Move function vfio_get_xlat_addr to memory.c
2. Use the existing memory listener, while the MR is
iommu MR then call the function iommu_region_add/
iommu_region_del

changes in V4
1.make the comments in vfio_get_xlat_addr more general

changes in V5
1. Address the comments in the last version
2. Add a new arg in the function vfio_get_xlat_addr, which shows whether
the memory is backed by a discard manager. So the device can have its
own warning.

changes in V6
move the error_report for the unpopulated discard back to
memeory_get_xlat_addr

changes in V7
organize the error massage to avoid the duplicate information

changes in V8
Organize the code follow the comments in the last version

changes in V9
Organize the code follow the comments

changes in V10
Address the comments

changes in V11
Address the comments
fix the crash found in test

changes in V12
Address the comments, squash patch 1 into the next patch
improve the code style issue

changes in V13
fail to start if IOMMU and svq enable at same time
improve the code style issue

Cindy Lu (2):
  vhost: expose function vhost_dev_has_iommu()
  vhost-vdpa: add support for vIOMMU

 hw/virtio/vhost-vdpa.c | 173 ++---
 hw/virtio/vhost.c  |   2 +-
 include/hw/virtio/vhost-vdpa.h |  11 +++
 include/hw/virtio/vhost.h  |   1 +
 4 files changed, 175 insertions(+), 12 deletions(-)

-- 
2.34.3




[PATCH v13 1/2] vhost: expose function vhost_dev_has_iommu()

2023-02-07 Thread Cindy Lu
To support vIOMMU in vdpa, need to exposed the function
vhost_dev_has_iommu, vdpa will use this function to check
if vIOMMU enable.

Signed-off-by: Cindy Lu 
---
 hw/virtio/vhost.c | 2 +-
 include/hw/virtio/vhost.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index eb8c4c378c..9ff5516655 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -107,7 +107,7 @@ static void vhost_dev_sync_region(struct vhost_dev *dev,
 }
 }
 
-static bool vhost_dev_has_iommu(struct vhost_dev *dev)
+bool vhost_dev_has_iommu(struct vhost_dev *dev)
 {
 VirtIODevice *vdev = dev->vdev;
 
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a52f273347..f7f10c8fb7 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -336,4 +336,5 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
struct vhost_inflight *inflight);
 int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
struct vhost_inflight *inflight);
+bool vhost_dev_has_iommu(struct vhost_dev *dev);
 #endif
-- 
2.34.3




[PATCH v2 4/7] qapi/expr: add typing workaround for AbstractSet

2023-02-07 Thread John Snow
As part of attempting to unify the JSON types, I discovered that mypy
believes that `Mapping[str, ...].keys() & Set[str]` produces an
`AbstractSet[str]` and not a `Set[str]`.

As a result, mypy is unsure if the .pop() is safe.

Eh, fine, just wrap the expression in a set() constructor to force it to
be a mutable type.

Signed-off-by: John Snow 
---
 scripts/qapi/expr.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 338c9ea4131..95a25758fed 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -610,8 +610,8 @@ def check_expr(pexpr: ParsedExpression) -> None:
 if 'include' in expr:
 return
 
-metas = expr.keys() & {'enum', 'struct', 'union', 'alternate',
-   'command', 'event'}
+metas = set(expr.keys() & {
+'enum', 'struct', 'union', 'alternate', 'command', 'event'})
 if len(metas) != 1:
 raise QAPISemError(
 info,
-- 
2.39.0




[PATCH v2 1/7] qapi/expr: Split check_expr out from check_exprs

2023-02-07 Thread John Snow
Primarily, this reduces a nesting level of a particularly long
block. It's mostly code movement, but a new docstring is created.

It also has the effect of creating a fairly convenient "catch point" in
check_exprs for exception handling without making the nesting level even
worse.

Signed-off-by: John Snow 

---

This patch was originally written as part of my effort to factor out
QAPISourceInfo from this file by having expr.py raise a simple
exception, then catch and wrap it at the higher level.

This series doesn't do that anymore, but reducing the nesting level
still seemed subjectively nice. It's not crucial.

Signed-off-by: John Snow 
---
 scripts/qapi/expr.py | 179 +++
 1 file changed, 95 insertions(+), 84 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 5a1782b57ea..d01543006d8 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -595,6 +595,99 @@ def check_event(expr: _JSONObject, info: QAPISourceInfo) 
-> None:
 check_type(args, info, "'data'", allow_dict=not boxed)
 
 
+def check_expr(expr_elem: _JSONObject) -> None:
+"""
+Validate and normalize a parsed QAPI schema expression.
+
+:param expr_elem: The parsed expression to normalize and validate.
+
+:raise QAPISemError: When this expression fails validation.
+:return: None, ``expr`` is normalized in-place as needed.
+"""
+# Expression
+assert isinstance(expr_elem['expr'], dict)
+for key in expr_elem['expr'].keys():
+assert isinstance(key, str)
+expr: _JSONObject = expr_elem['expr']
+
+# QAPISourceInfo
+assert isinstance(expr_elem['info'], QAPISourceInfo)
+info: QAPISourceInfo = expr_elem['info']
+
+# Optional[QAPIDoc]
+tmp = expr_elem.get('doc')
+assert tmp is None or isinstance(tmp, QAPIDoc)
+doc: Optional[QAPIDoc] = tmp
+
+if 'include' in expr:
+return
+
+metas = expr.keys() & {'enum', 'struct', 'union', 'alternate',
+   'command', 'event'}
+if len(metas) != 1:
+raise QAPISemError(
+info,
+"expression must have exactly one key"
+" 'enum', 'struct', 'union', 'alternate',"
+" 'command', 'event'")
+meta = metas.pop()
+
+check_name_is_str(expr[meta], info, "'%s'" % meta)
+name = cast(str, expr[meta])
+info.set_defn(meta, name)
+check_defn_name_str(name, info, meta)
+
+if doc:
+if doc.symbol != name:
+raise QAPISemError(
+info, "documentation comment is for '%s'" % doc.symbol)
+doc.check_expr(expr)
+elif info.pragma.doc_required:
+raise QAPISemError(info,
+   "documentation comment required")
+
+if meta == 'enum':
+check_keys(expr, info, meta,
+   ['enum', 'data'], ['if', 'features', 'prefix'])
+check_enum(expr, info)
+elif meta == 'union':
+check_keys(expr, info, meta,
+   ['union', 'base', 'discriminator', 'data'],
+   ['if', 'features'])
+normalize_members(expr.get('base'))
+normalize_members(expr['data'])
+check_union(expr, info)
+elif meta == 'alternate':
+check_keys(expr, info, meta,
+   ['alternate', 'data'], ['if', 'features'])
+normalize_members(expr['data'])
+check_alternate(expr, info)
+elif meta == 'struct':
+check_keys(expr, info, meta,
+   ['struct', 'data'], ['base', 'if', 'features'])
+normalize_members(expr['data'])
+check_struct(expr, info)
+elif meta == 'command':
+check_keys(expr, info, meta,
+   ['command'],
+   ['data', 'returns', 'boxed', 'if', 'features',
+'gen', 'success-response', 'allow-oob',
+'allow-preconfig', 'coroutine'])
+normalize_members(expr.get('data'))
+check_command(expr, info)
+elif meta == 'event':
+check_keys(expr, info, meta,
+   ['event'], ['data', 'boxed', 'if', 'features'])
+normalize_members(expr.get('data'))
+check_event(expr, info)
+else:
+assert False, 'unexpected meta type'
+
+check_if(expr, info, meta)
+check_features(expr.get('features'), info)
+check_flags(expr, info)
+
+
 def check_exprs(exprs: List[_JSONObject]) -> List[_JSONObject]:
 """
 Validate and normalize a list of parsed QAPI schema expressions.
@@ -607,88 +700,6 @@ def check_exprs(exprs: List[_JSONObject]) -> 
List[_JSONObject]:
 :raise QAPISemError: When any expression fails validation.
 :return: The same list of expressions (now modified).
 """
-for expr_elem in exprs:
-# Expression
-assert isinstance(expr_elem['expr'], dict)
-for key in expr_elem['expr'].keys():
-assert isinstance(key, str)
-expr: _JSONObject = expr_elem['expr']
-
-# QAPISourceInfo

[PATCH v2 3/7] qapi/expr: Use TopLevelExpr where appropriate

2023-02-07 Thread John Snow
Remove most usages of _JSONObject with a more semantically meaningful
alias. Note that this is only a semantic alias; the distinction is not
enforced by the type system. This is merely a benefit for the human:
instead of check_xyz functions operating on a representation of some
"JSON Object", we can document them as operating on QAPI's Top Level
Expressions directly; it's more semantically meaningful.

Signed-off-by: John Snow 
---
 scripts/qapi/expr.py | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 293f830fe9d..338c9ea4131 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -44,7 +44,7 @@
 
 from .common import c_name
 from .error import QAPISemError
-from .parser import ParsedExpression
+from .parser import ParsedExpression, TopLevelExpr
 from .source import QAPISourceInfo
 
 
@@ -229,11 +229,11 @@ def pprint(elems: Iterable[str]) -> str:
pprint(unknown), pprint(allowed)))
 
 
-def check_flags(expr: _JSONObject, info: QAPISourceInfo) -> None:
+def check_flags(expr: TopLevelExpr, info: QAPISourceInfo) -> None:
 """
 Ensure flag members (if present) have valid values.
 
-:param expr: The expression to validate.
+:param expr: The `TopLevelExpr` to validate.
 :param info: QAPI schema source file information.
 
 :raise QAPISemError:
@@ -447,9 +447,9 @@ def check_features(features: Optional[object],
 check_if(feat, info, source)
 
 
-def check_enum(expr: _JSONObject, info: QAPISourceInfo) -> None:
+def check_enum(expr: TopLevelExpr, info: QAPISourceInfo) -> None:
 """
-Normalize and validate this expression as an ``enum`` definition.
+Normalize and validate this `TopLevelExpr` as an ``enum`` definition.
 
 :param expr: The expression to validate.
 :param info: QAPI schema source file information.
@@ -486,9 +486,9 @@ def check_enum(expr: _JSONObject, info: QAPISourceInfo) -> 
None:
 check_features(member.get('features'), info)
 
 
-def check_struct(expr: _JSONObject, info: QAPISourceInfo) -> None:
+def check_struct(expr: TopLevelExpr, info: QAPISourceInfo) -> None:
 """
-Normalize and validate this expression as a ``struct`` definition.
+Normalize and validate this `TopLevelExpr` as a ``struct`` definition.
 
 :param expr: The expression to validate.
 :param info: QAPI schema source file information.
@@ -503,9 +503,9 @@ def check_struct(expr: _JSONObject, info: QAPISourceInfo) 
-> None:
 check_type(expr.get('base'), info, "'base'")
 
 
-def check_union(expr: _JSONObject, info: QAPISourceInfo) -> None:
+def check_union(expr: TopLevelExpr, info: QAPISourceInfo) -> None:
 """
-Normalize and validate this expression as a ``union`` definition.
+Normalize and validate this `TopLevelExpr` as a ``union`` definition.
 
 :param expr: The expression to validate.
 :param info: QAPI schema source file information.
@@ -531,9 +531,9 @@ def check_union(expr: _JSONObject, info: QAPISourceInfo) -> 
None:
 check_type(value['type'], info, source, allow_array=not base)
 
 
-def check_alternate(expr: _JSONObject, info: QAPISourceInfo) -> None:
+def check_alternate(expr: TopLevelExpr, info: QAPISourceInfo) -> None:
 """
-Normalize and validate this expression as an ``alternate`` definition.
+Normalize and validate this `TopLevelExpr` as an ``alternate`` definition.
 
 :param expr: The expression to validate.
 :param info: QAPI schema source file information.
@@ -557,9 +557,9 @@ def check_alternate(expr: _JSONObject, info: 
QAPISourceInfo) -> None:
 check_type(value['type'], info, source, allow_array=True)
 
 
-def check_command(expr: _JSONObject, info: QAPISourceInfo) -> None:
+def check_command(expr: TopLevelExpr, info: QAPISourceInfo) -> None:
 """
-Normalize and validate this expression as a ``command`` definition.
+Normalize and validate this `TopLevelExpr` as a ``command`` definition.
 
 :param expr: The expression to validate.
 :param info: QAPI schema source file information.
@@ -577,9 +577,9 @@ def check_command(expr: _JSONObject, info: QAPISourceInfo) 
-> None:
 check_type(rets, info, "'returns'", allow_array=True)
 
 
-def check_event(expr: _JSONObject, info: QAPISourceInfo) -> None:
+def check_event(expr: TopLevelExpr, info: QAPISourceInfo) -> None:
 """
-Normalize and validate this expression as an ``event`` definition.
+Normalize and validate this `TopLevelExpr` as an ``event`` definition.
 
 :param expr: The expression to validate.
 :param info: QAPI schema source file information.
-- 
2.39.0




[PATCH v2 0/7] qapi: static typing conversion, pt5c

2023-02-07 Thread John Snow
This is part five (c), and focuses on sharing strict types between
parser.py and expr.py.

gitlab: https://gitlab.com/jsnow/qemu/-/commits/python-qapi-cleanup-pt5c

Every commit should pass with:
 - `isort -c qapi/`
 - `flake8 qapi/`
 - `pylint --rcfile=qapi/pylintrc qapi/`
 - `mypy --config-file=qapi/mypy.ini qapi/`

Some notes on this series:

Patches 2 and 3 are almost entirely superseded by patch 5, but I wasn't
as confident that Markus would like patch 5, so these patches aren't
squashed quite as tightly as they could be -- I recommend peeking ahead
at the cover letters before reviewing the actual patch diffs.

By the end of this series, the only JSON-y types we have left are:

(A) QAPIExpression,
(B) JSONValue,
(C) _ExprValue.

The argument I'm making here is that QAPIExpression and JSONValue are
distinct enough to warrant having both types (for now, at least); and
that _ExprValue is specialized enough to also warrant its inclusion.

(Brutal honesty: my attempts at unifying this even further had even more
hacks and unsatisfying conclusions, and fully unifying these types
should probably wait until we're allowed to rely on some fairly modern
Python versions.)

John Snow (7):
  qapi/expr: Split check_expr out from check_exprs
  qapi/parser.py: add ParsedExpression type
  qapi/expr: Use TopLevelExpr where appropriate
  qapi/expr: add typing workaround for AbstractSet
  qapi/parser: [RFC] add QAPIExpression
  qapi: remove _JSONObject
  qapi: remove JSON value FIXME

 scripts/qapi/expr.py   | 282 +++--
 scripts/qapi/parser.py |  51 +---
 scripts/qapi/schema.py | 105 +++
 3 files changed, 218 insertions(+), 220 deletions(-)

-- 
2.39.0





[PATCH v2 2/7] qapi/parser.py: add ParsedExpression type

2023-02-07 Thread John Snow
This is an immutable, named, typed tuple. It's arguably nicer than
arbitrary dicts for passing data around when using strict typing.

This patch turns parser.exprs into a list of ParsedExpressions instead,
and adjusts expr.py to match.

This allows the types we specify in parser.py to be "remembered" all the
way through expr.py and into schema.py. Several assertions around
packing and unpacking this data can be removed as a result.

Signed-off-by: John Snow 
---
 scripts/qapi/expr.py   | 29 +
 scripts/qapi/parser.py | 29 ++---
 scripts/qapi/schema.py |  6 +++---
 3 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index d01543006d8..293f830fe9d 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -44,7 +44,7 @@
 
 from .common import c_name
 from .error import QAPISemError
-from .parser import QAPIDoc
+from .parser import ParsedExpression
 from .source import QAPISourceInfo
 
 
@@ -595,29 +595,17 @@ def check_event(expr: _JSONObject, info: QAPISourceInfo) 
-> None:
 check_type(args, info, "'data'", allow_dict=not boxed)
 
 
-def check_expr(expr_elem: _JSONObject) -> None:
+def check_expr(pexpr: ParsedExpression) -> None:
 """
-Validate and normalize a parsed QAPI schema expression.
+Validate and normalize a `ParsedExpression`.
 
-:param expr_elem: The parsed expression to normalize and validate.
+:param pexpr: The parsed expression to normalize and validate.
 
 :raise QAPISemError: When this expression fails validation.
-:return: None, ``expr`` is normalized in-place as needed.
+:return: None, ``pexpr`` is normalized in-place as needed.
 """
-# Expression
-assert isinstance(expr_elem['expr'], dict)
-for key in expr_elem['expr'].keys():
-assert isinstance(key, str)
-expr: _JSONObject = expr_elem['expr']
-
-# QAPISourceInfo
-assert isinstance(expr_elem['info'], QAPISourceInfo)
-info: QAPISourceInfo = expr_elem['info']
-
-# Optional[QAPIDoc]
-tmp = expr_elem.get('doc')
-assert tmp is None or isinstance(tmp, QAPIDoc)
-doc: Optional[QAPIDoc] = tmp
+expr = pexpr.expr
+info = pexpr.info
 
 if 'include' in expr:
 return
@@ -637,6 +625,7 @@ def check_expr(expr_elem: _JSONObject) -> None:
 info.set_defn(meta, name)
 check_defn_name_str(name, info, meta)
 
+doc = pexpr.doc
 if doc:
 if doc.symbol != name:
 raise QAPISemError(
@@ -688,7 +677,7 @@ def check_expr(expr_elem: _JSONObject) -> None:
 check_flags(expr, info)
 
 
-def check_exprs(exprs: List[_JSONObject]) -> List[_JSONObject]:
+def check_exprs(exprs: List[ParsedExpression]) -> List[ParsedExpression]:
 """
 Validate and normalize a list of parsed QAPI schema expressions.
 
diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 1b006cdc133..f897dffbfd4 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -21,6 +21,7 @@
 TYPE_CHECKING,
 Dict,
 List,
+NamedTuple,
 Optional,
 Set,
 Union,
@@ -48,6 +49,12 @@
 # several modules.
 
 
+class ParsedExpression(NamedTuple):
+expr: TopLevelExpr
+info: QAPISourceInfo
+doc: Optional['QAPIDoc']
+
+
 class QAPIParseError(QAPISourceError):
 """Error class for all QAPI schema parsing errors."""
 def __init__(self, parser: 'QAPISchemaParser', msg: str):
@@ -100,7 +107,7 @@ def __init__(self,
 self.line_pos = 0
 
 # Parser output:
-self.exprs: List[Dict[str, object]] = []
+self.exprs: List[ParsedExpression] = []
 self.docs: List[QAPIDoc] = []
 
 # Showtime!
@@ -147,8 +154,7 @@ def _parse(self) -> None:
"value of 'include' must be a string")
 incl_fname = os.path.join(os.path.dirname(self._fname),
   include)
-self.exprs.append({'expr': {'include': incl_fname},
-   'info': info})
+self._add_expr(OrderedDict({'include': incl_fname}), info)
 exprs_include = self._include(include, info, incl_fname,
   self._included)
 if exprs_include:
@@ -165,17 +171,18 @@ def _parse(self) -> None:
 for name, value in pragma.items():
 self._pragma(name, value, info)
 else:
-expr_elem = {'expr': expr,
- 'info': info}
-if cur_doc:
-if not cur_doc.symbol:
-raise QAPISemError(
-cur_doc.info, "definition documentation required")
-expr_elem['doc'] = cur_doc
-self.exprs.append(expr_elem)
+if cur_doc and not cur_doc.symbol:
+raise QAPISemError(
+cur_doc.info, 

[PATCH v2 7/7] qapi: remove JSON value FIXME

2023-02-07 Thread John Snow
With the two major JSON-ish type hierarchies clarified for distinct
purposes; QAPIExpression for parsed expressions and JSONValue for
introspection data, remove this FIXME as no longer an action item.

In theory, it may be possible to define a completely agnostic
one-size-fits-all JSON type hierarchy that any other user could borrow -
in practice, it's tough to wrangle the differences between invariant,
covariant and contravariant types: input and output parameters demand
different properties of such a structure. As such, it's simply more
trouble than it's worth.

So, declare this "done enough for now".

Signed-off-by: John Snow 
---
 scripts/qapi/parser.py | 4 
 1 file changed, 4 deletions(-)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 315660e8671..556092f37b1 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -42,10 +42,6 @@
 _ExprValue = Union[List[object], Dict[str, object], str, bool]
 
 
-# FIXME: Consolidate and centralize definitions for _ExprValue and
-# JSONValue; currently scattered across several modules.
-
-
 # 3.6 workaround: can be removed when Python 3.7+ is our required version.
 if TYPE_CHECKING:
 _UserDict = UserDict[str, object]
-- 
2.39.0




[PATCH] xen/pt: fix igd passthrough for pc machine with xen accelerator

2023-02-07 Thread Chuck Zmudzinski
Commit 998250e97661 ("xen, gfx passthrough: register host bridge specific
to passthrough") uses the igd-passthrough-i440FX pci host device with
the xenfv machine type and igd-passthru=on, but using it for the pc
machine type, xen accelerator, and igd-passtru=on was omitted from that
commit.

The igd-passthru-i440FX pci host device is also needed for guests
configured with the pc machine type, the xen accelerator, and
igd-passthru=on. Specifically, tests show that not using the igd-specific
pci host device with the Intel igd passed through to the guest results
in slower startup performance and reduced resolution of the display
during startup. This patch fixes this issue.

To simplify the logic that is needed to support both the --enable-xen
and the --disable-xen configure options, introduce the boolean symbol
pc_xen_igd_gfx_pt_enabled() whose value is set appropriately in the
sysemu/xen.h header file as the test to determine whether or not
to use the igd-passthrough-i440FX pci host device instead of the
normal i440FX pci host device.

Fixes: 998250e97661 ("xen, gfx passthrough: register host bridge specific to 
passthrough")
Signed-off-by: Chuck Zmudzinski 
---
This patch is intended to replace or complement a recently proposed
patch that modifies slot_reserved_mask for the xenfv machine with
igd-passthru=on in order to fix the problem of Qemu not reserving slot 2
for the Intel IGD for the xenfv machine type. This patch provides a
simple way to improve Qemu support for the Intel IGD with the xen
accelerator without needing to change how slot_reserved_mask functions.

For reference, the latest version of the patch to fix the xenfv machine
using slot_reserved_mask is at:

https://lore.kernel.org/qemu-devel/b1b4a21fe9a600b1322742dda55a40e9961daa57.1674346505.git.brchu...@aol.com/

Reason for introducing the new boolean pc_xen_igd_gfx_pt_enabled():

It is also possible to use xen_igd_gfx_pt_enabled() directly to check
if the igd-passthru-i440FX pci host device is needed in this patch,
but in that case it would be necessary to implement it in
accel/stubs/xen-stub.c like this:

bool xen_igd_gfx_pt_enabled(void)
{
return false;
}

to cover the case when Qemu is configured with --disable-xen. I thought
it was simpler to introduce the same boolean prefixed with pc_ and
set it to 0 when --disable-xen is the configure option, and that explains
why the proposed patch introduces pc_xen_igd_gfx_pt_enabled() instead of
using xen_igd_gfx_pt_enabled() directly.

Another reason to use pc_xen_igd_gfx_pt_enabled() is to distinguish it
from xen_igd_gfx_pt_enabled() in hw/i386/pc_piix.c, because the use of
xen_igd_gfx_pt_enabled() is guarded by CONFIG_XEN but this patch needs
to place the boolean in a position that is not guarded by CONFIG_XEN.
This approach will simplify any future effort to move the code in
pc_piix.c that is not guarded by CONFIG_XEN to a xen-specific file.

 hw/i386/pc_piix.c| 2 ++
 include/sysemu/xen.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index df64dd8dcc..fd5b9ae1eb 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -433,6 +433,8 @@ static void pc_xen_hvm_init(MachineState *machine)
 compat(machine); \
 } \
 pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
+ pc_xen_igd_gfx_pt_enabled() ? \
+ TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE : \
  TYPE_I440FX_PCI_DEVICE); \
 } \
 DEFINE_PC_MACHINE(suffix, name, pc_init_##suffix, optionfn)
diff --git a/include/sysemu/xen.h b/include/sysemu/xen.h
index 0ca25697e4..99ae41e619 100644
--- a/include/sysemu/xen.h
+++ b/include/sysemu/xen.h
@@ -23,6 +23,7 @@
 extern bool xen_allowed;
 
 #define xen_enabled()   (xen_allowed)
+#define pc_xen_igd_gfx_pt_enabled()xen_igd_gfx_pt_enabled()
 
 #ifndef CONFIG_USER_ONLY
 void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length);
@@ -33,6 +34,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
 #else /* !CONFIG_XEN_IS_POSSIBLE */
 
 #define xen_enabled() 0
+#define pc_xen_igd_gfx_pt_enabled() 0
 #ifndef CONFIG_USER_ONLY
 static inline void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length)
 {
-- 
2.39.1




Re: [PATCH] target/riscv: Remove .min_priv_ver restriction from RVV CSRs

2023-02-07 Thread LIU Zhiwei



On 2023/2/7 16:43, frank.ch...@sifive.com wrote:

From: Frank Chang 

The RVV specification does not require that the core needs to support
the privileged specification v1.12.0 to support RVV, and there is no
dependency from ISA level. This commit removes the restriction.

Signed-off-by: Frank Chang 
---
  target/riscv/csr.c | 21 +++--
  1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index fa17d7770c4..1b0a0c1693c 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3980,20 +3980,13 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
  [CSR_FRM]  = { "frm",  fs, read_frm, write_frm},
  [CSR_FCSR] = { "fcsr", fs, read_fcsr,write_fcsr   },
  /* Vector CSRs */
-[CSR_VSTART]   = { "vstart",   vs, read_vstart,  write_vstart,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VXSAT]= { "vxsat",vs, read_vxsat,   write_vxsat,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VXRM] = { "vxrm", vs, read_vxrm,write_vxrm,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VCSR] = { "vcsr", vs, read_vcsr,write_vcsr,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VL]   = { "vl",   vs, read_vl,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VTYPE]= { "vtype",vs, read_vtype,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_VLENB]= { "vlenb",vs, read_vlenb,
-   .min_priv_ver = PRIV_VERSION_1_12_0},
+[CSR_VSTART]   = { "vstart",   vs, read_vstart,  write_vstart },
+[CSR_VXSAT]= { "vxsat",vs, read_vxsat,   write_vxsat  },
+[CSR_VXRM] = { "vxrm", vs, read_vxrm,write_vxrm   },
+[CSR_VCSR] = { "vcsr", vs, read_vcsr,write_vcsr   },
+[CSR_VL]   = { "vl",   vs, read_vl},
+[CSR_VTYPE]= { "vtype",vs, read_vtype },
+[CSR_VLENB]= { "vlenb",vs, read_vlenb },
  /* User Timers and Counters */
  [CSR_CYCLE]= { "cycle",ctr,read_hpmcounter  },
  [CSR_INSTRET]  = { "instret",  ctr,read_hpmcounter  },


Reviewed-by: LIU Zhiwei 

Zhiwei




Re: [PATCH 04/10] hw/riscv/virt: virt-acpi-build.c: Add basic ACPI tables

2023-02-07 Thread Bin Meng
On Wed, Feb 8, 2023 at 2:15 AM Sunil V L  wrote:
>
> On Wed, Feb 08, 2023 at 12:10:55AM +0800, Bin Meng wrote:
> > On Mon, Feb 6, 2023 at 9:24 PM Sunil V L  wrote:
> > >
> > > On Mon, Feb 06, 2023 at 06:17:50PM +0800, Bin Meng wrote:
> > > > On Thu, Feb 2, 2023 at 12:54 PM Sunil V L  
> > > > wrote:
> > > > >
> > > > > Add few basic ACPI tables and DSDT with few devices in a
> > > > > new file virt-acpi-build.c.
> > > > >
> > > > > These are mostly leveraged from arm64.
> > > >
> > > > There are lots of same ACPI codes existing in x86/arm/riscv. I believe
> > > > some refactoring work is needed before ACPI support fully lands on
> > > > RISC-V.
> > > > For example, we can extract the common part among x86/arm/riscv into a
> > > > separate file, like hw/acpi/acpi-build.c?
> > > >
> > >
> > > While it appears like there is same code in multiple places, those
> > > functions take architecture specific MachineState parameter. Some tables
> > > like MADT though with same name, will have different contents for
> > > different architectures.
> > >
> > > Only one function which Daniel also pointed is the wrapper
> > > acpi_align_size() which can be made common. I am not
> > > sure whether it is worth of refactoring.
> > >
> >
> > It's more than that. For example,
> >
> > With acpi_dsdt_add_cpus(), the only meaningful parameter is the number
> > of cpus, so there is no need to pass the architecture specific
> > MachineState as the parameter.
> >
> I would not make this function common. The reason is, an architecture may
> choose to attach different ACPI objects under the CPU node.
>

Is it possible to insert architect specific CPU nodes after this
common API builds the basic CPU node in DSDT?

> > Likewise, acpi_dsdt_add_fw_cfg() can be made generic as well.
> >
> The issue is, these things are not exactly common across all architectures.
> x86 has bit different data in these objects. While today it appears they
> are same for riscv and arm, in future things may change for an architecture.
> It doesn't look like it is a standard practice to build files under
> hw/acpi for specific architectures. Hence, IMO, it is better to keep these
> things in architecture specific folders to allow them to do differently in
> future.
>

It looks like hw/acpi/ghes.c is for Arm from MAINTAINERS.

> But I look forward for the feedback from other architecture maintainers on
> this topic. My experience in qemu is very limited. So, I need help from
> experts.

+ Michael and Igor

Regards,
Bin



Re: [PATCH v2 07/10] hw/ide/piix: Require an ISABus only for user-created instances

2023-02-07 Thread BALATON Zoltan

On Wed, 8 Feb 2023, Bernhard Beschow wrote:

Am 7. Februar 2023 20:52:02 UTC schrieb Mark Cave-Ayland 
:

On 06/02/2023 23:40, Bernhard Beschow wrote:

Am 5. Februar 2023 22:32:03 UTC schrieb Mark Cave-Ayland 
:

On 05/02/2023 22:21, BALATON Zoltan wrote:


On Sun, 5 Feb 2023, Mark Cave-Ayland wrote:

On 26/01/2023 21:17, Bernhard Beschow wrote:

Internal instances now defer interrupt wiring to the caller which
decouples them from the ISABus. User-created devices still fish out the
ISABus from the QOM tree and the interrupt wiring remains in PIIX IDE.
The latter mechanism is considered a workaround and intended to be
removed once a deprecation period for user-created PIIX IDE devices is
over.

Signed-off-by: Bernhard Beschow 
---
   include/hw/ide/pci.h |  1 +
   hw/ide/piix.c    | 64 ++--
   hw/isa/piix.c    |  5 
   3 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 24c0b7a2dd..ee2c8781b7 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -54,6 +54,7 @@ struct PCIIDEState {
   MemoryRegion bmdma_bar;
   MemoryRegion cmd_bar[2];
   MemoryRegion data_bar[2];
+    bool user_created;
   };
     static inline IDEState *bmdma_active_if(BMDMAState *bmdma)
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 5980045db0..f0d95761ac 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -108,6 +108,13 @@ static void bmdma_setup_bar(PCIIDEState *d)
   }
   }
   +static void piix_ide_set_irq(void *opaque, int n, int level)
+{
+    PCIIDEState *d = opaque;
+
+    qemu_set_irq(d->isa_irqs[n], level);
+}
+
   static void piix_ide_reset(DeviceState *dev)
   {
   PCIIDEState *d = PCI_IDE(dev);
@@ -138,11 +145,18 @@ static void pci_piix_init_ports(PCIIDEState *d, ISABus 
*isa_bus)
   };
   int i;
   +    if (isa_bus) {
+    d->isa_irqs[0] = isa_bus->irqs[port_info[0].isairq];
+    d->isa_irqs[1] = isa_bus->irqs[port_info[1].isairq];
+    } else {
+    qdev_init_gpio_out(DEVICE(d), d->isa_irqs, 2);
+    }
+
   for (i = 0; i < 2; i++) {
   ide_bus_init(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
   ide_init_ioport(>bus[i], NULL, port_info[i].iobase,
   port_info[i].iobase2);
-    ide_init2(>bus[i], isa_bus->irqs[port_info[i].isairq]);
+    ide_init2(>bus[i], qdev_get_gpio_in(DEVICE(d), i));
     bmdma_init(>bus[i], >bmdma[i], d);
   d->bmdma[i].bus = >bus[i];
@@ -154,8 +168,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
   {
   PCIIDEState *d = PCI_IDE(dev);
   uint8_t *pci_conf = dev->config;
-    ISABus *isa_bus;
-    bool ambiguous;
+    ISABus *isa_bus = NULL;
     pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
   @@ -164,22 +177,36 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
     vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d);
   -    isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS, 
));
-    if (ambiguous) {
-    error_setg(errp,
-   "More than one ISA bus found while %s supports only one",
-   object_get_typename(OBJECT(dev)));
-    return;
-    }
-    if (!isa_bus) {
-    error_setg(errp, "No ISA bus found while %s requires one",
-   object_get_typename(OBJECT(dev)));
-    return;
+    if (d->user_created) {
+    bool ambiguous;
+
+    isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS,
+   ));
+
+    if (ambiguous) {
+    error_setg(errp,
+   "More than one ISA bus found while %s supports only 
one",
+   object_get_typename(OBJECT(dev)));
+    return;
+    }
+
+    if (!isa_bus) {
+    error_setg(errp, "No ISA bus found while %s requires one",
+   object_get_typename(OBJECT(dev)));
+    return;
+    }
   }
     pci_piix_init_ports(d, isa_bus);
   }
   +static void pci_piix_ide_init(Object *obj)
+{
+    DeviceState *dev = DEVICE(obj);
+
+    qdev_init_gpio_in(dev, piix_ide_set_irq, 2);
+}
+
   static void pci_piix_ide_exitfn(PCIDevice *dev)
   {
   PCIIDEState *d = PCI_IDE(dev);
@@ -191,6 +218,11 @@ static void pci_piix_ide_exitfn(PCIDevice *dev)
   }
   }
   +static Property piix_ide_properties[] = {
+    DEFINE_PROP_BOOL("user-created", PCIIDEState, user_created, true),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
   /* NOTE: for the PIIX3, the IRQs and IOports are hardcoded */
   static void piix3_ide_class_init(ObjectClass *klass, void *data)
   {
@@ -205,11 +237,13 @@ static void piix3_ide_class_init(ObjectClass *klass, void 
*data)
   k->class_id = PCI_CLASS_STORAGE_IDE;
   set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
   dc->hotpluggable = false;
+    device_class_set_props(dc, piix_ide_properties);
   }
     static const TypeInfo piix3_ide_info 

Re: [PATCH v2 07/10] hw/ide/piix: Require an ISABus only for user-created instances

2023-02-07 Thread Bernhard Beschow



Am 7. Februar 2023 20:52:02 UTC schrieb Mark Cave-Ayland 
:
>On 06/02/2023 23:40, Bernhard Beschow wrote:
>
>> Am 5. Februar 2023 22:32:03 UTC schrieb Mark Cave-Ayland 
>> :
>>> On 05/02/2023 22:21, BALATON Zoltan wrote:
>>> 
 On Sun, 5 Feb 2023, Mark Cave-Ayland wrote:
> On 26/01/2023 21:17, Bernhard Beschow wrote:
>> Internal instances now defer interrupt wiring to the caller which
>> decouples them from the ISABus. User-created devices still fish out the
>> ISABus from the QOM tree and the interrupt wiring remains in PIIX IDE.
>> The latter mechanism is considered a workaround and intended to be
>> removed once a deprecation period for user-created PIIX IDE devices is
>> over.
>> 
>> Signed-off-by: Bernhard Beschow 
>> ---
>>    include/hw/ide/pci.h |  1 +
>>    hw/ide/piix.c    | 64 ++--
>>    hw/isa/piix.c    |  5 
>>    3 files changed, 56 insertions(+), 14 deletions(-)
>> 
>> diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
>> index 24c0b7a2dd..ee2c8781b7 100644
>> --- a/include/hw/ide/pci.h
>> +++ b/include/hw/ide/pci.h
>> @@ -54,6 +54,7 @@ struct PCIIDEState {
>>    MemoryRegion bmdma_bar;
>>    MemoryRegion cmd_bar[2];
>>    MemoryRegion data_bar[2];
>> +    bool user_created;
>>    };
>>      static inline IDEState *bmdma_active_if(BMDMAState *bmdma)
>> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
>> index 5980045db0..f0d95761ac 100644
>> --- a/hw/ide/piix.c
>> +++ b/hw/ide/piix.c
>> @@ -108,6 +108,13 @@ static void bmdma_setup_bar(PCIIDEState *d)
>>    }
>>    }
>>    +static void piix_ide_set_irq(void *opaque, int n, int level)
>> +{
>> +    PCIIDEState *d = opaque;
>> +
>> +    qemu_set_irq(d->isa_irqs[n], level);
>> +}
>> +
>>    static void piix_ide_reset(DeviceState *dev)
>>    {
>>    PCIIDEState *d = PCI_IDE(dev);
>> @@ -138,11 +145,18 @@ static void pci_piix_init_ports(PCIIDEState *d, 
>> ISABus *isa_bus)
>>    };
>>    int i;
>>    +    if (isa_bus) {
>> +    d->isa_irqs[0] = isa_bus->irqs[port_info[0].isairq];
>> +    d->isa_irqs[1] = isa_bus->irqs[port_info[1].isairq];
>> +    } else {
>> +    qdev_init_gpio_out(DEVICE(d), d->isa_irqs, 2);
>> +    }
>> +
>>    for (i = 0; i < 2; i++) {
>>    ide_bus_init(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
>>    ide_init_ioport(>bus[i], NULL, port_info[i].iobase,
>>    port_info[i].iobase2);
>> -    ide_init2(>bus[i], isa_bus->irqs[port_info[i].isairq]);
>> +    ide_init2(>bus[i], qdev_get_gpio_in(DEVICE(d), i));
>>      bmdma_init(>bus[i], >bmdma[i], d);
>>    d->bmdma[i].bus = >bus[i];
>> @@ -154,8 +168,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, 
>> Error **errp)
>>    {
>>    PCIIDEState *d = PCI_IDE(dev);
>>    uint8_t *pci_conf = dev->config;
>> -    ISABus *isa_bus;
>> -    bool ambiguous;
>> +    ISABus *isa_bus = NULL;
>>      pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
>>    @@ -164,22 +177,36 @@ static void pci_piix_ide_realize(PCIDevice 
>> *dev, Error **errp)
>>      vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d);
>>    -    isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS, 
>> ));
>> -    if (ambiguous) {
>> -    error_setg(errp,
>> -   "More than one ISA bus found while %s supports only 
>> one",
>> -   object_get_typename(OBJECT(dev)));
>> -    return;
>> -    }
>> -    if (!isa_bus) {
>> -    error_setg(errp, "No ISA bus found while %s requires one",
>> -   object_get_typename(OBJECT(dev)));
>> -    return;
>> +    if (d->user_created) {
>> +    bool ambiguous;
>> +
>> +    isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS,
>> +   ));
>> +
>> +    if (ambiguous) {
>> +    error_setg(errp,
>> +   "More than one ISA bus found while %s supports 
>> only one",
>> +   object_get_typename(OBJECT(dev)));
>> +    return;
>> +    }
>> +
>> +    if (!isa_bus) {
>> +    error_setg(errp, "No ISA bus found while %s requires one",
>> +   object_get_typename(OBJECT(dev)));
>> +    return;
>> +    }
>>    }
>>      pci_piix_init_ports(d, isa_bus);
>>    }
>>    +static void pci_piix_ide_init(Object *obj)
>> +{
>> +    DeviceState *dev = DEVICE(obj);
>> +
>> +    qdev_init_gpio_in(dev, piix_ide_set_irq, 2);
>> +}
>> +
>>   

Re: [PATCH 0/7] hw/ide: Untangle ISA/PCI abuses of ide_init_ioport()

2023-02-07 Thread Philippe Mathieu-Daudé

On 8/2/23 01:07, Philippe Mathieu-Daudé wrote:

Background thread:
https://lore.kernel.org/qemu-devel/5095dffc-309b-6c72-d255-8cdaa6fd3...@ilande.co.uk/

The ide_init_ioport() method expect an ISA device, but is
massaged to accept NULL device (IOW, non-ISA devices...).

A plausible explanation is QOM objects can only inherit one
parent, and ide_init_ioport() was first used with ISA children.
Later it was adapted to accept PCI children in a convoluted
way. The PIIX IDE function abuse it.

This series rename the current ide_init_ioport() as
ide_init_ioport_isa(), then add a generic ide_init_ioport()
which works with PCI devices.

This is required to proceed with more PIIX cleanups.

Philippe Mathieu-Daudé (7):
   hw/isa: Un-inline isa_bus_from_device()
   hw/isa: Use isa_address_space_io() to reduce access on global 'isabus'
   hw/ide: Rename ISA specific ide_init_ioport() as ide_init_ioport_isa()
   hw/ide: Introduce generic ide_init_ioport()
   hw/ide/piix: Use generic ide_init_ioport()
   hw/isa: Assert isa_register_portio_list() gets non-NULL ISA device
   hw/ide/piix: Remove dead code in pci_piix_init_ports()


Forgot:
Based-on: <20230207234615.77300-1-phi...@linaro.org>
"exec/ioport: Factor portio_list_register[flush_coalesced]() out"
https://lore.kernel.org/qemu-devel/20230207234615.77300-1-phi...@linaro.org/





[PATCH 6/7] hw/isa: Assert isa_register_portio_list() gets non-NULL ISA device

2023-02-07 Thread Philippe Mathieu-Daudé
The previous commit removed the single call to
isa_register_portio_list() with dev=NULL. To be
sure we won't reintroduce such weird (ab)use,
add an assertion.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/isa/isa-bus.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c
index 95fc1ba5f7..3d1996c115 100644
--- a/hw/isa/isa-bus.c
+++ b/hw/isa/isa-bus.c
@@ -107,7 +107,7 @@ IsaDma *isa_get_dma(ISABus *bus, int nchan)
 
 static inline void isa_init_ioport(ISADevice *dev, uint16_t ioport)
 {
-if (dev && (dev->ioport_id == 0 || ioport < dev->ioport_id)) {
+if (dev->ioport_id == 0 || ioport < dev->ioport_id) {
 dev->ioport_id = ioport;
 }
 }
@@ -123,6 +123,8 @@ int isa_register_portio_list(ISADevice *dev,
  const MemoryRegionPortio *pio_start,
  void *opaque, const char *name)
 {
+assert(dev);
+
 if (!isabus) {
 return -ENODEV;
 }
-- 
2.38.1




[PATCH 7/7] hw/ide/piix: Remove dead code in pci_piix_init_ports()

2023-02-07 Thread Philippe Mathieu-Daudé
pci_piix_init_ports() always return '0' so can't fail.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/ide/piix.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 1cd4389611..54d545ce3a 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -126,7 +126,7 @@ static void piix_ide_reset(DeviceState *dev)
 pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
 }
 
-static int pci_piix_init_ports(PCIIDEState *d)
+static void pci_piix_init_ports(PCIIDEState *d)
 {
 static const struct {
 int iobase;
@@ -149,15 +149,12 @@ static int pci_piix_init_ports(PCIIDEState *d)
 d->bmdma[i].bus = >bus[i];
 ide_register_restart_cb(>bus[i]);
 }
-
-return 0;
 }
 
 static void pci_piix_ide_realize(PCIDevice *dev, Error **errp)
 {
 PCIIDEState *d = PCI_IDE(dev);
 uint8_t *pci_conf = dev->config;
-int rc;
 
 pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
 
@@ -166,11 +163,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
 
 vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d);
 
-rc = pci_piix_init_ports(d);
-if (rc) {
-error_setg_errno(errp, -rc, "Failed to realize %s",
- object_get_typename(OBJECT(dev)));
-}
+pci_piix_init_ports(d);
 }
 
 static void pci_piix_ide_exitfn(PCIDevice *dev)
-- 
2.38.1




[PATCH 4/7] hw/ide: Introduce generic ide_init_ioport()

2023-02-07 Thread Philippe Mathieu-Daudé
Add ide_init_ioport() which is not restricted to the ISA bus.
(Next commit will use it for a PCI device).

Inspired-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/ide/ioport.c   | 11 +--
 include/hw/ide/internal.h |  2 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/ide/ioport.c b/hw/ide/ioport.c
index ac804a89e8..494fa88368 100644
--- a/hw/ide/ioport.c
+++ b/hw/ide/ioport.c
@@ -54,8 +54,6 @@ int ide_init_ioport_isa(IDEBus *bus, ISADevice *dev, int 
iobase, int iobase2)
 {
 int ret;
 
-/* ??? Assume only ISA and PCI configurations, and that the PCI-ISA
-   bridge has been setup properly to always register with ISA.  */
 ret = isa_register_portio_list(dev, >portio_list,
iobase, ide_portio_list, bus, "ide");
 
@@ -66,3 +64,12 @@ int ide_init_ioport_isa(IDEBus *bus, ISADevice *dev, int 
iobase, int iobase2)
 
 return ret;
 }
+
+void ide_init_ioport(IDEBus *bus, Object *owner, MemoryRegion *io,
+ int iobase, int iobase2)
+{
+portio_list_register(>portio_list, owner, ide_portio_list,
+ bus, "ide", io, iobase);
+portio_list_register(>portio2_list, owner, ide_portio2_list,
+ bus, "ide", io, iobase2);
+}
diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
index 88a096f9df..79db902505 100644
--- a/include/hw/ide/internal.h
+++ b/include/hw/ide/internal.h
@@ -629,6 +629,8 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, 
IDEDriveKind kind,
 void ide_init2(IDEBus *bus, qemu_irq irq);
 void ide_exit(IDEState *s);
 int ide_init_ioport_isa(IDEBus *bus, ISADevice *isa, int iobase, int iobase2);
+void ide_init_ioport(IDEBus *bus, Object *owner, MemoryRegion *io,
+ int iobase, int iobase2);
 void ide_register_restart_cb(IDEBus *bus);
 
 void ide_exec_cmd(IDEBus *bus, uint32_t val);
-- 
2.38.1




[PATCH 5/7] hw/ide/piix: Use generic ide_init_ioport()

2023-02-07 Thread Philippe Mathieu-Daudé
TYPE_PIIX3_IDE is a PCI function inheriting from QOM
TYPE_PCI_DEVICE. To be able to call the ISA specific
ide_init_ioport_isa(), we call this function passing
a NULL ISADevice argument. Remove this hack by calling
the recently added generic ide_init_ioport(), which
doesn't expect any ISADevice.

Inspired-by: Bernhard Beschow 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/ide/piix.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index a587541bb2..1cd4389611 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -136,15 +136,13 @@ static int pci_piix_init_ports(PCIIDEState *d)
 {0x1f0, 0x3f6, 14},
 {0x170, 0x376, 15},
 };
-int i, ret;
+int i;
 
 for (i = 0; i < 2; i++) {
 ide_bus_init(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
-ret = ide_init_ioport_isa(>bus[i], NULL,
-  port_info[i].iobase, port_info[i].iobase2);
-if (ret) {
-return ret;
-}
+ide_init_ioport(>bus[i], OBJECT(d),
+pci_address_space_io(PCI_DEVICE(d)),
+port_info[i].iobase, port_info[i].iobase2);
 ide_init2(>bus[i], isa_get_irq(NULL, port_info[i].isairq));
 
 bmdma_init(>bus[i], >bmdma[i], d);
-- 
2.38.1




[PATCH 0/7] hw/ide: Untangle ISA/PCI abuses of ide_init_ioport()

2023-02-07 Thread Philippe Mathieu-Daudé
Background thread:
https://lore.kernel.org/qemu-devel/5095dffc-309b-6c72-d255-8cdaa6fd3...@ilande.co.uk/

The ide_init_ioport() method expect an ISA device, but is
massaged to accept NULL device (IOW, non-ISA devices...).

A plausible explanation is QOM objects can only inherit one
parent, and ide_init_ioport() was first used with ISA children.
Later it was adapted to accept PCI children in a convoluted
way. The PIIX IDE function abuse it.

This series rename the current ide_init_ioport() as
ide_init_ioport_isa(), then add a generic ide_init_ioport()
which works with PCI devices.

This is required to proceed with more PIIX cleanups.

Philippe Mathieu-Daudé (7):
  hw/isa: Un-inline isa_bus_from_device()
  hw/isa: Use isa_address_space_io() to reduce access on global 'isabus'
  hw/ide: Rename ISA specific ide_init_ioport() as ide_init_ioport_isa()
  hw/ide: Introduce generic ide_init_ioport()
  hw/ide/piix: Use generic ide_init_ioport()
  hw/isa: Assert isa_register_portio_list() gets non-NULL ISA device
  hw/ide/piix: Remove dead code in pci_piix_init_ports()

 hw/ide/ioport.c   | 13 ++---
 hw/ide/isa.c  |  2 +-
 hw/ide/piix.c | 21 ++---
 hw/isa/isa-bus.c  | 13 ++---
 include/hw/ide/internal.h |  4 +++-
 include/hw/isa/isa.h  |  5 +
 6 files changed, 31 insertions(+), 27 deletions(-)

-- 
2.38.1




[PATCH 2/7] hw/isa: Use isa_address_space_io() to reduce access on global 'isabus'

2023-02-07 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/isa/isa-bus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c
index 5bd99379e9..95fc1ba5f7 100644
--- a/hw/isa/isa-bus.c
+++ b/hw/isa/isa-bus.c
@@ -114,7 +114,7 @@ static inline void isa_init_ioport(ISADevice *dev, uint16_t 
ioport)
 
 void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start)
 {
-memory_region_add_subregion(isabus->address_space_io, start, io);
+memory_region_add_subregion(isa_address_space_io(dev), start, io);
 isa_init_ioport(dev, start);
 }
 
@@ -133,7 +133,7 @@ int isa_register_portio_list(ISADevice *dev,
 isa_init_ioport(dev, start);
 
 portio_list_register(piolist, OBJECT(dev), pio_start, opaque, name,
- isabus->address_space_io, start);
+ isa_address_space_io(dev), start);
 
 return 0;
 }
-- 
2.38.1




[PATCH 3/7] hw/ide: Rename ISA specific ide_init_ioport() as ide_init_ioport_isa()

2023-02-07 Thread Philippe Mathieu-Daudé
Rename ide_init_ioport() as ide_init_ioport_isa() to make
explicit it expects an ISA device.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/ide/ioport.c   | 2 +-
 hw/ide/isa.c  | 2 +-
 hw/ide/piix.c | 4 ++--
 include/hw/ide/internal.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/ide/ioport.c b/hw/ide/ioport.c
index e6caa537fa..ac804a89e8 100644
--- a/hw/ide/ioport.c
+++ b/hw/ide/ioport.c
@@ -50,7 +50,7 @@ static const MemoryRegionPortio ide_portio2_list[] = {
 PORTIO_END_OF_LIST(),
 };
 
-int ide_init_ioport(IDEBus *bus, ISADevice *dev, int iobase, int iobase2)
+int ide_init_ioport_isa(IDEBus *bus, ISADevice *dev, int iobase, int iobase2)
 {
 int ret;
 
diff --git a/hw/ide/isa.c b/hw/ide/isa.c
index 8bedbd13f1..4dbd1e48b8 100644
--- a/hw/ide/isa.c
+++ b/hw/ide/isa.c
@@ -74,7 +74,7 @@ static void isa_ide_realizefn(DeviceState *dev, Error **errp)
 ISAIDEState *s = ISA_IDE(dev);
 
 ide_bus_init(>bus, sizeof(s->bus), dev, 0, 2);
-ide_init_ioport(>bus, isadev, s->iobase, s->iobase2);
+ide_init_ioport_isa(>bus, isadev, s->iobase, s->iobase2);
 s->irq = isa_get_irq(isadev, s->isairq);
 ide_init2(>bus, s->irq);
 vmstate_register(VMSTATE_IF(dev), 0, _ide_isa, s);
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 267dbf37db..a587541bb2 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -140,8 +140,8 @@ static int pci_piix_init_ports(PCIIDEState *d)
 
 for (i = 0; i < 2; i++) {
 ide_bus_init(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
-ret = ide_init_ioport(>bus[i], NULL, port_info[i].iobase,
-  port_info[i].iobase2);
+ret = ide_init_ioport_isa(>bus[i], NULL,
+  port_info[i].iobase, port_info[i].iobase2);
 if (ret) {
 return ret;
 }
diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
index fc0aa81a88..88a096f9df 100644
--- a/include/hw/ide/internal.h
+++ b/include/hw/ide/internal.h
@@ -628,7 +628,7 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, 
IDEDriveKind kind,
int chs_trans, Error **errp);
 void ide_init2(IDEBus *bus, qemu_irq irq);
 void ide_exit(IDEState *s);
-int ide_init_ioport(IDEBus *bus, ISADevice *isa, int iobase, int iobase2);
+int ide_init_ioport_isa(IDEBus *bus, ISADevice *isa, int iobase, int iobase2);
 void ide_register_restart_cb(IDEBus *bus);
 
 void ide_exec_cmd(IDEBus *bus, uint32_t val);
-- 
2.38.1




[PATCH 1/7] hw/isa: Un-inline isa_bus_from_device()

2023-02-07 Thread Philippe Mathieu-Daudé
No point in inlining isa_bus_from_device() which is only
used at device realization time.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/isa/isa-bus.c | 5 +
 include/hw/isa/isa.h | 5 +
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c
index 4fe61d6dfe..5bd99379e9 100644
--- a/hw/isa/isa-bus.c
+++ b/hw/isa/isa-bus.c
@@ -162,6 +162,11 @@ bool isa_realize_and_unref(ISADevice *dev, ISABus *bus, 
Error **errp)
 return qdev_realize_and_unref(>parent_obj, >parent_obj, errp);
 }
 
+ISABus *isa_bus_from_device(ISADevice *dev)
+{
+return ISA_BUS(qdev_get_parent_bus(DEVICE(dev)));
+}
+
 ISADevice *isa_vga_init(ISABus *bus)
 {
 vga_interface_created = true;
diff --git a/include/hw/isa/isa.h b/include/hw/isa/isa.h
index 25acd5c34c..ad8bdd941f 100644
--- a/include/hw/isa/isa.h
+++ b/include/hw/isa/isa.h
@@ -123,9 +123,6 @@ int isa_register_portio_list(ISADevice *dev,
  const MemoryRegionPortio *portio,
  void *opaque, const char *name);
 
-static inline ISABus *isa_bus_from_device(ISADevice *d)
-{
-return ISA_BUS(qdev_get_parent_bus(DEVICE(d)));
-}
+ISABus *isa_bus_from_device(ISADevice *dev);
 
 #endif
-- 
2.38.1




Re: [PATCH v9 14/14] docs/devel: Align VFIO migration docs to v2 protocol

2023-02-07 Thread Alex Williamson
On Mon, 6 Feb 2023 14:31:37 +0200
Avihai Horon  wrote:

> Now that VFIO migration protocol v2 has been implemented and v1 protocol
> has been removed, update the documentation according to v2 protocol.
> 
> Signed-off-by: Avihai Horon 
> Reviewed-by: Cédric Le Goater 
> ---
>  docs/devel/vfio-migration.rst | 68 ---
>  1 file changed, 30 insertions(+), 38 deletions(-)

A note about the single device limitation should be added here.  Thanks,

Alex




[PATCH 1/2] exec/ioport: Factor portio_list_register_flush_coalesced() out

2023-02-07 Thread Philippe Mathieu-Daudé
We always follow the same pattern when registering
coalesced portio:

  - portio_list_init()
  - portio_list_set_flush_coalesced()
  - portio_list_add()

Factor these 3 operations in a single helper named
portio_list_register_flush_coalesced().

Drop portio_list_set_flush_coalesced() which is now
inlined.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/display/qxl.c  |  7 +++
 hw/display/vga.c  |  5 ++---
 include/exec/ioport.h |  5 -
 softmmu/ioport.c  | 27 ++-
 4 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index ec712d3ca2..2ecaa0643f 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2224,10 +2224,9 @@ static void qxl_realize_primary(PCIDevice *dev, Error 
**errp)
 }
 vga_init(vga, OBJECT(dev),
  pci_address_space(dev), pci_address_space_io(dev), false);
-portio_list_init(>vga_port_list, OBJECT(dev), qxl_vga_portio_list,
- vga, "vga");
-portio_list_set_flush_coalesced(>vga_port_list);
-portio_list_add(>vga_port_list, pci_address_space_io(dev), 0x3b0);
+portio_list_register_flush_coalesced(>vga_port_list, OBJECT(dev),
+ qxl_vga_portio_list, vga, "vga",
+ pci_address_space_io(dev), 0x3b0);
 qxl->have_vga = true;
 
 vga->con = graphic_console_init(DEVICE(dev), 0, _ops, qxl);
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 7a5fdff649..98d644922e 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2309,9 +2309,8 @@ void vga_init(VGACommonState *s, Object *obj, 
MemoryRegion *address_space,
 1);
 memory_region_set_coalescing(vga_io_memory);
 if (init_vga_ports) {
-portio_list_init(>vga_port_list, obj, vga_ports, s, "vga");
-portio_list_set_flush_coalesced(>vga_port_list);
-portio_list_add(>vga_port_list, address_space_io, 0x3b0);
+portio_list_register_flush_coalesced(>vga_port_list, obj, vga_ports,
+ s, "vga", address_space_io, 
0x3b0);
 }
 if (vbe_ports) {
 portio_list_init(>vbe_port_list, obj, vbe_ports, s, "vbe");
diff --git a/include/exec/ioport.h b/include/exec/ioport.h
index e34f668998..eb9882a3ee 100644
--- a/include/exec/ioport.h
+++ b/include/exec/ioport.h
@@ -65,7 +65,10 @@ typedef struct PortioList {
 void portio_list_init(PortioList *piolist, Object *owner,
   const struct MemoryRegionPortio *callbacks,
   void *opaque, const char *name);
-void portio_list_set_flush_coalesced(PortioList *piolist);
+void portio_list_register_flush_coalesced(PortioList *piolist, Object *owner,
+  const MemoryRegionPortio *callbacks,
+  void *opaque, const char *name,
+  MemoryRegion *mr, uint32_t offset);
 void portio_list_destroy(PortioList *piolist);
 void portio_list_add(PortioList *piolist,
  struct MemoryRegion *address_space,
diff --git a/softmmu/ioport.c b/softmmu/ioport.c
index cb8adb0b93..be0c920c5c 100644
--- a/softmmu/ioport.c
+++ b/softmmu/ioport.c
@@ -124,6 +124,7 @@ void portio_list_init(PortioList *piolist,
 ++n;
 }
 
+assert(owner);
 piolist->ports = callbacks;
 piolist->nr = 0;
 piolist->regions = g_new0(MemoryRegion *, n);
@@ -134,11 +135,6 @@ void portio_list_init(PortioList *piolist,
 piolist->flush_coalesced_mmio = false;
 }
 
-void portio_list_set_flush_coalesced(PortioList *piolist)
-{
-piolist->flush_coalesced_mmio = true;
-}
-
 void portio_list_destroy(PortioList *piolist)
 {
 MemoryRegionPortioList *mrpio;
@@ -297,3 +293,24 @@ void portio_list_del(PortioList *piolist)
 memory_region_del_subregion(piolist->address_space, >mr);
 }
 }
+
+static void do_portio_list_register(PortioList *piolist, Object *owner,
+const MemoryRegionPortio *callbacks,
+void *opaque, const char *name,
+MemoryRegion *mr, uint32_t offset,
+bool flush_coalesced_mmio)
+{
+assert(piolist && !piolist->owner);
+portio_list_init(piolist, owner, callbacks, opaque, name);
+piolist->flush_coalesced_mmio = flush_coalesced_mmio;
+portio_list_add(piolist, mr, offset);
+}
+
+void portio_list_register_flush_coalesced(PortioList *piolist, Object *owner,
+  const MemoryRegionPortio *callbacks,
+  void *opaque, const char *name,
+  MemoryRegion *mr, uint32_t offset)
+{
+do_portio_list_register(piolist, owner, callbacks,
+opaque, name, mr, offset, true);
+}
-- 
2.38.1




[PATCH 0/2] exec/ioport: Factor portio_list_register[flush_coalesced]() out

2023-02-07 Thread Philippe Mathieu-Daudé
Preliminary to further ISA API cleanups...

Convert this sequence:

- portio_list_init()
- portio_list_set_flush_coalesced()
- portio_list_add()

to portio_list_register_flush_coalesced(),
and the non-coalescing equivalent:

- portio_list_init()
- portio_list_add()

to portio_list_register().

Philippe Mathieu-Daudé (2):
  exec/ioport: Factor portio_list_register_flush_coalesced() out
  exec/ioport: Factor portio_list_register() out

 hw/audio/adlib.c|  4 ++--
 hw/display/qxl.c|  7 +++---
 hw/display/vga.c|  9 
 hw/dma/i82374.c |  7 +++---
 hw/isa/isa-bus.c|  6 ++
 hw/watchdog/wdt_ib700.c |  4 ++--
 include/exec/ioport.h   | 15 +++--
 softmmu/ioport.c| 48 ++---
 8 files changed, 60 insertions(+), 40 deletions(-)

-- 
2.38.1




[PATCH 2/2] exec/ioport: Factor portio_list_register() out

2023-02-07 Thread Philippe Mathieu-Daudé
We always follow the same pattern when registering
non-coalesced portio:

  - portio_list_init()
  - portio_list_add()

Factor these 2 operations in a single helper named
portio_list_register(). Since both calls become local
to ioport.c, reduce their scope by declaring them static.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/audio/adlib.c|  4 ++--
 hw/display/vga.c|  4 ++--
 hw/dma/i82374.c |  7 +++
 hw/isa/isa-bus.c|  6 ++
 hw/watchdog/wdt_ib700.c |  4 ++--
 include/exec/ioport.h   | 10 --
 softmmu/ioport.c| 21 ++---
 7 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/hw/audio/adlib.c b/hw/audio/adlib.c
index 5f979b1487..cc03c99306 100644
--- a/hw/audio/adlib.c
+++ b/hw/audio/adlib.c
@@ -291,8 +291,8 @@ static void adlib_realizefn (DeviceState *dev, Error **errp)
 
 adlib_portio_list[0].offset = s->port;
 adlib_portio_list[1].offset = s->port + 8;
-portio_list_init (>port_list, OBJECT(s), adlib_portio_list, s, "adlib");
-portio_list_add (>port_list, isa_address_space_io(>parent_obj), 0);
+portio_list_register(>port_list, OBJECT(s), adlib_portio_list, s,
+ "adlib", isa_address_space_io(>parent_obj), 0);
 }
 
 static Property adlib_properties[] = {
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 98d644922e..aa899fddc3 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2313,7 +2313,7 @@ void vga_init(VGACommonState *s, Object *obj, 
MemoryRegion *address_space,
  s, "vga", address_space_io, 
0x3b0);
 }
 if (vbe_ports) {
-portio_list_init(>vbe_port_list, obj, vbe_ports, s, "vbe");
-portio_list_add(>vbe_port_list, address_space_io, 0x1ce);
+portio_list_register(>vbe_port_list, obj, vbe_ports, s,
+ "vbe", address_space_io, 0x1ce);
 }
 }
diff --git a/hw/dma/i82374.c b/hw/dma/i82374.c
index 34c3aaf7d3..cb68eee192 100644
--- a/hw/dma/i82374.c
+++ b/hw/dma/i82374.c
@@ -131,10 +131,9 @@ static void i82374_realize(DeviceState *dev, Error **errp)
 }
 i8257_dma_init(isa_bus, true);
 
-portio_list_init(>port_list, OBJECT(s), i82374_portio_list, s,
- "i82374");
-portio_list_add(>port_list, isa_address_space_io(>parent_obj),
-s->iobase);
+portio_list_register(>port_list, OBJECT(s), i82374_portio_list, s,
+ "i82374", isa_address_space_io(>parent_obj),
+ s->iobase);
 
 memset(s->commands, 0, sizeof(s->commands));
 }
diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c
index f155b80010..4fe61d6dfe 100644
--- a/hw/isa/isa-bus.c
+++ b/hw/isa/isa-bus.c
@@ -123,8 +123,6 @@ int isa_register_portio_list(ISADevice *dev,
  const MemoryRegionPortio *pio_start,
  void *opaque, const char *name)
 {
-assert(piolist && !piolist->owner);
-
 if (!isabus) {
 return -ENODEV;
 }
@@ -134,8 +132,8 @@ int isa_register_portio_list(ISADevice *dev,
actually handled e.g. the FDC device.  */
 isa_init_ioport(dev, start);
 
-portio_list_init(piolist, OBJECT(dev), pio_start, opaque, name);
-portio_list_add(piolist, isabus->address_space_io, start);
+portio_list_register(piolist, OBJECT(dev), pio_start, opaque, name,
+ isabus->address_space_io, start);
 
 return 0;
 }
diff --git a/hw/watchdog/wdt_ib700.c b/hw/watchdog/wdt_ib700.c
index b116c3a3aa..ac4f0be7d8 100644
--- a/hw/watchdog/wdt_ib700.c
+++ b/hw/watchdog/wdt_ib700.c
@@ -115,8 +115,8 @@ static void wdt_ib700_realize(DeviceState *dev, Error 
**errp)
 
 s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ib700_timer_expired, s);
 
-portio_list_init(>port_list, OBJECT(s), wdt_portio_list, s, "ib700");
-portio_list_add(>port_list, isa_address_space_io(>parent_obj), 0);
+portio_list_register(>port_list, OBJECT(s), wdt_portio_list, s,
+ "ib700", isa_address_space_io(>parent_obj), 0);
 }
 
 static void wdt_ib700_reset(DeviceState *dev)
diff --git a/include/exec/ioport.h b/include/exec/ioport.h
index eb9882a3ee..ca44f269ea 100644
--- a/include/exec/ioport.h
+++ b/include/exec/ioport.h
@@ -62,17 +62,15 @@ typedef struct PortioList {
 bool flush_coalesced_mmio;
 } PortioList;
 
-void portio_list_init(PortioList *piolist, Object *owner,
-  const struct MemoryRegionPortio *callbacks,
-  void *opaque, const char *name);
+void portio_list_register(PortioList *piolist, Object *owner,
+  const MemoryRegionPortio *callbacks,
+  void *opaque, const char *name,
+  MemoryRegion *mr, uint32_t offset);
 void portio_list_register_flush_coalesced(PortioList *piolist, Object *owner,
   const MemoryRegionPortio *callbacks,
 

Re: [PATCH v9 10/14] vfio/migration: Implement VFIO migration protocol v2

2023-02-07 Thread Alex Williamson
On Mon, 6 Feb 2023 14:31:33 +0200
Avihai Horon  wrote:
> @@ -523,6 +745,41 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
>  return 0;
>  }
>  
> +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +enum vfio_device_mig_state recover_state;
> +int ret;
> +
> +/* We reach here with device state STOP only */
> +recover_state = VFIO_DEVICE_STATE_STOP;

Why do we need to put this in a local variable?

> +ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
> +   recover_state);
> +if (ret) {
> +return ret;
> +}
> +
> +do {
> +ret = vfio_save_block(f, vbasedev->migration);
> +if (ret < 0) {
> +return ret;
> +}
> +} while (!ret);
> +
> +qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> +ret = qemu_file_get_error(f);
> +if (ret) {
> +return ret;
> +}
> +
> +recover_state = VFIO_DEVICE_STATE_ERROR;

IIRC, the ERROR state is not reachable as a user directed state.  I
suppose passing it as the recovery state guarantees a device reset when
it fails, but if that's the intention it should be documented with a
comment to explain so (and vfio_migration_set_state() should not bother
trying to use it as a recovery state).

> +ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> +   recover_state);
> +trace_vfio_save_complete_precopy(vbasedev->name, ret);
> +
> +return ret;
> +}
> +
>  static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
>  {
>  VFIODevice *vbasedev = opaque;
...
> @@ -769,12 +1087,17 @@ static void vfio_migration_state_notifier(Notifier 
> *notifier, void *data)
>  case MIGRATION_STATUS_CANCELLED:
>  case MIGRATION_STATUS_FAILED:
>  bytes_transferred = 0;
> -ret = vfio_migration_v1_set_state(vbasedev,
> -  ~(VFIO_DEVICE_STATE_V1_SAVING |
> -VFIO_DEVICE_STATE_V1_RESUMING),
> -  VFIO_DEVICE_STATE_V1_RUNNING);
> -if (ret) {
> -error_report("%s: Failed to set state RUNNING", vbasedev->name);
> +if (migration->v2) {
> +vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
> + VFIO_DEVICE_STATE_ERROR);

Same here.  Thanks,

Alex

> +} else {
> +ret = vfio_migration_v1_set_state(vbasedev,
> +  ~(VFIO_DEVICE_STATE_V1_SAVING |
> +
> VFIO_DEVICE_STATE_V1_RESUMING),
> +  VFIO_DEVICE_STATE_V1_RUNNING);
> +if (ret) {
> +error_report("%s: Failed to set state RUNNING", 
> vbasedev->name);
> +}
>  }
>  }
>  }




Re: [PATCH 3/3] hw/isa/vt82c686: Implement ACPI powerdown

2023-02-07 Thread Bernhard Beschow



Am 31. Januar 2023 14:58:01 UTC schrieb BALATON Zoltan :
>On Sun, 29 Jan 2023, Bernhard Beschow wrote:
>> Signed-off-by: Bernhard Beschow 
>> ---
>> hw/isa/vt82c686.c | 15 +++
>> 1 file changed, 15 insertions(+)
>> 
>> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
>> index b0765d4ed8..2db54d1649 100644
>> --- a/hw/isa/vt82c686.c
>> +++ b/hw/isa/vt82c686.c
>> @@ -33,8 +33,10 @@
>> #include "qapi/error.h"
>> #include "qemu/log.h"
>> #include "qemu/module.h"
>> +#include "qemu/notify.h"
>> #include "qemu/range.h"
>> #include "qemu/timer.h"
>> +#include "sysemu/runstate.h"
>> #include "trace.h"
>> 
>> #define ACPI_ENABLE 0xf1
>> @@ -50,6 +52,8 @@ struct ViaPMState {
>> APMState apm;
>> PMSMBus smb;
>> 
>> +Notifier powerdown_notifier;
>> +
>> qemu_irq irq;
>
>Is there a reason to leave blank lines here? Do they separate any logical 
>blocks? If not please just drop them to allow mire lines to fit in one screen.

I could stick closer to piix4 here indeed.

>> };
>> 
>> @@ -198,6 +202,14 @@ static void via_pm_reset(DeviceState *d)
>> smb_io_space_update(s);
>> }
>> 
>> +static void via_pm_powerdown_req(Notifier *n, void *opaque)
>> +{
>> +ViaPMState *s = container_of(n, ViaPMState, powerdown_notifier);
>> +
>> +assert(s != NULL);
>
>Only piix4 seems to assert in this callback, all others assume this will work 
>and indeed you register it from the realize method of the same object with its 
>already type checked state struct so this should not need to check again so I 
>think asserting here is overcautiousness.

Yes, I took a lot of inspitation from piix4. Piix4 also sets up the notifier in 
its realize method, so the situation here and there should be the same. I'd 
therefore tend to keep the code consistent. Also, an assert does communicate 
intention.

>
>As said in the cover all these are just small things I came across, sorry I 
>can't give a better review of this.

Thanks still!

Best regards,
Bernhard

P.S.: Any news from the via-ac97 side? Mind rebasing onto 7.2, even if it's 
still work in progress?

>
>Regards,
>BALATON Zoltan
>
>> +acpi_pm1_evt_power_down(>ar);
>> +}
>> +
>> static void via_pm_realize(PCIDevice *dev, Error **errp)
>> {
>> ViaPMState *s = VIA_PM(dev);
>> @@ -218,6 +230,9 @@ static void via_pm_realize(PCIDevice *dev, Error **errp)
>> acpi_pm_tmr_init(>ar, pm_tmr_timer, >io);
>> acpi_pm1_evt_init(>ar, pm_tmr_timer, >io);
>> acpi_pm1_cnt_init(>ar, >io, false, false, 2, false);
>> +
>> +s->powerdown_notifier.notify = via_pm_powerdown_req;
>> +qemu_register_powerdown_notifier(>powerdown_notifier);
>> }
>> 
>> static void via_pm_init(Object *obj)
>> 



Re: [PATCH] x86: Don't add RNG seed to Linux cmdline for SEV guests

2023-02-07 Thread Tom Lendacky

On 2/7/23 15:45, Michael S. Tsirkin wrote:

On Tue, Feb 07, 2023 at 08:41:16AM +, Dov Murik wrote:

Recent feature to supply RNG seed to the guest kernel modifies the
kernel command-line by adding extra data at its end; this breaks
measured boot with SEV and OVMF, and possibly signed boot.

Specifically SEV doesn't miss this feature because it uses UEFI/OVMF
which has its own way of getting random seed (not to mention that
getting the random seed from the untrusted host breaks the confidential
computing trust model).


Nope - getting a random seed from an untrusted source should not break
anything assuming you also have some other randomness source.
If you don't then you have other problems.


Disable the RNG seed feature in SEV guests.

Fixes: eac7a7791bb6 ("x86: don't let decompressed kernel image clobber 
setup_data")
Reported-by: Tom Lendacky 
Signed-off-by: Dov Murik 

---

There might be a need for a wider change to the ways setup_data entries
are handled in x86_load_linux(); here I just try to restore the
situation for SEV guests prior to the addition of the SETUP_RNG_SEED
entry.

Recent discussions on other (safer?) ways to pass this setup_data entry:
[1] 
https://lore.kernel.org/qemu-devel/da39abab9785aea2a2e7652ed6403b6268aeb31f.ca...@linux.ibm.com/

Note that in qemu 7.2.0 this is broken as well -- there the
SETUP_RNG_SEED entry is appended to the Linux kernel data (and therefore
modifies and breaks the measurement of the kernel in SEV measured boot).
A similar fix will be needed there (but I fear this patch cannot be
applied as-is).


So it's not a regression, is it?


SEV kernel hash comparison succeeded with Qemu v7.1.0, but fails with 
v7.2.0, so I think that is a regression.


Thanks,
Tom




---
  hw/i386/x86.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index eaff4227bd..e65a83f8df 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1103,7 +1103,7 @@ void x86_load_linux(X86MachineState *x86ms,
  load_image_size(dtb_filename, setup_data->data, dtb_size);
  }
  
-if (!legacy_no_rng_seed && protocol >= 0x209) {

+if (!legacy_no_rng_seed && protocol >= 0x209 && !sev_enabled()) {
  setup_data_offset = cmdline_size;
  cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
  kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);

base-commit: 6661b8c7fe3f8b5687d2d90f7b4f3f23d70e3e8b


I am beginning to think we have been hasty here. no rng seed
should have been then default and requested with a flag.
Then we'd avoid all this heartburn - and SEV might not be the
only workload broken.
Maybe not too late. Jason - objections?


--
2.25.1






Re: [PATCH] x86: Don't add RNG seed to Linux cmdline for SEV guests

2023-02-07 Thread Jason A. Donenfeld
Hi Tom,

On Tue, Feb 7, 2023 at 8:21 PM Tom Lendacky  wrote:
>
> On 2/7/23 15:45, Michael S. Tsirkin wrote:
> > On Tue, Feb 07, 2023 at 08:41:16AM +, Dov Murik wrote:
> >> Recent feature to supply RNG seed to the guest kernel modifies the
> >> kernel command-line by adding extra data at its end; this breaks
> >> measured boot with SEV and OVMF, and possibly signed boot.
> >>
> >> Specifically SEV doesn't miss this feature because it uses UEFI/OVMF
> >> which has its own way of getting random seed (not to mention that
> >> getting the random seed from the untrusted host breaks the confidential
> >> computing trust model).
> >
> > Nope - getting a random seed from an untrusted source should not break
> > anything assuming you also have some other randomness source.
> > If you don't then you have other problems.
> >
> >> Disable the RNG seed feature in SEV guests.
> >>
> >> Fixes: eac7a7791bb6 ("x86: don't let decompressed kernel image clobber 
> >> setup_data")
> >> Reported-by: Tom Lendacky 
> >> Signed-off-by: Dov Murik 
> >>
> >> ---
> >>
> >> There might be a need for a wider change to the ways setup_data entries
> >> are handled in x86_load_linux(); here I just try to restore the
> >> situation for SEV guests prior to the addition of the SETUP_RNG_SEED
> >> entry.
> >>
> >> Recent discussions on other (safer?) ways to pass this setup_data entry:
> >> [1] 
> >> https://lore.kernel.org/qemu-devel/da39abab9785aea2a2e7652ed6403b6268aeb31f.ca...@linux.ibm.com/
> >>
> >> Note that in qemu 7.2.0 this is broken as well -- there the
> >> SETUP_RNG_SEED entry is appended to the Linux kernel data (and therefore
> >> modifies and breaks the measurement of the kernel in SEV measured boot).
> >> A similar fix will be needed there (but I fear this patch cannot be
> >> applied as-is).
> >
> > So it's not a regression, is it?
>
> SEV kernel hash comparison succeeded with Qemu v7.1.0, but fails with
> v7.2.0, so I think that is a regression.

Please let me know if this series fixes it:
https://lore.kernel.org/all/20230207224847.94429-1-ja...@zx2c4.com/

Jason



Re: [PATCH 1/3] hw/isa/vt82c686: Fix SCI routing

2023-02-07 Thread Bernhard Beschow



Am 31. Januar 2023 14:42:29 UTC schrieb BALATON Zoltan :
>On Sun, 29 Jan 2023, Bernhard Beschow wrote:
>> According to the PCI specification, the hardware is not supposed to use
>> PCI_INTERRUPT_PIN for interrupt routing. Use the dedicated ACPI
>> Interrupt Select register for SCI routing instead.
>> 
>> Signed-off-by: Bernhard Beschow 
>> ---
>> hw/isa/vt82c686.c | 42 ++
>> 1 file changed, 30 insertions(+), 12 deletions(-)
>> 
>> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
>> index 3f9bd0c04d..2189be6f20 100644
>> --- a/hw/isa/vt82c686.c
>> +++ b/hw/isa/vt82c686.c
>> @@ -46,6 +46,8 @@ struct ViaPMState {
>> ACPIREGS ar;
>> APMState apm;
>> PMSMBus smb;
>> +
>> +qemu_irq irq;
>> };
>> 
>> static void pm_io_space_update(ViaPMState *s)
>> @@ -148,18 +150,7 @@ static void pm_update_sci(ViaPMState *s)
>>ACPI_BITMASK_POWER_BUTTON_ENABLE |
>>ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
>>ACPI_BITMASK_TIMER_ENABLE)) != 0);
>> -if (pci_get_byte(s->dev.config + PCI_INTERRUPT_PIN)) {
>> -/*
>> - * FIXME:
>> - * Fix device model that realizes this PM device and remove
>> - * this work around.
>> - * The device model should wire SCI and setup
>> - * PCI_INTERRUPT_PIN properly.
>> - * If PIN# = 0(interrupt pin isn't used), don't raise SCI as
>> - * work around.
>> - */
>> -pci_set_irq(>dev, sci_level);
>> -}
>> +qemu_set_irq(s->irq, sci_level);
>> /* schedule a timer interruption if needed */
>> acpi_pm_tmr_update(>ar, (s->ar.pm1.evt.en & 
>> ACPI_BITMASK_TIMER_ENABLE) &&
>>!(pmsts & ACPI_BITMASK_TIMER_STATUS));
>> @@ -213,6 +204,13 @@ static void via_pm_realize(PCIDevice *dev, Error **errp)
>> acpi_pm1_cnt_init(>ar, >io, false, false, 2, false);
>> }
>> 
>> +static void via_pm_init(Object *obj)
>> +{
>> +ViaPMState *s = VIA_PM(obj);
>> +
>> +qdev_init_gpio_out(DEVICE(obj), >irq, 1);
>> +}
>> +
>> typedef struct via_pm_init_info {
>> uint16_t device_id;
>> } ViaPMInitInfo;
>> @@ -238,6 +236,7 @@ static void via_pm_class_init(ObjectClass *klass, void 
>> *data)
>> static const TypeInfo via_pm_info = {
>> .name  = TYPE_VIA_PM,
>> .parent= TYPE_PCI_DEVICE,
>> +.instance_init = via_pm_init,
>> .instance_size = sizeof(ViaPMState),
>> .abstract  = true,
>> .interfaces = (InterfaceInfo[]) {
>> @@ -568,6 +567,21 @@ static const VMStateDescription vmstate_via = {
>> }
>> };
>> 
>> +static void via_isa_set_pm_irq(void *opaque, int n, int level)
>> +{
>> +ViaISAState *s = opaque;
>> +uint8_t irq = pci_get_byte(s->pm.dev.config + 0x42) & 0xf;
>> +
>> +if (irq == 2) {
>
>unlikely(irq == 2) but do we really need to log this? It really should not 
>happen but if it does the guest is really broken and this will just flood the 
>log so I think you can just test for it in the if below and drop the log.

I'd prefer to log such guest errors precisely for above reason: To detect 
really broken guests. If it's really too much noise one can still filter the 
log with external tools such as grep.

Best regards,
Bernhard

>
>Regards,
>BALATON Zoltan
>
>> +qemu_log_mask(LOG_GUEST_ERROR, "IRQ 2 for PM controller is 
>> reserved");
>> +return;
>> +}
>> +
>> +if (irq != 0) {
>> +qemu_set_irq(s->isa_irqs[irq], level);
>> +}
>> +}
>> +
>> static void via_isa_init(Object *obj)
>> {
>> ViaISAState *s = VIA_ISA(obj);
>> @@ -578,6 +592,8 @@ static void via_isa_init(Object *obj)
>> object_initialize_child(obj, "uhci2", >uhci[1], 
>> TYPE_VT82C686B_USB_UHCI);
>> object_initialize_child(obj, "ac97", >ac97, TYPE_VIA_AC97);
>> object_initialize_child(obj, "mc97", >mc97, TYPE_VIA_MC97);
>> +
>> +qdev_init_gpio_in_named(DEVICE(obj), via_isa_set_pm_irq, "sci", 1);
>> }
>> 
>> static const TypeInfo via_isa_info = {
>> @@ -664,6 +680,8 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
>> if (!qdev_realize(DEVICE(>pm), BUS(pci_bus), errp)) {
>> return;
>> }
>> +qdev_connect_gpio_out(DEVICE(>pm), 0,
>> +  qdev_get_gpio_in_named(DEVICE(d), "sci", 0));
>> 
>> /* Function 5: AC97 Audio */
>> qdev_prop_set_int32(DEVICE(>ac97), "addr", d->devfn + 5);
>> 



Re: [PATCH v2 10/11] target/arm: Store CPUARMState::nvic as NVICState*

2023-02-07 Thread Richard Henderson

On 2/6/23 12:35, Philippe Mathieu-Daudé wrote:

There is no point in using a void pointer to access the NVIC.
Use the real type to avoid casting it while debugging.

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/intc/armv7m_nvic.c | 38 ---
  target/arm/cpu.c  |  1 +
  target/arm/cpu.h  | 46 ++-
  target/arm/m_helper.c |  2 +-
  4 files changed, 39 insertions(+), 48 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 09/11] target/arm: Restrict CPUARMState::nvic to sysemu

2023-02-07 Thread Richard Henderson

On 2/6/23 12:35, Philippe Mathieu-Daudé wrote:

Signed-off-by: Philippe Mathieu-Daudé 
---
  target/arm/cpu.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 05/11] target/arm: Convert CPUARMState::eabi to boolean

2023-02-07 Thread Richard Henderson

On 2/6/23 12:34, Philippe Mathieu-Daudé wrote:

Suggested-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
  linux-user/arm/cpu_loop.c   | 4 ++--
  linux-user/user-internals.h | 2 +-
  target/arm/cpu.h| 2 +-
  3 files changed, 4 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 03/11] target/arm: Reduce arm_v7m_mmu_idx_[all/for_secstate_and_priv]() scope

2023-02-07 Thread Richard Henderson

On 2/6/23 12:34, Philippe Mathieu-Daudé wrote:

arm_v7m_mmu_idx_all() and arm_v7m_mmu_idx_for_secstate_and_priv()
are only used for system emulation in m_helper.c.
Move the definitions to avoid prototype forward declarations.

Signed-off-by: Philippe Mathieu-Daudé
---
  target/arm/internals.h | 14 
  target/arm/m_helper.c  | 74 +-
  2 files changed, 37 insertions(+), 51 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 02/11] target/arm: Simplify arm_v7m_mmu_idx_for_secstate() for user emulation

2023-02-07 Thread Richard Henderson

On 2/6/23 12:34, Philippe Mathieu-Daudé wrote:

Suggested-by: Richard Henderson
Signed-off-by: Philippe Mathieu-Daudé
---
  target/arm/m_helper.c | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 01/11] hw/intc/armv7m_nvic: Use OBJECT_DECLARE_SIMPLE_TYPE() macro

2023-02-07 Thread Richard Henderson

On 2/6/23 12:34, Philippe Mathieu-Daudé wrote:

Manually convert to OBJECT_DECLARE_SIMPLE_TYPE() macro,
similarly to automatic conversion from commit 8063396bf3
("Use OBJECT_DECLARE_SIMPLE_TYPE when possible").

Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/intc/armv7m_nvic.h | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)


Reviewed-by: Richard Henderson 

r~



diff --git a/include/hw/intc/armv7m_nvic.h b/include/hw/intc/armv7m_nvic.h
index 0180c7b0ca..07f9c21a5f 100644
--- a/include/hw/intc/armv7m_nvic.h
+++ b/include/hw/intc/armv7m_nvic.h
@@ -16,10 +16,7 @@
  #include "qom/object.h"
  
  #define TYPE_NVIC "armv7m_nvic"

-
-typedef struct NVICState NVICState;
-DECLARE_INSTANCE_CHECKER(NVICState, NVIC,
- TYPE_NVIC)
+OBJECT_DECLARE_SIMPLE_TYPE(NVICState, NVIC)
  
  /* Highest permitted number of exceptions (architectural limit) */

  #define NVIC_MAX_VECTORS 512





Re: [PATCH] x86: Don't add RNG seed to Linux cmdline for SEV guests

2023-02-07 Thread Jason A. Donenfeld
On Tue, Feb 07, 2023 at 07:33:09PM -0300, Jason A. Donenfeld wrote:
> On Tue, Feb 7, 2023 at 7:31 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Feb 07, 2023 at 07:17:58PM -0300, Jason A. Donenfeld wrote:
> > > On Tue, Feb 07, 2023 at 04:45:19PM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 07, 2023 at 08:41:16AM +, Dov Murik wrote:
> > > > > Recent feature to supply RNG seed to the guest kernel modifies the
> > > > > kernel command-line by adding extra data at its end; this breaks
> > > > > measured boot with SEV and OVMF, and possibly signed boot.
> > > > >
> > > > > Specifically SEV doesn't miss this feature because it uses UEFI/OVMF
> > > > > which has its own way of getting random seed (not to mention that
> > > > > getting the random seed from the untrusted host breaks the 
> > > > > confidential
> > > > > computing trust model).
> > > >
> > > > Nope - getting a random seed from an untrusted source should not break
> > > > anything assuming you also have some other randomness source.
> > > > If you don't then you have other problems.
> > > >
> > > > > Disable the RNG seed feature in SEV guests.
> > > > >
> > > > > Fixes: eac7a7791bb6 ("x86: don't let decompressed kernel image 
> > > > > clobber setup_data")
> > > > > Reported-by: Tom Lendacky 
> > > > > Signed-off-by: Dov Murik 
> > > > >
> > > > > ---
> > > > >
> > > > > There might be a need for a wider change to the ways setup_data 
> > > > > entries
> > > > > are handled in x86_load_linux(); here I just try to restore the
> > > > > situation for SEV guests prior to the addition of the SETUP_RNG_SEED
> > > > > entry.
> > > > >
> > > > > Recent discussions on other (safer?) ways to pass this setup_data 
> > > > > entry:
> > > > > [1] 
> > > > > https://lore.kernel.org/qemu-devel/da39abab9785aea2a2e7652ed6403b6268aeb31f.ca...@linux.ibm.com/
> > > > >
> > > > > Note that in qemu 7.2.0 this is broken as well -- there the
> > > > > SETUP_RNG_SEED entry is appended to the Linux kernel data (and 
> > > > > therefore
> > > > > modifies and breaks the measurement of the kernel in SEV measured 
> > > > > boot).
> > > > > A similar fix will be needed there (but I fear this patch cannot be
> > > > > applied as-is).
> > > >
> > > > So it's not a regression, is it?
> > >
> > > I think that note is actually wrong. There prior was the sev_enabled()
> > > check elsewhere, which should have worked. I remember we originally had
> > > that problem with 7.1 and fixed it. So this is a new issue. I'll take
> > > care of it.
> > >
> > > >
> > > > > ---
> > > > >  hw/i386/x86.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > > > > index eaff4227bd..e65a83f8df 100644
> > > > > --- a/hw/i386/x86.c
> > > > > +++ b/hw/i386/x86.c
> > > > > @@ -1103,7 +1103,7 @@ void x86_load_linux(X86MachineState *x86ms,
> > > > >  load_image_size(dtb_filename, setup_data->data, dtb_size);
> > > > >  }
> > > > >
> > > > > -if (!legacy_no_rng_seed && protocol >= 0x209) {
> > > > > +if (!legacy_no_rng_seed && protocol >= 0x209 && !sev_enabled()) {
> > > > >  setup_data_offset = cmdline_size;
> > > > >  cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
> > > > >  kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
> > > > >
> > > > > base-commit: 6661b8c7fe3f8b5687d2d90f7b4f3f23d70e3e8b
> > > >
> > > > I am beginning to think we have been hasty here. no rng seed
> > > > should have been then default and requested with a flag.
> > > > Then we'd avoid all this heartburn - and SEV might not be the
> > > > only workload broken.
> > > > Maybe not too late. Jason - objections?
> > >
> > > Yes, highly object. If it's not here by default, it's completely useless
> > > from my perspective and I'll just stop working on this feature. There's
> > > no reason we can't make this work. It's turned out to have a lot of
> > > technical landmines, but that doesn't mean it's infeasible. I'll keep
> > > hammering away at it.
> > >
> > > Anyway, I'll send a v2 of this patch, and also address another thing
> > > left out of the previous fix.
> > >
> > > (And meanwhile, James and hpa@ seem to be having some discussion about
> > > introducing an even better mechanism; we'll see if that materializes.)
> > >
> > > Jason
> >
> >
> > OK I guess ... objections to a reverse flag disabling this?
> > Will at least allow a work-around for sev and friends ...
> 
> I think we should generally try to make this work right as-is, without
> needing to introduce knobs. The SEV stuff seems really simple to fix.
> I'll have a 2 patch series for you in the next 20 minutes if all goes
> well.

Done: https://lore.kernel.org/all/20230207224847.94429-1-ja...@zx2c4.com/



[PATCH 1/2] x86: reset rng seed when reading cmdline, not kernel image

2023-02-07 Thread Jason A. Donenfeld
With eac7a7791bb6 ("x86: don't let decompressed kernel image clobber
setup_data"), the rng seed setup_data is now appended to the cmdline
file rather than the kernel image file. But in the process, the code to
re-randomize the seed when selecting the image was left out. So, change
the re-randomization over to trigger when selecting the cmdline, rather
than the kernel image.

Fixes: eac7a7791bb6 ("x86: don't let decompressed kernel image clobber 
setup_data")
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/x86.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index eaff4227bd..c6d7bf6db2 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1114,15 +1114,15 @@ void x86_load_linux(X86MachineState *x86ms,
 setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
 qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
 qemu_register_reset_nosnapshotload(reset_rng_seed, setup_data);
-fw_cfg_add_bytes_callback(fw_cfg, FW_CFG_KERNEL_DATA, reset_rng_seed, 
NULL,
-  setup_data, kernel, kernel_size, true);
+fw_cfg_add_bytes_callback(fw_cfg, FW_CFG_CMDLINE_DATA, reset_rng_seed, 
NULL,
+  setup_data, kernel_cmdline, cmdline_size, 
true);
 } else {
-fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
+fw_cfg_add_bytes(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline, 
cmdline_size);
 }
 
+fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
 fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
 fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, cmdline_size);
-fw_cfg_add_bytes(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline, 
cmdline_size);
 sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
 sev_load_ctx.cmdline_size = cmdline_size;
 
-- 
2.39.1




[PATCH 0/2] x86: fix fallout from switching setup_data from kernel image to cmdline

2023-02-07 Thread Jason A. Donenfeld
With eac7a7791bb6 ("x86: don't let decompressed kernel image clobber
setup_data"), the rng seed setup_data is now appended to the cmdline
file rather than the kernel image file. In the process of doing that,
two things were left out: the check for sev_enabled(), and resetting the
RNG seed when selecting the cmdline. This short series fixes those up.

Cc: "Michael S. Tsirkin" 
Cc: Dov Murik 
Cc: Tom Lendacky 
Cc: Gerd Hoffmann 
Cc: "Daniel P. Berrangé" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 

Dov Murik (1):
  x86: don't append setup_data to cmdline for SEV guests

Jason A. Donenfeld (1):
  x86: reset rng seed when reading cmdline, not kernel image

 hw/i386/x86.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

-- 
2.39.1




[PATCH 2/2] x86: don't append setup_data to cmdline for SEV guests

2023-02-07 Thread Jason A. Donenfeld
From: Dov Murik 

Modifying the cmdline by appending setup_data breaks measured boot with
SEV and OVMF, and possibly signed boot. Previously this was disabled
when appending to the kernel image, but with eac7a7791bb6 ("x86: don't
let decompressed kernel image clobber setup_data"), this was changed to
the cmdline file instead, with the sev_enabled() check left out.

Fixes: eac7a7791bb6 ("x86: don't let decompressed kernel image clobber 
setup_data")
Reported-by: Tom Lendacky 
Signed-off-by: Dov Murik 
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/x86.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index c6d7bf6db2..80a1678acd 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1079,7 +1079,7 @@ void x86_load_linux(X86MachineState *x86ms,
 fclose(f);
 
 /* append dtb to kernel */
-if (dtb_filename) {
+if (dtb_filename && !sev_enabled()) {
 if (protocol < 0x209) {
 fprintf(stderr, "qemu: Linux kernel too old to load a dtb\n");
 exit(1);
@@ -1103,7 +1103,7 @@ void x86_load_linux(X86MachineState *x86ms,
 load_image_size(dtb_filename, setup_data->data, dtb_size);
 }
 
-if (!legacy_no_rng_seed && protocol >= 0x209) {
+if (!legacy_no_rng_seed && protocol >= 0x209 && !sev_enabled()) {
 setup_data_offset = cmdline_size;
 cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
 kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
-- 
2.39.1




Re: [PATCH v3 9/9] target/i386/tcg/sysemu/tcg-cpu: Avoid own opinion about smram size

2023-02-07 Thread Bernhard Beschow



Am 5. Februar 2023 11:26:10 UTC schrieb "Philippe Mathieu-Daudé" 
:
>On 4/2/23 16:10, Bernhard Beschow wrote:
>> When setting up the CPU's smram memory region alias, the code currently
>> assumes that the smram size is 4 GiB. While this is true, it repeats a
>> decision made elsewhere which seems redundant and prone to
>> inconsistencies. Avoid this by reusing whatever size the smram region
>> was set to.
>> 
>> Signed-off-by: Bernhard Beschow 
>> ---
>>   target/i386/tcg/sysemu/tcg-cpu.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/target/i386/tcg/sysemu/tcg-cpu.c 
>> b/target/i386/tcg/sysemu/tcg-cpu.c
>> index c223c0fe9b..8f5ce6093c 100644
>> --- a/target/i386/tcg/sysemu/tcg-cpu.c
>> +++ b/target/i386/tcg/sysemu/tcg-cpu.c
>> @@ -22,7 +22,6 @@
>>   #include "tcg/helper-tcg.h"
>> #include "sysemu/sysemu.h"
>> -#include "qemu/units.h"
>>   #include "exec/address-spaces.h"
>> #include "tcg/tcg-cpu.h"
>> @@ -36,7 +35,7 @@ static void tcg_cpu_machine_done(Notifier *n, void *unused)
>>   if (smram) {
>>   cpu->smram = g_new(MemoryRegion, 1);
>>   memory_region_init_alias(cpu->smram, OBJECT(cpu), "smram",
>> - smram, 0, 4 * GiB);
>> + smram, 0, memory_region_size(smram));
>
>Or define SMRAM_REGION_SIZE and use it?

This could still lead to inconsistencies if the size was changed in one place 
only, no?

>
>(subject header can be shortened to "target/i386:").

Okay.



Re: [PATCH v3 8/9] hw/i386/x86: Make TYPE_X86_MACHINE the owner of smram

2023-02-07 Thread Bernhard Beschow



Am 7. Februar 2023 18:34:40 UTC schrieb Juan Quintela :
>Bernhard Beschow  wrote:
>v> On Mon, Feb 6, 2023 at 11:06 AM Juan Quintela  wrote:
>>
>>> Philippe Mathieu-Daudé  wrote:
>>> > On 4/2/23 16:10, Bernhard Beschow wrote:
>>> >> Treat the smram MemoryRegion analoguous to other memory regions such as
>>> >> ram, pci, io, ... , making the used memory regions more explicit when
>>> >> instantiating q35 or i440fx.
>>> >> Note that the q35 device uses these memory regions only during the
>>> >> realize phase which suggests that it shouldn't be the owner of smram.
>>> >
>>> > Few years ago I tried something similar and it wasn't accepted because
>>> > the MR owner name is used in the migration stream, so this was breaking
>>> > migrating from older machines.
>>>
>>> I don't remember the details O:-)
>>>
>>> Migration code, really depends on RAMBlocks names, not memory region
>>> names.  But as far as I remember, that don't matter too much because the
>>> memory region names ends tangled quite a bit with the RAMBlock name, right?
>>>
>>> > Adding David/Juan for double-checking that.
>>>
>>> trace_vmstate_save(se->idstr, se->vmsd ? se->vmsd->name : "(old)");
>>>
>>> You can try to enable this trace and see that every section has the same
>>> name with and without your change (i.e. that memory region name is not
>>> seen by the migration stream).
>>>
>>> But that is the only help that I can came with.
>>>
>>> The code that you are changing (smram) is something that I don't know
>>> about to give you more help.
>>>
>>> Looking at the patch, it looks that the name was before and now the
>>> "sram", so perhaps it could help.  But I don't know.
>>>
>>> In the i440fx you say that you only use it until realize, so you should
>>> be safe.
>>>
>>> For q35, it is not clear to me.
>>>
>>> If the trace don't show new names, I will just try:
>>> - migrate a i440fx machine from binary without your patch to one with
>>>   your patch
>>> - the same for q35.
>>>
>>> And depending on the result, we can go from there.
>>>
>>
>> Thanks for the pointers, Juan!
>>
>> I took some inspiration and created four migration files,
>> {pc,q35}-{before,after}.mig by running `qemu-system-x86_64 -M {pc,q35} -S`
>> with qemu built from master and from my branch. Then I basically ran
>>  `./scripts/analyze-migration.py -d desc -f *.mig > *.json` on the four
>> files and compared the diffs. Both diffs were empty. AFAIU this proves that
>> there is no binary change, right?
>
>We have two options here:
>- you are right (my opinion)
>- you got a bug in analyze-migration.py script and you have a new job.

I take option one ;)

>But I think you can send this patch.

The patches still need reviews.

Best regards,
Bernhard
>
>Later, Juan.
>
>> Best regards,
>> Bernhard
>>>
>>>
>>> Later, Juan.
>>>
>>>
>



Re: [PATCH v9 07/14] vfio/migration: Block multiple devices migration

2023-02-07 Thread Alex Williamson
On Mon, 6 Feb 2023 14:31:30 +0200
Avihai Horon  wrote:

> Currently VFIO migration doesn't implement some kind of intermediate
> quiescent state in which P2P DMAs are quiesced before stopping or
> running the device. This can cause problems in multi-device migration
> where the devices are doing P2P DMAs, since the devices are not stopped
> together at the same time.
> 
> Until such support is added, block migration of multiple devices.
> 
> Signed-off-by: Avihai Horon 
> ---
>  include/hw/vfio/vfio-common.h |  2 ++
>  hw/vfio/common.c  | 51 +++
>  hw/vfio/migration.c   |  6 +
>  3 files changed, 59 insertions(+)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index e573f5a9f1..56b1683824 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -218,6 +218,8 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) 
> VFIOGroupList;
>  extern VFIOGroupList vfio_group_list;
>  
>  bool vfio_mig_active(void);
> +int vfio_block_multiple_devices_migration(Error **errp);
> +void vfio_unblock_multiple_devices_migration(void);
>  int64_t vfio_mig_bytes_transferred(void);
>  
>  #ifdef CONFIG_LINUX
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 3a35f4afad..01db41b735 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -41,6 +41,7 @@
>  #include "qapi/error.h"
>  #include "migration/migration.h"
>  #include "migration/misc.h"
> +#include "migration/blocker.h"
>  #include "sysemu/tpm.h"
>  
>  VFIOGroupList vfio_group_list =
> @@ -337,6 +338,56 @@ bool vfio_mig_active(void)
>  return true;
>  }
>  
> +Error *multiple_devices_migration_blocker;
> +
> +static unsigned int vfio_migratable_device_num(void)
> +{
> +VFIOGroup *group;
> +VFIODevice *vbasedev;
> +unsigned int device_num = 0;
> +
> +QLIST_FOREACH(group, _group_list, next) {
> +QLIST_FOREACH(vbasedev, >device_list, next) {
> +if (vbasedev->migration) {
> +device_num++;
> +}
> +}
> +}
> +
> +return device_num;
> +}
> +
> +int vfio_block_multiple_devices_migration(Error **errp)
> +{
> +int ret;
> +
> +if (vfio_migratable_device_num() != 2) {
> +return 0;
> +}
> +
> +error_setg(_devices_migration_blocker,
> +   "Migration is currently not supported with multiple "
> +   "VFIO devices");
> +ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
> +if (ret < 0) {
> +error_free(multiple_devices_migration_blocker);
> +multiple_devices_migration_blocker = NULL;
> +}
> +
> +return ret;
> +}
> +
> +void vfio_unblock_multiple_devices_migration(void)
> +{
> +if (vfio_migratable_device_num() != 2) {
> +return;
> +}
> +
> +migrate_del_blocker(multiple_devices_migration_blocker);
> +error_free(multiple_devices_migration_blocker);
> +multiple_devices_migration_blocker = NULL;
> +}

A couple awkward things here.  First I wish we could do something
cleaner or more intuitive than the != 2 test.  I get that we're trying
to do this on the addition of the 2nd device supporting migration, or
the removal of the next to last device independent of all other devices,
but I wonder if it wouldn't be better to remove the multiple-device
blocker after migration is torn down for the device so we can test
device >1 or ==1 in combination with whether
multiple_devices_migration_blocker is NULL.

Which comes to the second awkwardness, if we fail to add the blocker we
free and clear the blocker, but when we tear down the device due to that
failure we'll remove the blocker that doesn't exist, free NULL, and
clear it again.  Thanks to the glib slist the migration blocker is
using, I think that all works, but I'd rather not be dependent on that
implementation to avoid a segfault here.  Incorporating a test of
multiple_devices_migration_blocker as above would avoid this too.  Thanks,

Alex

> +
>  static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>  {
>  VFIOGroup *group;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 552c2313b2..8085a4ff44 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -880,6 +880,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error 
> **errp)
>  goto add_blocker;
>  }
>  
> +ret = vfio_block_multiple_devices_migration(errp);
> +if (ret) {
> +return ret;
> +}
> +
>  trace_vfio_migration_probe(vbasedev->name, info->index);
>  g_free(info);
>  return 0;
> @@ -902,6 +907,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>  if (vbasedev->migration) {
>  VFIOMigration *migration = vbasedev->migration;
>  
> +vfio_unblock_multiple_devices_migration();
>  remove_migration_state_change_notifier(>migration_state);
>  qemu_del_vm_change_state_handler(migration->vm_state);
>  

Re: [PATCH] x86: Don't add RNG seed to Linux cmdline for SEV guests

2023-02-07 Thread Jason A. Donenfeld
On Tue, Feb 7, 2023 at 7:31 PM Michael S. Tsirkin  wrote:
>
> On Tue, Feb 07, 2023 at 07:17:58PM -0300, Jason A. Donenfeld wrote:
> > On Tue, Feb 07, 2023 at 04:45:19PM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Feb 07, 2023 at 08:41:16AM +, Dov Murik wrote:
> > > > Recent feature to supply RNG seed to the guest kernel modifies the
> > > > kernel command-line by adding extra data at its end; this breaks
> > > > measured boot with SEV and OVMF, and possibly signed boot.
> > > >
> > > > Specifically SEV doesn't miss this feature because it uses UEFI/OVMF
> > > > which has its own way of getting random seed (not to mention that
> > > > getting the random seed from the untrusted host breaks the confidential
> > > > computing trust model).
> > >
> > > Nope - getting a random seed from an untrusted source should not break
> > > anything assuming you also have some other randomness source.
> > > If you don't then you have other problems.
> > >
> > > > Disable the RNG seed feature in SEV guests.
> > > >
> > > > Fixes: eac7a7791bb6 ("x86: don't let decompressed kernel image clobber 
> > > > setup_data")
> > > > Reported-by: Tom Lendacky 
> > > > Signed-off-by: Dov Murik 
> > > >
> > > > ---
> > > >
> > > > There might be a need for a wider change to the ways setup_data entries
> > > > are handled in x86_load_linux(); here I just try to restore the
> > > > situation for SEV guests prior to the addition of the SETUP_RNG_SEED
> > > > entry.
> > > >
> > > > Recent discussions on other (safer?) ways to pass this setup_data entry:
> > > > [1] 
> > > > https://lore.kernel.org/qemu-devel/da39abab9785aea2a2e7652ed6403b6268aeb31f.ca...@linux.ibm.com/
> > > >
> > > > Note that in qemu 7.2.0 this is broken as well -- there the
> > > > SETUP_RNG_SEED entry is appended to the Linux kernel data (and therefore
> > > > modifies and breaks the measurement of the kernel in SEV measured boot).
> > > > A similar fix will be needed there (but I fear this patch cannot be
> > > > applied as-is).
> > >
> > > So it's not a regression, is it?
> >
> > I think that note is actually wrong. There prior was the sev_enabled()
> > check elsewhere, which should have worked. I remember we originally had
> > that problem with 7.1 and fixed it. So this is a new issue. I'll take
> > care of it.
> >
> > >
> > > > ---
> > > >  hw/i386/x86.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > > > index eaff4227bd..e65a83f8df 100644
> > > > --- a/hw/i386/x86.c
> > > > +++ b/hw/i386/x86.c
> > > > @@ -1103,7 +1103,7 @@ void x86_load_linux(X86MachineState *x86ms,
> > > >  load_image_size(dtb_filename, setup_data->data, dtb_size);
> > > >  }
> > > >
> > > > -if (!legacy_no_rng_seed && protocol >= 0x209) {
> > > > +if (!legacy_no_rng_seed && protocol >= 0x209 && !sev_enabled()) {
> > > >  setup_data_offset = cmdline_size;
> > > >  cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
> > > >  kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
> > > >
> > > > base-commit: 6661b8c7fe3f8b5687d2d90f7b4f3f23d70e3e8b
> > >
> > > I am beginning to think we have been hasty here. no rng seed
> > > should have been then default and requested with a flag.
> > > Then we'd avoid all this heartburn - and SEV might not be the
> > > only workload broken.
> > > Maybe not too late. Jason - objections?
> >
> > Yes, highly object. If it's not here by default, it's completely useless
> > from my perspective and I'll just stop working on this feature. There's
> > no reason we can't make this work. It's turned out to have a lot of
> > technical landmines, but that doesn't mean it's infeasible. I'll keep
> > hammering away at it.
> >
> > Anyway, I'll send a v2 of this patch, and also address another thing
> > left out of the previous fix.
> >
> > (And meanwhile, James and hpa@ seem to be having some discussion about
> > introducing an even better mechanism; we'll see if that materializes.)
> >
> > Jason
>
>
> OK I guess ... objections to a reverse flag disabling this?
> Will at least allow a work-around for sev and friends ...

I think we should generally try to make this work right as-is, without
needing to introduce knobs. The SEV stuff seems really simple to fix.
I'll have a 2 patch series for you in the next 20 minutes if all goes
well.



Re: [PATCH] x86: Don't add RNG seed to Linux cmdline for SEV guests

2023-02-07 Thread Michael S. Tsirkin
On Tue, Feb 07, 2023 at 07:17:58PM -0300, Jason A. Donenfeld wrote:
> On Tue, Feb 07, 2023 at 04:45:19PM -0500, Michael S. Tsirkin wrote:
> > On Tue, Feb 07, 2023 at 08:41:16AM +, Dov Murik wrote:
> > > Recent feature to supply RNG seed to the guest kernel modifies the
> > > kernel command-line by adding extra data at its end; this breaks
> > > measured boot with SEV and OVMF, and possibly signed boot.
> > > 
> > > Specifically SEV doesn't miss this feature because it uses UEFI/OVMF
> > > which has its own way of getting random seed (not to mention that
> > > getting the random seed from the untrusted host breaks the confidential
> > > computing trust model).
> > 
> > Nope - getting a random seed from an untrusted source should not break
> > anything assuming you also have some other randomness source.
> > If you don't then you have other problems.
> > 
> > > Disable the RNG seed feature in SEV guests.
> > > 
> > > Fixes: eac7a7791bb6 ("x86: don't let decompressed kernel image clobber 
> > > setup_data")
> > > Reported-by: Tom Lendacky 
> > > Signed-off-by: Dov Murik 
> > > 
> > > ---
> > > 
> > > There might be a need for a wider change to the ways setup_data entries
> > > are handled in x86_load_linux(); here I just try to restore the
> > > situation for SEV guests prior to the addition of the SETUP_RNG_SEED
> > > entry.
> > > 
> > > Recent discussions on other (safer?) ways to pass this setup_data entry:
> > > [1] 
> > > https://lore.kernel.org/qemu-devel/da39abab9785aea2a2e7652ed6403b6268aeb31f.ca...@linux.ibm.com/
> > > 
> > > Note that in qemu 7.2.0 this is broken as well -- there the
> > > SETUP_RNG_SEED entry is appended to the Linux kernel data (and therefore
> > > modifies and breaks the measurement of the kernel in SEV measured boot).
> > > A similar fix will be needed there (but I fear this patch cannot be
> > > applied as-is).
> > 
> > So it's not a regression, is it?
> 
> I think that note is actually wrong. There prior was the sev_enabled()
> check elsewhere, which should have worked. I remember we originally had
> that problem with 7.1 and fixed it. So this is a new issue. I'll take
> care of it.
> 
> > 
> > > ---
> > >  hw/i386/x86.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > > index eaff4227bd..e65a83f8df 100644
> > > --- a/hw/i386/x86.c
> > > +++ b/hw/i386/x86.c
> > > @@ -1103,7 +1103,7 @@ void x86_load_linux(X86MachineState *x86ms,
> > >  load_image_size(dtb_filename, setup_data->data, dtb_size);
> > >  }
> > >  
> > > -if (!legacy_no_rng_seed && protocol >= 0x209) {
> > > +if (!legacy_no_rng_seed && protocol >= 0x209 && !sev_enabled()) {
> > >  setup_data_offset = cmdline_size;
> > >  cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
> > >  kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
> > > 
> > > base-commit: 6661b8c7fe3f8b5687d2d90f7b4f3f23d70e3e8b
> > 
> > I am beginning to think we have been hasty here. no rng seed
> > should have been then default and requested with a flag.
> > Then we'd avoid all this heartburn - and SEV might not be the
> > only workload broken.
> > Maybe not too late. Jason - objections?
> 
> Yes, highly object. If it's not here by default, it's completely useless
> from my perspective and I'll just stop working on this feature. There's
> no reason we can't make this work. It's turned out to have a lot of
> technical landmines, but that doesn't mean it's infeasible. I'll keep
> hammering away at it.
> 
> Anyway, I'll send a v2 of this patch, and also address another thing
> left out of the previous fix.
> 
> (And meanwhile, James and hpa@ seem to be having some discussion about
> introducing an even better mechanism; we'll see if that materializes.)
> 
> Jason


OK I guess ... objections to a reverse flag disabling this?
Will at least allow a work-around for sev and friends ...

-- 
MST




Re: [PATCH] x86: Don't add RNG seed to Linux cmdline for SEV guests

2023-02-07 Thread Jason A. Donenfeld
On Tue, Feb 07, 2023 at 04:45:19PM -0500, Michael S. Tsirkin wrote:
> On Tue, Feb 07, 2023 at 08:41:16AM +, Dov Murik wrote:
> > Recent feature to supply RNG seed to the guest kernel modifies the
> > kernel command-line by adding extra data at its end; this breaks
> > measured boot with SEV and OVMF, and possibly signed boot.
> > 
> > Specifically SEV doesn't miss this feature because it uses UEFI/OVMF
> > which has its own way of getting random seed (not to mention that
> > getting the random seed from the untrusted host breaks the confidential
> > computing trust model).
> 
> Nope - getting a random seed from an untrusted source should not break
> anything assuming you also have some other randomness source.
> If you don't then you have other problems.
> 
> > Disable the RNG seed feature in SEV guests.
> > 
> > Fixes: eac7a7791bb6 ("x86: don't let decompressed kernel image clobber 
> > setup_data")
> > Reported-by: Tom Lendacky 
> > Signed-off-by: Dov Murik 
> > 
> > ---
> > 
> > There might be a need for a wider change to the ways setup_data entries
> > are handled in x86_load_linux(); here I just try to restore the
> > situation for SEV guests prior to the addition of the SETUP_RNG_SEED
> > entry.
> > 
> > Recent discussions on other (safer?) ways to pass this setup_data entry:
> > [1] 
> > https://lore.kernel.org/qemu-devel/da39abab9785aea2a2e7652ed6403b6268aeb31f.ca...@linux.ibm.com/
> > 
> > Note that in qemu 7.2.0 this is broken as well -- there the
> > SETUP_RNG_SEED entry is appended to the Linux kernel data (and therefore
> > modifies and breaks the measurement of the kernel in SEV measured boot).
> > A similar fix will be needed there (but I fear this patch cannot be
> > applied as-is).
> 
> So it's not a regression, is it?

I think that note is actually wrong. There prior was the sev_enabled()
check elsewhere, which should have worked. I remember we originally had
that problem with 7.1 and fixed it. So this is a new issue. I'll take
care of it.

> 
> > ---
> >  hw/i386/x86.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > index eaff4227bd..e65a83f8df 100644
> > --- a/hw/i386/x86.c
> > +++ b/hw/i386/x86.c
> > @@ -1103,7 +1103,7 @@ void x86_load_linux(X86MachineState *x86ms,
> >  load_image_size(dtb_filename, setup_data->data, dtb_size);
> >  }
> >  
> > -if (!legacy_no_rng_seed && protocol >= 0x209) {
> > +if (!legacy_no_rng_seed && protocol >= 0x209 && !sev_enabled()) {
> >  setup_data_offset = cmdline_size;
> >  cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
> >  kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
> > 
> > base-commit: 6661b8c7fe3f8b5687d2d90f7b4f3f23d70e3e8b
> 
> I am beginning to think we have been hasty here. no rng seed
> should have been then default and requested with a flag.
> Then we'd avoid all this heartburn - and SEV might not be the
> only workload broken.
> Maybe not too late. Jason - objections?

Yes, highly object. If it's not here by default, it's completely useless
from my perspective and I'll just stop working on this feature. There's
no reason we can't make this work. It's turned out to have a lot of
technical landmines, but that doesn't mean it's infeasible. I'll keep
hammering away at it.

Anyway, I'll send a v2 of this patch, and also address another thing
left out of the previous fix.

(And meanwhile, James and hpa@ seem to be having some discussion about
introducing an even better mechanism; we'll see if that materializes.)

Jason



Re: [PULL 00/25] aspeed queue

2023-02-07 Thread Peter Maydell
On Tue, 7 Feb 2023 at 10:07, Cédric Le Goater  wrote:
>
> The following changes since commit 6661b8c7fe3f8b5687d2d90f7b4f3f23d70e3e8b:
>
>   Merge tag 'pull-ppc-20230205' of https://gitlab.com/danielhb/qemu into 
> staging (2023-02-05 16:49:09 +)
>
> are available in the Git repository at:
>
>   https://github.com/legoater/qemu/ tags/pull-aspeed-20230207
>
> for you to fetch changes up to bf81b8f8acda4f1f774adc5f8e76225d472c6ae5:
>
>   aspeed/sdmc: Drop unnecessary scu include (2023-02-07 09:02:05 +0100)
>
> 
> aspeed queue:
>
> * various small cleanups and fixes
> * new variant of the supermicrox11-bmc machine using an ast2500-a1 SoC
> * at24c_eeprom extension to define eeprom contents with static arrays
> * ast10x0 model and test improvements
> * avocado update of images to use the latest


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM



Re: [PATCH] virtio-rng-pci: fix transitional migration compat for vectors

2023-02-07 Thread Michael S. Tsirkin
On Tue, Feb 07, 2023 at 06:31:53PM +, Dr. David Alan Gilbert wrote:
> * Philippe Mathieu-Daudé (phi...@linaro.org) wrote:
> > On 7/2/23 18:49, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > In bad9c5a5166fd5e3a892b7b0477cf2f4bd3a959a I fixed the virito-rng-pci
> > 
> > Typo "virtio-rng-pci".
> 
> I've made that typo SO many times...



Add this in .vimrc

ab virito virtio

and never make this mistake again ... as long as you use vim ;)


> > > migration compatibility, but it was discovered that we also need to fix
> > > the other aliases of the device for the transitional cases.
> > > 
> > > Fixes: 9ea02e8f1 ('virtio-rng-pci: Allow setting nvectors, so we can use 
> > > MSI-X')
> > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=2162569
> > > Signed-off-by: Dr. David Alan Gilbert 
> > > ---
> > >   hw/core/machine.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index b5cd42cd8c..4627b274d9 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -49,6 +49,8 @@ const size_t hw_compat_7_2_len = 
> > > G_N_ELEMENTS(hw_compat_7_2);
> > >   GlobalProperty hw_compat_7_1[] = {
> > >   { "virtio-device", "queue_reset", "false" },
> > >   { "virtio-rng-pci", "vectors", "0" },
> > > +{ "virtio-rng-pci-transitional", "vectors", "0" },
> > > +{ "virtio-rng-pci-non-transitional", "vectors", "0" },
> > 
> > Ouch :(
> > 
> > Reviewed-by: Philippe Mathieu-Daudé 
> 
> Thanks!
> 
> Dave
> 
> > 
> > >   };
> > >   const size_t hw_compat_7_1_len = G_N_ELEMENTS(hw_compat_7_1);
> > 
> -- 
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH] virtio-rng-pci: fix transitional migration compat for vectors

2023-02-07 Thread Michael S. Tsirkin
On Tue, Feb 07, 2023 at 05:49:44PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> In bad9c5a5166fd5e3a892b7b0477cf2f4bd3a959a I fixed the virito-rng-pci

virtio?

> migration compatibility, but it was discovered that we also need to fix
> the other aliases of the device for the transitional cases.
> 
> Fixes: 9ea02e8f1 ('virtio-rng-pci: Allow setting nvectors, so we can use 
> MSI-X')
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=2162569
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  hw/core/machine.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index b5cd42cd8c..4627b274d9 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -49,6 +49,8 @@ const size_t hw_compat_7_2_len = 
> G_N_ELEMENTS(hw_compat_7_2);
>  GlobalProperty hw_compat_7_1[] = {
>  { "virtio-device", "queue_reset", "false" },
>  { "virtio-rng-pci", "vectors", "0" },
> +{ "virtio-rng-pci-transitional", "vectors", "0" },
> +{ "virtio-rng-pci-non-transitional", "vectors", "0" },
>  };
>  const size_t hw_compat_7_1_len = G_N_ELEMENTS(hw_compat_7_1);
>  
> -- 
> 2.39.1




Re: [PATCH V2] memory: flat section iterator

2023-02-07 Thread Peter Xu
On Tue, Feb 07, 2023 at 04:28:49PM -0500, Steven Sistare wrote:
> On 2/7/2023 3:10 PM, Peter Xu wrote:
> > On Tue, Feb 07, 2023 at 11:03:29AM -0800, Steve Sistare wrote:
> >> Add an iterator over the sections of a flattened address space.
> >> This will be needed by cpr to issue vfio ioctl's on the same memory
> >> ranges that are already programmed.
> > 
> > Should this better be proposed with the context of using it?  Or I don't
> > know how to justify this new interface is needed.
> > 
> > For example, any explanations on why memory region listeners cannot work?
> 
> For context, the new interfaces is used in the patch
>   "vfio-pci: recover from unmap-all-vaddr failure"
> in the original live update series:
>   
> https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/
> 
> More succinctly, the memory region listeners already ran, and the vfio 
> callbacks created vfio memory regions.  Now we want to perform live update,
> and are in steady state, so no listeners are being called.  We need the
> flat section iterator to reproduce the same addresses and extents that were
> produced by the listeners, to make a state change on each distinct vfio
> memory region.

Okay it makes sense, thanks.

I think it'll be the same as one memory_listener_register() followed by an
unregister with region_add() hooked only, but maybe we should avoid
fiddling with the global list of listeners when possible.

If you plan to keep posting it separately, would you add some information
into the commit message?  With that enhanced, please feel free to add:

Acked-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH] x86: Don't add RNG seed to Linux cmdline for SEV guests

2023-02-07 Thread Michael S. Tsirkin
On Tue, Feb 07, 2023 at 08:41:16AM +, Dov Murik wrote:
> Recent feature to supply RNG seed to the guest kernel modifies the
> kernel command-line by adding extra data at its end; this breaks
> measured boot with SEV and OVMF, and possibly signed boot.
> 
> Specifically SEV doesn't miss this feature because it uses UEFI/OVMF
> which has its own way of getting random seed (not to mention that
> getting the random seed from the untrusted host breaks the confidential
> computing trust model).

Nope - getting a random seed from an untrusted source should not break
anything assuming you also have some other randomness source.
If you don't then you have other problems.

> Disable the RNG seed feature in SEV guests.
> 
> Fixes: eac7a7791bb6 ("x86: don't let decompressed kernel image clobber 
> setup_data")
> Reported-by: Tom Lendacky 
> Signed-off-by: Dov Murik 
> 
> ---
> 
> There might be a need for a wider change to the ways setup_data entries
> are handled in x86_load_linux(); here I just try to restore the
> situation for SEV guests prior to the addition of the SETUP_RNG_SEED
> entry.
> 
> Recent discussions on other (safer?) ways to pass this setup_data entry:
> [1] 
> https://lore.kernel.org/qemu-devel/da39abab9785aea2a2e7652ed6403b6268aeb31f.ca...@linux.ibm.com/
> 
> Note that in qemu 7.2.0 this is broken as well -- there the
> SETUP_RNG_SEED entry is appended to the Linux kernel data (and therefore
> modifies and breaks the measurement of the kernel in SEV measured boot).
> A similar fix will be needed there (but I fear this patch cannot be
> applied as-is).

So it's not a regression, is it?

> ---
>  hw/i386/x86.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index eaff4227bd..e65a83f8df 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -1103,7 +1103,7 @@ void x86_load_linux(X86MachineState *x86ms,
>  load_image_size(dtb_filename, setup_data->data, dtb_size);
>  }
>  
> -if (!legacy_no_rng_seed && protocol >= 0x209) {
> +if (!legacy_no_rng_seed && protocol >= 0x209 && !sev_enabled()) {
>  setup_data_offset = cmdline_size;
>  cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
>  kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
> 
> base-commit: 6661b8c7fe3f8b5687d2d90f7b4f3f23d70e3e8b

I am beginning to think we have been hasty here. no rng seed
should have been then default and requested with a flag.
Then we'd avoid all this heartburn - and SEV might not be the
only workload broken.
Maybe not too late. Jason - objections?

> -- 
> 2.25.1




Re: [PATCH V2] python/machine: QEMUMachine reopen_qmp_connection

2023-02-07 Thread Steven Sistare
On 2/7/2023 4:23 PM, John Snow wrote:
> On Tue, Feb 7, 2023 at 4:04 PM Steven Sistare  
> wrote:
>>
>> On 2/7/2023 3:28 PM, John Snow wrote:
>>> On Tue, Feb 7, 2023 at 2:03 PM Steve Sistare  
>>> wrote:

 Provide reopen_qmp_connection() to reopen a closed monitor connection.
 This will be needed by cpr, because qemu exec closes the monitor socket.

 Signed-off-by: Steve Sistare 
 Reviewed-by: John Snow 
 ---
  python/qemu/machine/machine.py | 10 ++
  1 file changed, 10 insertions(+)

 diff --git a/python/qemu/machine/machine.py 
 b/python/qemu/machine/machine.py
 index ef94dcf..557209a 100644
 --- a/python/qemu/machine/machine.py
 +++ b/python/qemu/machine/machine.py
 @@ -501,6 +501,16 @@ def _close_qmp_connection(self) -> None:
  finally:
  self._qmp_connection = None

 +def reopen_qmp_connection(self) -> None:
 +"""Close and re-open the QMP connection."""
 +self._close_qmp_connection()
 +self._qmp_connection = QEMUMonitorProtocol(
 +self._monitor_address,
 +server=True,
 +nickname=self._name
 +)
 +self._qmp.accept(self._qmp_timer)
 +
  def _early_cleanup(self) -> None:
  """
  Perform any cleanup that needs to happen before the VM exits.
 --
 1.8.3.1

>>>
>>> This code is still mechanically fine as far as I can tell, but I lost
>>> the plot on why it's needed - Can you please elaborate for me on what
>>> you mean by "qemu exec will close the socket"?
>>>
>>> (R-B still stands, but since I am rewriting machine.py to be natively
>>> async, I should be aware of your new use case.)
>>
>> Sure, thanks for asking.
>>
>> During cpr (aka live update), the old qemu process directly execv's the new
>> qemu binary.  The monitor socket fd is marked cloexec, so the kernel closes 
>> it
>> during execv.
>>
>> - Steve
> 
> Oho, I see.
> 
> I wonder if you are helped at all by some of Marc-Andre's recent
> changes to use socketpair() for QMP connections with machine.py.
> Depending on how those sockets are managed, it might be possible to
> hold onto one of them without having it closed alongside the old QEMU
> process. Or, if that doesn't help, double-check that it doesn't *hurt*
> you either. It should be the default as of about a week ago now --
> instead of using a local unix socket, we use socketpair and pass the
> FD to help alleviate some race conditions.
> 
> (Again, I'm fine with this patch either way, just a heads up.)

Thanks.  Preserving the monitor connection and state across exec requires more 
work though, because of buffering, partially processed messages, and other 
state.  I tried and abandoned that a while ago.  Re-opening has been robust.

- Steve



Re: [PATCH V2] migration: simplify blockers

2023-02-07 Thread Peter Xu
On Tue, Feb 07, 2023 at 04:17:34PM -0500, Steven Sistare wrote:
> >> + * *@reasonp is freed and set to NULL if failure is returned.
> >> + * On success, the caller no longer owns *@reasonp and must not free it.
> > 
> > This statement reads weird.  IMHO the caller still owns @reasonp, but if it
> > succeeded it should only free it with a migrate_del_blocker() later.
> 
> How about:
>   On success, the caller must not free @reasonp, except by calling 
> migrate_del_blocker.

LGTM, thanks.

-- 
Peter Xu




Re: [PATCH V2] memory: flat section iterator

2023-02-07 Thread Steven Sistare
On 2/7/2023 3:10 PM, Peter Xu wrote:
> On Tue, Feb 07, 2023 at 11:03:29AM -0800, Steve Sistare wrote:
>> Add an iterator over the sections of a flattened address space.
>> This will be needed by cpr to issue vfio ioctl's on the same memory
>> ranges that are already programmed.
> 
> Should this better be proposed with the context of using it?  Or I don't
> know how to justify this new interface is needed.
> 
> For example, any explanations on why memory region listeners cannot work?

For context, the new interfaces is used in the patch
  "vfio-pci: recover from unmap-all-vaddr failure"
in the original live update series:
  
https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/

More succinctly, the memory region listeners already ran, and the vfio 
callbacks created vfio memory regions.  Now we want to perform live update,
and are in steady state, so no listeners are being called.  We need the
flat section iterator to reproduce the same addresses and extents that were
produced by the listeners, to make a state change on each distinct vfio
memory region.

- Steve



Re: [PATCH V2] python/machine: QEMUMachine reopen_qmp_connection

2023-02-07 Thread John Snow
On Tue, Feb 7, 2023 at 4:04 PM Steven Sistare  wrote:
>
> On 2/7/2023 3:28 PM, John Snow wrote:
> > On Tue, Feb 7, 2023 at 2:03 PM Steve Sistare  
> > wrote:
> >>
> >> Provide reopen_qmp_connection() to reopen a closed monitor connection.
> >> This will be needed by cpr, because qemu exec closes the monitor socket.
> >>
> >> Signed-off-by: Steve Sistare 
> >> Reviewed-by: John Snow 
> >> ---
> >>  python/qemu/machine/machine.py | 10 ++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/python/qemu/machine/machine.py 
> >> b/python/qemu/machine/machine.py
> >> index ef94dcf..557209a 100644
> >> --- a/python/qemu/machine/machine.py
> >> +++ b/python/qemu/machine/machine.py
> >> @@ -501,6 +501,16 @@ def _close_qmp_connection(self) -> None:
> >>  finally:
> >>  self._qmp_connection = None
> >>
> >> +def reopen_qmp_connection(self) -> None:
> >> +"""Close and re-open the QMP connection."""
> >> +self._close_qmp_connection()
> >> +self._qmp_connection = QEMUMonitorProtocol(
> >> +self._monitor_address,
> >> +server=True,
> >> +nickname=self._name
> >> +)
> >> +self._qmp.accept(self._qmp_timer)
> >> +
> >>  def _early_cleanup(self) -> None:
> >>  """
> >>  Perform any cleanup that needs to happen before the VM exits.
> >> --
> >> 1.8.3.1
> >>
> >
> > This code is still mechanically fine as far as I can tell, but I lost
> > the plot on why it's needed - Can you please elaborate for me on what
> > you mean by "qemu exec will close the socket"?
> >
> > (R-B still stands, but since I am rewriting machine.py to be natively
> > async, I should be aware of your new use case.)
>
> Sure, thanks for asking.
>
> During cpr (aka live update), the old qemu process directly execv's the new
> qemu binary.  The monitor socket fd is marked cloexec, so the kernel closes it
> during execv.
>
> - Steve

Oho, I see.

I wonder if you are helped at all by some of Marc-Andre's recent
changes to use socketpair() for QMP connections with machine.py.
Depending on how those sockets are managed, it might be possible to
hold onto one of them without having it closed alongside the old QEMU
process. Or, if that doesn't help, double-check that it doesn't *hurt*
you either. It should be the default as of about a week ago now --
instead of using a local unix socket, we use socketpair and pass the
FD to help alleviate some race conditions.

(Again, I'm fine with this patch either way, just a heads up.)

--js




Re: [PATCH V2] migration: simplify blockers

2023-02-07 Thread Steven Sistare
On 2/7/2023 3:05 PM, Peter Xu wrote:
> On Tue, Feb 07, 2023 at 11:03:13AM -0800, Steve Sistare wrote:
>> Modify migrate_add_blocker and migrate_del_blocker to take an Error **
>> reason.  This allows migration to own the Error object, so that if
>> an error occurs, migration code can free the Error and clear the client
>> handle, simplifying client code.  This is a pre-requisite for a future
>> patch that will allow one Error blocker to be registered for multiple
>> migration modes.
>>
>> No functional change.
>>
>> Signed-off-by: Steve Sistare 
> 
> Acked-by: Peter Xu 
> 
> One trivial comment below.
> 
> [...]
> 
>> diff --git a/include/migration/blocker.h b/include/migration/blocker.h
>> index 9cebe2b..7c8d326 100644
>> --- a/include/migration/blocker.h
>> +++ b/include/migration/blocker.h
>> @@ -17,19 +17,22 @@
>>  /**
>>   * @migrate_add_blocker - prevent migration from proceeding
>>   *
>> - * @reason - an error to be returned whenever migration is attempted
>> + * @reasonp - address of an error to be returned whenever migration is 
>> attempted
>>   *
>>   * @errp - [out] The reason (if any) we cannot block migration right now.
>>   *
>>   * @returns - 0 on success, -EBUSY/-EACCES on failure, with errp set.
>> + *
>> + * *@reasonp is freed and set to NULL if failure is returned.
>> + * On success, the caller no longer owns *@reasonp and must not free it.
> 
> This statement reads weird.  IMHO the caller still owns @reasonp, but if it
> succeeded it should only free it with a migrate_del_blocker() later.

How about:
  On success, the caller must not free @reasonp, except by calling 
migrate_del_blocker.

Same below.

- Steve

>>   */
>> -int migrate_add_blocker(Error *reason, Error **errp);
>> +int migrate_add_blocker(Error **reasonp, Error **errp);
>>  
>>  /**
>>   * @migrate_add_blocker_internal - prevent migration from proceeding without
>>   * only-migrate implications
>>   *
>> - * @reason - an error to be returned whenever migration is attempted
>> + * @reasonp - address of an error to be returned whenever migration is 
>> attempted
>>   *
>>   * @errp - [out] The reason (if any) we cannot block migration right now.
>>   *
>> @@ -38,14 +41,19 @@ int migrate_add_blocker(Error *reason, Error **errp);
>>   * Some of the migration blockers can be temporary (e.g., for a few 
>> seconds),
>>   * so it shouldn't need to conflict with "-only-migratable".  For those 
>> cases,
>>   * we can call this function rather than @migrate_add_blocker().
>> + *
>> + * *@reasonp is freed and set to NULL if failure is returned.
>> + * On success, the caller no longer owns *@reasonp and must not free it.
> 
> Same here?
> 
>>   */
> 



Re: [RFC PATCH] tests: add LKFT baseline test to avocado

2023-02-07 Thread Fabiano Rosas
Alex Bennée  writes:

> The Linux Kernel Function Test (LKFT) project uses QEMU to test a wide
> variety of kernel configurations on wide range of our emulated
> platforms. They publish a known good set of images at:
>
>   https://storage.tuxboot.com/
>
> to help with bisecting regressions in either the kernel, firmware or
> QEMU itself. The tests are pretty lightweight as they contain just a
> kernel with a minimal rootfs which boots a lot faster than most of the
> distros. In time they might be persuaded to version there known good
> baselines and we can then enable proper checksums.
>
> Total run time: 140s
>
> Overall coverage rate:
>   lines..: 8.7% (96412 of 1106284 lines)
>   functions..: 10.8% (11515 of 106651 functions)
>   branches...: 8.3% (30685 of 370255 branches)

Cool!

We could maybe use some accel:tcg tags.

Reviewed-by: Fabiano Rosas 




Re: [PATCH V2] python/machine: QEMUMachine reopen_qmp_connection

2023-02-07 Thread Steven Sistare
On 2/7/2023 3:28 PM, John Snow wrote:
> On Tue, Feb 7, 2023 at 2:03 PM Steve Sistare  
> wrote:
>>
>> Provide reopen_qmp_connection() to reopen a closed monitor connection.
>> This will be needed by cpr, because qemu exec closes the monitor socket.
>>
>> Signed-off-by: Steve Sistare 
>> Reviewed-by: John Snow 
>> ---
>>  python/qemu/machine/machine.py | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
>> index ef94dcf..557209a 100644
>> --- a/python/qemu/machine/machine.py
>> +++ b/python/qemu/machine/machine.py
>> @@ -501,6 +501,16 @@ def _close_qmp_connection(self) -> None:
>>  finally:
>>  self._qmp_connection = None
>>
>> +def reopen_qmp_connection(self) -> None:
>> +"""Close and re-open the QMP connection."""
>> +self._close_qmp_connection()
>> +self._qmp_connection = QEMUMonitorProtocol(
>> +self._monitor_address,
>> +server=True,
>> +nickname=self._name
>> +)
>> +self._qmp.accept(self._qmp_timer)
>> +
>>  def _early_cleanup(self) -> None:
>>  """
>>  Perform any cleanup that needs to happen before the VM exits.
>> --
>> 1.8.3.1
>>
> 
> This code is still mechanically fine as far as I can tell, but I lost
> the plot on why it's needed - Can you please elaborate for me on what
> you mean by "qemu exec will close the socket"?
> 
> (R-B still stands, but since I am rewriting machine.py to be natively
> async, I should be aware of your new use case.)

Sure, thanks for asking.

During cpr (aka live update), the old qemu process directly execv's the new
qemu binary.  The monitor socket fd is marked cloexec, so the kernel closes it
during execv.

- Steve



[PATCH v3 1/2] linux-headers: Update to v6.1

2023-02-07 Thread Peter Xu
Signed-off-by: Peter Xu 
---
 include/standard-headers/drm/drm_fourcc.h |  34 -
 include/standard-headers/linux/ethtool.h  |  63 +++-
 include/standard-headers/linux/fuse.h |   6 +-
 .../linux/input-event-codes.h |   1 +
 include/standard-headers/linux/virtio_blk.h   |  19 +++
 linux-headers/asm-generic/hugetlb_encode.h|  26 ++--
 linux-headers/asm-generic/mman-common.h   |   2 +
 linux-headers/asm-mips/mman.h |   2 +
 linux-headers/asm-riscv/kvm.h |   4 +
 linux-headers/linux/kvm.h |   1 +
 linux-headers/linux/psci.h|  14 ++
 linux-headers/linux/userfaultfd.h |   4 +
 linux-headers/linux/vfio.h| 142 ++
 13 files changed, 298 insertions(+), 20 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 48b620cbef..b868488f93 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -98,18 +98,42 @@ extern "C" {
 #define DRM_FORMAT_INVALID 0
 
 /* color index */
+#define DRM_FORMAT_C1  fourcc_code('C', '1', ' ', ' ') /* [7:0] 
C0:C1:C2:C3:C4:C5:C6:C7 1:1:1:1:1:1:1:1 eight pixels/byte */
+#define DRM_FORMAT_C2  fourcc_code('C', '2', ' ', ' ') /* [7:0] 
C0:C1:C2:C3 2:2:2:2 four pixels/byte */
+#define DRM_FORMAT_C4  fourcc_code('C', '4', ' ', ' ') /* [7:0] C0:C1 
4:4 two pixels/byte */
 #define DRM_FORMAT_C8  fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
 
-/* 8 bpp Red */
+/* 1 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D1  fourcc_code('D', '1', ' ', ' ') /* [7:0] 
D0:D1:D2:D3:D4:D5:D6:D7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D2  fourcc_code('D', '2', ' ', ' ') /* [7:0] 
D0:D1:D2:D3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D4  fourcc_code('D', '4', ' ', ' ') /* [7:0] D0:D1 
4:4 two pixels/byte */
+
+/* 8 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D8  fourcc_code('D', '8', ' ', ' ') /* [7:0] D */
+
+/* 1 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R1  fourcc_code('R', '1', ' ', ' ') /* [7:0] 
R0:R1:R2:R3:R4:R5:R6:R7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R2  fourcc_code('R', '2', ' ', ' ') /* [7:0] 
R0:R1:R2:R3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R4  fourcc_code('R', '4', ' ', ' ') /* [7:0] R0:R1 
4:4 two pixels/byte */
+
+/* 8 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R8  fourcc_code('R', '8', ' ', ' ') /* [7:0] R */
 
-/* 10 bpp Red */
+/* 10 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R10 fourcc_code('R', '1', '0', ' ') /* [15:0] x:R 
6:10 little endian */
 
-/* 12 bpp Red */
+/* 12 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R12 fourcc_code('R', '1', '2', ' ') /* [15:0] x:R 
4:12 little endian */
 
-/* 16 bpp Red */
+/* 16 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R16 fourcc_code('R', '1', '6', ' ') /* [15:0] R 
little endian */
 
 /* 16 bpp RG */
@@ -204,7 +228,9 @@ extern "C" {
 #define DRM_FORMAT_VYUYfourcc_code('V', 'Y', 'U', 'Y') /* 
[31:0] Y1:Cb0:Y0:Cr0 8:8:8:8 little endian */
 
 #define DRM_FORMAT_AYUVfourcc_code('A', 'Y', 'U', 'V') /* 
[31:0] A:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_AVUYfourcc_code('A', 'V', 'U', 'Y') /* [31:0] 
A:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_XYUVfourcc_code('X', 'Y', 'U', 'V') /* [31:0] 
X:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_XVUYfourcc_code('X', 'V', 'U', 'Y') /* [31:0] 
X:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_VUY888  fourcc_code('V', 'U', '2', '4') /* [23:0] 
Cr:Cb:Y 8:8:8 little endian */
 #define DRM_FORMAT_VUY101010   fourcc_code('V', 'U', '3', '0') /* Y followed 
by U then V, 10:10:10. Non-linear modifier only */
 
diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index 4537da20cc..1dc56cdc0a 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -736,6 +736,51 @@ enum ethtool_module_power_mode {
ETHTOOL_MODULE_POWER_MODE_HIGH,
 };
 
+/**
+ * enum ethtool_podl_pse_admin_state - operational state of the PoDL PSE
+ * functions. IEEE 

[PATCH v3 2/2] util/userfaultfd: Support /dev/userfaultfd

2023-02-07 Thread Peter Xu
Teach QEMU to use /dev/userfaultfd when it existed and fallback to the
system call if either it's not there or doesn't have enough permission.

Firstly, as long as the app has permission to access /dev/userfaultfd, it
always have the ability to trap kernel faults which QEMU mostly wants.
Meanwhile, in some context (e.g. containers) the userfaultfd syscall can be
forbidden, so it can be the major way to use postcopy in a restricted
environment with strict seccomp setup.

Signed-off-by: Peter Xu 
---
 util/trace-events  |  1 +
 util/userfaultfd.c | 32 
 2 files changed, 33 insertions(+)

diff --git a/util/trace-events b/util/trace-events
index c8f53d7d9f..16f78d8fe5 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -93,6 +93,7 @@ qemu_vfio_region_info(const char *desc, uint64_t region_ofs, 
uint64_t region_siz
 qemu_vfio_pci_map_bar(int index, uint64_t region_ofs, uint64_t region_size, 
int ofs, void *host) "map region bar#%d addr 0x%"PRIx64" size 0x%"PRIx64" ofs 
0x%x host %p"
 
 #userfaultfd.c
+uffd_detect_open_mode(int mode) "%d"
 uffd_query_features_nosys(int err) "errno: %i"
 uffd_query_features_api_failed(int err) "errno: %i"
 uffd_create_fd_nosys(int err) "errno: %i"
diff --git a/util/userfaultfd.c b/util/userfaultfd.c
index 4953b3137d..fdff4867e8 100644
--- a/util/userfaultfd.c
+++ b/util/userfaultfd.c
@@ -18,10 +18,42 @@
 #include 
 #include 
 #include 
+#include 
+
+typedef enum {
+UFFD_UNINITIALIZED = 0,
+UFFD_USE_DEV_PATH,
+UFFD_USE_SYSCALL,
+} uffd_open_mode;
 
 int uffd_open(int flags)
 {
 #if defined(__NR_userfaultfd)
+static uffd_open_mode open_mode;
+static int uffd_dev;
+
+/* Detect how to generate uffd desc when run the 1st time */
+if (open_mode == UFFD_UNINITIALIZED) {
+/*
+ * Make /dev/userfaultfd the default approach because it has better
+ * permission controls, meanwhile allows kernel faults without any
+ * privilege requirement (e.g. SYS_CAP_PTRACE).
+ */
+uffd_dev = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
+if (uffd_dev >= 0) {
+open_mode = UFFD_USE_DEV_PATH;
+} else {
+/* Fallback to the system call */
+open_mode = UFFD_USE_SYSCALL;
+}
+trace_uffd_detect_open_mode(open_mode);
+}
+
+if (open_mode == UFFD_USE_DEV_PATH) {
+assert(uffd_dev >= 0);
+return ioctl(uffd_dev, USERFAULTFD_IOC_NEW, flags);
+}
+
 return syscall(__NR_userfaultfd, flags);
 #else
 return -EINVAL;
-- 
2.37.3




[PATCH v3 0/2] util/userfaultfd: Support /dev/userfaultfd

2023-02-07 Thread Peter Xu
To Juan: from what I observe on how to update the linux headers, I think it
can simply be in a pull just like a normal patch.

The most recent change and its pull for reference:

https://lore.kernel.org/all/20220915091035.3897-1-chenyi.qi...@intel.com/
https://lore.kernel.org/all/20220926170804.453855-1-th...@redhat.com/

Changelog:

v3:
- Remove __linux__ macro check in util/userfaultfd.c [Juan]
- Avoid using globals, merge functions [Juan]

v2:
- Added R-bs for Phil
- Move open_mode into uffd_detect_open_mode() [Phil]
- Document uffd_open() in the header file [Phil]
- [Discussed with Daniel/Michal, decided to leave fd support for later]

The new /dev/userfaultfd handle is superior to the system call with a
better permission control and also works for a restricted seccomp
environment.

The new device was only introduced in v6.1 so we need a header update.

Please have a look, thanks.

Peter Xu (2):
  linux-headers: Update to v6.1
  util/userfaultfd: Support /dev/userfaultfd

 include/standard-headers/drm/drm_fourcc.h |  34 -
 include/standard-headers/linux/ethtool.h  |  63 +++-
 include/standard-headers/linux/fuse.h |   6 +-
 .../linux/input-event-codes.h |   1 +
 include/standard-headers/linux/virtio_blk.h   |  19 +++
 linux-headers/asm-generic/hugetlb_encode.h|  26 ++--
 linux-headers/asm-generic/mman-common.h   |   2 +
 linux-headers/asm-mips/mman.h |   2 +
 linux-headers/asm-riscv/kvm.h |   4 +
 linux-headers/linux/kvm.h |   1 +
 linux-headers/linux/psci.h|  14 ++
 linux-headers/linux/userfaultfd.h |   4 +
 linux-headers/linux/vfio.h| 142 ++
 util/trace-events |   1 +
 util/userfaultfd.c|  32 
 15 files changed, 331 insertions(+), 20 deletions(-)

-- 
2.37.3




Re: Can we unpoison CONFIG_FOO macros?

2023-02-07 Thread Thomas Huth

On 07/02/2023 16.39, Markus Armbruster wrote:

We have a boatload of CONFIG_FOO macros that may only be used in
target-dependent code.  We use generated config-poison.h to enforce.

This is a bit annoying in the QAPI schema.  Let me demonstrate with an
example: QMP commands query-rocker, query-rocker-ports, and so forth.
These commands are useful only with "rocker" devices.  They are
compile-time optional.  hw/net/Kconfig:

 config ROCKER
 bool
 default y if PCI_DEVICES
 depends on PCI && MSI_NONBROKEN

The rocker device and QMP code is actually target-independent:
hw/net/meson.build puts it into softmmu_ss.

Disabling the "rocker" device type ideally disables the rocker QMP
commands, too.  Should be easy enough: 'if': 'CONFIG_FOO' in the QAPI
schema.

Except that makes the entire code QAPI generates for rocker.json
device-dependent: it now contains #if defined(CONFIG_ROCKER), and
CONFIG_ROCKER is poisoned.  The rocker code implementing monitor
commands also becomes device-dependent, because it includes generated
headers.  We compile all that per target for no sane reason at all.


Well, CONFIG_ROCKER depends on the target config, so that is a sane reason 
in my eyes. Depending on which target you currently compile, it's either set 
or not set - there is no reasonable way to compile generic code and still 
test the CONFIG_ROCKER macro.


So adding such a switch to rocker.json itself just cannot work.

But at the "make" level, we distinguish between config-all-devices.mak and 
*-softmmu-config-devices.mak , so if you want to disable the rocker QMP code 
depending on whether CONFIG_ROCKER is set in *any* target binary or not, you 
would need a similar mechanism to config-all-devices.mak here, e.g. 
something that would switch the inclusion of rocker.json on or off depending 
on the switch in config-all-devices.mak ...

Would it be possible to add such a switch to qapi/meson.build ?
... and to qapi/qapi-schema.json, I guess, since the rocker.json gets 
included from there? Something like that:


diff --git a/qapi/meson.build b/qapi/meson.build
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -59,9 +59,11 @@ if have_system
 'qdev',
 'pci',
 'rdma',
-'rocker',
 'tpm',
   ]
+  if config_all_devices.has_key('CONFIG_ROCKER')
+qapi_all_modules += [ 'rocker', ]
+  endif
 endif
 if have_system or have_tools
   qapi_all_modules += [
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -72,7 +72,7 @@
 { 'include': 'job.json' }
 { 'include': 'net.json' }
 { 'include': 'rdma.json' }
-{ 'include': 'rocker.json' }
+{ 'include': 'rocker.json', 'if': 'CONFIG_ROCKER' }
 { 'include': 'tpm.json' }
 { 'include': 'ui.json' }
 { 'include': 'authz.json' }

No clue whether the qapi parser could be extended like that, though...

 Thomas




Re: [PATCH v2 07/10] hw/ide/piix: Require an ISABus only for user-created instances

2023-02-07 Thread Mark Cave-Ayland

On 06/02/2023 23:40, Bernhard Beschow wrote:


Am 5. Februar 2023 22:32:03 UTC schrieb Mark Cave-Ayland 
:

On 05/02/2023 22:21, BALATON Zoltan wrote:


On Sun, 5 Feb 2023, Mark Cave-Ayland wrote:

On 26/01/2023 21:17, Bernhard Beschow wrote:

Internal instances now defer interrupt wiring to the caller which
decouples them from the ISABus. User-created devices still fish out the
ISABus from the QOM tree and the interrupt wiring remains in PIIX IDE.
The latter mechanism is considered a workaround and intended to be
removed once a deprecation period for user-created PIIX IDE devices is
over.

Signed-off-by: Bernhard Beschow 
---
   include/hw/ide/pci.h |  1 +
   hw/ide/piix.c    | 64 ++--
   hw/isa/piix.c    |  5 
   3 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index 24c0b7a2dd..ee2c8781b7 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -54,6 +54,7 @@ struct PCIIDEState {
   MemoryRegion bmdma_bar;
   MemoryRegion cmd_bar[2];
   MemoryRegion data_bar[2];
+    bool user_created;
   };
     static inline IDEState *bmdma_active_if(BMDMAState *bmdma)
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 5980045db0..f0d95761ac 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -108,6 +108,13 @@ static void bmdma_setup_bar(PCIIDEState *d)
   }
   }
   +static void piix_ide_set_irq(void *opaque, int n, int level)
+{
+    PCIIDEState *d = opaque;
+
+    qemu_set_irq(d->isa_irqs[n], level);
+}
+
   static void piix_ide_reset(DeviceState *dev)
   {
   PCIIDEState *d = PCI_IDE(dev);
@@ -138,11 +145,18 @@ static void pci_piix_init_ports(PCIIDEState *d, ISABus 
*isa_bus)
   };
   int i;
   +    if (isa_bus) {
+    d->isa_irqs[0] = isa_bus->irqs[port_info[0].isairq];
+    d->isa_irqs[1] = isa_bus->irqs[port_info[1].isairq];
+    } else {
+    qdev_init_gpio_out(DEVICE(d), d->isa_irqs, 2);
+    }
+
   for (i = 0; i < 2; i++) {
   ide_bus_init(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
   ide_init_ioport(>bus[i], NULL, port_info[i].iobase,
   port_info[i].iobase2);
-    ide_init2(>bus[i], isa_bus->irqs[port_info[i].isairq]);
+    ide_init2(>bus[i], qdev_get_gpio_in(DEVICE(d), i));
     bmdma_init(>bus[i], >bmdma[i], d);
   d->bmdma[i].bus = >bus[i];
@@ -154,8 +168,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
   {
   PCIIDEState *d = PCI_IDE(dev);
   uint8_t *pci_conf = dev->config;
-    ISABus *isa_bus;
-    bool ambiguous;
+    ISABus *isa_bus = NULL;
     pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
   @@ -164,22 +177,36 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
     vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d);
   -    isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS, 
));
-    if (ambiguous) {
-    error_setg(errp,
-   "More than one ISA bus found while %s supports only one",
-   object_get_typename(OBJECT(dev)));
-    return;
-    }
-    if (!isa_bus) {
-    error_setg(errp, "No ISA bus found while %s requires one",
-   object_get_typename(OBJECT(dev)));
-    return;
+    if (d->user_created) {
+    bool ambiguous;
+
+    isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS,
+   ));
+
+    if (ambiguous) {
+    error_setg(errp,
+   "More than one ISA bus found while %s supports only 
one",
+   object_get_typename(OBJECT(dev)));
+    return;
+    }
+
+    if (!isa_bus) {
+    error_setg(errp, "No ISA bus found while %s requires one",
+   object_get_typename(OBJECT(dev)));
+    return;
+    }
   }
     pci_piix_init_ports(d, isa_bus);
   }
   +static void pci_piix_ide_init(Object *obj)
+{
+    DeviceState *dev = DEVICE(obj);
+
+    qdev_init_gpio_in(dev, piix_ide_set_irq, 2);
+}
+
   static void pci_piix_ide_exitfn(PCIDevice *dev)
   {
   PCIIDEState *d = PCI_IDE(dev);
@@ -191,6 +218,11 @@ static void pci_piix_ide_exitfn(PCIDevice *dev)
   }
   }
   +static Property piix_ide_properties[] = {
+    DEFINE_PROP_BOOL("user-created", PCIIDEState, user_created, true),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
   /* NOTE: for the PIIX3, the IRQs and IOports are hardcoded */
   static void piix3_ide_class_init(ObjectClass *klass, void *data)
   {
@@ -205,11 +237,13 @@ static void piix3_ide_class_init(ObjectClass *klass, void 
*data)
   k->class_id = PCI_CLASS_STORAGE_IDE;
   set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
   dc->hotpluggable = false;
+    device_class_set_props(dc, piix_ide_properties);
   }
     static const TypeInfo piix3_ide_info = {
   .name  = TYPE_PIIX3_IDE,
   .parent    = TYPE_PCI_IDE,
+    .instance_init = 

[RFC PATCH] tests: add LKFT baseline test to avocado

2023-02-07 Thread Alex Bennée
The Linux Kernel Function Test (LKFT) project uses QEMU to test a wide
variety of kernel configurations on wide range of our emulated
platforms. They publish a known good set of images at:

  https://storage.tuxboot.com/

to help with bisecting regressions in either the kernel, firmware or
QEMU itself. The tests are pretty lightweight as they contain just a
kernel with a minimal rootfs which boots a lot faster than most of the
distros. In time they might be persuaded to version there known good
baselines and we can then enable proper checksums.

Total run time: 140s

Overall coverage rate:
  lines..: 8.7% (96412 of 1106284 lines)
  functions..: 10.8% (11515 of 106651 functions)
  branches...: 8.3% (30685 of 370255 branches)

Signed-off-by: Alex Bennée 
Cc: Anders Roxell 
---
 tests/avocado/lkft_baselines.py | 257 
 1 file changed, 257 insertions(+)
 create mode 100755 tests/avocado/lkft_baselines.py

diff --git a/tests/avocado/lkft_baselines.py b/tests/avocado/lkft_baselines.py
new file mode 100755
index 00..f347150df7
--- /dev/null
+++ b/tests/avocado/lkft_baselines.py
@@ -0,0 +1,257 @@
+# Functional test that boots known good lkft images
+#
+# Copyright (c) 2023 Linaro Ltd.
+#
+# Author:
+#  Alex Bennée 
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os
+import time
+
+from avocado_qemu import QemuSystemTest
+from avocado_qemu import exec_command, exec_command_and_wait_for_pattern
+from avocado_qemu import wait_for_console_pattern
+from avocado.utils import process
+from avocado.utils.path import find_command
+
+
+class LKFTBaselineTest(QemuSystemTest):
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 root=/dev/vda'
+
+def setUp(self):
+super().setUp()
+
+# We need zstd for all the lkft tests
+zstd = find_command('zstd', False)
+if zstd is False:
+self.cancel('Could not find "zstd", which is required to '
+'decompress rootfs')
+self.zstd = zstd
+
+# Process the LKFT specific tags, most machines work with
+# reasonable defaults but we sometimes need to tweak the
+# config.
+
+# The lfkt tag matches the root directory
+self.lkft = self.params.get('lkft',
+default = self._get_unique_tag_val('lkft'))
+
+# Most Linux's use ttyS0 for their serial port
+console = self.params.get('console',
+  default = 
self._get_unique_tag_val('console'))
+if console:
+self.console = console
+else:
+self.console = "ttyS0"
+
+# Does the machine shutdown QEMU nicely on "halt"
+self.shutdown = self.params.get('shutdown',
+default = 
self._get_unique_tag_val('shutdown'))
+
+# The name of the kernel Image file
+image = self.params.get('image',
+default = self._get_unique_tag_val('image'))
+if not image:
+self.image = "Image"
+else:
+self.image = image
+
+# The block device drive type
+drive = self.params.get('drive',
+default = self._get_unique_tag_val('drive'))
+if not drive:
+self.drive = "virtio-blk-device"
+else:
+self.drive = drive
+
+def wait_for_console_pattern(self, success_message, vm=None):
+wait_for_console_pattern(self, success_message,
+ failure_message='Kernel panic - not syncing',
+ vm=vm)
+
+def fetch_lkft_assets(self, dt=None):
+"""
+Fetch the LKFT assets. They are stored in a standard way so we
+use the per-test tags to fetch details.
+"""
+base_url = f"https://storage.tuxboot.com/{self.lkft}/;
+kernel_image =  self.fetch_asset(base_url + self.image)
+disk_image_zst = self.fetch_asset(base_url + "rootfs.ext4.zst")
+
+cmd = f"{self.zstd} -d {disk_image_zst} -o {self.workdir}/rootfs.ext4"
+process.run(cmd)
+
+if dt:
+dtb = self.fetch_asset(base_url + dt)
+else:
+dtb = None
+
+return (kernel_image, self.workdir + "/rootfs.ext4", dtb)
+
+def prepare_run(self, kernel, disk, dtb=None):
+"""
+Setup to run and add the common parameters to the system
+"""
+self.vm.set_console()
+
+# all block devices are raw ext4's
+blockdev = "driver=raw,file.driver=file," \
++ f"file.filename={disk},node-name=hd0"
+
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + \
+f" console={self.console}"
+
+self.vm.add_args('-kernel', kernel,
+ '-append', kernel_command_line,
+ '-blockdev', blockdev,
+ '-device', f"{self.drive},drive=hd0")
+
+# Some machines 

[PATCH v4 4/4] iotests/detect-zeroes-registered-buf: add new test

2023-02-07 Thread Stefan Hajnoczi
This regression test demonstrates that detect-zeroes works with
registered buffers. Bug details:
https://gitlab.com/qemu-project/qemu/-/issues/1404

Reviewed-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
---
 .../tests/detect-zeroes-registered-buf| 58 +++
 .../tests/detect-zeroes-registered-buf.out|  7 +++
 2 files changed, 65 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/detect-zeroes-registered-buf
 create mode 100644 tests/qemu-iotests/tests/detect-zeroes-registered-buf.out

diff --git a/tests/qemu-iotests/tests/detect-zeroes-registered-buf 
b/tests/qemu-iotests/tests/detect-zeroes-registered-buf
new file mode 100755
index 00..edb5f2cee5
--- /dev/null
+++ b/tests/qemu-iotests/tests/detect-zeroes-registered-buf
@@ -0,0 +1,58 @@
+#!/usr/bin/env bash
+# group: rw auto quick
+#
+# Check that detect-zeroes=unmap works on writes with registered I/O buffers.
+# This is a regression test for
+# https://gitlab.com/qemu-project/qemu/-/issues/1404 where I/O requests failed
+# unexpectedly.
+#
+# Copyright Red Hat
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=stefa...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto generic
+
+size=128M
+_make_test_img $size
+IMGSPEC="driver=$IMGFMT,file.filename=$TEST_IMG,discard=unmap,detect-zeroes=unmap"
+
+echo
+echo "== writing zero buffer to image =="
+QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO -c "write -r -P 0 0 4k" 
--image-opts "$IMGSPEC" | _filter_qemu_io
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/detect-zeroes-registered-buf.out 
b/tests/qemu-iotests/tests/detect-zeroes-registered-buf.out
new file mode 100644
index 00..42c56fcc8d
--- /dev/null
+++ b/tests/qemu-iotests/tests/detect-zeroes-registered-buf.out
@@ -0,0 +1,7 @@
+QA output created by detect-zeroes-registered-buf
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
+
+== writing zero buffer to image ==
+wrote 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+*** done
-- 
2.39.1




[PATCH v4 0/4] block: fix detect-zeroes= with BDRV_REQ_REGISTERED_BUF

2023-02-07 Thread Stefan Hajnoczi
v4:
- Add 'r' to read_f() getopt() call [Hanna]
- Fix qemu_io_alloc() and friends buf and len with qemuio_misalign [Hanna]
- Fix qemu_iovec_destroy()/qemu_io_free() ordering in aio_write_done() [Hanna]
- Add mutually exclusive -z -r option check in aio_write_f() [Hanna]
v3:
- Restore alphabetical order in getopt strings [Eric]
v2:
- Add comment explaining unbalanced error code path in
  qemu_io_alloc_from_file() [Eric]
- List options alphabetically in help output [Eric]
- Add Tested-by/Reviewed-by
- CC qemu-stable on the fix

The first patch fixes a regression in QEMU 7.2 where detect-zeroes breaks with
virtio-blk devices due to a BDRV_REQ_REGISTERED_BUF bug. Details of the
regression can be found here:
https://gitlab.com/qemu-project/qemu/-/issues/1404

The remaining patches add a regression test that will protect this code path in
the future. The qemu-io command is extended with the new -r option that calls
blk_register_buf(). This allows a qemu-iotests test case to trigger the same
bug as virtio-blk.

Stefan Hajnoczi (4):
  block: fix detect-zeroes= with BDRV_REQ_REGISTERED_BUF
  qemu-io: use BdrvRequestFlags instead of int
  qemu-io: add -r option to register I/O buffer
  iotests/detect-zeroes-registered-buf: add new test

 block/io.c|   3 +
 qemu-io-cmds.c| 215 +++---
 .../tests/detect-zeroes-registered-buf|  58 +
 .../tests/detect-zeroes-registered-buf.out|   7 +
 4 files changed, 203 insertions(+), 80 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/detect-zeroes-registered-buf
 create mode 100644 tests/qemu-iotests/tests/detect-zeroes-registered-buf.out

-- 
2.39.1




[PATCH v4 3/4] qemu-io: add -r option to register I/O buffer

2023-02-07 Thread Stefan Hajnoczi
The blk_register_buf() API is an optimization hint that allows some
block drivers to avoid I/O buffer housekeeping or bounce buffers.

Add an -r option to register the I/O buffer so that qemu-io can be used
to test the blk_register_buf() API. The next commit will add a test that
uses the new option.

Reviewed-by: Eric Blake 
Signed-off-by: Stefan Hajnoczi 
---
 qemu-io-cmds.c | 204 +++--
 1 file changed, 129 insertions(+), 75 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 1f60c23ba4..e7a02f5b99 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -338,7 +338,8 @@ static int parse_pattern(const char *arg)
  */
 
 #define MISALIGN_OFFSET 16
-static void *qemu_io_alloc(BlockBackend *blk, size_t len, int pattern)
+static void *qemu_io_alloc(BlockBackend *blk, size_t len, int pattern,
+   bool register_buf)
 {
 void *buf;
 
@@ -347,16 +348,24 @@ static void *qemu_io_alloc(BlockBackend *blk, size_t len, 
int pattern)
 }
 buf = blk_blockalign(blk, len);
 memset(buf, pattern, len);
+if (register_buf) {
+blk_register_buf(blk, buf, len, _abort);
+}
 if (qemuio_misalign) {
 buf += MISALIGN_OFFSET;
 }
 return buf;
 }
 
-static void qemu_io_free(void *p)
+static void qemu_io_free(BlockBackend *blk, void *p, size_t len,
+ bool unregister_buf)
 {
 if (qemuio_misalign) {
 p -= MISALIGN_OFFSET;
+len += MISALIGN_OFFSET;
+}
+if (unregister_buf) {
+blk_unregister_buf(blk, p, len);
 }
 qemu_vfree(p);
 }
@@ -371,14 +380,16 @@ static void qemu_io_free(void *p)
  * @blk - the block backend where the buffer content is going to be written to
  * @len - the buffer length
  * @file_name - the file to read the content from
+ * @register_buf - call blk_register_buf()
  *
  * Returns: the buffer pointer on success
  *  NULL on error
  */
 static void *qemu_io_alloc_from_file(BlockBackend *blk, size_t len,
- const char *file_name)
+ const char *file_name, bool register_buf)
 {
-char *buf, *buf_origin;
+size_t alloc_len = len + (qemuio_misalign ? MISALIGN_OFFSET : 0);
+char *alloc_buf, *buf, *end;
 FILE *f = fopen(file_name, "r");
 int pattern_len;
 
@@ -387,19 +398,13 @@ static void *qemu_io_alloc_from_file(BlockBackend *blk, 
size_t len,
 return NULL;
 }
 
-if (qemuio_misalign) {
-len += MISALIGN_OFFSET;
-}
-
-buf_origin = buf = blk_blockalign(blk, len);
+alloc_buf = buf = blk_blockalign(blk, alloc_len);
 
 if (qemuio_misalign) {
-buf_origin += MISALIGN_OFFSET;
 buf += MISALIGN_OFFSET;
-len -= MISALIGN_OFFSET;
 }
 
-pattern_len = fread(buf_origin, 1, len, f);
+pattern_len = fread(buf, 1, len, f);
 
 if (ferror(f)) {
 perror(file_name);
@@ -414,24 +419,23 @@ static void *qemu_io_alloc_from_file(BlockBackend *blk, 
size_t len,
 fclose(f);
 f = NULL;
 
-if (len > pattern_len) {
-len -= pattern_len;
-buf += pattern_len;
-
-while (len > 0) {
-size_t len_to_copy = MIN(pattern_len, len);
-
-memcpy(buf, buf_origin, len_to_copy);
+if (register_buf) {
+blk_register_buf(blk, alloc_buf, alloc_len, _abort);
+}
 
-len -= len_to_copy;
-buf += len_to_copy;
-}
+end = buf + len;
+for (char *p = buf + pattern_len; p < end; p += pattern_len) {
+memcpy(p, buf, MIN(pattern_len, end - p));
 }
 
-return buf_origin;
+return buf;
 
 error:
-qemu_io_free(buf_origin);
+/*
+ * This code path is only taken before blk_register_buf() is called, so
+ * hardcode the qemu_io_free() unregister_buf argument to false.
+ */
+qemu_io_free(blk, alloc_buf, alloc_len, false);
 if (f) {
 fclose(f);
 }
@@ -490,7 +494,7 @@ static void print_report(const char *op, struct timespec 
*t, int64_t offset,
  */
 static void *
 create_iovec(BlockBackend *blk, QEMUIOVector *qiov, char **argv, int nr_iov,
- int pattern)
+ int pattern, bool register_buf)
 {
 size_t *sizes = g_new0(size_t, nr_iov);
 size_t count = 0;
@@ -526,7 +530,7 @@ create_iovec(BlockBackend *blk, QEMUIOVector *qiov, char 
**argv, int nr_iov,
 
 qemu_iovec_init(qiov, nr_iov);
 
-buf = p = qemu_io_alloc(blk, count, pattern);
+buf = p = qemu_io_alloc(blk, count, pattern, register_buf);
 
 for (i = 0; i < nr_iov; i++) {
 qemu_iovec_add(qiov, p, sizes[i]);
@@ -539,7 +543,7 @@ fail:
 }
 
 static int do_pread(BlockBackend *blk, char *buf, int64_t offset,
-int64_t bytes, int64_t *total)
+int64_t bytes, BdrvRequestFlags flags, int64_t *total)
 {
 int ret;
 
@@ -547,7 +551,7 @@ static int do_pread(BlockBackend *blk, char *buf, int64_t 
offset,
 return -ERANGE;
   

[PATCH v4 1/4] block: fix detect-zeroes= with BDRV_REQ_REGISTERED_BUF

2023-02-07 Thread Stefan Hajnoczi
When a write request is converted into a write zeroes request by the
detect-zeroes= feature, it is no longer associated with an I/O buffer.
The BDRV_REQ_REGISTERED_BUF flag doesn't make sense without an I/O
buffer and must be cleared because bdrv_co_do_pwrite_zeroes() fails with
-EINVAL when it's set.

Fiona Ebner  bisected and diagnosed this QEMU 7.2
regression where writes containing zeroes to a blockdev with
discard=unmap,detect-zeroes=unmap fail.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1404
Fixes: e8b6535533be ("block: add BDRV_REQ_REGISTERED_BUF request flag")
Tested-by: Fiona Ebner 
Cc: qemu-sta...@nongnu.org
Reviewed-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
---
 block/io.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/io.c b/block/io.c
index 2dc0c13e41..d2be37b11e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1926,6 +1926,9 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
 flags |= BDRV_REQ_MAY_UNMAP;
 }
+
+/* Can't use optimization hint with bufferless zero write */
+flags &= ~BDRV_REQ_REGISTERED_BUF;
 }
 
 if (ret < 0) {
-- 
2.39.1




[PATCH v4 2/4] qemu-io: use BdrvRequestFlags instead of int

2023-02-07 Thread Stefan Hajnoczi
The block layer APIs use BdrvRequestFlags while qemu-io code uses int.
Although the code compiles and runs fine, BdrvRequestFlags is clearer
because it differentiates between other types of flags like bdrv_open()
flags.

This is purely refactoring.

Reviewed-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
---
 qemu-io-cmds.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index a061031615..1f60c23ba4 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -556,7 +556,7 @@ static int do_pread(BlockBackend *blk, char *buf, int64_t 
offset,
 }
 
 static int do_pwrite(BlockBackend *blk, char *buf, int64_t offset,
- int64_t bytes, int flags, int64_t *total)
+ int64_t bytes, BdrvRequestFlags flags, int64_t *total)
 {
 int ret;
 
@@ -573,7 +573,8 @@ static int do_pwrite(BlockBackend *blk, char *buf, int64_t 
offset,
 }
 
 static int do_pwrite_zeroes(BlockBackend *blk, int64_t offset,
-   int64_t bytes, int flags, int64_t *total)
+   int64_t bytes, BdrvRequestFlags flags,
+   int64_t *total)
 {
 int ret = blk_pwrite_zeroes(blk, offset, bytes,
 flags | BDRV_REQ_ZERO_WRITE);
@@ -651,7 +652,7 @@ static int do_aio_readv(BlockBackend *blk, QEMUIOVector 
*qiov,
 }
 
 static int do_aio_writev(BlockBackend *blk, QEMUIOVector *qiov,
- int64_t offset, int flags, int *total)
+ int64_t offset, BdrvRequestFlags flags, int *total)
 {
 int async_ret = NOT_DONE;
 
@@ -1028,7 +1029,7 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 struct timespec t1, t2;
 bool Cflag = false, qflag = false, bflag = false;
 bool Pflag = false, zflag = false, cflag = false, sflag = false;
-int flags = 0;
+BdrvRequestFlags flags = 0;
 int c, cnt, ret;
 char *buf = NULL;
 int64_t offset;
@@ -1229,7 +1230,7 @@ static int writev_f(BlockBackend *blk, int argc, char 
**argv)
 {
 struct timespec t1, t2;
 bool Cflag = false, qflag = false;
-int flags = 0;
+BdrvRequestFlags flags = 0;
 int c, cnt, ret;
 char *buf;
 int64_t offset;
@@ -1544,7 +1545,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 int nr_iov, c;
 int pattern = 0xcd;
 struct aio_ctx *ctx = g_new0(struct aio_ctx, 1);
-int flags = 0;
+BdrvRequestFlags flags = 0;
 
 ctx->blk = blk;
 while ((c = getopt(argc, argv, "CfiqP:uz")) != -1) {
-- 
2.39.1




Re: [PATCH 0/9] hw: Use QOM alias properties and few QOM/QDev cleanups

2023-02-07 Thread Mark Cave-Ayland

On 06/02/2023 23:04, Philippe Mathieu-Daudé wrote:


On 6/2/23 22:54, Mark Cave-Ayland wrote:

On 06/02/2023 15:27, Philippe Mathieu-Daudé wrote:


On 6/2/23 00:29, Mark Cave-Ayland wrote:

On 03/02/2023 11:36, Philippe Mathieu-Daudé wrote:


These patches are extracted from a QOM/QDev refactor series,
so they are preliminary cleanups noticed while working on it:

- Use correct type when calling qdev_prop_set_xxx()
- Unify some qdev properties in MIPS models
- Replace intermediate properties by link properties
- Remove DEFINE_PROP_DMAADDR() macro which is used one time
- Use qdev_realize_and_unref() instead of open-coding it



I must admit to being slightly nervous about using QOM alias properties in this 
way, simply because you start creating implicit dependencies between QOM objects. 
How would this work when trying to build machines from configuration files and/or 
the monitor? Or are the changes restricted to container devices i.e. those which 
consist of in-built child devices?


The latter. All parents forward a property to a contained child.

The parent forwarding property is replaced by a link into the child,
so accessing the parent property transparently access the child one.

The dependencies are already explicit. We can not create a parent
without its children (the children creation is implicit when we
create the parent object).

I thought this was the canonical QOM alias properties use. What is
the normal use then?


The problem I've found with this approach in the past is that it fails when you 
have more than one child device of the same type.


For example imagine the scenario where there is a QEMU device that contains 2 child 
UARTs and each UART has a property to disable hardware handshaking: if you add a 
property alias to the container device, it can only map to a single child UART. 
Furthermore if you then try to alias the UART IRQs onto the container device using 
qdev_pass_gpios(), then that also fails with 2 UARTs because the gpios from each 
UART have the same property name.


I noticed some qdev gpio namespace issues. Thanks for pointing that
qdev_pass_gpios() restriction.

You could then think about solving that problem by using 
object_property_add_alias() directly to specify a different property name for each 
UART's mapped property on the container device, but then you end up accessing the 
child UART properties with different names, but only when using that particular 
parent container device(!).


For this reason I've tended to avoid aliases and setup child objects from the 
container like this:


    static void container_init(Object *obj)
    {
    object_initialize_child(obj, "uart0", >uart0, TYPE_UART);
    object_initialize_child(obj, "uart1", >uart1, TYPE_UART);
    }


Hmm OK, this is the approach used in IMX:

@@ -120,8 +120,12 @@ static void fsl_imx6ul_init(Object *obj)
   * Ethernet
   */
  for (i = 0; i < FSL_IMX6UL_NUM_ETHS; i++) {
+    g_autofree gchar *propname = g_strdup_printf("fec%d-phy-num", i + 1);
  snprintf(name, NAME_SIZE, "eth%d", i);
  object_initialize_child(obj, name, >eth[i], TYPE_IMX_ENET);
+    qdev_prop_set_uint32(DEVICE(>eth[i]), "phy-num", i);
+    object_property_add_alias(obj, propname,
+  OBJECT(>eth[i]), "phy-num");
  }

But then this is how it is used today:

  static Property fsl_imx6ul_properties[] = {
-    DEFINE_PROP_UINT32("fec1-phy-num", FslIMX6ULState, phy_num[0], 0),
-    DEFINE_PROP_UINT32("fec2-phy-num", FslIMX6ULState, phy_num[1], 1),
  DEFINE_PROP_END_OF_LIST(),
  };

What do you mean by "you end up accessing the child UART properties with
different names, but only when using that particular parent container
device(!)."? I tend to see QOM modelling as matching hardware design,
so if a container is used, there is a similar relationship / hierarchy
in the hardware, then accessing the children via a particular parent
container path sounds the correct way. QOM indexed child must have the
same meaning in the hardware layout.

And then when configuring the board it is possible to obtain the UART references 
like this:


    uart0 = UART(object_resolve_path_component(OBJECT(container), "uart0"));
    irq0 = qdev_connect_gpio_out(DEVICE(uart0), 0, ... );

    uart1 = UART(object_resolve_path_component(OBJECT(container), "uart1"));
    irq1 = qdev_connect_gpio_out(DEVICE(uart1), 0, ... );

This allows all UART configuration to be done in the same way regardless of the 
parent container device and number of child devices, and without having to think 
about using different property names depending upon the container device.


OK I think this is the same explanation as what I just wrote earlier.


Yes indeed, I think we're on the same page here now.

One place where it could conceivably be useful is where you have a chip modelled as 
a device and you want to expose the memory regions and IRQs to an interface such as 
ISA, but often even that doesn't 

Re: [PATCH V2] python/machine: QEMUMachine reopen_qmp_connection

2023-02-07 Thread John Snow
On Tue, Feb 7, 2023 at 2:03 PM Steve Sistare  wrote:
>
> Provide reopen_qmp_connection() to reopen a closed monitor connection.
> This will be needed by cpr, because qemu exec closes the monitor socket.
>
> Signed-off-by: Steve Sistare 
> Reviewed-by: John Snow 
> ---
>  python/qemu/machine/machine.py | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index ef94dcf..557209a 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -501,6 +501,16 @@ def _close_qmp_connection(self) -> None:
>  finally:
>  self._qmp_connection = None
>
> +def reopen_qmp_connection(self) -> None:
> +"""Close and re-open the QMP connection."""
> +self._close_qmp_connection()
> +self._qmp_connection = QEMUMonitorProtocol(
> +self._monitor_address,
> +server=True,
> +nickname=self._name
> +)
> +self._qmp.accept(self._qmp_timer)
> +
>  def _early_cleanup(self) -> None:
>  """
>  Perform any cleanup that needs to happen before the VM exits.
> --
> 1.8.3.1
>

This code is still mechanically fine as far as I can tell, but I lost
the plot on why it's needed - Can you please elaborate for me on what
you mean by "qemu exec will close the socket"?

(R-B still stands, but since I am rewriting machine.py to be natively
async, I should be aware of your new use case.)

--js




Re: [PATCH V2] memory: RAM_NAMED_FILE flag

2023-02-07 Thread Peter Xu
On Tue, Feb 07, 2023 at 11:03:33AM -0800, Steve Sistare wrote:
> migrate_ignore_shared() is an optimization that avoids copying memory
> that is visible and can be mapped on the target.  However, a
> memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED
> flag set is not migrated when migrate_ignore_shared() is true.  This is
> wrong, because the block has no named backing store, and its contents will
> be lost.  To fix, ignore shared memory iff it is a named file.  Define a
> new flag RAM_NAMED_FILE to distinguish this case.

There's also TYPE_MEMORY_BACKEND_EPC.  Reading the commit message it seems
it can still be used in similar ways.  Pasting commit message from c6c0232:

Because of its unique requirements, Linux manages EPC separately from
normal memory.  Similar to memfd, the device /dev/sgx_vepc can be
opened to obtain a file descriptor which can in turn be used to mmap()
EPC memory.

I'm not sure whether it means that should apply for RAM_NAMED_FILE too,
neither do I think it's super important..  Still better to define it
properly.

Another comment is, AFAIK this patch will modify senamtics of the old
capability "x-ignore-shared".  But I'd say in a sensible way.  Maybe worth
directly modify qapi/migration.json to reflect it (especially it's x-
prefixed) to something like:

# @x-ignore-shared: If enabled, QEMU will not migrate named shared memory
#   (since 4.0) 

Thanks,

-- 
Peter Xu




[PATCH v2 2/5] gitlab-ci.d/buildtest: Remove aarch64-softmmu from the build-system-ubuntu job

2023-02-07 Thread Thomas Huth
aarch64-softmmu is also checked on the same version of Ubuntu in the
gcov job, so it is redundant to check again in the normal ubuntu job.

Signed-off-by: Thomas Huth 
---
 .gitlab-ci.d/buildtest.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 8f332fc36f..8fff961b44 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -42,7 +42,7 @@ build-system-ubuntu:
   variables:
 IMAGE: ubuntu2004
 CONFIGURE_ARGS: --enable-docs --enable-fdt=system --enable-capstone
-TARGETS: aarch64-softmmu alpha-softmmu cris-softmmu hppa-softmmu
+TARGETS: alpha-softmmu cris-softmmu hppa-softmmu
   microblazeel-softmmu mips64el-softmmu
 MAKE_CHECK_ARGS: check-build
   artifacts:
-- 
2.31.1




[PATCH v2 3/5] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job

2023-02-07 Thread Thomas Huth
We can get rid of the build-coroutine-sigaltstack job by moving
the configure flags that should be tested here to other jobs:
Move --with-coroutine=sigaltstack to the build-system-debian job
(where the coroutines should get some more test coverage with
"make check-block", too) and --enable-trace-backends=ftrace to
the cross-s390x-kvm-only job.

Signed-off-by: Thomas Huth 
---
 .gitlab-ci.d/buildtest.yml   | 13 +
 .gitlab-ci.d/crossbuilds.yml |  2 +-
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 8fff961b44..8697c61072 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -74,6 +74,7 @@ build-system-debian:
 job: amd64-debian-container
   variables:
 IMAGE: debian-amd64
+CONFIGURE_ARGS: --with-coroutine=sigaltstack
 TARGETS: arm-softmmu avr-softmmu i386-softmmu mipsel-softmmu
   riscv64-softmmu sh4eb-softmmu sparc-softmmu xtensaeb-softmmu
 MAKE_CHECK_ARGS: check-build
@@ -534,18 +535,6 @@ build-tci:
 - QTEST_QEMU_BINARY="./qemu-system-s390x" ./tests/qtest/pxe-test -m slow
 - make check-tcg
 
-# Alternate coroutines implementations are only really of interest to KVM users
-# However we can't test against KVM on Gitlab-CI so we can only run unit tests
-build-coroutine-sigaltstack:
-  extends: .native_build_job_template
-  needs:
-job: amd64-ubuntu2004-container
-  variables:
-IMAGE: ubuntu2004
-CONFIGURE_ARGS: --with-coroutine=sigaltstack --disable-tcg
---enable-trace-backends=ftrace
-MAKE_CHECK_ARGS: check-unit
-
 # Check our reduced build configurations
 build-without-defaults:
   extends: .native_build_job_template
diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index 74d6259b90..57637c5127 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -159,7 +159,7 @@ cross-s390x-kvm-only:
 job: s390x-debian-cross-container
   variables:
 IMAGE: debian-s390x-cross
-EXTRA_CONFIGURE_OPTS: --disable-tcg
+EXTRA_CONFIGURE_OPTS: --disable-tcg --enable-trace-backends=ftrace
 
 cross-mips64el-kvm-only:
   extends: .cross_accel_build_job
-- 
2.31.1




[PATCH v2 4/5] .gitlab-ci.d/buildtest-template: Simplify the configure step

2023-02-07 Thread Thomas Huth
It's easier to use ${TARGETS:+--target-list="$TARGETS"} to add
a --target-list parameter depending on whether the TARGETS variable
is set or not.

Signed-off-by: Thomas Huth 
---
 .gitlab-ci.d/buildtest-template.yml | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/.gitlab-ci.d/buildtest-template.yml 
b/.gitlab-ci.d/buildtest-template.yml
index 73ecfabb8d..4a922d9c33 100644
--- a/.gitlab-ci.d/buildtest-template.yml
+++ b/.gitlab-ci.d/buildtest-template.yml
@@ -11,12 +11,10 @@
   fi
 - mkdir build
 - cd build
-- if test -n "$TARGETS";
-  then
-../configure --enable-werror --disable-docs ${LD_JOBS:+--meson=git} 
$CONFIGURE_ARGS --target-list="$TARGETS" ;
-  else
-../configure --enable-werror --disable-docs ${LD_JOBS:+--meson=git} 
$CONFIGURE_ARGS ;
-  fi || { cat config.log meson-logs/meson-log.txt && exit 1; }
+- ../configure --enable-werror --disable-docs
+  ${LD_JOBS:+--meson=git} ${TARGETS:+--target-list="$TARGETS"}
+  $CONFIGURE_ARGS ||
+  { cat config.log meson-logs/meson-log.txt && exit 1; }
 - if test -n "$LD_JOBS";
   then
 ../meson/meson.py configure . -Dbackend_max_links="$LD_JOBS" ;
-- 
2.31.1




[PATCH v2 1/5] build: deprecate --enable-gprof builds and remove from CI

2023-02-07 Thread Thomas Huth
From: Alex Bennée 

As gprof relies on instrumentation you rarely get useful data compared
to a real optimised build. Lets deprecate the build option and
simplify the CI configuration as a result.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1338
Signed-off-by: Alex Bennée 
Message-Id: <20230131094224.861621-1-alex.ben...@linaro.org>
Reviewed-by: Thomas Huth 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Thomas Huth 
---
 docs/about/deprecated.rst  | 14 ++
 meson.build|  7 ++-
 .gitlab-ci.d/buildtest.yml | 19 ---
 meson_options.txt  |  3 ++-
 4 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index da2e6fe63d..9317046177 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -20,6 +20,20 @@ they were first deprecated in the 2.10.0 release.
 What follows is a list of all features currently marked as
 deprecated.
 
+Build options
+-
+
+``gprof`` builds (since 8.0)
+
+
+The ``--enable-gprof`` configure setting relies on compiler
+instrumentation to gather its data which can distort the generated
+profile. As other non-instrumenting tools are available that give a
+more holistic view of the system with non-instrumented binaries we are
+deprecating the build option and no longer defend it in CI. The
+``--enable-gcov`` build option remains for analysis test case
+coverage.
+
 System emulator command line arguments
 --
 
diff --git a/meson.build b/meson.build
index 4ba3bf3431..bc60bf3c4c 100644
--- a/meson.build
+++ b/meson.build
@@ -3784,7 +3784,12 @@ summary_info += {'mutex debugging':   
get_option('debug_mutex')}
 summary_info += {'memory allocator':  get_option('malloc')}
 summary_info += {'avx2 optimization': config_host_data.get('CONFIG_AVX2_OPT')}
 summary_info += {'avx512f optimization': 
config_host_data.get('CONFIG_AVX512F_OPT')}
-summary_info += {'gprof enabled': get_option('gprof')}
+if get_option('gprof')
+  gprof_info = 'YES (deprecated)'
+else
+  gprof_info = get_option('gprof')
+endif
+summary_info += {'gprof': gprof_info}
 summary_info += {'gcov':  get_option('b_coverage')}
 summary_info += {'thread sanitizer':  config_host.has_key('CONFIG_TSAN')}
 summary_info += {'CFI support':   get_option('cfi')}
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 0aa149a352..8f332fc36f 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -467,27 +467,16 @@ tsan-build:
 TARGETS: x86_64-softmmu ppc64-softmmu riscv64-softmmu x86_64-linux-user
 MAKE_CHECK_ARGS: bench V=1
 
-# gprof/gcov are GCC features
-build-gprof-gcov:
+# gcov is a GCC features
+gcov:
   extends: .native_build_job_template
   needs:
 job: amd64-ubuntu2004-container
+  timeout: 80m
   variables:
 IMAGE: ubuntu2004
-CONFIGURE_ARGS: --enable-gprof --enable-gcov
+CONFIGURE_ARGS: --enable-gcov
 TARGETS: aarch64-softmmu ppc64-softmmu s390x-softmmu x86_64-softmmu
-  artifacts:
-expire_in: 1 days
-paths:
-  - build
-
-check-gprof-gcov:
-  extends: .native_test_job_template
-  needs:
-- job: build-gprof-gcov
-  artifacts: true
-  variables:
-IMAGE: ubuntu2004
 MAKE_CHECK_ARGS: check
   after_script:
 - cd build
diff --git a/meson_options.txt b/meson_options.txt
index 559a571b6b..53459c15fc 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -316,7 +316,8 @@ option('debug_stack_usage', type: 'boolean', value: false,
 option('qom_cast_debug', type: 'boolean', value: false,
description: 'cast debugging support')
 option('gprof', type: 'boolean', value: false,
-   description: 'QEMU profiling with gprof')
+   description: 'QEMU profiling with gprof',
+   deprecated: true)
 option('profiler', type: 'boolean', value: false,
description: 'profiler support')
 option('slirp_smbd', type : 'feature', value : 'auto',
-- 
2.31.1




[PATCH v2 5/5] gitlab-ci.d: Build with --enable-fdt=system by default

2023-02-07 Thread Thomas Huth
By using --enable-fdt=system we can make sure that the configure
script does not try to check out the "dtc" submodule. This should
help to safe some precious CI minutes in the long run.

While we're at it, also drop some now-redundant --enable-slirp
and --enable-capstone statements. These used to have the "=system"
suffix in the past, too, which has been dropped when the their
corresponding submodules had been removed. Since these features
are auto-enabled anyway now (since the containers have the right
libraries installed), we do not need the explicit --enable-...
statements anymore.

Signed-off-by: Thomas Huth 
---
 .gitlab-ci.d/buildtest-template.yml  | 2 +-
 .gitlab-ci.d/buildtest.yml   | 9 +++--
 .gitlab-ci.d/crossbuild-template.yml | 5 +++--
 .gitlab-ci.d/crossbuilds.yml | 2 ++
 .gitlab-ci.d/windows.yml | 7 +--
 5 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/.gitlab-ci.d/buildtest-template.yml 
b/.gitlab-ci.d/buildtest-template.yml
index 4a922d9c33..cb96b55c3f 100644
--- a/.gitlab-ci.d/buildtest-template.yml
+++ b/.gitlab-ci.d/buildtest-template.yml
@@ -11,7 +11,7 @@
   fi
 - mkdir build
 - cd build
-- ../configure --enable-werror --disable-docs
+- ../configure --enable-werror --disable-docs --enable-fdt=system
   ${LD_JOBS:+--meson=git} ${TARGETS:+--target-list="$TARGETS"}
   $CONFIGURE_ARGS ||
   { cat config.log meson-logs/meson-log.txt && exit 1; }
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 8697c61072..d903c42798 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -41,7 +41,7 @@ build-system-ubuntu:
 job: amd64-ubuntu2004-container
   variables:
 IMAGE: ubuntu2004
-CONFIGURE_ARGS: --enable-docs --enable-fdt=system --enable-capstone
+CONFIGURE_ARGS: --enable-docs
 TARGETS: alpha-softmmu cris-softmmu hppa-softmmu
   microblazeel-softmmu mips64el-softmmu
 MAKE_CHECK_ARGS: check-build
@@ -120,7 +120,6 @@ build-system-fedora:
   variables:
 IMAGE: fedora
 CONFIGURE_ARGS: --disable-gcrypt --enable-nettle --enable-docs
- --enable-fdt=system --enable-slirp --enable-capstone
 TARGETS: tricore-softmmu microblaze-softmmu mips-softmmu
   xtensa-softmmu m68k-softmmu riscv32-softmmu ppc-softmmu sparc64-softmmu
 MAKE_CHECK_ARGS: check-build
@@ -166,9 +165,8 @@ build-system-centos:
 job: amd64-centos8-container
   variables:
 IMAGE: centos8
-CONFIGURE_ARGS: --disable-nettle --enable-gcrypt --enable-fdt=system
+CONFIGURE_ARGS: --disable-nettle --enable-gcrypt --enable-vfio-user-server
   --enable-modules --enable-trace-backends=dtrace --enable-docs
-  --enable-vfio-user-server
 TARGETS: ppc64-softmmu or1k-softmmu s390x-softmmu
   x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
 MAKE_CHECK_ARGS: check-build
@@ -201,7 +199,6 @@ build-system-opensuse:
 job: amd64-opensuse-leap-container
   variables:
 IMAGE: opensuse-leap
-CONFIGURE_ARGS: --enable-fdt=system
 TARGETS: s390x-softmmu x86_64-softmmu aarch64-softmmu
 MAKE_CHECK_ARGS: check-build
   artifacts:
@@ -464,7 +461,7 @@ tsan-build:
   variables:
 IMAGE: ubuntu2004
 CONFIGURE_ARGS: --enable-tsan --cc=clang-10 --cxx=clang++-10
-  --enable-trace-backends=ust --enable-fdt=system --disable-slirp
+  --enable-trace-backends=ust --disable-slirp
 TARGETS: x86_64-softmmu ppc64-softmmu riscv64-softmmu x86_64-linux-user
 MAKE_CHECK_ARGS: bench V=1
 
diff --git a/.gitlab-ci.d/crossbuild-template.yml 
b/.gitlab-ci.d/crossbuild-template.yml
index 6d709628f1..d07989e3b0 100644
--- a/.gitlab-ci.d/crossbuild-template.yml
+++ b/.gitlab-ci.d/crossbuild-template.yml
@@ -6,8 +6,9 @@
   script:
 - mkdir build
 - cd build
-- ../configure --enable-werror --disable-docs $QEMU_CONFIGURE_OPTS
---disable-user --target-list-exclude="arm-softmmu cris-softmmu
+- ../configure --enable-werror --disable-docs --enable-fdt=system
+--disable-user $QEMU_CONFIGURE_OPTS $EXTRA_CONFIGURE_OPTS
+--target-list-exclude="arm-softmmu cris-softmmu
   i386-softmmu microblaze-softmmu mips-softmmu mipsel-softmmu
   mips64-softmmu ppc-softmmu riscv32-softmmu sh4-softmmu
   sparc-softmmu xtensa-softmmu $CROSS_SKIP_TARGETS"
diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index 57637c5127..101416080c 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -175,6 +175,7 @@ cross-win32-system:
 job: win32-fedora-cross-container
   variables:
 IMAGE: fedora-win32-cross
+EXTRA_CONFIGURE_OPTS: --enable-fdt=internal
 CROSS_SKIP_TARGETS: alpha-softmmu avr-softmmu hppa-softmmu m68k-softmmu
 microblazeel-softmmu mips64el-softmmu nios2-softmmu
   artifacts:
@@ -187,6 +188,7 @@ cross-win64-system:
 job: win64-fedora-cross-container
   variables:
 IMAGE: fedora-win64-cross

[PATCH v2 0/5] Shorten the runtime of some gitlab-CI shared runner jobs

2023-02-07 Thread Thomas Huth
We're currently facing the problem that the gitlab-CI jobs for the
shared runners take too much of the limited CI minutes on gitlab.com.
Here are now some patches that optimize some of the jobs a little bit
to take less runtime.

v2:
- Dropped the patches that have already been merged
- Rework the sigaltstack patch according to Daniel's rewview comments
- Add Alex' patch as replacement for the gcov-gprof patch in v1
- Add patch to compile with --enable-fdt=system by default

Alex Bennée (1):
  build: deprecate --enable-gprof builds and remove from CI

Thomas Huth (4):
  gitlab-ci.d/buildtest: Remove aarch64-softmmu from the
build-system-ubuntu job
  gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack
job
  .gitlab-ci.d/buildtest-template: Simplify the configure step
  gitlab-ci.d: Build with --enable-fdt=system by default

 docs/about/deprecated.rst| 14 +
 meson.build  |  7 -
 .gitlab-ci.d/buildtest-template.yml  | 10 +++
 .gitlab-ci.d/buildtest.yml   | 43 ++--
 .gitlab-ci.d/crossbuild-template.yml |  5 ++--
 .gitlab-ci.d/crossbuilds.yml |  4 ++-
 .gitlab-ci.d/windows.yml |  7 +++--
 meson_options.txt|  3 +-
 8 files changed, 46 insertions(+), 47 deletions(-)

-- 
2.31.1




Re: [PULL 00/32] riscv-to-apply queue

2023-02-07 Thread Peter Maydell
On Tue, 7 Feb 2023 at 07:12, Alistair Francis
 wrote:
>
> From: Alistair Francis 
>
> The following changes since commit 6661b8c7fe3f8b5687d2d90f7b4f3f23d70e3e8b:
>
>   Merge tag 'pull-ppc-20230205' of https://gitlab.com/danielhb/qemu into 
> staging (2023-02-05 16:49:09 +)
>
> are available in the Git repository at:
>
>   https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20230207
>
> for you to fetch changes up to 5474aa4f3e0a3e9c171db7c55b5baf15f2e2778c:
>
>   hw/riscv: virt: Simplify virt_{get,set}_aclint() (2023-02-07 08:21:32 +1000)
>
> 
> Third RISC-V PR for QEMU 8.0
>
> * Update disas for xnor/orn/andn and slli.uw
> * Update opentitan IRQs
> * Fix rom code when Zicsr is disabled
> * Update VS timer whenever htimedelta changes
> * A collection of fixes for virtulisation
> * Set tval for triggered watchpoints
> * Cleanups for board and FDT creation
> * Add support for the T-Head vendor extensions
> * A fix for virtual instr exception
> * Fix ctzw behavior
> * Fix SBI getchar handler for KVM


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM



Re: [PATCH V2] memory: flat section iterator

2023-02-07 Thread Peter Xu
On Tue, Feb 07, 2023 at 11:03:29AM -0800, Steve Sistare wrote:
> Add an iterator over the sections of a flattened address space.
> This will be needed by cpr to issue vfio ioctl's on the same memory
> ranges that are already programmed.

Should this better be proposed with the context of using it?  Or I don't
know how to justify this new interface is needed.

For example, any explanations on why memory region listeners cannot work?

Thanks,

-- 
Peter Xu




Re: [PATCH V2] migration: simplify blockers

2023-02-07 Thread Peter Xu
On Tue, Feb 07, 2023 at 11:03:13AM -0800, Steve Sistare wrote:
> Modify migrate_add_blocker and migrate_del_blocker to take an Error **
> reason.  This allows migration to own the Error object, so that if
> an error occurs, migration code can free the Error and clear the client
> handle, simplifying client code.  This is a pre-requisite for a future
> patch that will allow one Error blocker to be registered for multiple
> migration modes.
> 
> No functional change.
> 
> Signed-off-by: Steve Sistare 

Acked-by: Peter Xu 

One trivial comment below.

[...]

> diff --git a/include/migration/blocker.h b/include/migration/blocker.h
> index 9cebe2b..7c8d326 100644
> --- a/include/migration/blocker.h
> +++ b/include/migration/blocker.h
> @@ -17,19 +17,22 @@
>  /**
>   * @migrate_add_blocker - prevent migration from proceeding
>   *
> - * @reason - an error to be returned whenever migration is attempted
> + * @reasonp - address of an error to be returned whenever migration is 
> attempted
>   *
>   * @errp - [out] The reason (if any) we cannot block migration right now.
>   *
>   * @returns - 0 on success, -EBUSY/-EACCES on failure, with errp set.
> + *
> + * *@reasonp is freed and set to NULL if failure is returned.
> + * On success, the caller no longer owns *@reasonp and must not free it.

This statement reads weird.  IMHO the caller still owns @reasonp, but if it
succeeded it should only free it with a migrate_del_blocker() later.

>   */
> -int migrate_add_blocker(Error *reason, Error **errp);
> +int migrate_add_blocker(Error **reasonp, Error **errp);
>  
>  /**
>   * @migrate_add_blocker_internal - prevent migration from proceeding without
>   * only-migrate implications
>   *
> - * @reason - an error to be returned whenever migration is attempted
> + * @reasonp - address of an error to be returned whenever migration is 
> attempted
>   *
>   * @errp - [out] The reason (if any) we cannot block migration right now.
>   *
> @@ -38,14 +41,19 @@ int migrate_add_blocker(Error *reason, Error **errp);
>   * Some of the migration blockers can be temporary (e.g., for a few seconds),
>   * so it shouldn't need to conflict with "-only-migratable".  For those 
> cases,
>   * we can call this function rather than @migrate_add_blocker().
> + *
> + * *@reasonp is freed and set to NULL if failure is returned.
> + * On success, the caller no longer owns *@reasonp and must not free it.

Same here?

>   */

-- 
Peter Xu




Re: random copy-before-write iotest failure

2023-02-07 Thread Fabiano Rosas
Peter Maydell  writes:

> This is on ppc64 (big-endian), a random failure
> (it was while testing the riscv pullreq, but that doesn't touch
> any of the block stuff):
>
> 616/635 qemu:block / qemu-iotests qcow2
>ERROR
>   101.88s   exit status 1
> ― ✀  ―
> stderr:
> --- /home/pm215/qemu/tests/qemu-iotests/tests/copy-before-write.out
> +++ 
> /home/pm215/qemu/build/all/tests/qemu-iotests/scratch/copy-before-write/copy-before-wri
> te.out.bad
> @@ -1,5 +1,21 @@
> -
> +..F.
> +==
> +FAIL: test_timeout_break_guest 
> (__main__.TestCbwError.test_timeout_break_guest)
> +--
> +Traceback (most recent call last):
> +  File "/home/pm215/qemu/tests/qemu-iotests/tests/copy-before-write",
> line 200, in test_ti
> meout_break_guest
> +self.assertEqual(log, """\
> +AssertionError: 'wrot[90 chars])\nwrote 524288/524288 bytes at offset
> 524288\[151 chars]c)\n' != 'wrot[90 chars])\nwrite failed: Connection
> timed out\nread 10[85 chars]c)\n'
> +  wrote 524288/524288 bytes at offset 0
> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> ++ write failed: Connection timed out
> +- wrote 524288/524288 bytes at offset 524288
> +- 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +  read 1048576/1048576 bytes at offset 0
> +  1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
> +
>  --
>  Ran 4 tests
>
> -OK
> +FAILED (failures=1)
>
> thanks
> -- PMM

FWIW, I've seen that one fail on aarch64 with --enable-debug on the
configure line.



  1   2   3   4   >