date:20221007


On Fri, 07 Oct 2022, Gregory Price wrote:


Spec says that volatile devices `may` implement an lsa.


Right you are.


Get LSA (Opcode 4102h)
The Label Storage Area (LSA) shall be supported by a memory device
that provides persistent memory capacity and may be supported by a
device that provides only volatile memory capacity. The format of
the LSA is specified in Section 9.13.2. The size of the Label Storage
Area is retrieved from the Identify Memory Device command.

Re: [PATCH RFC] hw/cxl: type 3 devices can now present volatile or persistent memory


On Thu, 06 Oct 2022, Jonathan Cameron wrote:


One of the blockers for volatile support was that we had no means to poke
it properly as the kernel doesn't yet support volatile capacity and
no one has done the relevant work in EDK2 or similar to do it before the kernel 
boots.
There has been some work on EDK2 support for ARM N2 FVPs from
Saanta Pattanayak, but not upstream eyt.
https://lpc.events/event/16/contributions/1254/


fwiw I had been trying to build some of the firmware bootup for the required
acpi tables that are particular to volatile capacity steps (srat/slit, hmat and
EFI Memory Map) by parameters, but got quickly out of hand. For example, the 
srat
could use a passed 'node' and have a cxl_build_srat(), etc. But yeah it would
be nice for the EDK2 work to advance on the x86 end.

Thanks,
Davidlohr

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index 2bf8c0799359..1c3c6d17c222 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -254,3 +255,46 @@ void build_cxl_osc_method(Aml *dev)
+static int cxl_device_list(Object *obj, void *opaque)
+{
+GSList **list = opaque;
+
+if (object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
+*list = g_slist_append(*list, DEVICE(obj));
+}
+
+object_child_foreach(obj, cxl_device_list, opaque);
+return 0;
+}
+
+static GSList *cxl_get_device_list(void)
+{
+GSList *list = NULL;
+
+object_child_foreach(qdev_get_machine(), cxl_device_list, );
+return list;
+}
+
+void cxl_build_srat(GArray *table_data)
+{
+GSList *device_list, *list = cxl_get_device_list();
+
+for (device_list = list; device_list; device_list = device_list->next) {
+DeviceState *dev = device_list->data;
+Object *obj = OBJECT(dev);
+CXLType3Dev *ct3d = CXL_TYPE3(dev);
+MemoryRegion *mr;
+int node;
+
+mr = host_memory_backend_get_memory(ct3d->hostmem);
+if (!mr) {
+continue;
+}
+node = object_property_get_uint(obj, "node", _abort);
+
+build_srat_memory(table_data, mr->addr, mr->size, node, 
MEM_AFFINITY_ENABLED);
+}
+
+g_slist_free(list);
+}

Re: [PATCH 2/6] target/ppc: fix msgsync insns flags

2022-10-07 Thread Fabiano Rosas

Matheus Ferst  writes:

> This instruction was added by Power ISA 3.0, using PPC2_PRCNTL makes it
> available for older processors, like de e5500 and e6500.
>
> Fixes: 7af1e7b02264 ("target/ppc: add support for hypervisor doorbells on 
> book3s CPUs")
> Signed-off-by: Matheus Ferst 

Reviewed-by: Fabiano Rosas 

> ---
>  target/ppc/translate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 37d7018d18..eaac8670b1 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -6906,7 +6906,7 @@ GEN_HANDLER2_E(msgsnd, "msgsnd", 0x1F, 0x0E, 0x06, 
> 0x03ff0001,
>  GEN_HANDLER2_E(msgclr, "msgclr", 0x1F, 0x0E, 0x07, 0x03ff0001,
> PPC_NONE, (PPC2_PRCNTL | PPC2_ISA207S)),
>  GEN_HANDLER2_E(msgsync, "msgsync", 0x1F, 0x16, 0x1B, 0x,
> -   PPC_NONE, PPC2_PRCNTL),
> +   PPC_NONE, PPC2_ISA300),
>  GEN_HANDLER(wrtee, 0x1F, 0x03, 0x04, 0x000FFC01, PPC_WRTEE),
>  GEN_HANDLER(wrteei, 0x1F, 0x03, 0x05, 0x000E7C01, PPC_WRTEE),
>  GEN_HANDLER(dlmzb, 0x1F, 0x0E, 0x02, 0x, PPC_440_SPEC),

Re: [PATCH 1/6] target/ppc: fix msgclr/msgsnd insns flags

2022-10-07 Thread Fabiano Rosas

Matheus Ferst  writes:

> On Power ISA v2.07, the category for these instructions became
> "Embedded.Processor Control" or "Book S".
>
> Signed-off-by: Matheus Ferst 

Reviewed-by: Fabiano Rosas 

> ---
>  target/ppc/translate.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index e810842925..37d7018d18 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -6902,9 +6902,9 @@ GEN_HANDLER2_E(tlbivax_booke206, "tlbivax", 0x1F, 0x12, 
> 0x18, 0x0001,
>  GEN_HANDLER2_E(tlbilx_booke206, "tlbilx", 0x1F, 0x12, 0x00, 0x0381,
> PPC_NONE, PPC2_BOOKE206),
>  GEN_HANDLER2_E(msgsnd, "msgsnd", 0x1F, 0x0E, 0x06, 0x03ff0001,
> -   PPC_NONE, PPC2_PRCNTL),
> +   PPC_NONE, (PPC2_PRCNTL | PPC2_ISA207S)),
>  GEN_HANDLER2_E(msgclr, "msgclr", 0x1F, 0x0E, 0x07, 0x03ff0001,
> -   PPC_NONE, PPC2_PRCNTL),
> +   PPC_NONE, (PPC2_PRCNTL | PPC2_ISA207S)),
>  GEN_HANDLER2_E(msgsync, "msgsync", 0x1F, 0x16, 0x1B, 0x,
> PPC_NONE, PPC2_PRCNTL),
>  GEN_HANDLER(wrtee, 0x1F, 0x03, 0x04, 0x000FFC01, PPC_WRTEE),

Re: [PATCH 3/6] target/ppc: fix REQUIRE_HV macro definition

2022-10-07 Thread Fabiano Rosas

Matheus Ferst  writes:

> The macro is missing a '{' after the if condition. Any use of REQUIRE_HV
> would cause a compilation error.
>
> Fixes: fc34e81acd51 ("target/ppc: add macros to check privilege level")
> Signed-off-by: Matheus Ferst 

Reviewed-by: Fabiano Rosas 

> ---
>  target/ppc/translate.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index eaac8670b1..435066c4a3 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -6545,12 +6545,12 @@ static int64_t dw_compose_ea(DisasContext *ctx, int x)
>  }   \
>  } while (0)
>  
> -#define REQUIRE_HV(CTX) \
> -do {\
> -if (unlikely((CTX)->pr || !(CTX)->hv))  \
> -gen_priv_opc(CTX);  \
> -return true;\
> -}   \
> +#define REQUIRE_HV(CTX) \
> +do {\
> +if (unlikely((CTX)->pr || !(CTX)->hv)) {\
> +gen_priv_opc(CTX);  \
> +return true;\
> +}   \
>  } while (0)
>  #else
>  #define REQUIRE_SV(CTX) do { gen_priv_opc(CTX); return true; } while (0)

Re: [PATCH RFC] hw/cxl: type 3 devices can now present volatile or persistent memory

2022-10-07 Thread Gregory Price

On Fri, Oct 07, 2022 at 11:16:19AM -0700, Davidlohr Bueso wrote:
> 
> Yeah, putting this back together was on my todo list, but happy to see
> patches are out. Recollecting my thoughts on this, my original approach
> was also to support only volatile or persistent capacities, but through
> two backends, and thus two address spaces. Afaik the last idea that was
> discussed on IRC in this regard was to do it with a single backend along
> with a pmem_offset=N boundary (0 or 100% for example for one type or the
> other) tunnable.
> 

This makes sense.  References another message I sent, are the region
areas in the dvsecs an artifact from cxl1.x? They suggest only two
regions are supported.  Was this overridden by the introduction of CDAT
fields that describe the memory layout?

(sorry, just trying to put together the puzzle pieces here, jumping in a
bit late to the party).

> > > > > >  Example command lines
> > > > > >  -
> > > > > > -A very simple setup with just one directly attached CXL Type 3 
> > > > > > device::
> > > > > > +A very simple setup with just one directly attached CXL Type 3 
> > > > > > Persistent Memory device::
> > > > > >
> > > > > >qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 
> > > > > > 4g,maxmem=8G,slots=8 -cpu max \
> > > > > >...
> > > > > > @@ -308,7 +308,18 @@ A very simple setup with just one directly 
> > > > > > attached CXL Type 3 device::
> > > > > >-object 
> > > > > > memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M
> > > > > >  \
> > > > > >-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> > > > > >-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> > > > > > -  -device 
> > > > > > cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 
> > > > > > \
> > > > > > +  -device 
> > > > > > cxl-type3,bus=root_port13,pmem=true,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0
> > > > > >  \
> 
> So regardless of the interface we end up with, volatile and lsa parameters
> should be mutually exclusive.
> 

Spec says that volatile devices `may` implement an lsa.

Get LSA (Opcode 4102h)
The Label Storage Area (LSA) shall be supported by a memory device
that provides persistent memory capacity and may be supported by a
device that provides only volatile memory capacity. The format of
the LSA is specified in Section 9.13.2. The size of the Label Storage
Area is retrieved from the Identify Memory Device command.

Re: [PATCH RFC] hw/cxl: type 3 devices can now present volatile or persistent memory

On Thu, 06 Oct 2022, Jonathan Cameron wrote:

3) Upstream linux drivers haven't touched ram configurations yet.  I
just configured this with Dan Williams yesterday on IRC.  My
understanding is that it's been worked on but nothing has been
upstreamed, in part because there are only a very small set of devices
available to developers at the moment.

There was an offer of similar volatile memory QEMU emulation in the
session on QEMU CXL at Linux Plumbers.  That will look something like you have
here and maybe reflects that someone has hardware as well...

Yeah, putting this back together was on my todo list, but happy to see
patches are out. Recollecting my thoughts on this, my original approach
was also to support only volatile or persistent capacities, but through
two backends, and thus two address spaces. Afaik the last idea that was
discussed on IRC in this regard was to do it with a single backend along
with a pmem_offset=N boundary (0 or 100% for example for one type or the
other) tunnable.

...

Seems a little odd to use two memory backends.  Of what use is it to the
software developers, it should be completely transparent to them, right?

The only thing I can think of is maybe reset mechanics for volatile
regions being set differently than persistent regions, but even then it
seems simple enough to just emulate the behavior and use a single
backing device.

It's a very convenient path as lets us define sizes and things from the
actual memdev rather than duplicating all the configuration in multiple
devices.  If it weren't for the ability to change the allocations at runtime
I'd definitely say this was the best path.  That corner makes it complex
but I'd still like the simplicity of you throw a backend at the device
and we set up all the description on basis that backend is what we want
to use.

Agreed.

...

> > >  Example command lines
> > >  -
> > > -A very simple setup with just one directly attached CXL Type 3 device::
> > > +A very simple setup with just one directly attached CXL Type 3 
Persistent Memory device::
> > >
> > >qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 
4g,maxmem=8G,slots=8 -cpu max \
> > >...
> > > @@ -308,7 +308,18 @@ A very simple setup with just one directly attached 
CXL Type 3 device::
> > >-object 
memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
> > >-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> > >-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> > > -  -device 
cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
> > > +  -device 
cxl-type3,bus=root_port13,pmem=true,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \

So regardless of the interface we end up with, volatile and lsa parameters
should be mutually exclusive.

Thanks,
Davidlohr

Re: [PATCH 1/2] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL


On Thu, 06 Oct 2022, Gregory Price wrote:


Current code sets to STORAGE_EXPRESS and then overrides it.


Good catch.

Reviewed-by: Davidlohr Bueso 



Signed-off-by: Gregory Price 
---
hw/mem/cxl_type3.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index ada2108fac..1837c1c83a 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -146,7 +146,6 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
}

pci_config_set_prog_interface(pci_conf, 0x10);
-pci_config_set_class(pci_conf, PCI_CLASS_MEMORY_CXL);

pcie_endpoint_cap_init(pci_dev, 0x80);
cxl_cstate->dvsec_offset = 0x100;
@@ -335,7 +334,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)

pc->realize = ct3_realize;
pc->exit = ct3_exit;
-pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
+pc->class_id = PCI_CLASS_MEMORY_CXL;
pc->vendor_id = PCI_VENDOR_ID_INTEL;
pc->device_id = 0xd93; /* LVF for now */
pc->revision = 1;
--
2.37.3

Re: [PATCH v3 41/42] target/arm: Implement FEAT_HAFDBS


On 10/7/22 09:50, Peter Maydell wrote:

On Fri, 7 Oct 2022 at 17:45, Richard Henderson
 wrote:


On 10/7/22 06:47, Peter Maydell wrote:

Are there definitely no code paths where we might try to do
a page table walk with the iothread already locked ?


I'll double-check, but another possibility is to simply perform the atomic 
operation on
the low 32-bits, where both AF and DB are located.  Another trick I learned 
from x86...


Doesn't that cause a problem where we don't detect that some other
CPU wrote to the high 32 bits of the descriptor ? We're supposed to
be using those high 32 bits, not the ones we have in hand...


Hmm, yes.  Which now makes me wonder if the x86 case is in fact buggy...


If we do need the iothread lock, we could do it the way that
io_readx() does, I guess, where we track whether we needed to
lock it or not.


yes.


r~

Re: [PATCH 1/2] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL


On Thu, 06 Oct 2022, Gregory Price wrote:


Current code sets to STORAGE_EXPRESS and then overrides it.


Good catch.

Reviewed-by: Davidlohr Bueso

Re: [PATCH v6 02/13] blkio: add libblkio block driver

2022-10-07 Thread Stefan Hajnoczi

On Fri, 7 Oct 2022 at 11:41, Markus Armbruster  wrote:
>
> Stefan Hajnoczi  writes:
>
> > libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
> > high-performance disk I/O. It currently supports io_uring,
> > virtio-blk-vhost-user, and virtio-blk-vhost-vdpa with additional drivers
> > under development.
> >
> > One of the reasons for developing libblkio is that other applications
> > besides QEMU can use it. This will be particularly useful for
> > virtio-blk-vhost-user which applications may wish to use for connecting
> > to qemu-storage-daemon.
> >
> > libblkio also gives us an opportunity to develop in Rust behind a C API
> > that is easy to consume from QEMU.
> >
> > This commit adds io_uring, virtio-blk-vhost-user, and
> > virtio-blk-vhost-vdpa BlockDrivers to QEMU using libblkio. It will be
> > easy to add other libblkio drivers since they will share the majority of
> > code.
> >
> > For now I/O buffers are copied through bounce buffers if the libblkio
> > driver requires it. Later commits add an optimization for
> > pre-registering guest RAM to avoid bounce buffers.
> >
> > The syntax is:
> >
> >   --blockdev 
> > io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off
> >
> > and:
> >
> >   --blockdev 
> > virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off
>
> The patch also adds nvme-io_uring.  Shouldn't the commit message mention
> it?

Yes, will fix in the next revision. Thanks!
>
> >
> > Signed-off-by: Stefan Hajnoczi 
> > Acked-by: Markus Armbruster 
> > ---
> >  MAINTAINERS   |   6 +
> >  meson_options.txt |   2 +
> >  qapi/block-core.json  |  75 ++-
> >  meson.build   |   9 +
> >  block/blkio.c | 830 ++
> >  tests/qtest/modules-test.c|   3 +
> >  block/meson.build |   1 +
> >  scripts/meson-buildoptions.sh |   3 +
> >  8 files changed, 925 insertions(+), 4 deletions(-)
> >  create mode 100644 block/blkio.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index e1530b51a2..0dcae6168a 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3403,6 +3403,12 @@ L: qemu-bl...@nongnu.org
> >  S: Maintained
> >  F: block/vdi.c
> >
> > +blkio
> > +M: Stefan Hajnoczi 
> > +L: qemu-bl...@nongnu.org
> > +S: Maintained
> > +F: block/blkio.c
> > +
> >  iSCSI
> >  M: Ronnie Sahlberg 
> >  M: Paolo Bonzini 
> > diff --git a/meson_options.txt b/meson_options.txt
> > index 79c6af18d5..66128178bf 100644
> > --- a/meson_options.txt
> > +++ b/meson_options.txt
> > @@ -117,6 +117,8 @@ option('bzip2', type : 'feature', value : 'auto',
> > description: 'bzip2 support for DMG images')
> >  option('cap_ng', type : 'feature', value : 'auto',
> > description: 'cap_ng support')
> > +option('blkio', type : 'feature', value : 'auto',
> > +   description: 'libblkio block device driver')
> >  option('bpf', type : 'feature', value : 'auto',
> >  description: 'eBPF support')
> >  option('cocoa', type : 'feature', value : 'auto',
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index f21fa235f2..6c6ae2885c 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -2951,11 +2951,18 @@
> >  'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
> >  {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
> >  {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
> > -'http', 'https', 'iscsi',
> > -'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 
> > 'parallels',
> > -'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
> > +'http', 'https',
> > +{ 'name': 'io_uring', 'if': 'CONFIG_BLKIO' },
> > +'iscsi',
> > +'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme',
> > +{ 'name': 'nvme-io_uring', 'if': 'CONFIG_BLKIO' },
>
> This enumeration value and ...
>
> > +'parallels', 'preallocate', 'qcow', 'qcow2', 'qed', 'quorum',
> > +'raw', 'rbd',
> >  { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> > -'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> > +'ssh', 'throttle', 'vdi', 'vhdx',
> > +{ 'name': 'virtio-blk-vhost-user', 'if': 'CONFIG_BLKIO' },
> > +{ 'name': 'virtio-blk-vhost-vdpa', 'if': 'CONFIG_BLKIO' },
> > +'vmdk', 'vpc', 'vvfat' ] }
> >
> >  ##
> >  # @BlockdevOptionsFile:
> > @@ -3678,6 +3685,58 @@
> >  '*debug': 'int',
> >  '*logfile': 'str' } }
> >
> > +##
> > +# @BlockdevOptionsIoUring:
> > +#
> > +# Driver specific block device options for the io_uring backend.
> > +#
> > +# @filename: path to the image file
> > +#
> > +# Since: 7.2
> > +##
> > +{ 'struct': 'BlockdevOptionsIoUring',
> > +  'data': { 'filename': 'str' },
> > +  'if': 'CONFIG_BLKIO' }
> > +
> > +##
> > +#

[PATCH v3] mips/malta: pass RNG seed to to kernel via env var

2022-10-07 Thread Jason A. Donenfeld

As of the kernel commit linked below, Linux ingests an RNG seed
passed as part of the environment block by the bootloader or firmware.
This mechanism works across all different environment block types,
generically, which pass some block via the second firmware argument. On
malta, this has been tested to work when passed as an argument from
U-Boot's linux_env_set.

As is the case on most other architectures (such as boston), when
booting with `-kernel`, QEMU, acting as the bootloader, should pass the
RNG seed, so that the machine has good entropy for Linux to consume. So
this commit implements that quite simply by using the guest random API,
which is what is used on nearly all other archs too. It also
reinitializes the seed on reboot, so that it is always fresh.

Link: https://git.kernel.org/torvalds/c/056a68cea01
Signed-off-by: Jason A. Donenfeld 
---
 hw/mips/malta.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/hw/mips/malta.c b/hw/mips/malta.c
index 0e932988e0..9d793b3c17 100644
--- a/hw/mips/malta.c
+++ b/hw/mips/malta.c
@@ -26,6 +26,7 @@
 #include "qemu/units.h"
 #include "qemu/bitops.h"
 #include "qemu/datadir.h"
+#include "qemu/guest-random.h"
 #include "hw/clock.h"
 #include "hw/southbridge/piix.h"
 #include "hw/isa/superio.h"
@@ -1017,6 +1018,17 @@ static void G_GNUC_PRINTF(3, 4) prom_set(uint32_t 
*prom_buf, int index,
 va_end(ap);
 }
 
+static void reinitialize_rng_seed(void *opaque)
+{
+char *rng_seed_hex = opaque;
+uint8_t rng_seed[32];
+
+qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+for (size_t i = 0; i < sizeof(rng_seed); ++i) {
+sprintf(rng_seed_hex + i * 2, "%02x", rng_seed[i]);
+}
+}
+
 /* Kernel */
 static uint64_t load_kernel(void)
 {
@@ -1028,6 +1040,8 @@ static uint64_t load_kernel(void)
 long prom_size;
 int prom_index = 0;
 uint64_t (*xlate_to_kseg0) (void *opaque, uint64_t addr);
+uint8_t rng_seed[32];
+char rng_seed_hex[sizeof(rng_seed) * 2 + 1];
 
 #if TARGET_BIG_ENDIAN
 big_endian = 1;
@@ -1115,9 +1129,20 @@ static uint64_t load_kernel(void)
 
 prom_set(prom_buf, prom_index++, "modetty0");
 prom_set(prom_buf, prom_index++, "38400n8r");
+
+qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+for (size_t i = 0; i < sizeof(rng_seed); ++i) {
+sprintf(rng_seed_hex + i * 2, "%02x", rng_seed[i]);
+}
+prom_set(prom_buf, prom_index++, "rngseed");
+prom_set(prom_buf, prom_index++, "%s", rng_seed_hex);
+
 prom_set(prom_buf, prom_index++, NULL);
 
 rom_add_blob_fixed("prom", prom_buf, prom_size, ENVP_PADDR);
+qemu_register_reset(reinitialize_rng_seed,
+memmem(rom_ptr(ENVP_PADDR, prom_size), prom_size,
+   rng_seed_hex, sizeof(rng_seed_hex)));
 
 g_free(prom_buf);
 return kernel_entry;
-- 
2.37.3

Re: [PATCH v1] vhost-vdpa : add support for vIOMMU

2022-10-07 Thread Eugenio Perez Martin

On Thu, Oct 6, 2022 at 7:44 AM Cindy Lu  wrote:
>
> Add support for vIOMMU. Register a memory listener to dma_as in
> vhost_vdpa_dev_start
> - during region_add register a specific IOMMU notifier, and store all 
> notifiers in a list.
> - during region_del, compare and delete the IOMMU notifier from the list
> - also change the IOTLB batch flag to support IOMMU batch send
>
> Verified in vp_vdpa and vdpa_sim_net driver
>
> Signed-off-by: Cindy Lu 
> ---
>  hw/virtio/vhost-vdpa.c | 253 +++--
>  include/hw/virtio/vhost-vdpa.h |  26 +++-
>  2 files changed, 267 insertions(+), 12 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 3ff9ce3501..d2ac40c261 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -26,6 +26,7 @@
>  #include "cpu.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "hw/virtio/virtio-access.h"
>
>  /*
>   * Return one past the end of the end of section. Be careful with uint64_t
> @@ -136,14 +137,15 @@ static void vhost_vdpa_listener_begin_batch(struct 
> vhost_vdpa *v)
>  }
>  }
>
> -static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v)
> +static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v,
> +  enum iotlb_batch_flag flag)
>  {
>  if (v->dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH) &&
>  !v->iotlb_batch_begin_sent) {
>  vhost_vdpa_listener_begin_batch(v);
>  }
>
> -v->iotlb_batch_begin_sent = true;
> +v->iotlb_batch_begin_sent |= flag;
>  }
>
>  static void vhost_vdpa_listener_commit(MemoryListener *listener)
> @@ -157,7 +159,7 @@ static void vhost_vdpa_listener_commit(MemoryListener 
> *listener)
>  return;
>  }
>
> -if (!v->iotlb_batch_begin_sent) {
> +if (!(v->iotlb_batch_begin_sent & VDPA_IOTLB_BATCH_SEND)) {
>  return;
>  }
>
> @@ -169,8 +171,7 @@ static void vhost_vdpa_listener_commit(MemoryListener 
> *listener)
>  error_report("failed to write, fd=%d, errno=%d (%s)",
>   fd, errno, strerror(errno));
>  }
> -
> -v->iotlb_batch_begin_sent = false;
> +v->iotlb_batch_begin_sent &= ~VDPA_IOTLB_BATCH_SEND;
>  }
>
>  static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> @@ -186,6 +187,9 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
> *listener,
>  v->iova_range.last)) {
>  return;
>  }
> +if (memory_region_is_iommu(section->mr)) {
> +return;
> +}
>
>  if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) 
> !=
>   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> @@ -227,9 +231,9 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
> *listener,
>  iova = mem_region.iova;
>  }
>
> -vhost_vdpa_iotlb_batch_begin_once(v);
> -ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> - vaddr, section->readonly);
> +vhost_vdpa_iotlb_batch_begin_once(v, VDPA_IOTLB_BATCH_SEND);
> +ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize), vaddr,
> + section->readonly);
>  if (ret) {
>  error_report("vhost vdpa map fail!");
>  goto fail;
> @@ -260,6 +264,9 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
> *listener,
>  v->iova_range.last)) {
>  return;
>  }
> +if (memory_region_is_iommu(section->mr)) {
> +return;
> +}
>
>  if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) 
> !=
>   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> @@ -292,7 +299,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
> *listener,
>  iova = result->iova;
>  vhost_iova_tree_remove(v->iova_tree, result);
>  }
> -vhost_vdpa_iotlb_batch_begin_once(v);
> +vhost_vdpa_iotlb_batch_begin_once(v, VDPA_IOTLB_BATCH_SEND);
>  ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
>  if (ret) {
>  error_report("vhost_vdpa dma unmap error!");
> @@ -312,6 +319,212 @@ static const MemoryListener vhost_vdpa_memory_listener 
> = {
>  .region_del = vhost_vdpa_listener_region_del,
>  };
>
> +static void vhost_vdpa_listener_iommu_commit(MemoryListener *listener)
> +{
> +struct vhost_vdpa *v =
> +container_of(listener, struct vhost_vdpa, iommu_listener);
> +struct vhost_dev *dev = v->dev;
> +struct vhost_msg_v2 msg = {};
> +int fd = v->device_fd;
> +
> +if (!(dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
> +return;
> +}
> +
> +if (!(v->iotlb_batch_begin_sent & VDPA_IOTLB_BATCH_IOMMU_SEND)) {
> +return;
> +}
> +
> +msg.type = v->msg_type;
> +msg.iotlb.type = VHOST_IOTLB_BATCH_END;
> +
> +

Re: [PATCH v3 32/42] target/arm: Extract HA and HD in aa64_va_parameters

On Fri, 7 Oct 2022 at 17:13, Richard Henderson
 wrote:
>
> On 10/7/22 09:11, Peter Maydell wrote:
> > On Fri, 7 Oct 2022 at 16:37, Richard Henderson
> >  wrote:
> >>
> >> On 10/7/22 02:24, Peter Maydell wrote:
>  +.ha = ha,
>  +.hd = ha & hd,
> >>>
> >>> This is a bitwise operation on two bools, should be && ?
> >>
> >> Bitwise works fine, but I can use boolean if you like.
> >>
> >> I'd be surprised (and filing a missed optimization bug) if the compiler 
> >> treated these two
> >> operations differently in this case (simple bool operands with no side 
> >> effects).
> >
> > The different treatment I would expect would be that in the '&'
> > case it warns you about using a bitwise operation on a boolean :-)
>
> Oh, well, no compiler should ever do that, because bool implicitly converts 
> to int for any
> arithmetic, just like char.

Yeah, but -Wbool-operation is there to catch bugs where the bitwise
operation was unintended and the wrong behaviour.

-- PMM

Re: [PATCH v3] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-10-07 Thread Venu Busireddy

On 2022-10-07 06:55:15 -0400, Paolo Bonzini wrote:
> Il gio 6 ott 2022, 15:25 Venu Busireddy  ha
> scritto:
> 
> > I do see that the Solaris driver does send the 0x1a command during
> > the initialization, perhaps (?) seeking the value of UA_INTLCK_CTRL.
> > Since QEMU currently does not support it, QEMU sends back a
> > key/asc/ascq=0x05/0x24/0x00 response, indicating that 0x1a is an Illegal
> > Request.
> 
> 
> What is your QEMU command line and what is the full CDB (apart from 0x1a)?
> 
> I am assuming that the Solaris driver does not handle that
> > response well (I still don't have access to the source code to verify
> > that), confuses itself about the value of UA_INTLCK_CTRL, and hence does
> > not handle the response to the REPORT_LUNS command correctly.
> 
> 
> No this has nothing to do with what's happening. The most likely reason for
> the bug IMO is simple: the event is causing the driver to send the REPORT
> LUNS command, but it does so in a way that does not handle the unit
> attention when it is reported.

I had a developer with access to the Solaris code review how the response
to REPORT_LUNS is being handled. And they do see that the response to
REPORT_LUNS is mishandled.

With the fix proposed in v4, and fixing the handling of REPORT_LUNS
on the Solaris side, we believe we will have a complete working
solution. Therefore, I believe we can conclude this thread on v3.
Do you agree?

Venu

> 
> Maybe the
> > Solaris driver assumes that QEMU will retain the unit attention condition
> > (UA_INTLCK_CTRL = 10b?), and will respond with a REPORTED_LUNS_CHANGED
> > for a subsequent command?
> >
> > Based on your confirmation that we want to handle the REPORT_LUNS command
> > as if UA_INTLCK_CTRL is set to 0, I will proceed with the assumption
> > that the Solaris driver is at fault, and will work with the Solaris
> > driver folks.
> >
> > In the meantime, as you suggested, I will post v4 with the bus unit
> > attention mechanism implemented. We still need that.
> >
> > Venu
> >
> >

Re: [PATCH v2 03/11] bdrv_change_aio_context: use hash table instead of list of visited nodes

Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> Minor performance improvement, but given that we have hash tables
> available, avoid iterating in the visited nodes list every time just
> to check if a node has been already visited.
> 
> The data structure is not actually a proper hash map, but an hash set,
> as we are just adding nodes and not key,value pairs.
> 
> Suggested-by: Hanna Reitz 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf

Re: [PATCH 1/2] vhost-user: Refactor vhost acked features saving

2022-10-07 Thread Hyman Huang





在 2022/10/7 22:01, Michael S. Tsirkin 写道:

On Mon, Sep 26, 2022 at 02:36:40PM +0800, huang...@chinatelecom.cn wrote:

From: Hyman Huang(黄勇) 

Abstract vhost acked features saving into
vhost_user_save_acked_features, export it as util function.

Signed-off-by: Hyman Huang(黄勇) 
Signed-off-by: Guoyi Tu 
---
  include/net/vhost-user.h |  2 ++
  net/vhost-user.c | 35 +++
  2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
index 5bcd8a6285..00d46613d3 100644
--- a/include/net/vhost-user.h
+++ b/include/net/vhost-user.h
@@ -14,5 +14,7 @@
  struct vhost_net;
  struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
  uint64_t vhost_user_get_acked_features(NetClientState *nc);
+void vhost_user_save_acked_features(NetClientState *nc,
+bool cleanup);
  
  #endif /* VHOST_USER_H */

diff --git a/net/vhost-user.c b/net/vhost-user.c
index b1a0247b59..c512cc9727 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -45,24 +45,31 @@ uint64_t vhost_user_get_acked_features(NetClientState *nc)
  return s->acked_features;
  }
  
-static void vhost_user_stop(int queues, NetClientState *ncs[])

+void vhost_user_save_acked_features(NetClientState *nc, bool cleanup)
  {
  NetVhostUserState *s;
+
+s = DO_UPCAST(NetVhostUserState, nc, nc);
+if (s->vhost_net) {
+uint64_t features = vhost_net_get_acked_features(s->vhost_net);
+if (features) {
+s->acked_features = features;
+}


Note it does not set  acked_features if features are 0.
Which might be the case for legacy ...
I will need to analyze this more to figure out what's
the correct behaviour


Thanks Michael for commentting. :)

Indeed, backing up acked_features in the two functions chr_closed_bh() 
vhost_user_stop() are kind of different as above, it also seems a little 
weried for me.


IMHO, we can always keep the acked_features in NetVhostUserState the 
same as acked_features in vhost_dev no matter what features are, since 
this is the role that acked_features in NetVhostUserState plays and we 
can just focus on the validation of acked_features in vhost_dev if 
something goes wrong.


Thanks,
Yong


Stefano? Raphael?


+
+if (cleanup) {
+vhost_net_cleanup(s->vhost_net);
+}
+}
+}
+
+static void vhost_user_stop(int queues, NetClientState *ncs[])
+{
  int i;
  
  for (i = 0; i < queues; i++) {

  assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
  
-s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);

-
-if (s->vhost_net) {
-/* save acked features */
-uint64_t features = vhost_net_get_acked_features(s->vhost_net);
-if (features) {
-s->acked_features = features;
-}
-vhost_net_cleanup(s->vhost_net);
-}
+vhost_user_save_acked_features(ncs[i], true);
  }
  }
  
@@ -251,11 +258,7 @@ static void chr_closed_bh(void *opaque)

  s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
  
  for (i = queues -1; i >= 0; i--) {

-s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
-
-if (s->vhost_net) {
-s->acked_features = vhost_net_get_acked_features(s->vhost_net);
-}
+vhost_user_save_acked_features(ncs[i], false);
  }
  
  qmp_set_link(name, false, );

--
2.27.0




--
Best regard

Hyman Huang(黄勇)

Re: [PATCH v3 32/42] target/arm: Extract HA and HD in aa64_va_parameters


On 10/7/22 02:24, Peter Maydell wrote:

+.ha = ha,
+.hd = ha & hd,


This is a bitwise operation on two bools, should be && ?


Bitwise works fine, but I can use boolean if you like.

I'd be surprised (and filing a missed optimization bug) if the compiler treated these two 
operations differently in this case (simple bool operands with no side effects).



r~

Re: [PATCH v3 41/42] target/arm: Implement FEAT_HAFDBS

On Fri, 7 Oct 2022 at 17:45, Richard Henderson
 wrote:
>
> On 10/7/22 06:47, Peter Maydell wrote:
> > Are there definitely no code paths where we might try to do
> > a page table walk with the iothread already locked ?
>
> I'll double-check, but another possibility is to simply perform the atomic 
> operation on
> the low 32-bits, where both AF and DB are located.  Another trick I learned 
> from x86...

Doesn't that cause a problem where we don't detect that some other
CPU wrote to the high 32 bits of the descriptor ? We're supposed to
be using those high 32 bits, not the ones we have in hand...

If we do need the iothread lock, we could do it the way that
io_readx() does, I guess, where we track whether we needed to
lock it or not.

thanks
-- PMM

Re: [PATCH] target/arm: Make the final stage1+2 write to secure be unconditional


On 10/7/22 09:20, Peter Maydell wrote:

-/* Check if IPA translates to secure or non-secure PA space. */
-if (is_secure) {
-if (ipa_secure) {
-result->attrs.secure =
-!(env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW));
-} else {
-result->attrs.secure =
-!((env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW))
-|| (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW)));
-}
-}


If:
  is_secure == true
  ipa_secure == false
  (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW) is non-zero
  (env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW) is zero
then the old code sets attrs.secure to true...


No, I think the misalignment of the two lines wrt the !() may have been 
confusing:

  if (true) {
if (false) {
} else {
  secure = !((0) || (non-zero))
 = !(1)
 = 0
}
  }


r~




+/*
+ * Check if IPA translates to secure or non-secure PA space.
+ * Note that VSTCR overrides VTCR and {N}SW overrides {N}SA.
+ */
+result->attrs.secure =
+(is_secure
+ && !(env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW))
+ && (ipa_secure
+ || !(env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW;


...but the new code will set it to false, I think ?

thanks
-- PMM

Re: [PATCH v3 32/42] target/arm: Extract HA and HD in aa64_va_parameters

On Fri, 7 Oct 2022 at 16:37, Richard Henderson
 wrote:
>
> On 10/7/22 02:24, Peter Maydell wrote:
> >> +.ha = ha,
> >> +.hd = ha & hd,
> >
> > This is a bitwise operation on two bools, should be && ?
>
> Bitwise works fine, but I can use boolean if you like.
>
> I'd be surprised (and filing a missed optimization bug) if the compiler 
> treated these two
> operations differently in this case (simple bool operands with no side 
> effects).

The different treatment I would expect would be that in the '&'
case it warns you about using a bitwise operation on a boolean :-)

-- PMM

Re: [PATCH] error handling: Use TFR() macro where applicable

On Fri, 7 Oct 2022 at 12:44, Nikita Ivanov  wrote:
>
> Hi!
> Sorry for such a long absence, I've been resolving some other issues in my 
> life for a while. I've adjusted the patch according to your latest comments. 
> Could you check it out, please?

Hi; thanks for coming back to this. (I'd been meaning to re-read the
thread but hadn't found time to do so; sorry.) As Christian says,
you should send the patches as a proper new patchset thread of their
own, but for the moment:

> From 5389c5ccc8789f8f666ab99e50d38af728bd2c9c Mon Sep 17 00:00:00 2001
> From: Nikita Ivanov 
> Date: Wed, 3 Aug 2022 12:54:00 +0300
> Subject: [PATCH 1/2] error handling: Use TFR() macro where applicable
>
> There is a defined TFR() macro in qemu/osdep.h which
> handles the same while loop.
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/415
>
> Signed-off-by: Nikita Ivanov 

> diff --git a/block/file-posix.c b/block/file-posix.c
> index 66fdb07820..7892bdea31 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1238,9 +1238,9 @@ static int hdev_get_max_segments(int fd, struct stat 
> *st)
>  ret = -errno;
>  goto out;
>  }
> -do {
> -ret = read(sysfd, buf, sizeof(buf) - 1);
> -} while (ret == -1 && errno == EINTR);
> +TFR(
> +ret = read(sysfd, buf, sizeof(buf) - 1)
> +);

I think this patch is doing things in the wrong order. Instead of
converting code to use the old macro that we don't like and then
updating it again in patch 2 to use the new macro, we should
first introduce the new macro, and then after that we can update
code that's currently not using a macro at all to use the new one.
This makes code review easier because we don't have to look at a
change to this code which is then going to be rewritten anyway.

> From 7a9fccf00ec2d1c6b30b2ed1cb98398b49ddb0bc Mon Sep 17 00:00:00 2001
> From: Nikita Ivanov 
> Date: Mon, 8 Aug 2022 20:43:45 +0300
> Subject: [PATCH 2/2] Refactoring: rename TFR() to TEMP_FAILURE_RETRY()
>
> glibc's unistd.h header provides the same macro with the
> subtle difference in type casting. Adjust macro name to the
> common standard and refactor it to expression.
>
> Signed-off-by: Nikita Ivanov 

> diff --git a/block/file-posix.c b/block/file-posix.c
> index 7892bdea31..ee7f60c78a 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1238,8 +1238,8 @@ static int hdev_get_max_segments(int fd, struct stat 
> *st)
>  ret = -errno;
>  goto out;
>  }
> -TFR(
> -ret = read(sysfd, buf, sizeof(buf) - 1)
> +ret = TEMP_FAILURE_RETRY(
> +read(sysfd, buf, sizeof(buf) - 1)
>  );

This doesn't need these newlines in it. If the whole thing fits on
a single line, that's easier to read.

>  if (ret < 0) {
>  ret = -errno;

> @@ -1472,8 +1472,8 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData 
> *aiocb)
>  {
>  ssize_t len;
>
> -TFR(
> -len = (aiocb->aio_type & QEMU_AIO_WRITE) ?
> +len = TEMP_FAILURE_RETRY(
> +(aiocb->aio_type & QEMU_AIO_WRITE) ?
>  qemu_pwritev(aiocb->aio_fildes,
> aiocb->io.iov,
> aiocb->io.niov,

I'm not sure why you've put the TEMP_FAILURE_RETRY on the outside here
rather than just on the individual function calls.

> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index b1c161c035..6e244f15fa 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -243,7 +243,13 @@ void QEMU_ERROR("code path is reachable")
>  #define ESHUTDOWN 4099
>  #endif
>
> -#define TFR(expr) do { if ((expr) != -1) break; } while (errno == EINTR)
> +#define TEMP_FAILURE_RETRY(expr) \

We can't call the macro this, because the glibc system headers already
may define a macro of that name, so the compiler will complain if they're
both defined at the same time, and depending on header ordering it might
not be clear which version you're getting.

> +(__extension__  \
> +({ typeof(int64_t) __result;   \

As Christian says, the point of the typeof is to use the type
of the expression. "typeof(int64_t)" is always just "int64_t".
You want "typeof(expr) __result;".

> +   do { \
> +__result = (typeof(int64_t)) (expression); \

Then you don't need this cast, because both __result and expr
are the same type anyway.

Also, how did this compile? 'expression' isn't the name of the macro argument.

> +   } while (__result == -1L && errno == EINTR); \

I think you don't need the 'L' suffix here.

> +   __result; }))

thanks
-- PMM

Re: [PATCH v3 41/42] target/arm: Implement FEAT_HAFDBS


On 10/7/22 06:47, Peter Maydell wrote:

On Sat, 1 Oct 2022 at 18:04, Richard Henderson
 wrote:


Perform the atomic update for hardware management of the access flag
and the dirty bit.

A limitation of the implementation so far is that the page table
itself must already be writable, i.e. the dirty bit for the stage2
page table must already be set, i.e. we cannot set both dirty bits
at the same time.

This is allowed because it is CONSTRAINED UNPREDICTABLE whether any
atomic update happens at all.  The implementation is allowed to simply
fall back on software update at any time.


I can't see where in the Arm ARM this is stated.

In any case, if HA is set you can't simply return ARMFault_AccessFlag
without breaking the bit in D5.4.12 which says
"When hardware updates of the Access flag are enabled for a stage of
  translation an address translation instruction that uses that stage
  of translation will not report that the address will give rise to
  an Access flag fault in the PAR"


I think this may have been loose (or incorrect) reading of what has become R_QSPMS in I_a, 
via "access generates ... a Permission fault", due to the pte page being read-only, due to 
the dirty bit being clear.


However, having performed the same atomic update conversion for x86, I can see now that I 
merely need to perform the lookup for s1 ptw with MMU_DATA_STORE rather than MMU_DATA_LOAD 
in order for the s1 pte page to have its own dirty bit processed and become writable.



+t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 2);   /* FEAT_HAFDBS */


I think we should split the access flag update and the
dirty-bit update into separate commits. It might be useful
for bisection purposes later, and it looks like they're pretty
cleanly separable. (Though if you look at my last comment in this
email, maybe not quite so clean as in the code as you have it here.)


Shouldn't be too hard to split.  I'll try, at least.


+static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
+ uint64_t new_val, const S1TranslateResult *s1,
+ ARMMMUFaultInfo *fi)
+{
+uint64_t cur_val;
+
+if (unlikely(!s1->hphys)) {
+fi->type = ARMFault_UnsuppAtomicUpdate;
+fi->s1ptw = true;
+return 0;
+}
+
+#ifndef CONFIG_ATOMIC64
+/*
+ * We can't support the atomic operation on the host.  We should be
+ * running in round-robin mode though, which means that we would only
+ * race with dma i/o.
+ */
+qemu_mutex_lock_iothread();


Are there definitely no code paths where we might try to do
a page table walk with the iothread already locked ?


I'll double-check, but another possibility is to simply perform the atomic operation on 
the low 32-bits, where both AF and DB are located.  Another trick I learned from x86...



+old_des = descriptor;
+if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+new_des = descriptor | (1ull << 7);   /* S2AP[1] */
+} else {
+new_des = descriptor & ~(1ull << 7);  /* AP[2] */
+}


If we update the prot bits, we need to also re-calculate the exec bit,
I think, because the execute-never stuff depends on whether the page is
writeable. Alternatively you can do it the way the pseudocode does and
pre-figure-out the final permission bits value (see AArch64.S1ApplyOutputPerms()
and AArch64.S2ApplyOutputPerms()) so you only need to calculate the
exec bit once.


Good catch.


r~

Re: [PATCH v2 02/11] block: use transactions as a replacement of ->{can_}set_aio_context()

Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> Simplify the way the aiocontext can be changed in a BDS graph.
> There are currently two problems in bdrv_try_set_aio_context:
> - There is a confusion of AioContext locks taken and released, because
>   we assume that old aiocontext is always taken and new one is
>   taken inside.
> 
> - It doesn't look very safe to call bdrv_drained_begin while some
>   nodes have already switched to the new aiocontext and others haven't.
>   This could be especially dangerous because bdrv_drained_begin polls, so
>   something else could be executed while graph is in an inconsistent
>   state.
> 
> Additional minor nitpick: can_set and set_ callbacks both traverse the
> graph, both using the ignored list of visited nodes in a different way.
> 
> Therefore, get rid of all of this and introduce a new callback,
> change_aio_context, that uses transactions to efficiently, cleanly
> and most importantly safely change the aiocontext of a graph.
> 
> This new callback is a "merge" of the two previous ones:
> - Just like can_set_aio_context, recursively traverses the graph.
>   Marks all nodes that are visited using a GList, and checks if
>   they *could* change the aio_context.
> - For each node that passes the above check, drain it and add a new 
> transaction
>   that implements a callback that effectively changes the aiocontext.
> - Once done, the recursive function returns if *all* nodes can change
>   the AioContext. If so, commit the above transactions.
>   Regardless of the outcome, call transaction.clean() to undo all drains
>   done in the recursion.
> - The transaction list is scanned only after all nodes are being drained, so
>   we are sure that they all are in the same context, and then
>   we switch their AioContext, concluding the drain only after all nodes
>   switched to the new AioContext. In this way we make sure that
>   bdrv_drained_begin() is always called under the old AioContext, and
>   bdrv_drained_end() under the new one.
> - Because of the above, we don't need to release and re-acquire the
>   old AioContext every time, as everything is done once (and not
>   per-node drain and aiocontext change).
> 
> Note that the "change" API is not yet invoked anywhere.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

For future work, please change the way you construct your series. It's
not good practice to have many patches that just add dead code (and even
patches that optimise that dead code!) and then a final patch to enable
everything at once.

It's not only hard to review because you never know what to compare it
to, but also any regression will always happen on the final patch and
you can't know which patch actually contains the broken code.

Or looking at it from a slightly different angle, we should also try to
ensure that the code makes sense after each individual commit. Having
lots of duplicated code doesn't necessarily make a lot of sense.

> diff --git a/block.c b/block.c
> index 58a9cfc8b7..c80e49009a 100644
> --- a/block.c
> +++ b/block.c
> @@ -108,6 +108,10 @@ static void bdrv_reopen_abort(BDRVReopenState 
> *reopen_state);
>  
>  static bool bdrv_backing_overridden(BlockDriverState *bs);
>  
> +static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
> +GSList **visited, Transaction *tran,
> +Error **errp);
> +
>  /* If non-zero, use only whitelisted block drivers */
>  static int use_bdrv_whitelist;
>  
> @@ -7325,7 +7329,7 @@ static void bdrv_attach_aio_context(BlockDriverState 
> *bs,
>   * must not own the AioContext lock for new_context (unless new_context is 
> the
>   * same as the current context of bs).
>   *
> - * @ignore will accumulate all visited BdrvChild object. The caller is
> + * @ignore will accumulate all visited BdrvChild objects. The caller is
>   * responsible for freeing the list afterwards.
>   */
>  void bdrv_set_aio_context_ignore(BlockDriverState *bs,
> @@ -7434,6 +7438,38 @@ static bool bdrv_parent_can_set_aio_context(BdrvChild 
> *c, AioContext *ctx,
>  return true;
>  }
>  
> +typedef struct BdrvStateSetAioContext {
> +AioContext *new_ctx;
> +BlockDriverState *bs;
> +} BdrvStateSetAioContext;
> +
> +static bool bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
> +   GSList **visited, Transaction 
> *tran,
> +   Error **errp)
> +{
> +GLOBAL_STATE_CODE();
> +if (g_slist_find(*visited, c)) {
> +return true;
> +}
> +*visited = g_slist_prepend(*visited, c);
> +
> +/*
> + * A BdrvChildClass that doesn't handle AioContext changes cannot
> + * tolerate any AioContext changes
> + */
> +if (!c->klass->change_aio_ctx) {
> +char *user = bdrv_child_user_desc(c);
> +error_setg(errp, "Changing iothreads is not supported by %s", user);
> +g_free(user);
>

[PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange

From: Huai-Cheng Kuo 

The CDAT can be specified in two ways. One is to add ",cdat="
in "-device cxl-type3"'s command option. The file is required to provide
the whole CDAT table in binary mode. The other is to use the default
that provides some 'reasonable' numbers based on type of memory and
size.

The DOE capability supporting CDAT is added to hw/mem/cxl_type3.c with
capability offset 0x190. The config read/write to this capability range
can be generated in the OS to request the CDAT data.

Signed-off-by: Huai-Cheng Kuo 
Signed-off-by: Chris Browy 
Signed-off-by: Jonathan Cameron 

--
Changes since RFC:
- Break out type 3 user of library as separate patch.
- Change reported data for default to be based on the options provided
  for the type 3 device.
---
 hw/mem/cxl_type3.c | 227 +
 1 file changed, 227 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 568c9d62f5..3fa5d70662 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -12,9 +12,218 @@
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
+#include "sysemu/numa.h"
 #include "hw/cxl/cxl.h"
 #include "hw/pci/msix.h"
 
+#define DWORD_BYTE 4
+
+static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
+void *priv)
+{
+g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
+g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
+g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
+CXLType3Dev *ct3d = priv;
+int len = 0;
+int i = 0;
+int next_dsmad_handle = 0;
+int nonvolatile_dsmad = -1;
+int dslbis_nonvolatile_num = 4;
+MemoryRegion *mr;
+
+/* Non volatile aspects */
+if (ct3d->hostmem) {
+dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
+if (!dsmas_nonvolatile) {
+return -ENOMEM;
+}
+nonvolatile_dsmad = next_dsmad_handle++;
+mr = host_memory_backend_get_memory(ct3d->hostmem);
+if (!mr) {
+return -EINVAL;
+}
+*dsmas_nonvolatile = (CDATDsmas) {
+.header = {
+.type = CDAT_TYPE_DSMAS,
+.length = sizeof(*dsmas_nonvolatile),
+},
+.DSMADhandle = nonvolatile_dsmad,
+.flags = CDAT_DSMAS_FLAG_NV,
+.DPA_base = 0,
+.DPA_length = int128_get64(mr->size),
+};
+len++;
+
+/* For now, no memory side cache, plausiblish numbers */
+dslbis_nonvolatile = g_malloc(sizeof(*dslbis_nonvolatile) * 
dslbis_nonvolatile_num);
+if (!dslbis_nonvolatile)
+return -ENOMEM;
+
+dslbis_nonvolatile[0] = (CDATDslbis) {
+.header = {
+.type = CDAT_TYPE_DSLBIS,
+.length = sizeof(*dslbis_nonvolatile),
+},
+.handle = nonvolatile_dsmad,
+.flags = HMAT_LB_MEM_MEMORY,
+.data_type = HMAT_LB_DATA_READ_LATENCY,
+.entry_base_unit = 1, /* 10ns base */
+.entry[0] = 15, /* 150ns */
+};
+len++;
+
+dslbis_nonvolatile[1] = (CDATDslbis) {
+.header = {
+.type = CDAT_TYPE_DSLBIS,
+.length = sizeof(*dslbis_nonvolatile),
+},
+.handle = nonvolatile_dsmad,
+.flags = HMAT_LB_MEM_MEMORY,
+.data_type = HMAT_LB_DATA_WRITE_LATENCY,
+.entry_base_unit = 1,
+.entry[0] = 25, /* 250ns */
+};
+len++;
+   
+dslbis_nonvolatile[2] = (CDATDslbis) {
+.header = {
+.type = CDAT_TYPE_DSLBIS,
+.length = sizeof(*dslbis_nonvolatile),
+},
+.handle = nonvolatile_dsmad,
+.flags = HMAT_LB_MEM_MEMORY,
+.data_type = HMAT_LB_DATA_READ_BANDWIDTH,
+.entry_base_unit = 1000, /* GB/s */
+.entry[0] = 16,
+};
+len++;
+
+dslbis_nonvolatile[3] = (CDATDslbis) {
+.header = {
+.type = CDAT_TYPE_DSLBIS,
+.length = sizeof(*dslbis_nonvolatile),
+},
+.handle = nonvolatile_dsmad,
+.flags = HMAT_LB_MEM_MEMORY,
+.data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
+.entry_base_unit = 1000, /* GB/s */
+.entry[0] = 16,
+};
+len++;
+
+mr = host_memory_backend_get_memory(ct3d->hostmem);
+if (!mr) {
+return -EINVAL;
+}
+dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
+*dsemts_nonvolatile = (CDATDsemts) {
+.header = {
+.type = CDAT_TYPE_DSEMTS,
+.length = sizeof(*dsemts_nonvolatile),
+},
+.DSMAS_handle = nonvolatile_dsmad,
+.EFI_memory_type_attr = 2, /* Reserved - the non volatile from 
DSMAS matters */
+.DPA_offset = 0,
+.DPA_length =

Re: [PATCH v1 5/8] migration: Export dirty-limit time info

2022-10-07 Thread Hyman Huang





在 2022/10/7 23:09, Markus Armbruster 写道:

Hyman Huang  writes:


在 2022/10/2 2:31, Markus Armbruster 写道:

huang...@chinatelecom.cn writes:


From: Hyman Huang(黄勇) 

Export dirty limit throttle time and estimated ring full
time, through which we can observe the process of dirty
limit during live migration.

Signed-off-by: Hyman Huang(黄勇) 

[...]


diff --git a/qapi/migration.json b/qapi/migration.json
index bc4bc96..c263d54 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -242,6 +242,12 @@
   #   Present and non-empty when migration is blocked.
   #   (since 6.0)
   #
+# @dirty-limit-throttle-us-per-full: Throttle time (us) during the period of
+#dirty ring full (since 7.0)
+#
+# @dirty-limit-us-ring-full: Estimated periodic time (us) of dirty ring full.
+#(since 7.0)
+#


Can you explain what is measured here a bit more verbosely?


The two fields of migration info aims to export dirty-limit throttle time so 
that upper apps can check out the process of live migration,
like 'cpu-throttle-percentage'.

The commit "tests: Add migration dirty-limit capability test" make use of the 
'dirty-limit-throttle-us-per-full' to checkout if dirty-limit has
started, the commit "tests/migration: Introduce dirty-limit into guestperf" 
introduce the two field so guestperf tools also show the
process of dirty-limit migration.

And i also use qmp_query_migrate to observe the migration by checkout these two 
fields.

I'm not sure if above explantation is what you want exactly, please be free to 
start any discussion about this features.


You explained use cases, which is always welcome.

I'm trying to understand the two new members' meaning, i.e. what exactly
is being measured.


dirty-limit-throttle-us-per-full：
Means the time vCPU should sleep once it's dirty ring get full, since we 
set limit on vCPU every time it returns to Qemu for the 
KVM_EXIT_DIRTY_RING_FULL reason, the sleep time may also changes everty 
time dirty ring get full. 'dirty-limit-throttle-us-per-full' can be 
simplified as 'throttle time(us) every time vCPU's dirty ring full get 
full'. The 'dirty-limit' is just the prefix to mark that parameter is 
dirty-limit-related.


dirty-limit-us-ring-full:
It is an estimated value which means the time a vCPU's dirty ring get 
full. It depends on the vCPU's dirty page rate, the higher the rate is, 
the smaller dirty-limit-us-ring-full is.


dirty-limit-throttle-us-per-full / dirty-limit-us-ring-full * 100 is 
kind of like 'cpu-throttle-percentage'.


Thanks,

Yong



For existing @cpu-throttle-percentage, the doc comment tells me:
"percentage of time guest cpus are being throttled during
auto-converge."

For the your new members, the doc comment tries to tell me, but it
doesn't succeed.  If you explain what is being measured more verbosely,
we may be able to improve the doc comment.



--
Best regard

Hyman Huang(黄勇)

[PATCH v7 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation

From: Huai-Cheng Kuo 

The Data Object Exchange implementation of CXL Coherent Device Attribute
Table (CDAT). This implementation is referring to "Coherent Device
Attribute Table Specification, Rev. 1.02, Oct. 2020" and "Compute
Express Link Specification, Rev. 2.0, Oct. 2020"

This patch adds core support that will be shared by both
end-points and switch port emulation.

Signed-off-by: Huai-Cheng Kuo 
Signed-off-by: Chris Browy 
Signed-off-by: Jonathan Cameron 

---
Changes since RFC:
- Split out libary code from specific device.
---
 hw/cxl/cxl-cdat.c  | 222 +
 hw/cxl/meson.build |   1 +
 include/hw/cxl/cxl_cdat.h  | 165 
 include/hw/cxl/cxl_component.h |   7 ++
 include/hw/cxl/cxl_device.h|   3 +
 include/hw/cxl/cxl_pci.h   |   1 +
 6 files changed, 399 insertions(+)

diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
new file mode 100644
index 00..137178632b
--- /dev/null
+++ b/hw/cxl/cxl-cdat.c
@@ -0,0 +1,222 @@
+/*
+ * CXL CDAT Structure
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+
+static void cdat_len_check(CDATSubHeader *hdr, Error **errp)
+{
+assert(hdr->length);
+assert(hdr->reserved == 0);
+
+switch (hdr->type) {
+case CDAT_TYPE_DSMAS:
+assert(hdr->length == sizeof(CDATDsmas));
+break;
+case CDAT_TYPE_DSLBIS:
+assert(hdr->length == sizeof(CDATDslbis));
+break;
+case CDAT_TYPE_DSMSCIS:
+assert(hdr->length == sizeof(CDATDsmscis));
+break;
+case CDAT_TYPE_DSIS:
+assert(hdr->length == sizeof(CDATDsis));
+break;
+case CDAT_TYPE_DSEMTS:
+assert(hdr->length == sizeof(CDATDsemts));
+break;
+case CDAT_TYPE_SSLBIS:
+assert(hdr->length >= sizeof(CDATSslbisHeader));
+assert((hdr->length - sizeof(CDATSslbisHeader)) %
+   sizeof(CDATSslbe) == 0);
+break;
+default:
+error_setg(errp, "Type %d is reserved", hdr->type);
+}
+}
+
+static void ct3_build_cdat(CDATObject *cdat, Error **errp)
+{
+g_autofree CDATTableHeader *cdat_header = NULL;
+g_autofree CDATEntry *cdat_st = NULL;
+uint8_t sum = 0;
+int ent, i;
+
+/* Use default table if fopen == NULL */
+assert(cdat->build_cdat_table);
+
+cdat_header = g_malloc0(sizeof(*cdat_header));
+if (!cdat_header) {
+error_setg(errp, "Failed to allocate CDAT header");
+return;
+}
+
+cdat->built_buf_len = cdat->build_cdat_table(>built_buf, 
cdat->private);
+
+if (!cdat->built_buf_len) {
+/* Build later as not all data available yet */
+cdat->to_update = true;
+return;
+}
+cdat->to_update = false;
+
+cdat_st = g_malloc0(sizeof(*cdat_st) * (cdat->built_buf_len + 1));
+if (!cdat_st) {
+error_setg(errp, "Failed to allocate CDAT entry array");
+return;
+}
+
+/* Entry 0 for CDAT header, starts with Entry 1 */
+for (ent = 1; ent < cdat->built_buf_len + 1; ent++) {
+CDATSubHeader *hdr = cdat->built_buf[ent - 1];
+uint8_t *buf = (uint8_t *)cdat->built_buf[ent - 1];
+
+cdat_st[ent].base = hdr;
+cdat_st[ent].length = hdr->length;
+
+cdat_header->length += hdr->length;
+for (i = 0; i < hdr->length; i++) {
+sum += buf[i];
+}
+}
+
+/* CDAT header */
+cdat_header->revision = CXL_CDAT_REV;
+/* For now, no runtime updates */
+cdat_header->sequence = 0;
+cdat_header->length += sizeof(CDATTableHeader);
+sum += cdat_header->revision + cdat_header->sequence +
+cdat_header->length;
+/* Sum of all bytes including checksum must be 0 */
+cdat_header->checksum = ~sum + 1;
+
+cdat_st[0].base = g_steal_pointer(_header);
+cdat_st[0].length = sizeof(*cdat_header);
+cdat->entry_len = 1 + cdat->built_buf_len;
+cdat->entry = g_steal_pointer(_st);
+}
+
+static void ct3_load_cdat(CDATObject *cdat, Error **errp)
+{
+g_autofree CDATEntry *cdat_st = NULL;
+uint8_t sum = 0;
+int num_ent;
+int i = 0, ent = 1, file_size = 0;
+CDATSubHeader *hdr;
+FILE *fp = NULL;
+
+/* Read CDAT file and create its cache */
+fp = fopen(cdat->filename, "r");
+if (!fp) {
+error_setg(errp, "CDAT: Unable to open file");
+return;
+}
+
+fseek(fp, 0, SEEK_END);
+file_size = ftell(fp);
+fseek(fp, 0, SEEK_SET);
+cdat->buf = g_malloc0(file_size);
+
+if (fread(cdat->buf, file_size, 1, fp) == 0) {
+error_setg(errp, "CDAT: File read failed");
+return;
+}
+
+fclose(fp);
+
+if (file_size < sizeof(CDATTableHeader)) {
+error_setg(errp, "CDAT: File

Re: [PATCH 1/2] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL

On Thu,  6 Oct 2022 19:37:01 -0400
Gregory Price  wrote:

> Current code sets to STORAGE_EXPRESS and then overrides it.
> 
> Signed-off-by: Gregory Price 

I'm carry the same patch after you reported it the other day.

Reviewed-by: Jonathan Cameron 

> ---
>  hw/mem/cxl_type3.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index ada2108fac..1837c1c83a 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -146,7 +146,6 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>  }
>  
>  pci_config_set_prog_interface(pci_conf, 0x10);
> -pci_config_set_class(pci_conf, PCI_CLASS_MEMORY_CXL);
>  
>  pcie_endpoint_cap_init(pci_dev, 0x80);
>  cxl_cstate->dvsec_offset = 0x100;
> @@ -335,7 +334,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
>  
>  pc->realize = ct3_realize;
>  pc->exit = ct3_exit;
> -pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
> +pc->class_id = PCI_CLASS_MEMORY_CXL;
>  pc->vendor_id = PCI_VENDOR_ID_INTEL;
>  pc->device_id = 0xd93; /* LVF for now */
>  pc->revision = 1;

Re: [PATCH v3 32/42] target/arm: Extract HA and HD in aa64_va_parameters


On 10/7/22 09:11, Peter Maydell wrote:

On Fri, 7 Oct 2022 at 16:37, Richard Henderson
 wrote:


On 10/7/22 02:24, Peter Maydell wrote:

+.ha = ha,
+.hd = ha & hd,


This is a bitwise operation on two bools, should be && ?


Bitwise works fine, but I can use boolean if you like.

I'd be surprised (and filing a missed optimization bug) if the compiler treated 
these two
operations differently in this case (simple bool operands with no side effects).


The different treatment I would expect would be that in the '&'
case it warns you about using a bitwise operation on a boolean :-)


Oh, well, no compiler should ever do that, because bool implicitly converts to int for any 
arithmetic, just like char.



r~

Re: [PATCH] block/io_uring: revert "Use io_uring_register_ring_fd() to skip fd operations"

2022-10-07 Thread Dario Faggioli

Yes, we did hit this bug as well, in the QEMU 7.1 package, for openSUSE
Tumbleweed (more info
here: https://bugzilla.suse.com/show_bug.cgi?id=1204082)

FWIW, I can confirm that applying this patch fixes the issue, so this
can have:

On Sat, 2022-09-24 at 22:48 +0800, Sam Li wrote:
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1193
> 
> [...]
>
> This reverts commit e2848bc574fe2715c694bf8fe9a1ba7f78a1125a
> and 77e3f038af1764983087e3551a0fde9951952c4d.
> 
> Signed-off-by: Sam Li 
>
Tested-by: Dario Faggioli 

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

[PATCH v7 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0

Whilst I have carried on Huai-Cheng Kuo's series version numbering and
naming, there have been very substantial changes since v6 so I would
suggest fresh review makes sense for anyone who has looked at this before.
In particularly if the Avery design folks could check I haven't broken
anything that would be great.

For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbr...@avery-design.com/

Summary of changes:
1) Linux headers definitions for DOE are now upstream so drop that patch.
2) Add CDAT for switch upstream port.
3) Generate 'plausible' default CDAT tables when a file is not provided.
4) General refactoring to calculate the correct table sizes and allocate
based on that rather than copying from a local static array.
5) Changes from earlier reviews such as matching QEMU type naming style.
6) Moved compliance and SPDM usecases to future patch sets.

Sign-offs on these are complex because the patches were originally developed
by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by Jonathan
Cameron who made substantial changes.

Huai-Cheng Kuo / Chris Browy, please confirm you are still happy to maintain
this
code as per the original MAINTAINERS entry.

What's here?

This series brings generic PCI Express Data Object Exchange support (DOE)
DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in PCI
config space via a PCIe Extended Capability Structure.
The PCIe spec defines several protocols (including one to discover what
protocols a given DOE instance supports) and other specification such as
CXL define additional protocols using their own vendor IDs.

In this series we make use of the DOE to support the CXL spec defined
Table Access Protocol, specifically to provide access to CDAT - a
table specified in a specification that is hosted by the UEFI forum
and is used to provide runtime discoverability of the sort of information
that would otherwise be available in firmware tables (memory types,
latency and bandwidth information etc).

The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0.
The version merged did not support interrupts (earlier versions did
so that support in the emulation was tested a while back).

This series provides CDAT emulation for CXL switch upstream ports
and CXL type 3 memory devices. Note that to exercise the switch support
additional Linux kernel patches are needed.
https://lore.kernel.org/linux-cxl/20220503153449.4088-1-jonathan.came...@huawei.com/
(I'll post a new version of that support shortly)

Additional protocols will be supported by follow on patch sets:
* CXL compliance protocol.
* CMA / SPDM device attestation.
(Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will refresh
that tree next week)

Huai-Cheng Kuo (3):
hw/pci: PCIe Data Object Exchange emulation
hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange

Jonathan Cameron (2):
hw/mem/cxl-type3: Add MSIX support
hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE

--
2.37.2

Re: [PATCH] target/arm: Make the final stage1+2 write to secure be unconditional

On Fri, 7 Oct 2022 at 16:22, Richard Henderson
 wrote:
>
> While the stage2 call to get_phys_addr_lpae should never set
> attrs.secure when given a non-secure input, it's just as easy
> to make the final update to attrs.secure be unconditional and
> false in the case of non-secure input.
>
> Suggested-by: Peter Maydell 
> Signed-off-by: Richard Henderson 
> ---
>
> Hi Peter,
>
> This is the promised patch 1.5 for v3 FEAT_HAFDBS.  It generates minor
> conflicts down the line, which I have already fixed up locally.  I think
> the first one you would encounter is beyond the proposed 20 that you
> indicated that you intend to take into target-arm.next right now.
>
>
> r~
>
> ---
>  target/arm/ptw.c | 21 ++---
>  1 file changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/target/arm/ptw.c b/target/arm/ptw.c
> index b8c494ad9f..7d763a5847 100644
> --- a/target/arm/ptw.c
> +++ b/target/arm/ptw.c
> @@ -2365,17 +2365,16 @@ bool get_phys_addr(CPUARMState *env, target_ulong 
> address,
>  result->cacheattrs = combine_cacheattrs(env, cacheattrs1,
>  result->cacheattrs);
>
> -/* Check if IPA translates to secure or non-secure PA space. */
> -if (is_secure) {
> -if (ipa_secure) {
> -result->attrs.secure =
> -!(env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW));
> -} else {
> -result->attrs.secure =
> -!((env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW))
> -|| (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW)));
> -}
> -}

If:
 is_secure == true
 ipa_secure == false
 (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW) is non-zero
 (env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW) is zero
then the old code sets attrs.secure to true...

> +/*
> + * Check if IPA translates to secure or non-secure PA space.
> + * Note that VSTCR overrides VTCR and {N}SW overrides {N}SA.
> + */
> +result->attrs.secure =
> +(is_secure
> + && !(env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW))
> + && (ipa_secure
> + || !(env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW;

...but the new code will set it to false, I think ?

thanks
-- PMM

Re: [PATCH v2 04/11] bdrv_child_try_change_aio_context: add transaction parameter

Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> This enables the caller to use the same transaction to also
> keep track of aiocontext changes.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

What you're really doing here is factoring out the recursive phase.
However, the factored out function is never used from anywhere else,
so I don't understand the purpose of this patch. It feels like an
unnecessary complication of the code.

The commit message is unclear to me, too: Who is the caller of
bdrv_child_try_change_aio_context() that it mentions, and why does it
make a difference to it how the code is organised internally?

Is this some artifact of changes you made and we don't need it any more
now?

>  block.c| 31 --
>  include/block/block-global-state.h |  5 +
>  2 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/block.c b/block.c
> index c02a628336..221bf90268 100644
> --- a/block.c
> +++ b/block.c
> @@ -7643,17 +7643,16 @@ int bdrv_child_try_set_aio_context(BlockDriverState 
> *bs, AioContext *ctx,
>   * For the same reason, it temporarily holds also the new AioContext, since
>   * bdrv_drained_end calls BDRV_POLL_WHILE that assumes the lock is taken too.
>   */
> -int bdrv_child_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
> -  BdrvChild *ignore_child, Error **errp)
> +int bdrv_child_try_change_aio_context_tran(BlockDriverState *bs,
> +   AioContext *ctx,
> +   BdrvChild *ignore_child,
> +   Transaction *tran,
> +   Error **errp)

As mentioned above, this is never used anywhere else than from
bdrv_child_try_change_aio_context(), so if we want to keep the patch, it
should be static at least.

Maybe find a better name, too, because all of the transaction related
operations are in the caller.

The function comment is not accurate any more either because it
described the whole of bdrv_child_try_change_aio_context(), while this
function only contains the recursive part.

Kevin

Re: [PATCH v2 0/3] fix for two ACPI GTDT physical addresses

2022-10-07 Thread Ani Sinha

On Fri, Oct 7, 2022 at 8:16 PM Miguel Luis  wrote:
>
> The ACPI GTDT table contains two invalid 64-bit physical addresses according 
> to
> the ACPI spec. 6.5 [1]. Those are the Counter Control Base physical address 
> and
> the Counter Read Base physical address. Those fields of the GTDT table should 
> be
> set to 0x if not provided, rather than 0x0.
>
> [1]: 
> https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gtdt-table-structure
>
> Changelog:
>
> v2:
> Updated with collected tags from v1.

For future reference, there is no need to send out a new version with
just the tags added. The tooling make sure that the tags are collected
correctly from the last version.

>
> v1: https://lists.nongnu.org/archive/html/qemu-devel/2022-09/msg02847.html
>
> Miguel Luis (3):
>   tests/acpi: virt: allow acpi GTDT changes
>   acpi: arm/virt: build_gtdt: fix invalid 64-bit physical addresses
>   tests/acpi: virt: update ACPI GTDT binaries
>
>  hw/arm/virt-acpi-build.c  |   5 ++---
>  tests/data/acpi/virt/GTDT | Bin 96 -> 96 bytes
>  tests/data/acpi/virt/GTDT.memhp   | Bin 96 -> 96 bytes
>  tests/data/acpi/virt/GTDT.numamem | Bin 96 -> 96 bytes
>  4 files changed, 2 insertions(+), 3 deletions(-)
>
> --
> 2.37.3
>

Re: [PATCH v3 41/42] target/arm: Implement FEAT_HAFDBS

On Fri, 7 Oct 2022 at 14:47, Peter Maydell  wrote:
> Do we really need to go all the way back to restart_atomic_update?

> Are we allowed to do the access and dirty bit updates with separate
> atomic accesses?

I've just discovered that the latest revision of the Arm ARM (rev I.a)
is clearer on this -- R_SGJBL and I_YZSVV clearly say that we need to
go back to restart_atomic_update for dirty bit updates, and it's a
reasonable assumption that this is true also for atomic updates.
And R_PRHKD says you're permitted to do everything in one rmw,
which clearly implies that you're permitted also not to. And if
you restart the descriptor handling it's architecturally not
distinguishable whether you did one rmw or two. So both of these
are fine the way you have them in your patch.

thanks
-- PMM

Re: [PATCH v3 27/42] target/arm: Use softmmu tlbs for page table walking

On Fri, 7 Oct 2022 at 16:27, Richard Henderson
 wrote:
>
> On 10/7/22 02:01, Peter Maydell wrote:
> > The upcoming v8R support has its stage 2 attributes in the MAIR
> > format, so it might be a little awkward to assume the v8A-stage-2
> > format here rather than being able to add the "if !is_s2_format"
> > condition. I guess we'll deal with that when we get to it...
>
> Ah.  I had wondered whether it would be better to convert the result here, so 
> that we
> always have the MAIR format.  I decided against it within the scope of this 
> patch set
> because it meant that I kept the existing s1+s2 attribute merging logic 
> unchanged.

Unfortunately, for A-profile you can't just convert the s2 attrs
to MAIR format, because their interpretation depends on the
s1 attr values in the FWB case. So you have to keep the s2
attrs as raw until they get to the point of combination.
(v8R doesn't have any equivalent of FWB.)

thanks
-- PMM

Re: [PATCH v3 27/42] target/arm: Use softmmu tlbs for page table walking


On 10/7/22 02:01, Peter Maydell wrote:

The upcoming v8R support has its stage 2 attributes in the MAIR
format, so it might be a little awkward to assume the v8A-stage-2
format here rather than being able to add the "if !is_s2_format"
condition. I guess we'll deal with that when we get to it...


Ah.  I had wondered whether it would be better to convert the result here, so that we 
always have the MAIR format.  I decided against it within the scope of this patch set 
because it meant that I kept the existing s1+s2 attribute merging logic unchanged.



+/*
+ * Allow S1_ptw_translate to see any fault generated here.
+ * Since this may recurse, read and clear.
+ */
+fi = cpu->env.tlb_fi;
+if (fi) {
+cpu->env.tlb_fi = NULL;
+} else {
+fi = memset(_fi, 0, sizeof(local_fi));
+}


This makes two architectures now that want to do "call a probe_access
function, and get information that's known in the architecture-specific
tlb_fill function", and need to do it via this awkward "have tlb_fill
know that it should stash the info away in the CPU state struct somewhere"
trick (the other being s390 tlb_fill_exc/tlb_fill_tec). But I don't
really have a better idea...


A better idea would be most welcome, if anyone has one...  :-)


r~

Re: [PATCH v1 5/8] migration: Export dirty-limit time info

2022-10-07 Thread Markus Armbruster

Hyman Huang  writes:

> 在 2022/10/2 2:31, Markus Armbruster 写道:
>> huang...@chinatelecom.cn writes:
>> 
>>> From: Hyman Huang(黄勇) 
>>>
>>> Export dirty limit throttle time and estimated ring full
>>> time, through which we can observe the process of dirty
>>> limit during live migration.
>>>
>>> Signed-off-by: Hyman Huang(黄勇) 
>> [...]
>> 
>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index bc4bc96..c263d54 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -242,6 +242,12 @@
>>>   #   Present and non-empty when migration is blocked.
>>>   #   (since 6.0)
>>>   #
>>> +# @dirty-limit-throttle-us-per-full: Throttle time (us) during the period 
>>> of
>>> +#dirty ring full (since 7.0)
>>> +#
>>> +# @dirty-limit-us-ring-full: Estimated periodic time (us) of dirty ring 
>>> full.
>>> +#(since 7.0)
>>> +#
>>
>> Can you explain what is measured here a bit more verbosely?
>
> The two fields of migration info aims to export dirty-limit throttle time so 
> that upper apps can check out the process of live migration, 
> like 'cpu-throttle-percentage'.
>
> The commit "tests: Add migration dirty-limit capability test" make use of the 
> 'dirty-limit-throttle-us-per-full' to checkout if dirty-limit has 
> started, the commit "tests/migration: Introduce dirty-limit into guestperf" 
> introduce the two field so guestperf tools also show the 
> process of dirty-limit migration.
>
> And i also use qmp_query_migrate to observe the migration by checkout these 
> two fields.
>
> I'm not sure if above explantation is what you want exactly, please be free 
> to start any discussion about this features.

You explained use cases, which is always welcome.

I'm trying to understand the two new members' meaning, i.e. what exactly
is being measured.

For existing @cpu-throttle-percentage, the doc comment tells me:
"percentage of time guest cpus are being throttled during
auto-converge."

For the your new members, the doc comment tries to tell me, but it
doesn't succeed.  If you explain what is being measured more verbosely,
we may be able to improve the doc comment.

Re: [PATCH] m25p80: Add the w25q01jvq SFPD table

On [2022 Oct 06] Thu 17:44:24, Patrick Williams wrote:
> Generated from hardware using the following command and then padding
> with 0xff to fill out a power-of-2:
> hexdump -v -e '8/1 "0x%02x, " "\n"' sfdp`
> 
> Signed-off-by: Patrick Williams 
> ---
>  hw/block/m25p80.c  |  3 ++-
>  hw/block/m25p80_sfdp.c | 36 
>  hw/block/m25p80_sfdp.h |  2 ++
>  3 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index 8ba9d732a3..86343160ef 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -349,7 +349,8 @@ static const FlashPartInfo known_devices[] = {
>.sfdp_read = m25p80_sfdp_w25q256 },
>  { INFO("w25q512jv",   0xef4020,  0,  64 << 10, 1024, ER_4K),
>.sfdp_read = m25p80_sfdp_w25q512jv },
> -{ INFO("w25q01jvq",   0xef4021,  0,  64 << 10, 2048, ER_4K) },
> +{ INFO("w25q01jvq",   0xef4021,  0,  64 << 10, 2048, ER_4K),
> +  .sfdp_read = m25p80_sfdp_w25q01jvq },
>  };
>  
>  typedef enum {
> diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c
> index dad3d7e64f..77615fa29e 100644
> --- a/hw/block/m25p80_sfdp.c
> +++ b/hw/block/m25p80_sfdp.c
> @@ -294,3 +294,39 @@ static const uint8_t sfdp_w25q512jv[] = {
>  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>  };
>  define_sfdp_read(w25q512jv);
> +
> +static const uint8_t sfdp_w25q01jvq[] = {
> +0x53, 0x46, 0x44, 0x50, 0x06, 0x01, 0x01, 0xff,
> +0x00, 0x06, 0x01, 0x10, 0x80, 0x00, 0x00, 0xff,
> +0x84, 0x00, 0x01, 0x02, 0xd0, 0x00, 0x00, 0xff,
> +0x03, 0x00, 0x01, 0x02, 0xf0, 0x00, 0x00, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xe5, 0x20, 0xfb, 0xff, 0xff, 0xff, 0xff, 0x3f,
> +0x44, 0xeb, 0x08, 0x6b, 0x08, 0x3b, 0x42, 0xbb,
> +0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
> +0xff, 0xff, 0x40, 0xeb, 0x0c, 0x20, 0x0f, 0x52,
> +0x10, 0xd8, 0x00, 0x00, 0x36, 0x02, 0xa6, 0x00,
> +0x82, 0xea, 0x14, 0xe2, 0xe9, 0x63, 0x76, 0x33,
> +0x7a, 0x75, 0x7a, 0x75, 0xf7, 0xa2, 0xd5, 0x5c,
> +0x19, 0xf7, 0x4d, 0xff, 0xe9, 0x70, 0xf9, 0xa5,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0x0a, 0xf0, 0xff, 0x21, 0xff, 0xdc, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +};
> +define_sfdp_read(w25q01jvq);
> diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h
> index 62f140a2fc..8fb1cd3f8a 100644
> --- a/hw/block/m25p80_sfdp.h
> +++ b/hw/block/m25p80_sfdp.h
> @@ -24,4 +24,6 @@ extern uint8_t m25p80_sfdp_mx66l1g45g(uint32_t addr);
>  extern uint8_t m25p80_sfdp_w25q256(uint32_t addr);
>  extern uint8_t m25p80_sfdp_w25q512jv(uint32_t addr);
>  
> +extern uint8_t m25p80_sfdp_w25q01jvq(uint32_t addr);
(optional -extern)

Reviewed-by: Francisco Iglesias 

> +
>  #endif
> -- 
> 2.35.1
>

Re: [PATCH] vhost-vdpa: fix assert !virtio_net_get_subqueue(nc)->async_tx.elem in virtio_net_reset

2022-10-07 Thread Eugenio Perez Martin

On Tue, Oct 4, 2022 at 11:05 PM Si-Wei Liu  wrote:
>
> The citing commit has incorrect code in vhost_vdpa_receive() that returns
> zero instead of full packet size to the caller. This renders pending packets
> unable to be freed so then get clogged in the tx queue forever. When device
> is being reset later on, below assertion failure ensues:
>
> 0  0x7f86d53bb387 in raise () from /lib64/libc.so.6
> 1  0x7f86d53bca78 in abort () from /lib64/libc.so.6
> 2  0x7f86d53b41a6 in __assert_fail_base () from /lib64/libc.so.6
> 3  0x7f86d53b4252 in __assert_fail () from /lib64/libc.so.6
> 4  0x55b8f6ff6fcc in virtio_net_reset (vdev=) at 
> /usr/src/debug/qemu/hw/net/virtio-net.c:563
> 5  0x55b8f7012fcf in virtio_reset (opaque=0x55b8faf881f0) at 
> /usr/src/debug/qemu/hw/virtio/virtio.c:1993
> 6  0x55b8f71f0086 in virtio_bus_reset (bus=bus@entry=0x55b8faf88178) at 
> /usr/src/debug/qemu/hw/virtio/virtio-bus.c:102
> 7  0x55b8f71f1620 in virtio_pci_reset (qdev=) at 
> /usr/src/debug/qemu/hw/virtio/virtio-pci.c:1845
> 8  0x55b8f6fafc6c in memory_region_write_accessor (mr=, 
> addr=, value=,
>size=, shift=, mask=, 
> attrs=...) at /usr/src/debug/qemu/memory.c:483
> 9  0x55b8f6fadce9 in access_with_adjusted_size (addr=addr@entry=20, 
> value=value@entry=0x7f867e7fb7e8, size=size@entry=1,
>access_size_min=, access_size_max=, 
> access_fn=0x55b8f6fafc20 ,
>mr=0x55b8faf80a50, attrs=...) at /usr/src/debug/qemu/memory.c:544
> 10 0x55b8f6fb1d0b in memory_region_dispatch_write 
> (mr=mr@entry=0x55b8faf80a50, addr=addr@entry=20, data=0, op=,
>attrs=attrs@entry=...) at /usr/src/debug/qemu/memory.c:1470
> 11 0x55b8f6f62ada in flatview_write_continue (fv=fv@entry=0x7f86ac04cd20, 
> addr=addr@entry=549755813908, attrs=...,
>attrs@entry=..., buf=buf@entry=0x7f86d0223028  of bounds>, len=len@entry=1, addr1=20, l=1,
>mr=0x55b8faf80a50) at /usr/src/debug/qemu/exec.c:3266
> 12 0x55b8f6f62c8f in flatview_write (fv=0x7f86ac04cd20, 
> addr=549755813908, attrs=...,
>buf=0x7f86d0223028 , len=1) at 
> /usr/src/debug/qemu/exec.c:3306
> 13 0x55b8f6f674cb in address_space_write (as=, 
> addr=, attrs=..., buf=,
>len=) at /usr/src/debug/qemu/exec.c:3396
> 14 0x55b8f6f67575 in address_space_rw (as=, 
> addr=, attrs=..., attrs@entry=...,
>buf=buf@entry=0x7f86d0223028 , 
> len=, is_write=)
>at /usr/src/debug/qemu/exec.c:3406
> 15 0x55b8f6fc1cc8 in kvm_cpu_exec (cpu=cpu@entry=0x55b8f9aa0e10) at 
> /usr/src/debug/qemu/accel/kvm/kvm-all.c:2410
> 16 0x55b8f6fa5f5e in qemu_kvm_cpu_thread_fn (arg=0x55b8f9aa0e10) at 
> /usr/src/debug/qemu/cpus.c:1318
> 17 0x55b8f7336e16 in qemu_thread_start (args=0x55b8f9ac8480) at 
> /usr/src/debug/qemu/util/qemu-thread-posix.c:519
> 18 0x7f86d575aea5 in start_thread () from /lib64/libpthread.so.0
> 19 0x7f86d5483b2d in clone () from /lib64/libc.so.6
>
> Make vhost_vdpa_receive() return the size passed in as is, so that the
> caller qemu_deliver_packet_iov() would eventually propagate it back to
> virtio_net_flush_tx() to release pending packets from the async_tx queue.
> Which corresponds to the drop path where qemu_sendv_packet_async() returns
> non-zero in virtio_net_flush_tx().
>

Acked-by: Eugenio Pérez 


> Fixes: 846a1e85da64 ("vdpa: Add dummy receive callback")
> Cc: Eugenio Perez Martin 
> Signed-off-by: Si-Wei Liu 
> ---
>  net/vhost-vdpa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 4bc3fd0..182b3a1 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -211,7 +211,7 @@ static bool vhost_vdpa_check_peer_type(NetClientState 
> *nc, ObjectClass *oc,
>  static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>size_t size)
>  {
> -return 0;
> +return size;
>  }
>
>  static NetClientInfo net_vhost_vdpa_info = {
> --
> 1.8.3.1
>

Re: [RFC PATCH 0/4] Idea for using hardfloat in PPC


On 10/7/22 06:42, Alex Bennée wrote:

Is ppc unique in not persisting the inexact flag from previous
operations?


Better phrased as "having an additional per-operation flags for inexact and 'rounded'", 
because ppc also has the standard ieee sticky inexact flag.  But yes, as far as I know ppc 
is unique with this.



r~

Re: [PATCH v3 30/42] target/arm: Add ptw_idx argument to S1_ptw_translate


On 10/7/22 02:19, Peter Maydell wrote:

I don't think this works, because the s2_mmu_idx is not necessarily
the same through the whole of a page table walk. See the comment in
get_phys_addr_lpae():
 /*
  * Secure accesses start with the page table in secure memory and
  * can be downgraded to non-secure at any step. Non-secure accesses
  * remain non-secure. We implement this by just ORing in the NSTable/NS
  * bits at each step.
  */

Currently get_phys_addr_lpae() updates the nstable bit in tableattrs and
passes that to arm_ldq_ptw() for each level of the page tables, which in
turn causes S1_ptw_translate() to select ARMMMUIdx_Stage2_S or ARMMMUIdx_Stage2.


Ouch.  I had missed this subtlety.

We could play lsb games with the mmu_idx itself, knowing that we have either 
ARMMMUIdx_{Stage2,Phys}_S and generate ARMMMUIdx_{Stage2,Phys}.  I'll have another good 
long look at this.




  if (regime_translation_disabled(env, mmu_idx, is_secure)) {
-return get_phys_addr_disabled(env, address, access_type, mmu_idx,
-  is_secure, result, fi);
+goto do_disabled;
  }


I'd prefer to avoid this goto back up into the middle of an unrelated
switch statement.


Oops, I guess I missed this one when I went back through to eliminate the gotos.


r~

Re: [PATCH] vhost-vdpa: allow passing opened vhostfd to vhost-vdpa

2022-10-07 Thread Eugenio Perez Martin

On Tue, Oct 4, 2022 at 11:09 PM Si-Wei Liu  wrote:
>
> Similar to other vhost backends, vhostfd can be passed to vhost-vdpa
> backend as another parameter to instantiate vhost-vdpa net client.
> This would benefit the use case where only open fd's, as oppposed to

s/oppposed/opposed/ (realized by the mail client actually).

Also, not an English native, but is it correct to use "fd's" there?
Just "fds" or "file descriptors" sounds better to me, but I'm not sure
about it.

> raw vhost-vdpa device paths, are accessible from the QEMU process.
>
> (qemu) netdev_add type=vhost-vdpa,vhostfd=61,id=vhost-vdpa1
>
> Signed-off-by: Si-Wei Liu 

Apart from the typos,

Acked-by: Eugenio Pérez 

> ---
>  net/vhost-vdpa.c | 25 -
>  qapi/net.json|  3 +++
>  qemu-options.hx  |  6 --
>  3 files changed, 27 insertions(+), 7 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 182b3a1..366b070 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -683,14 +683,29 @@ int net_init_vhost_vdpa(const Netdev *netdev, const 
> char *name,
>
>  assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>  opts = >u.vhost_vdpa;
> -if (!opts->vhostdev) {
> -error_setg(errp, "vdpa character device not specified with 
> vhostdev");
> +if (!opts->has_vhostdev && !opts->has_vhostfd) {
> +error_setg(errp,
> +   "vhost-vdpa: neither vhostdev= nor vhostfd= was 
> specified");
>  return -1;
>  }
>
> -vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
> -if (vdpa_device_fd == -1) {
> -return -errno;
> +if (opts->has_vhostdev && opts->has_vhostfd) {
> +error_setg(errp,
> +   "vhost-vdpa: vhostdev= and vhostfd= are mutually 
> exclusive");
> +return -1;
> +}
> +
> +if (opts->has_vhostdev) {
> +vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
> +if (vdpa_device_fd == -1) {
> +return -errno;
> +}
> +} else if (opts->has_vhostfd) {
> +vdpa_device_fd = monitor_fd_param(monitor_cur(), opts->vhostfd, 
> errp);
> +if (vdpa_device_fd == -1) {
> +error_prepend(errp, "vhost-vdpa: unable to parse vhostfd: ");
> +return -1;
> +}
>  }
>
>  r = vhost_vdpa_get_features(vdpa_device_fd, , errp);
> diff --git a/qapi/net.json b/qapi/net.json
> index dd088c0..926ecc8 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -442,6 +442,8 @@
>  # @vhostdev: path of vhost-vdpa device
>  #(default:'/dev/vhost-vdpa-0')
>  #
> +# @vhostfd: file descriptor of an already opened vhost vdpa device
> +#
>  # @queues: number of queues to be created for multiqueue vhost-vdpa
>  #  (default: 1)
>  #
> @@ -456,6 +458,7 @@
>  { 'struct': 'NetdevVhostVDPAOptions',
>'data': {
>  '*vhostdev': 'str',
> +'*vhostfd':  'str',
>  '*queues':   'int',
>  '*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } }
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 913c71e..c040f74 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2774,8 +2774,10 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
>  "configure a vhost-user network, backed by a chardev 
> 'dev'\n"
>  #endif
>  #ifdef __linux__
> -"-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n"
> +"-netdev vhost-vdpa,id=str[,vhostdev=/path/to/dev][,vhostfd=h]\n"
>  "configure a vhost-vdpa network,Establish a vhost-vdpa 
> netdev\n"
> +"use 'vhostdev=/path/to/dev' to open a vhost vdpa 
> device\n"
> +"use 'vhostfd=h' to connect to an already opened vhost 
> vdpa device\n"
>  #endif
>  #ifdef CONFIG_VMNET
>  "-netdev vmnet-host,id=str[,isolated=on|off][,net-uuid=uuid]\n"
> @@ -3280,7 +3282,7 @@ SRST
>   -netdev type=vhost-user,id=net0,chardev=chr0 \
>   -device virtio-net-pci,netdev=net0
>
> -``-netdev vhost-vdpa,vhostdev=/path/to/dev``
> +``-netdev vhost-vdpa[,vhostdev=/path/to/dev][,vhostfd=h]``
>  Establish a vhost-vdpa netdev.
>
>  vDPA device is a device that uses a datapath which complies with
> --
> 1.8.3.1
>
>

[PATCH v2] linux-user: mprotect() should returns 0 when len is 0.

2022-10-07 Thread Soichiro Isshiki

From: sisshiki1969 

On Fri, Oct 7, 2022 at 9:38 AM Richard Henderson  
wrote:
| Although, sorta, this smells like a kernel bug.
| Why should mprotect(-4096, 0, 0) succeed while mprotect(-4096, 4096, 0) fails?

This may be kinda bug compatibility...

| But anyway, if we're going to fix len == 0 to match, we might as well fix all 
3 test
| ordering bugs at the same time.

Yes, I agree, and made another patch.
A validation for wrap-around was added, I think it is neccesory.

A tiny test code was shown below.

```sh
> cat test.c
#include 
#include 
#include 

int main(int argc, char *argv[])
{
  char *addr;
  int prot = PROT_READ | PROT_EXEC;
  int map = MAP_SHARED | MAP_ANONYMOUS;
  addr = mmap(NULL, 4096, prot, map, -1, 0);
  if (addr == 0) {
perror("mmap");
exit(EXIT_FAILURE);
  }
  
  void *addrs[] = { (void *)77, NULL, addr };
  for (int i = 0; i < 3; i++) {
if (mprotect(addrs[i], 0, PROT_READ) == -1) {
  perror("mprotect");
} else {
  printf("OK\n");
}
  }

  // invalid prot
  if (mprotect(addr, 2048, PROT_READ | 0x20) == -1) {
perror("mprotect");
  } else {
printf("OK\n");
  }
}
> cc test.c -o test
> ./test
mprotect: Invalid argument
OK
OK
mprotect: Invalid argument
> qemu-x86_64 test  # current master
mprotect: Invalid argument
OK
mprotect: Cannot allocate memory
mprotect: Invalid argument
> build/qemu-x86_64 test# after the patch applied
mprotect: Invalid argument
OK
OK
mprotect: Invalid argument
```

seems good.

Soichiro Isshiki

Signed-off-by: sisshiki1969 
---
 linux-user/mmap.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 28f3bc85ed..757709eeba 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -124,17 +124,20 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 if ((start & ~TARGET_PAGE_MASK) != 0) {
 return -TARGET_EINVAL;
 }
-page_flags = validate_prot_to_pageflags(_prot, target_prot);
-if (!page_flags) {
-return -TARGET_EINVAL;
+if (len == 0) {
+return 0;
 }
 len = TARGET_PAGE_ALIGN(len);
 end = start + len;
+if (end <= start) {
+return -TARGET_ENOMEM;
+}
 if (!guest_range_valid_untagged(start, len)) {
 return -TARGET_ENOMEM;
 }
-if (len == 0) {
-return 0;
+page_flags = validate_prot_to_pageflags(_prot, target_prot);
+if (!page_flags) {
+return -TARGET_EINVAL;
 }
 
 mmap_lock();
-- 
2.25.1

Re: [PATCH v6 02/13] blkio: add libblkio block driver

2022-10-07 Thread Markus Armbruster

Stefan Hajnoczi  writes:

> libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
> high-performance disk I/O. It currently supports io_uring,
> virtio-blk-vhost-user, and virtio-blk-vhost-vdpa with additional drivers
> under development.
>
> One of the reasons for developing libblkio is that other applications
> besides QEMU can use it. This will be particularly useful for
> virtio-blk-vhost-user which applications may wish to use for connecting
> to qemu-storage-daemon.
>
> libblkio also gives us an opportunity to develop in Rust behind a C API
> that is easy to consume from QEMU.
>
> This commit adds io_uring, virtio-blk-vhost-user, and
> virtio-blk-vhost-vdpa BlockDrivers to QEMU using libblkio. It will be
> easy to add other libblkio drivers since they will share the majority of
> code.
>
> For now I/O buffers are copied through bounce buffers if the libblkio
> driver requires it. Later commits add an optimization for
> pre-registering guest RAM to avoid bounce buffers.
>
> The syntax is:
>
>   --blockdev 
> io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off
>
> and:
>
>   --blockdev 
> virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off

The patch also adds nvme-io_uring.  Shouldn't the commit message mention
it?

>
> Signed-off-by: Stefan Hajnoczi 
> Acked-by: Markus Armbruster 
> ---
>  MAINTAINERS   |   6 +
>  meson_options.txt |   2 +
>  qapi/block-core.json  |  75 ++-
>  meson.build   |   9 +
>  block/blkio.c | 830 ++
>  tests/qtest/modules-test.c|   3 +
>  block/meson.build |   1 +
>  scripts/meson-buildoptions.sh |   3 +
>  8 files changed, 925 insertions(+), 4 deletions(-)
>  create mode 100644 block/blkio.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e1530b51a2..0dcae6168a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3403,6 +3403,12 @@ L: qemu-bl...@nongnu.org
>  S: Maintained
>  F: block/vdi.c
>  
> +blkio
> +M: Stefan Hajnoczi 
> +L: qemu-bl...@nongnu.org
> +S: Maintained
> +F: block/blkio.c
> +
>  iSCSI
>  M: Ronnie Sahlberg 
>  M: Paolo Bonzini 
> diff --git a/meson_options.txt b/meson_options.txt
> index 79c6af18d5..66128178bf 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -117,6 +117,8 @@ option('bzip2', type : 'feature', value : 'auto',
> description: 'bzip2 support for DMG images')
>  option('cap_ng', type : 'feature', value : 'auto',
> description: 'cap_ng support')
> +option('blkio', type : 'feature', value : 'auto',
> +   description: 'libblkio block device driver')
>  option('bpf', type : 'feature', value : 'auto',
>  description: 'eBPF support')
>  option('cocoa', type : 'feature', value : 'auto',
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index f21fa235f2..6c6ae2885c 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2951,11 +2951,18 @@
>  'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
>  {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
>  {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
> -'http', 'https', 'iscsi',
> -'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
> -'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
> +'http', 'https',
> +{ 'name': 'io_uring', 'if': 'CONFIG_BLKIO' },
> +'iscsi',
> +'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme',
> +{ 'name': 'nvme-io_uring', 'if': 'CONFIG_BLKIO' },

This enumeration value and ...

> +'parallels', 'preallocate', 'qcow', 'qcow2', 'qed', 'quorum',
> +'raw', 'rbd',
>  { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> -'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> +'ssh', 'throttle', 'vdi', 'vhdx',
> +{ 'name': 'virtio-blk-vhost-user', 'if': 'CONFIG_BLKIO' },
> +{ 'name': 'virtio-blk-vhost-vdpa', 'if': 'CONFIG_BLKIO' },
> +'vmdk', 'vpc', 'vvfat' ] }
>  
>  ##
>  # @BlockdevOptionsFile:
> @@ -3678,6 +3685,58 @@
>  '*debug': 'int',
>  '*logfile': 'str' } }
>  
> +##
> +# @BlockdevOptionsIoUring:
> +#
> +# Driver specific block device options for the io_uring backend.
> +#
> +# @filename: path to the image file
> +#
> +# Since: 7.2
> +##
> +{ 'struct': 'BlockdevOptionsIoUring',
> +  'data': { 'filename': 'str' },
> +  'if': 'CONFIG_BLKIO' }
> +
> +##
> +# @BlockdevOptionsNvmeIoUring:
> +#
> +# Driver specific block device options for the nvme-io_uring backend.
> +#
> +# @filename: path to the image file
> +#
> +# Since: 7.2
> +##
> +{ 'struct': 'BlockdevOptionsNvmeIoUring',
> +  'data': { 'filename': 'str' },
> +  'if': 'CONFIG_BLKIO' }

... this type aren't used in this patch.  Did you ...

> +
> +##
> +#

Re: [External] : Re: [PATCH v2 0/3] fix for two ACPI GTDT physical addresses




> On 7 Oct 2022, at 15:21, Ani Sinha  wrote:
> 
> On Fri, Oct 7, 2022 at 8:16 PM Miguel Luis  wrote:
>> 
>> The ACPI GTDT table contains two invalid 64-bit physical addresses according 
>> to
>> the ACPI spec. 6.5 [1]. Those are the Counter Control Base physical address 
>> and
>> the Counter Read Base physical address. Those fields of the GTDT table 
>> should be
>> set to 0x if not provided, rather than 0x0.
>> 
>> [1]: 
>> https://urldefense.com/v3/__https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html*gtdt-table-structure__;Iw!!ACWV5N9M2RV99hQ!I-YqmAwYNhk19YHxcbjQBMwEE9a8rZOvufvOOonAPEtgTynOYOf5AyYKLTTGJ2RRzsjvkjIuleSubpg$
>>   
>> 
>> Changelog:
>> 
>> v2:
>>Updated with collected tags from v1.
> 
> For future reference, there is no need to send out a new version with
> just the tags added. The tooling make sure that the tags are collected
> correctly from the last version.
> 

Great! Thanks for the tip which is very helpful.

Miguel

>> 
>> v1: 
>> https://urldefense.com/v3/__https://lists.nongnu.org/archive/html/qemu-devel/2022-09/msg02847.html__;!!ACWV5N9M2RV99hQ!I-YqmAwYNhk19YHxcbjQBMwEE9a8rZOvufvOOonAPEtgTynOYOf5AyYKLTTGJ2RRzsjvkjIulSis4m4$
>>   
>> 
>> Miguel Luis (3):
>>  tests/acpi: virt: allow acpi GTDT changes
>>  acpi: arm/virt: build_gtdt: fix invalid 64-bit physical addresses
>>  tests/acpi: virt: update ACPI GTDT binaries
>> 
>> hw/arm/virt-acpi-build.c  |   5 ++---
>> tests/data/acpi/virt/GTDT | Bin 96 -> 96 bytes
>> tests/data/acpi/virt/GTDT.memhp   | Bin 96 -> 96 bytes
>> tests/data/acpi/virt/GTDT.numamem | Bin 96 -> 96 bytes
>> 4 files changed, 2 insertions(+), 3 deletions(-)
>> 
>> --
>> 2.37.3
>>

Re: [PATCH v3 6/8] m25p80: Add the w25q256 SFPD table

On [2022 Jul 22] Fri 08:36:00, Cédric Le Goater wrote:
> The SFDP table size is 0x100 bytes long. Only the mandatory table for
> basic features is available at byte 0x80.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/block/m25p80_sfdp.h |  2 ++
>  hw/block/m25p80.c  |  3 ++-
>  hw/block/m25p80_sfdp.c | 40 
>  3 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h
> index 468e3434151b..f60429ab8542 100644
> --- a/hw/block/m25p80_sfdp.h
> +++ b/hw/block/m25p80_sfdp.h
> @@ -21,4 +21,6 @@ extern uint8_t m25p80_sfdp_mx25l25635e(uint32_t addr);
>  extern uint8_t m25p80_sfdp_mx25l25635f(uint32_t addr);
>  extern uint8_t m25p80_sfdp_mx66l1g45g(uint32_t addr);
>  
> +extern uint8_t m25p80_sfdp_w25q256(uint32_t addr);
(optional -extern)

Reviewed-by: Francisco Iglesias 

> +
>  #endif
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index 52df24d24751..220dbc8fb327 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -345,7 +345,8 @@ static const FlashPartInfo known_devices[] = {
>  { INFO("w25q64",  0xef4017,  0,  64 << 10, 128, ER_4K) },
>  { INFO("w25q80",  0xef5014,  0,  64 << 10,  16, ER_4K) },
>  { INFO("w25q80bl",0xef4014,  0,  64 << 10,  16, ER_4K) },
> -{ INFO("w25q256", 0xef4019,  0,  64 << 10, 512, ER_4K) },
> +{ INFO("w25q256", 0xef4019,  0,  64 << 10, 512, ER_4K),
> +  .sfdp_read = m25p80_sfdp_w25q256 },
>  { INFO("w25q512jv",   0xef4020,  0,  64 << 10, 1024, ER_4K) },
>  { INFO("w25q01jvq",   0xef4021,  0,  64 << 10, 2048, ER_4K) },
>  };
> diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c
> index 38c3ced34d2e..5b011559d43d 100644
> --- a/hw/block/m25p80_sfdp.c
> +++ b/hw/block/m25p80_sfdp.c
> @@ -218,3 +218,43 @@ static const uint8_t sfdp_mx66l1g45g[] = {
>  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>  };
>  define_sfdp_read(mx66l1g45g);
> +
> +/*
> + * Windbond
> + */
> +
> +static const uint8_t sfdp_w25q256[] = {
> +0x53, 0x46, 0x44, 0x50, 0x00, 0x01, 0x00, 0xff,
> +0x00, 0x00, 0x01, 0x09, 0x80, 0x00, 0x00, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xe5, 0x20, 0xf3, 0xff, 0xff, 0xff, 0xff, 0x0f,
> +0x44, 0xeb, 0x08, 0x6b, 0x08, 0x3b, 0x42, 0xbb,
> +0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
> +0xff, 0xff, 0x21, 0xeb, 0x0c, 0x20, 0x0f, 0x52,
> +0x10, 0xd8, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +};
> +define_sfdp_read(w25q256);
> -- 
> 2.35.3
>

[PATCH] target/arm: Make the final stage1+2 write to secure be unconditional

While the stage2 call to get_phys_addr_lpae should never set
attrs.secure when given a non-secure input, it's just as easy
to make the final update to attrs.secure be unconditional and
false in the case of non-secure input.

Suggested-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---

Hi Peter,

This is the promised patch 1.5 for v3 FEAT_HAFDBS.  It generates minor
conflicts down the line, which I have already fixed up locally.  I think
the first one you would encounter is beyond the proposed 20 that you
indicated that you intend to take into target-arm.next right now.


r~

---
 target/arm/ptw.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index b8c494ad9f..7d763a5847 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2365,17 +2365,16 @@ bool get_phys_addr(CPUARMState *env, target_ulong 
address,
 result->cacheattrs = combine_cacheattrs(env, cacheattrs1,
 result->cacheattrs);
 
-/* Check if IPA translates to secure or non-secure PA space. */
-if (is_secure) {
-if (ipa_secure) {
-result->attrs.secure =
-!(env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW));
-} else {
-result->attrs.secure =
-!((env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW))
-|| (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW)));
-}
-}
+/*
+ * Check if IPA translates to secure or non-secure PA space.
+ * Note that VSTCR overrides VTCR and {N}SW overrides {N}SA.
+ */
+result->attrs.secure =
+(is_secure
+ && !(env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW))
+ && (ipa_secure
+ || !(env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW;
+
 return 0;
 } else {
 /*
-- 
2.34.1

[PATCH v7 5/5] hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE

This Data Object Exchange Mailbox allows software to query the
latency and bandwidth between ports on the switch. For now
only provide information on routes between the upstream port and
each downstream port (not p2p).

Signed-off-by: Jonathan Cameron 
---
 hw/pci-bridge/cxl_upstream.c | 182 ++-
 include/hw/cxl/cxl_cdat.h|   1 +
 2 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
index a83a3e81e4..9209c704ae 100644
--- a/hw/pci-bridge/cxl_upstream.c
+++ b/hw/pci-bridge/cxl_upstream.c
@@ -10,11 +10,12 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "hw/qdev-properties.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/pcie.h"
 #include "hw/pci/pcie_port.h"
 
-#define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 1
+#define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 2
 
 #define CXL_UPSTREAM_PORT_MSI_OFFSET 0x70
 #define CXL_UPSTREAM_PORT_PCIE_CAP_OFFSET 0x90
@@ -28,6 +29,7 @@ typedef struct CXLUpstreamPort {
 
 /*< public >*/
 CXLComponentState cxl_cstate;
+DOECap doe_cdat;
 } CXLUpstreamPort;
 
 CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp)
@@ -60,6 +62,9 @@ static void cxl_usp_dvsec_write_config(PCIDevice *dev, 
uint32_t addr,
 static void cxl_usp_write_config(PCIDevice *d, uint32_t address,
  uint32_t val, int len)
 {
+CXLUpstreamPort *usp = CXL_USP(d);
+
+pcie_doe_write_config(>doe_cdat, address, val, len);
 pci_bridge_write_config(d, address, val, len);
 pcie_cap_flr_write_config(d, address, val, len);
 pcie_aer_write_config(d, address, val, len);
@@ -67,6 +72,18 @@ static void cxl_usp_write_config(PCIDevice *d, uint32_t 
address,
 cxl_usp_dvsec_write_config(d, address, val, len);
 }
 
+static uint32_t cxl_usp_read_config(PCIDevice *d, uint32_t address, int len)
+{
+CXLUpstreamPort *usp = CXL_USP(d);
+uint32_t val;
+
+if (pcie_doe_read_config(>doe_cdat, address, len, )) {
+return val;
+}
+
+return pci_default_read_config(d, address, len);
+}
+
 static void latch_registers(CXLUpstreamPort *usp)
 {
 uint32_t *reg_state = usp->cxl_cstate.crb.cache_mem_registers;
@@ -119,6 +136,155 @@ static void build_dvsecs(CXLComponentState *cxl)
REG_LOC_DVSEC_REVID, dvsec);
 }
 
+static bool cxl_doe_cdat_rsp(DOECap *doe_cap)
+{
+CDATObject *cdat = _USP(doe_cap->pdev)->cxl_cstate.cdat;
+uint16_t ent;
+void *base;
+uint32_t len;
+CDATReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+CDATRsp rsp;
+
+cxl_doe_cdat_update(_USP(doe_cap->pdev)->cxl_cstate, _fatal);
+assert(cdat->entry_len);
+
+/* Discard if request length mismatched */
+if (pcie_doe_get_obj_len(req) <
+DIV_ROUND_UP(sizeof(CDATReq), sizeof(uint32_t))) {
+return false;
+}
+
+ent = req->entry_handle;
+base = cdat->entry[ent].base;
+len = cdat->entry[ent].length;
+
+rsp = (CDATRsp) {
+.header = {
+.vendor_id = CXL_VENDOR_ID,
+.data_obj_type = CXL_DOE_TABLE_ACCESS,
+.reserved = 0x0,
+.length = DIV_ROUND_UP((sizeof(rsp) + len), sizeof(uint32_t)),
+},
+.rsp_code = CXL_DOE_TAB_RSP,
+.table_type = CXL_DOE_TAB_TYPE_CDAT,
+.entry_handle = (ent < cdat->entry_len - 1) ?
+ent + 1 : CXL_DOE_TAB_ENT_MAX,
+};
+
+memcpy(doe_cap->read_mbox, , sizeof(rsp));
+memcpy(doe_cap->read_mbox + DIV_ROUND_UP(sizeof(rsp), 
sizeof(uint32_t)),
+   base, len);
+
+doe_cap->read_mbox_len += rsp.header.length;
+
+return true;
+}
+
+static DOEProtocol doe_cdat_prot[] = {
+{ CXL_VENDOR_ID, CXL_DOE_TABLE_ACCESS, cxl_doe_cdat_rsp },
+{ }
+};
+
+static int build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
+{
+g_autofree CDATSslbis *sslbis_latency = NULL;
+g_autofree CDATSslbis *sslbis_bandwidth = NULL;
+CXLUpstreamPort *us = CXL_USP(priv);
+PCIBus *bus = _BRIDGE(us)->sec_bus;
+int devfn, sslbis_size;
+int len = 0;
+int i = 0;
+int count = 0;
+uint16_t port_ids[256];
+
+for (devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
+PCIDevice *d = bus->devices[devfn];
+PCIEPort *port;
+
+if (!d || !pci_is_express(d) || !d->exp.exp_cap) {
+continue;
+}
+
+/*
+ * Whilst the PCI express spec doesn't allow anything other than
+ * downstream ports on this bus, let us be a little paranoid
+ */
+if (!object_dynamic_cast(OBJECT(d), TYPE_PCIE_PORT)) {
+continue;
+}
+
+port = PCIE_PORT(d);
+port_ids[count] = port->port;
+count++;
+}
+
+/* May not yet have any ports - try again later */
+if (count == 0) {
+return 0;
+}
+
+sslbis_size = sizeof(CDATSslbis) + sizeof(*sslbis_latency->sslbe) * count;
+sslbis_latency = g_malloc(sslbis_size);
+

Re: [PATCH v3] win32: set threads name


On 10/7/22 02:52, Marc-André Lureau wrote:

Hi

On Fri, Oct 7, 2022 at 1:04 AM Richard Henderson > wrote:


On 10/6/22 05:51, Marc-André Lureau wrote:
 > Hi Richard
 >
 > On Mon, Oct 3, 2022 at 11:39 AM Marc-André Lureau 
mailto:marcandre.lur...@redhat.com>
 > >> wrote:
 >
 >     Hi
 >
 >     On Fri, Sep 30, 2022 at 6:10 PM Richard Henderson
 >     mailto:richard.hender...@linaro.org>
>> wrote:
 >      >
 >      > On 9/30/22 07:03, marcandre.lur...@redhat.com
 >
 >     wrote:
 >      > > +static bool
 >      > > +set_thread_description(HANDLE h, const char *name)
 >      > > +{
 >      > > +  HRESULT hr;
 >      > > +  g_autofree wchar_t *namew = NULL;
 >      > > +
 >      > > +  if (!load_set_thread_description()) {
 >      > > +      return false;
 >      > > +  }
 >      >
 >      > I don't understand why you're retaining this.
 >      > What is your logic?
 >      >
 >
 >     Also, if we change the "static bool name_threads" to be true by
 >     default, then set_thread_description() might be called without 
calling
 >     qemu_thread_naming()
 >
 >     It really shouldn't hurt to keep it that way.
 >
 >
 >
 > Let me know if the current form is ok for you, thanks

I guess it's ok, sure.


Should I take that for an Ack?  :)


Yes, that was sloppy.
Acked-by: Richard Henderson 


r~

[PATCH v7 2/5] hw/mem/cxl-type3: Add MSIX support

This will be used by several upcoming patch sets so break it out
such that it doesn't matter which one lands first.

Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index a71bf1afeb..568c9d62f5 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -13,6 +13,7 @@
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
 #include "hw/cxl/cxl.h"
+#include "hw/pci/msix.h"
 
 /*
  * Null value of all Fs suggested by IEEE RA guidelines for use of
@@ -146,6 +147,8 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 ComponentRegisters *regs = _cstate->crb;
 MemoryRegion *mr = >component_registers;
 uint8_t *pci_conf = pci_dev->config;
+unsigned short msix_num = 1;
+int i;
 
 if (!cxl_setup_memory(ct3d, errp)) {
 return;
@@ -180,6 +183,12 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
  PCI_BASE_ADDRESS_SPACE_MEMORY |
  PCI_BASE_ADDRESS_MEM_TYPE_64,
  >cxl_dstate.device_registers);
+
+/* MSI(-X) Initailization */
+msix_init_exclusive_bar(pci_dev, msix_num, 4, NULL);
+for (i = 0; i < msix_num; i++) {
+msix_vector_use(pci_dev, i);
+}
 }
 
 static void ct3_exit(PCIDevice *pci_dev)
-- 
2.37.2

Re: [PATCH v3 7/8] m25p80: Add the w25q512jv SFPD table

On [2022 Jul 22] Fri 08:36:01, Cédric Le Goater wrote:
> The SFDP table size is 0x100 bytes long. The mandatory table for basic
> features is available at byte 0x80 and two extra Winbond specifics
> table are available at 0xC0 and 0xF0.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/block/m25p80_sfdp.h |  1 +
>  hw/block/m25p80.c  |  3 ++-
>  hw/block/m25p80_sfdp.c | 36 
>  3 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h
> index f60429ab8542..62f140a2fcef 100644
> --- a/hw/block/m25p80_sfdp.h
> +++ b/hw/block/m25p80_sfdp.h
> @@ -22,5 +22,6 @@ extern uint8_t m25p80_sfdp_mx25l25635f(uint32_t addr);
>  extern uint8_t m25p80_sfdp_mx66l1g45g(uint32_t addr);
>  
>  extern uint8_t m25p80_sfdp_w25q256(uint32_t addr);
> +extern uint8_t m25p80_sfdp_w25q512jv(uint32_t addr);
(optional -extern)

Reviewed-by: Francisco Iglesias 

>  
>  #endif
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index 220dbc8fb327..8ba9d732a323 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -347,7 +347,8 @@ static const FlashPartInfo known_devices[] = {
>  { INFO("w25q80bl",0xef4014,  0,  64 << 10,  16, ER_4K) },
>  { INFO("w25q256", 0xef4019,  0,  64 << 10, 512, ER_4K),
>.sfdp_read = m25p80_sfdp_w25q256 },
> -{ INFO("w25q512jv",   0xef4020,  0,  64 << 10, 1024, ER_4K) },
> +{ INFO("w25q512jv",   0xef4020,  0,  64 << 10, 1024, ER_4K),
> +  .sfdp_read = m25p80_sfdp_w25q512jv },
>  { INFO("w25q01jvq",   0xef4021,  0,  64 << 10, 2048, ER_4K) },
>  };
>  
> diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c
> index 5b011559d43d..dad3d7e64f9f 100644
> --- a/hw/block/m25p80_sfdp.c
> +++ b/hw/block/m25p80_sfdp.c
> @@ -258,3 +258,39 @@ static const uint8_t sfdp_w25q256[] = {
>  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>  };
>  define_sfdp_read(w25q256);
> +
> +static const uint8_t sfdp_w25q512jv[] = {
> +0x53, 0x46, 0x44, 0x50, 0x06, 0x01, 0x01, 0xff,
> +0x00, 0x06, 0x01, 0x10, 0x80, 0x00, 0x00, 0xff,
> +0x84, 0x00, 0x01, 0x02, 0xd0, 0x00, 0x00, 0xff,
> +0x03, 0x00, 0x01, 0x02, 0xf0, 0x00, 0x00, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xe5, 0x20, 0xfb, 0xff, 0xff, 0xff, 0xff, 0x1f,
> +0x44, 0xeb, 0x08, 0x6b, 0x08, 0x3b, 0x42, 0xbb,
> +0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00,
> +0xff, 0xff, 0x40, 0xeb, 0x0c, 0x20, 0x0f, 0x52,
> +0x10, 0xd8, 0x00, 0x00, 0x36, 0x02, 0xa6, 0x00,
> +0x82, 0xea, 0x14, 0xe2, 0xe9, 0x63, 0x76, 0x33,
> +0x7a, 0x75, 0x7a, 0x75, 0xf7, 0xa2, 0xd5, 0x5c,
> +0x19, 0xf7, 0x4d, 0xff, 0xe9, 0x70, 0xf9, 0xa5,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0x0a, 0xf0, 0xff, 0x21, 0xff, 0xdc, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +};
> +define_sfdp_read(w25q512jv);
> -- 
> 2.35.3
>

Re: [PATCH 0/4] Add a new backend for cryptodev

On Mon, Sep 19, 2022 at 11:53:16AM +0800, Lei He wrote:
> This patch adds a new backend called LKCF to cryptodev, LKCF stands
> for Linux Kernel Cryptography Framework. If a cryptographic
> accelerator that supports LKCF is installed on the the host (you can
> see which algorithms are supported in host's LKCF by executing
> 'cat /proc/crypto'), then RSA operations can be offloaded.
> More background info can refer to: https://lwn.net/Articles/895399/,
> 'keyctl[5]' in the picture.
> 
> This patch:
> 1. Modified some interfaces of cryptodev and cryptodev-backend to
> support asynchronous requests.
> 2. Extended the DER encoder in crypto, so that we can export the
> RSA private key into PKCS#8 format and upload it to host kernel.
> 3. Added a new backend for cryptodev.
> 
> I tested the backend with a QAT card, the qps of RSA-2048-decryption
> is about 25k/s, and the main-loop becomes the bottleneck. The qps
> using OpenSSL directly is about 6k/s (with 6 vCPUs). We will support 
> IO-thread for cryptodev in another series later.
> 
> Lei He (4):
>   virtio-crypto: Support asynchronous mode
>   crypto: Support DER encodings
>   crypto: Support export akcipher to pkcs8
>   cryptodev: Add a lkcf-backend for cryptodev

Seems to fail build for me - probably a conflict applying.
Coul you pls rebase and repost? Sorry about the noise.

>  backends/cryptodev-builtin.c|  69 +++--
>  backends/cryptodev-lkcf.c   | 620 
> 
>  backends/cryptodev-vhost-user.c |  51 +++-
>  backends/cryptodev.c|  44 +--
>  backends/meson.build|   3 +
>  crypto/akcipher.c   |  17 ++
>  crypto/der.c| 307 ++--
>  crypto/der.h| 211 +-
>  crypto/rsakey.c |  42 +++
>  crypto/rsakey.h |  11 +-
>  hw/virtio/virtio-crypto.c   | 324 -
>  include/crypto/akcipher.h   |  21 ++
>  include/sysemu/cryptodev.h  |  61 ++--
>  qapi/qom.json   |   2 +
>  tests/unit/test-crypto-der.c| 126 ++--
>  15 files changed, 1649 insertions(+), 260 deletions(-)
>  create mode 100644 backends/cryptodev-lkcf.c
> 
> --
> 2.11.0

[PATCH v7 1/5] hw/pci: PCIe Data Object Exchange emulation

From: Huai-Cheng Kuo 

Emulation of PCIe Data Object Exchange (DOE)
PCIE Base Specification r6.0 6.3 Data Object Exchange

Supports multiple DOE PCIe Extended Capabilities for a single PCIe
device. For each capability, a static array of DOEProtocol should be passed
to pcie_doe_init(). The protocols in that array will be registered under
the DOE capability structure. For each protocol, vendor ID, type, and
corresponding callback function (handle_request()) should be implemented.
This callback function represents how the DOE request for corresponding
protocol will be handled.

pcie_doe_{read/write}_config() must be appended to corresponding PCI
device's config_read/write() handler to enable DOE access. In
pcie_doe_read_config(), false will be returned if pci_config_read()
offset is not within DOE capability range. In pcie_doe_write_config(),
the function will have no affect if the address is not within the related
DOE PCIE extended capability.

Signed-off-by: Huai-Cheng Kuo 
Signed-off-by: Chris Browy 
Signed-off-by: Jonathan Cameron 
---
 MAINTAINERS|   7 +
 hw/pci/meson.build |   1 +
 hw/pci/pcie_doe.c  | 367 +
 include/hw/pci/pci_ids.h   |   3 +
 include/hw/pci/pcie.h  |   1 +
 include/hw/pci/pcie_doe.h  | 123 +
 include/hw/pci/pcie_regs.h |   4 +
 7 files changed, 506 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e1530b51a2..9c8d9280a0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1832,6 +1832,13 @@ F: qapi/pci.json
 F: docs/pci*
 F: docs/specs/*pci*
 
+PCIE DOE
+M: Huai-Cheng Kuo 
+M: Chris Browy 
+S: Supported
+F: include/hw/pci/pcie_doe.h
+F: hw/pci/pcie_doe.c
+
 ACPI/SMBIOS
 M: Michael S. Tsirkin 
 M: Igor Mammedov 
diff --git a/hw/pci/meson.build b/hw/pci/meson.build
index bcc9c75919..5aff7ed1c6 100644
--- a/hw/pci/meson.build
+++ b/hw/pci/meson.build
@@ -13,6 +13,7 @@ pci_ss.add(files(
 # allow plugging PCIe devices into PCI buses, include them even if
 # CONFIG_PCI_EXPRESS=n.
 pci_ss.add(files('pcie.c', 'pcie_aer.c'))
+pci_ss.add(files('pcie_doe.c'))
 softmmu_ss.add(when: 'CONFIG_PCI_EXPRESS', if_true: files('pcie_port.c', 
'pcie_host.c'))
 softmmu_ss.add_all(when: 'CONFIG_PCI', if_true: pci_ss)
 
diff --git a/hw/pci/pcie_doe.c b/hw/pci/pcie_doe.c
new file mode 100644
index 00..2210f86968
--- /dev/null
+++ b/hw/pci/pcie_doe.c
@@ -0,0 +1,367 @@
+/*
+ * PCIe Data Object Exchange
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qemu/range.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pcie_doe.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+
+#define DWORD_BYTE 4
+
+typedef struct DoeDiscoveryReq {
+DOEHeader header;
+uint8_t index;
+uint8_t reserved[3];
+} QEMU_PACKED DoeDiscoveryReq;
+
+typedef struct DoeDiscoveryRsp {
+DOEHeader header;
+uint16_t vendor_id;
+uint8_t data_obj_type;
+uint8_t next_index;
+} QEMU_PACKED DoeDiscoveryRsp;
+
+static bool pcie_doe_discovery(DOECap *doe_cap)
+{
+DoeDiscoveryReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+DoeDiscoveryRsp rsp;
+uint8_t index = req->index;
+DOEProtocol *prot;
+
+/* Discard request if length does not match DoeDiscoveryReq */
+if (pcie_doe_get_obj_len(req) <
+DIV_ROUND_UP(sizeof(DoeDiscoveryReq), DWORD_BYTE)) {
+return false;
+}
+
+rsp.header = (DOEHeader) {
+.vendor_id = PCI_VENDOR_ID_PCI_SIG,
+.data_obj_type = PCI_SIG_DOE_DISCOVERY,
+.length = DIV_ROUND_UP(sizeof(DoeDiscoveryRsp), DWORD_BYTE),
+};
+
+/* Point to the requested protocol, index 0 must be Discovery */
+if (index == 0) {
+rsp.vendor_id = PCI_VENDOR_ID_PCI_SIG;
+rsp.data_obj_type = PCI_SIG_DOE_DISCOVERY;
+} else {
+if (index < doe_cap->protocol_num) {
+prot = _cap->protocols[index - 1];
+rsp.vendor_id = prot->vendor_id;
+rsp.data_obj_type = prot->data_obj_type;
+} else {
+rsp.vendor_id = 0x;
+rsp.data_obj_type = 0xFF;
+}
+}
+
+if (index + 1 == doe_cap->protocol_num) {
+rsp.next_index = 0;
+} else {
+rsp.next_index = index + 1;
+}
+
+pcie_doe_set_rsp(doe_cap, );
+
+return true;
+}
+
+static void pcie_doe_reset_mbox(DOECap *st)
+{
+st->read_mbox_idx = 0;
+st->read_mbox_len = 0;
+st->write_mbox_len = 0;
+
+memset(st->read_mbox, 0, PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+memset(st->write_mbox, 0, PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+}
+
+void pcie_doe_init(PCIDevice *dev, DOECap *doe_cap, uint16_t offset,
+   DOEProtocol *protocols, bool intr, uint16_t vec)
+{
+pcie_add_capability(dev,

Re: [PATCH 1/2] vhost-user: Refactor vhost acked features saving

2022-10-07 Thread Stefano Garzarella


On Fri, Oct 07, 2022 at 10:01:21AM -0400, Michael S. Tsirkin wrote:

On Mon, Sep 26, 2022 at 02:36:40PM +0800, huang...@chinatelecom.cn wrote:

From: Hyman Huang(黄勇) 

Abstract vhost acked features saving into
vhost_user_save_acked_features, export it as util function.

Signed-off-by: Hyman Huang(黄勇) 
Signed-off-by: Guoyi Tu 
---
 include/net/vhost-user.h |  2 ++
 net/vhost-user.c | 35 +++
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
index 5bcd8a6285..00d46613d3 100644
--- a/include/net/vhost-user.h
+++ b/include/net/vhost-user.h
@@ -14,5 +14,7 @@
 struct vhost_net;
 struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
 uint64_t vhost_user_get_acked_features(NetClientState *nc);
+void vhost_user_save_acked_features(NetClientState *nc,
+bool cleanup);

 #endif /* VHOST_USER_H */
diff --git a/net/vhost-user.c b/net/vhost-user.c
index b1a0247b59..c512cc9727 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -45,24 +45,31 @@ uint64_t vhost_user_get_acked_features(NetClientState *nc)
 return s->acked_features;
 }

-static void vhost_user_stop(int queues, NetClientState *ncs[])
+void vhost_user_save_acked_features(NetClientState *nc, bool cleanup)
 {
 NetVhostUserState *s;
+
+s = DO_UPCAST(NetVhostUserState, nc, nc);
+if (s->vhost_net) {
+uint64_t features = vhost_net_get_acked_features(s->vhost_net);
+if (features) {
+s->acked_features = features;
+}


Note it does not set  acked_features if features are 0.
Which might be the case for legacy ...
I will need to analyze this more to figure out what's
the correct behaviour

Stefano? Raphael?


I just noticed that we now call vhost_user_stop() only in the error path 
of vhost_user_start(). We used it elsewhere, but over time we removed 
those calls.


Do we still need to save acked_feature in that path?

About doing it only if the features aren't 0, it isn't clear to me.
I see we did in e6bcb1b617 ("vhost-user: keep vhost_net after a 
disconnection"). @Marc-André do you remember why?


Thanks,
Stefano




+
+if (cleanup) {
+vhost_net_cleanup(s->vhost_net);
+}
+}
+}
+
+static void vhost_user_stop(int queues, NetClientState *ncs[])
+{
 int i;

 for (i = 0; i < queues; i++) {
 assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);

-s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
-
-if (s->vhost_net) {
-/* save acked features */
-uint64_t features = vhost_net_get_acked_features(s->vhost_net);
-if (features) {
-s->acked_features = features;
-}
-vhost_net_cleanup(s->vhost_net);
-}
+vhost_user_save_acked_features(ncs[i], true);
 }
 }

@@ -251,11 +258,7 @@ static void chr_closed_bh(void *opaque)
 s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);

 for (i = queues -1; i >= 0; i--) {
-s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
-
-if (s->vhost_net) {
-s->acked_features = vhost_net_get_acked_features(s->vhost_net);
-}
+vhost_user_save_acked_features(ncs[i], false);
 }

 qmp_set_link(name, false, );
--
2.27.0

[PATCH v2 3/3] tests/acpi: virt: update ACPI GTDT binaries

Step 6 & 7 of the bios-tables-test.c documented procedure.

Differences between disassembled ASL files for GTDT:

@@ -13,14 +13,14 @@
 [000h    4]Signature : "GTDT"[Generic Timer 
Description Table]
 [004h 0004   4] Table Length : 0060
 [008h 0008   1] Revision : 02
-[009h 0009   1] Checksum : 8C
+[009h 0009   1] Checksum : 9C
 [00Ah 0010   6]   Oem ID : "BOCHS "
 [010h 0016   8] Oem Table ID : "BXPC"
 [018h 0024   4] Oem Revision : 0001
 [01Ch 0028   4]  Asl Compiler ID : "BXPC"
 [020h 0032   4]Asl Compiler Revision : 0001

-[024h 0036   8]Counter Block Address : 
+[024h 0036   8]Counter Block Address : 
 [02Ch 0044   4] Reserved : 

 [030h 0048   4] Secure EL1 Interrupt : 001D
@@ -46,16 +46,16 @@
 Trigger Mode : 0
 Polarity : 0
Always On : 0
-[050h 0080   8]   Counter Read Block Address : 
+[050h 0080   8]   Counter Read Block Address : 

 [058h 0088   4] Platform Timer Count : 
 [05Ch 0092   4]Platform Timer Offset : 

 Raw Table Data: Length 96 (0x60)

-: 47 54 44 54 60 00 00 00 02 8C 42 4F 43 48 53 20  // 
GTDT`.BOCHS
+: 47 54 44 54 60 00 00 00 02 9C 42 4F 43 48 53 20  // 
GTDT`.BOCHS
 0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC
BXPC
-0020: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 

+0020: 01 00 00 00 FF FF FF FF FF FF FF FF 00 00 00 00  // 

 0030: 1D 00 00 00 00 00 00 00 1E 00 00 00 04 00 00 00  // 

 0040: 1B 00 00 00 00 00 00 00 1A 00 00 00 00 00 00 00  // 

-0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 

+0050: FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00  // 


Signed-off-by: Miguel Luis 
Acked-by: Ani Sinha 
---
 tests/data/acpi/virt/GTDT   | Bin 96 -> 96 bytes
 tests/data/acpi/virt/GTDT.memhp | Bin 96 -> 96 bytes
 tests/data/acpi/virt/GTDT.numamem   | Bin 96 -> 96 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   3 ---
 4 files changed, 3 deletions(-)

diff --git a/tests/data/acpi/virt/GTDT b/tests/data/acpi/virt/GTDT
index 
9408b71b59c0e0f2991c0053562280155b47bc0b..6f8cb9b8f30b55f4c93fe515982621e3db50feb2
 100644
GIT binary patch
delta 45
kcmYdD;BpUf2}xjJU|^avkxPo>KNL*VQ4xT#fs$YV0LH=;ng9R*

delta 45
jcmYdD;BpUf2}xjJU|{N*$R))AWPrg$9Tfo>8%6^Foy!E8

diff --git a/tests/data/acpi/virt/GTDT.memhp b/tests/data/acpi/virt/GTDT.memhp
index 
9408b71b59c0e0f2991c0053562280155b47bc0b..6f8cb9b8f30b55f4c93fe515982621e3db50feb2
 100644
GIT binary patch
delta 45
kcmYdD;BpUf2}xjJU|^avkxPo>KNL*VQ4xT#fs$YV0LH=;ng9R*

delta 45
jcmYdD;BpUf2}xjJU|{N*$R))AWPrg$9Tfo>8%6^Foy!E8

diff --git a/tests/data/acpi/virt/GTDT.numamem 
b/tests/data/acpi/virt/GTDT.numamem
index 
9408b71b59c0e0f2991c0053562280155b47bc0b..6f8cb9b8f30b55f4c93fe515982621e3db50feb2
 100644
GIT binary patch
delta 45
kcmYdD;BpUf2}xjJU|^avkxPo>KNL*VQ4xT#fs$YV0LH=;ng9R*

delta 45
jcmYdD;BpUf2}xjJU|{N*$R))AWPrg$9Tfo>8%6^Foy!E8

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index 957bd1b4f6..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,4 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/virt/GTDT",
-"tests/data/acpi/virt/GTDT.memhp",
-"tests/data/acpi/virt/GTDT.numamem",
-- 
2.37.3

Re: [PATCH v3 2/8] m25p80: Add the n25q256a SFDP table

On [2022 Jul 22] Fri 08:35:56, Cédric Le Goater wrote:
> The same values were collected on 4 differents OpenPower systems,
> palmettos, romulus and tacoma.
> 
> The SFDP table size is defined as being 0x100 bytes but it could be
> bigger. Only the mandatory table for basic features is available at
> byte 0x30.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/block/m25p80_sfdp.h |  2 ++
>  hw/block/m25p80.c  |  8 +++---
>  hw/block/m25p80_sfdp.c | 58 ++
>  hw/block/meson.build   |  1 +
>  4 files changed, 66 insertions(+), 3 deletions(-)
>  create mode 100644 hw/block/m25p80_sfdp.c
> 
> diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h
> index 230b07ef3308..d3a0a778ae84 100644
> --- a/hw/block/m25p80_sfdp.h
> +++ b/hw/block/m25p80_sfdp.h
> @@ -15,4 +15,6 @@
>   */
>  #define M25P80_SFDP_MAX_SIZE  (1 << 24)
>  
> +extern uint8_t m25p80_sfdp_n25q256a(uint32_t addr);

(-extern above if we would like)


> +
>  #endif
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index abdc4c0b0da7..13e7b28fd2b0 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -247,13 +247,15 @@ static const FlashPartInfo known_devices[] = {
>  { INFO("n25q128a11",  0x20bb18,  0,  64 << 10, 256, ER_4K) },
>  { INFO("n25q128a13",  0x20ba18,  0,  64 << 10, 256, ER_4K) },
>  { INFO("n25q256a11",  0x20bb19,  0,  64 << 10, 512, ER_4K) },
> -{ INFO("n25q256a13",  0x20ba19,  0,  64 << 10, 512, ER_4K) },
> +{ INFO("n25q256a13",  0x20ba19,  0,  64 << 10, 512, ER_4K),
> +  .sfdp_read = m25p80_sfdp_n25q256a },
>  { INFO("n25q512a11",  0x20bb20,  0,  64 << 10, 1024, ER_4K) },
>  { INFO("n25q512a13",  0x20ba20,  0,  64 << 10, 1024, ER_4K) },
>  { INFO("n25q128", 0x20ba18,  0,  64 << 10, 256, 0) },
>  { INFO("n25q256a",0x20ba19,  0,  64 << 10, 512,
> -   ER_4K | HAS_SR_BP3_BIT6 | HAS_SR_TB) },
> -{ INFO("n25q512a",0x20ba20,  0,  64 << 10, 1024, ER_4K) },
> +   ER_4K | HAS_SR_BP3_BIT6 | HAS_SR_TB),
> +  .sfdp_read = m25p80_sfdp_n25q256a },
> +   { INFO("n25q512a",0x20ba20,  0,  64 << 10, 1024, ER_4K) },
>  { INFO("n25q512ax3",  0x20ba20,  0x1000,  64 << 10, 1024, ER_4K) },
>  { INFO("mt25ql512ab", 0x20ba20, 0x1044, 64 << 10, 1024, ER_4K | ER_32K) 
> },
>  { INFO_STACKED("mt35xu01g", 0x2c5b1b, 0x104100, 128 << 10, 1024,
> diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c
> new file mode 100644
> index ..24ec05de79a1
> --- /dev/null
> +++ b/hw/block/m25p80_sfdp.c
> @@ -0,0 +1,58 @@
> +/*
> + * M25P80 Serial Flash Discoverable Parameter (SFDP)
> + *
> + * Copyright (c) 2020, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/host-utils.h"
> +#include "m25p80_sfdp.h"
> +
> +#define define_sfdp_read(model)   \
> +uint8_t m25p80_sfdp_##model(uint32_t addr)\
> +{ \
> +assert(is_power_of_2(sizeof(sfdp_##model)));  \
> +return sfdp_##model[addr & (sizeof(sfdp_##model) - 1)];   \
> +}
> +
> +/*
> + * Micron
> + */
> +static const uint8_t sfdp_n25q256a[] = {

The datasheets I found wasn't completetly as this table but I can't argue
with the hw read out of 4 flashes.

Reviewed-by: Francisco Iglesias 

> +0x53, 0x46, 0x44, 0x50, 0x00, 0x01, 0x00, 0xff,
> +0x00, 0x00, 0x01, 0x09, 0x30, 0x00, 0x00, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xe5, 0x20, 0xfb, 0xff, 0xff, 0xff, 0xff, 0x0f,
> +0x29, 0xeb, 0x27, 0x6b, 0x08, 0x3b, 0x27, 0xbb,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x27, 0xbb,
> +0xff, 0xff, 0x29, 0xeb, 0x0c, 0x20, 0x10, 0xd8,
> +0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff,

Re: [PATCH v3 5/8] m25p80: Add the mx66l1g45g SFDP table

On [2022 Jul 22] Fri 08:35:59, Cédric Le Goater wrote:
> The SFDP table size is 0x200 bytes long. The mandatory table for basic
> features is available at byte 0x30 plus some more Macronix specific
> tables.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/block/m25p80_sfdp.h |  2 +-
>  hw/block/m25p80.c  |  3 +-
>  hw/block/m25p80_sfdp.c | 68 ++
>  3 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h
> index 87690a173c78..468e3434151b 100644
> --- a/hw/block/m25p80_sfdp.h
> +++ b/hw/block/m25p80_sfdp.h
> @@ -19,6 +19,6 @@ extern uint8_t m25p80_sfdp_n25q256a(uint32_t addr);
>  
>  extern uint8_t m25p80_sfdp_mx25l25635e(uint32_t addr);
>  extern uint8_t m25p80_sfdp_mx25l25635f(uint32_t addr);
> -
> +extern uint8_t m25p80_sfdp_mx66l1g45g(uint32_t addr);

(optional -extern)

Reviewed-by: Francisco Iglesias 

>  
>  #endif
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index 6b120ce65212..52df24d24751 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -240,7 +240,8 @@ static const FlashPartInfo known_devices[] = {
>  { INFO("mx66l51235f", 0xc2201a,  0,  64 << 10, 1024, ER_4K | ER_32K) 
> },
>  { INFO("mx66u51235f", 0xc2253a,  0,  64 << 10, 1024, ER_4K | ER_32K) 
> },
>  { INFO("mx66u1g45g",  0xc2253b,  0,  64 << 10, 2048, ER_4K | ER_32K) 
> },
> -{ INFO("mx66l1g45g",  0xc2201b,  0,  64 << 10, 2048, ER_4K | ER_32K) 
> },
> +{ INFO("mx66l1g45g",  0xc2201b,  0,  64 << 10, 2048, ER_4K | ER_32K),
> +  .sfdp_read = m25p80_sfdp_mx66l1g45g },
>  
>  /* Micron */
>  { INFO("n25q032a11",  0x20bb16,  0,  64 << 10,  64, ER_4K) },
> diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c
> index 70c13aea7c63..38c3ced34d2e 100644
> --- a/hw/block/m25p80_sfdp.c
> +++ b/hw/block/m25p80_sfdp.c
> @@ -150,3 +150,71 @@ static const uint8_t sfdp_mx25l25635f[] = {
>  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>  };
>  define_sfdp_read(mx25l25635f);
> +
> +static const uint8_t sfdp_mx66l1g45g[] = {
> +0x53, 0x46, 0x44, 0x50, 0x06, 0x01, 0x02, 0xff,
> +0x00, 0x06, 0x01, 0x10, 0x30, 0x00, 0x00, 0xff,
> +0xc2, 0x00, 0x01, 0x04, 0x10, 0x01, 0x00, 0xff,
> +0x84, 0x00, 0x01, 0x02, 0xc0, 0x00, 0x00, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xe5, 0x20, 0xfb, 0xff, 0xff, 0xff, 0xff, 0x3f,
> +0x44, 0xeb, 0x08, 0x6b, 0x08, 0x3b, 0x04, 0xbb,
> +0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0xff,
> +0xff, 0xff, 0x44, 0xeb, 0x0c, 0x20, 0x0f, 0x52,
> +0x10, 0xd8, 0x00, 0xff, 0xd6, 0x49, 0xc5, 0x00,
> +0x85, 0xdf, 0x04, 0xe3, 0x44, 0x03, 0x67, 0x38,
> +0x30, 0xb0, 0x30, 0xb0, 0xf7, 0xbd, 0xd5, 0x5c,
> +0x4a, 0x9e, 0x29, 0xff, 0xf0, 0x50, 0xf9, 0x85,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0x7f, 0xef, 0xff, 0xff, 0x21, 0x5c, 0xdc, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0x00, 0x36, 0x00, 0x27, 0x9d, 0xf9, 0xc0, 0x64,
> +0x85, 0xcb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff,

Re: [PATCH 1/2] vhost-user: Refactor vhost acked features saving

On Mon, Sep 26, 2022 at 02:36:40PM +0800, huang...@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) 
> 
> Abstract vhost acked features saving into
> vhost_user_save_acked_features, export it as util function.
> 
> Signed-off-by: Hyman Huang(黄勇) 
> Signed-off-by: Guoyi Tu 
> ---
>  include/net/vhost-user.h |  2 ++
>  net/vhost-user.c | 35 +++
>  2 files changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
> index 5bcd8a6285..00d46613d3 100644
> --- a/include/net/vhost-user.h
> +++ b/include/net/vhost-user.h
> @@ -14,5 +14,7 @@
>  struct vhost_net;
>  struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
>  uint64_t vhost_user_get_acked_features(NetClientState *nc);
> +void vhost_user_save_acked_features(NetClientState *nc,
> +bool cleanup);
>  
>  #endif /* VHOST_USER_H */
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index b1a0247b59..c512cc9727 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -45,24 +45,31 @@ uint64_t vhost_user_get_acked_features(NetClientState *nc)
>  return s->acked_features;
>  }
>  
> -static void vhost_user_stop(int queues, NetClientState *ncs[])
> +void vhost_user_save_acked_features(NetClientState *nc, bool cleanup)
>  {
>  NetVhostUserState *s;
> +
> +s = DO_UPCAST(NetVhostUserState, nc, nc);
> +if (s->vhost_net) {
> +uint64_t features = vhost_net_get_acked_features(s->vhost_net);
> +if (features) {
> +s->acked_features = features;
> +}

Note it does not set  acked_features if features are 0.
Which might be the case for legacy ...
I will need to analyze this more to figure out what's
the correct behaviour

Stefano? Raphael?

> +
> +if (cleanup) {
> +vhost_net_cleanup(s->vhost_net);
> +}
> +}
> +}
> +
> +static void vhost_user_stop(int queues, NetClientState *ncs[])
> +{
>  int i;
>  
>  for (i = 0; i < queues; i++) {
>  assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
>  
> -s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
> -
> -if (s->vhost_net) {
> -/* save acked features */
> -uint64_t features = vhost_net_get_acked_features(s->vhost_net);
> -if (features) {
> -s->acked_features = features;
> -}
> -vhost_net_cleanup(s->vhost_net);
> -}
> +vhost_user_save_acked_features(ncs[i], true);
>  }
>  }
>  
> @@ -251,11 +258,7 @@ static void chr_closed_bh(void *opaque)
>  s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
>  
>  for (i = queues -1; i >= 0; i--) {
> -s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
> -
> -if (s->vhost_net) {
> -s->acked_features = vhost_net_get_acked_features(s->vhost_net);
> -}
> +vhost_user_save_acked_features(ncs[i], false);
>  }
>  
>  qmp_set_link(name, false, );
> -- 
> 2.27.0

[PATCH v2 1/3] tests/acpi: virt: allow acpi GTDT changes

Step 3 from bios-tables-test.c documented procedure.

Signed-off-by: Miguel Luis 
Acked-by: Ani Sinha 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..957bd1b4f6 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,4 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/virt/GTDT",
+"tests/data/acpi/virt/GTDT.memhp",
+"tests/data/acpi/virt/GTDT.numamem",
-- 
2.37.3

Re: [PATCH v3 1/2] qpci_device_enable: Allow for command bits hardwired to 0

On Sun, Sep 25, 2022 at 09:37:58AM +, Lev Kujawski wrote:
> Devices like the PIIX3/4 IDE controller do not support certain modes
> of operation, such as memory space accesses, and indicate this lack of
> support by hardwiring the applicable bits to zero.  Extend the QEMU
> PCI device testing framework to accommodate such devices.
> 
> * tests/qtest/libqos/pci.h: Add the command_disabled word to indicate
>   bits hardwired to 0.
> * tests/qtest/libqos/pci.c: Verify that hardwired bits are actually
>   hardwired.
> 
> Signed-off-by: Lev Kujawski 
> ---
>  tests/qtest/libqos/pci.c | 13 +++--
>  tests/qtest/libqos/pci.h |  1 +
>  2 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qtest/libqos/pci.c b/tests/qtest/libqos/pci.c
> index b23d72346b..4f3d28d8d9 100644
> --- a/tests/qtest/libqos/pci.c
> +++ b/tests/qtest/libqos/pci.c
> @@ -220,18 +220,19 @@ int qpci_secondary_buses_init(QPCIBus *bus)
>  
>  void qpci_device_enable(QPCIDevice *dev)
>  {
> -uint16_t cmd;
> +const uint16_t enable_bits =
> +PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER;
> +uint16_t cmd, new_cmd;
>  
>  /* FIXME -- does this need to be a bus callout? */
>  cmd = qpci_config_readw(dev, PCI_COMMAND);
> -cmd |= PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER;
> +cmd |= enable_bits;
>  qpci_config_writew(dev, PCI_COMMAND, cmd);
>  
>  /* Verify the bits are now set. */
> -cmd = qpci_config_readw(dev, PCI_COMMAND);
> -g_assert_cmphex(cmd & PCI_COMMAND_IO, ==, PCI_COMMAND_IO);
> -g_assert_cmphex(cmd & PCI_COMMAND_MEMORY, ==, PCI_COMMAND_MEMORY);
> -g_assert_cmphex(cmd & PCI_COMMAND_MASTER, ==, PCI_COMMAND_MASTER);
> +new_cmd = qpci_config_readw(dev, PCI_COMMAND);
> +new_cmd &= enable_bits;
> +g_assert_cmphex(new_cmd, ==, enable_bits & ~dev->command_disabled);
>  }
>  
>  /**
> diff --git a/tests/qtest/libqos/pci.h b/tests/qtest/libqos/pci.h
> index 8389614523..eaedb98588 100644
> --- a/tests/qtest/libqos/pci.h
> +++ b/tests/qtest/libqos/pci.h
> @@ -68,6 +68,7 @@ struct QPCIDevice
>  bool msix_enabled;
>  QPCIBar msix_table_bar, msix_pba_bar;
>  uint64_t msix_table_off, msix_pba_off;
> +uint16_t command_disabled;


Can we get this from device's wmask?

>  };
>  
>  struct QPCIAddress {
> -- 
> 2.34.1

Re: [PATCH RFC] hw/cxl: type 3 devices can now present volatile or persistent memory

2022-10-07 Thread Gregory Price

Now that i've had some time to look at the spec, the DVSEC CXL Capability
register (8.1.3.1 in 3.0 spec)
only supports enabling two HDM ranges at the moment, which to me means we
should implement

memdev0=..., memdev1=...

Yesterday I pushed a patch proposal that separated the regions into
persistent and volatile, but
i had discovered that there isn't a good way (presently) to determine of a
MemoryBacking object
is both file File-type AND has pmem=true, meaning we'd have to either
expose that interface
(this seems dubious to me) or do the following:

memdev0=mem0,memdev0-pmem=true,memdev1=mem0,memdev0-pmem=false

and then simply store a bit for each hostmem region accordingly to report
pmem/volatile.
This would allow the flexibility for the backing device to be whatever the
user wants while
still being able to mark the behavior of the type-3 device as pmem or
volatile.  Alternatively
we could make `memdev0-type=` and allow for new types in the future I
suppose.

The one thing I'm a bit confused on - the Media_type and Memory_Class
fields in the DVSEC
CXL Range registers (8.1.3.8.2).  Right now they're set to "010b" = Memory
Characteristics
are communicated via CDAT".  Do the devices presently emulate this? I'm
finding it hard to pick
apart the code to identify it.

On Thu, Oct 6, 2022 at 1:30 PM Gregory Price 
wrote:

> On Thu, Oct 06, 2022 at 05:42:14PM +0100, Jonathan Cameron wrote:
> > >
> > > 1) The PCI device type is set prior to realize/attributes, and is
> > > currently still set to PCI_CLASS_STORAGE_EXPRESS.  Should this instead
> > > be PCI_CLASS_MEMORY_CXL when presenting as a simple memory expander?
> >
> > We override this later in realize but indeed a bit odd that we first set
> > it to the wrong thing. I guess that is really old code.  Nice thing to
> clean up.
> > I just tried it and setting it right in the first place + dropping where
> we
> > change it later works fine.
> >
>
> I'll add it as a pullout patch ahead of my next revision.
>
> / snip - skipping ahead for the sake of brevity /
>
>
> I was unaware that an SLD could be comprised of multiple regions
> of both persistent and volatile memory.  I was under the impression that
> it could only be one type of memory.  Of course that makes sense in the
> case of a memory expander that simply lets you plug DIMMs in *facepalm*
>
> I see the reason to have separate backends in this case.
>
> The reason to allow an array of backing devices is if we believe each
> individual DIMM plugged into a memexpander is likely to show up as
> (configurably?) individual NUMA nodes, or if it's likely to get
> classified as one numa node.
>
> Maybe we should consider 2 new options:
> --persistent-memdevs=pm1 pm2 pm3
> --volatile-memdevs=vm1 vm2 vm3
>
> etc, and deprecate --memdev, and go with your array of memdevs idea.
>
> I think I could probably whip that up in a day or two.  Thoughts?
>
>
>
> > >
> > > 2) EDK2 sets the memory area as a reserved, and the memory is not
> > > configured by the system as ram.  I'm fairly sure edk2 just doesn't
> > > support this yet, but there's a chicken/egg problem.  If the device
> > > isn't there, there's nothing to test against... if there's nothing to
> > > test against, no one will write the support.  So I figure we should
> kick
> > > start the process (probably by getting it wrong on the first go
> around!)
> >
> > Yup, if the bios left it alone, OS drivers need to treat it the same as
> > they would deal with hotplugged memory.  Note my strong suspicion is
> there
> > will be host vendors who won't ever handle volatile CXL memory in
> firmware.
> > They will just let the OS bring it up after boot. As long as you have DDR
> > as well on the system that will be fine.  Means there is one code path
> > to verify rather than two.  Not everyone will care about legacy OS
> support.
> >
>
> Presently i'm failing to bring up a region of memory even when this is
> set to persistent (even on upstream configuration).  The kernel is
> presently failing to set_size because the region is used.
>
> I can't tell if this is a driver error or because EDK2 is marking the
> region as reserved.
>
> relevant boot output:
> [0.00] BIOS-e820: [mem 0x00029000-0x00029fff]
> reserved
> [1.229097] acpi ACPI0016:00: _OSC: OS supports [ExtendedConfig ASPM
> ClockPM Segments MSI EDR HPX-Type3]
> [1.244082] acpi ACPI0016:00: _OSC: OS supports [CXL20PortDevRegAccess
> CXLProtocolErrorReporting CXLNativeHotPlug]
> [1.261245] acpi ACPI0016:00: _OSC: platform does not support [LTR DPC]
> [1.272347] acpi ACPI0016:00: _OSC: OS now controls [PCIeHotplug
> SHPCHotplug PME AER PCIeCapability]
> [1.286092] acpi ACPI0016:00: _OSC: OS now controls
> [CXLMemErrorReporting]
>
> The device is otherwise available for use
>
> cli output
> # cxl list
> [
>   {
> "memdev":"mem0",
> "pmem_size":268435456,
> "ram_size":0,
> "serial":0,
> "host":":35:00.0"
>   }
> ]
>
> but it fails to setup

Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory

2022-10-07 Thread Sean Christopherson

On Fri, Oct 07, 2022, Jarkko Sakkinen wrote:
> On Thu, Oct 06, 2022 at 03:34:58PM +, Sean Christopherson wrote:
> > On Thu, Oct 06, 2022, Jarkko Sakkinen wrote:
> > > On Thu, Oct 06, 2022 at 05:58:03PM +0300, Jarkko Sakkinen wrote:
> > > > On Thu, Sep 15, 2022 at 10:29:07PM +0800, Chao Peng wrote:
> > > > > This new extension, indicated by the new flag KVM_MEM_PRIVATE, adds 
> > > > > two
> > > > > additional KVM memslot fields private_fd/private_offset to allow
> > > > > userspace to specify that guest private memory provided from the
> > > > > private_fd and guest_phys_addr mapped at the private_offset of the
> > > > > private_fd, spanning a range of memory_size.
> > > > > 
> > > > > The extended memslot can still have the userspace_addr(hva). When 
> > > > > use, a
> > > > > single memslot can maintain both private memory through private
> > > > > fd(private_fd/private_offset) and shared memory through
> > > > > hva(userspace_addr). Whether the private or shared part is visible to
> > > > > guest is maintained by other KVM code.
> > > > 
> > > > What is anyway the appeal of private_offset field, instead of having 
> > > > just
> > > > 1:1 association between regions and files, i.e. one memfd per region?
> > 
> > Modifying memslots is slow, both in KVM and in QEMU (not sure about 
> > Google's VMM).
> > E.g. if a vCPU converts a single page, it will be forced to wait until all 
> > other
> > vCPUs drop SRCU, which can have severe latency spikes, e.g. if KVM is 
> > faulting in
> > memory.  KVM's memslot updates also hold a mutex for the entire duration of 
> > the
> > update, i.e. conversions on different vCPUs would be fully serialized, 
> > exacerbating
> > the SRCU problem.
> > 
> > KVM also has historical baggage where it "needs" to zap _all_ SPTEs when any
> > memslot is deleted.
> > 
> > Taking both a private_fd and a shared userspace address allows userspace to 
> > convert
> > between private and shared without having to manipulate memslots.
> 
> Right, this was really good explanation, thank you.
> 
> Still wondering could this possibly work (or not):
> 
> 1. Union userspace_addr and private_fd.

No, because userspace needs to be able to provide both userspace_addr (shared
memory) and private_fd (private memory) for a single memslot.

> 2. Instead of introducing private_offset, use guest_phys_addr as the
>offset.

No, because that would force userspace to use a single private_fd for all of 
guest
memory since it effectively means private_offset=0.  And userspace couldn't skip
over holes in guest memory, i.e. the size of the memfd would need to follow the
max guest gpa.  In other words, dropping private_offset could work, but it'd be
quite kludgy and not worth saving 8 bytes.

[PATCH v2 0/3] fix for two ACPI GTDT physical addresses

The ACPI GTDT table contains two invalid 64-bit physical addresses according to
the ACPI spec. 6.5 [1]. Those are the Counter Control Base physical address and
the Counter Read Base physical address. Those fields of the GTDT table should be
set to 0x if not provided, rather than 0x0.

[1]: 
https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gtdt-table-structure

Changelog:

v2:
Updated with collected tags from v1.

v1: https://lists.nongnu.org/archive/html/qemu-devel/2022-09/msg02847.html

Miguel Luis (3):
  tests/acpi: virt: allow acpi GTDT changes
  acpi: arm/virt: build_gtdt: fix invalid 64-bit physical addresses
  tests/acpi: virt: update ACPI GTDT binaries

 hw/arm/virt-acpi-build.c  |   5 ++---
 tests/data/acpi/virt/GTDT | Bin 96 -> 96 bytes
 tests/data/acpi/virt/GTDT.memhp   | Bin 96 -> 96 bytes
 tests/data/acpi/virt/GTDT.numamem | Bin 96 -> 96 bytes
 4 files changed, 2 insertions(+), 3 deletions(-)

-- 
2.37.3

Re: [PATCH] error handling: Use TFR() macro where applicable

2022-10-07 Thread Christian Schoenebeck

On Freitag, 7. Oktober 2022 13:44:28 CEST Nikita Ivanov wrote:
> Hi!

Hi Nikita!

> Sorry for such a long absence, I've been resolving some other issues in my
> life for a while. I've adjusted the patch according to your latest
> comments. Could you check it out, please?

Sorry for the drill, but I fear this has to follow common form for patch 
submissions:

  * one email as cover letter

  * two (additional) separate emails for the two patches, each referencing the 
cover letter email

  * bumping the version in subject line

https://www.qemu.org/docs/master/devel/submitting-a-patch.html

One more comment on patch 2 below ...

> From 5389c5ccc8789f8f666ab99e50d38af728bd2c9c Mon Sep 17 00:00:00 2001
> From: Nikita Ivanov 
> Date: Wed, 3 Aug 2022 12:54:00 +0300
> Subject: [PATCH 1/2] error handling: Use TFR() macro where applicable
> 
> There is a defined TFR() macro in qemu/osdep.h which
> handles the same while loop.
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/415
> 
> Signed-off-by: Nikita Ivanov 
> ---
>  block/file-posix.c| 39 ++-
>  chardev/char-pty.c|  4 +---
>  hw/9pfs/9p-local.c|  8 ++--
>  net/l2tpv3.c  | 15 +++
>  net/socket.c  | 16 +++-
>  net/tap.c |  8 ++--
>  qga/commands-posix.c  |  4 +---
>  semihosting/syscalls.c|  4 +---
>  tests/qtest/libqtest.c| 14 +++---
>  tests/vhost-user-bridge.c |  4 +---
>  util/main-loop.c  |  4 +---
>  util/osdep.c  |  4 +---
>  util/vfio-helpers.c   | 12 ++--
>  13 files changed, 51 insertions(+), 85 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 66fdb07820..7892bdea31 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1238,9 +1238,9 @@ static int hdev_get_max_segments(int fd, struct stat
> *st)
>  ret = -errno;
>  goto out;
>  }
> -do {
> -ret = read(sysfd, buf, sizeof(buf) - 1);
> -} while (ret == -1 && errno == EINTR);
> +TFR(
> +ret = read(sysfd, buf, sizeof(buf) - 1)
> +);
>  if (ret < 0) {
>  ret = -errno;
>  goto out;
> @@ -1388,9 +1388,9 @@ static int handle_aiocb_ioctl(void *opaque)
>  RawPosixAIOData *aiocb = opaque;
>  int ret;
> 
> -do {
> -ret = ioctl(aiocb->aio_fildes, aiocb->ioctl.cmd, aiocb->ioctl.buf);
> -} while (ret == -1 && errno == EINTR);
> +TFR(
> +ret = ioctl(aiocb->aio_fildes, aiocb->ioctl.cmd, aiocb->ioctl.buf)
> +);
>  if (ret == -1) {
>  return -errno;
>  }
> @@ -1472,18 +1472,17 @@ static ssize_t
> handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
>  {
>  ssize_t len;
> 
> -do {
> -if (aiocb->aio_type & QEMU_AIO_WRITE)
> -len = qemu_pwritev(aiocb->aio_fildes,
> -   aiocb->io.iov,
> -   aiocb->io.niov,
> -   aiocb->aio_offset);
> - else
> -len = qemu_preadv(aiocb->aio_fildes,
> -  aiocb->io.iov,
> -  aiocb->io.niov,
> -  aiocb->aio_offset);
> -} while (len == -1 && errno == EINTR);
> +TFR(
> +len = (aiocb->aio_type & QEMU_AIO_WRITE) ?
> +qemu_pwritev(aiocb->aio_fildes,
> +   aiocb->io.iov,
> +   aiocb->io.niov,
> +   aiocb->aio_offset) :
> +qemu_preadv(aiocb->aio_fildes,
> +  aiocb->io.iov,
> +  aiocb->io.niov,
> +  aiocb->aio_offset)
> +);
> 
>  if (len == -1) {
>  return -errno;
> @@ -1908,9 +1907,7 @@ static int allocate_first_block(int fd, size_t
> max_size)
>  buf = qemu_memalign(max_align, write_size);
>  memset(buf, 0, write_size);
> 
> -do {
> -n = pwrite(fd, buf, write_size, 0);
> -} while (n == -1 && errno == EINTR);
> +TFR(n = pwrite(fd, buf, write_size, 0));
> 
>  ret = (n == -1) ? -errno : 0;
> 
> diff --git a/chardev/char-pty.c b/chardev/char-pty.c
> index 53f25c6bbd..b2f490bacf 100644
> --- a/chardev/char-pty.c
> +++ b/chardev/char-pty.c
> @@ -93,9 +93,7 @@ static void pty_chr_update_read_handler(Chardev *chr)
>  pfd.fd = fioc->fd;
>  pfd.events = G_IO_OUT;
>  pfd.revents = 0;
> -do {
> -rc = g_poll(, 1, 0);
> -} while (rc == -1 && errno == EINTR);
> +TFR(rc = g_poll(, 1, 0));
>  assert(rc >= 0);
> 
>  if (pfd.revents & G_IO_HUP) {
> diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
> index d42ce6d8b8..c90ab947ba 100644
> --- a/hw/9pfs/9p-local.c
> +++ b/hw/9pfs/9p-local.c
> @@ -470,9 +470,7 @@ static ssize_t local_readlink(FsContext *fs_ctx,
> V9fsPath *fs_path,
>  if (fd == -1) {
>  return -1;
>  }
> -do {
> -tsize =

Re: [PATCH v3 41/42] target/arm: Implement FEAT_HAFDBS

On Sat, 1 Oct 2022 at 18:04, Richard Henderson
 wrote:
>
> Perform the atomic update for hardware management of the access flag
> and the dirty bit.
>
> A limitation of the implementation so far is that the page table
> itself must already be writable, i.e. the dirty bit for the stage2
> page table must already be set, i.e. we cannot set both dirty bits
> at the same time.
>
> This is allowed because it is CONSTRAINED UNPREDICTABLE whether any
> atomic update happens at all.  The implementation is allowed to simply
> fall back on software update at any time.

I can't see where in the Arm ARM this is stated.

In any case, if HA is set you can't simply return ARMFault_AccessFlag
without breaking the bit in D5.4.12 which says
"When hardware updates of the Access flag are enabled for a stage of
 translation an address translation instruction that uses that stage
 of translation will not report that the address will give rise to
 an Access flag fault in the PAR"

> Signed-off-by: Richard Henderson 
> ---
>  docs/system/arm/emulation.rst |   1 +
>  target/arm/cpu64.c|   1 +
>  target/arm/ptw.c  | 119 --
>  3 files changed, 115 insertions(+), 6 deletions(-)
>
> diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
> index be7bbffe59..c3582d075e 100644
> --- a/docs/system/arm/emulation.rst
> +++ b/docs/system/arm/emulation.rst
> @@ -31,6 +31,7 @@ the following architecture extensions:
>  - FEAT_FRINTTS (Floating-point to integer instructions)
>  - FEAT_FlagM (Flag manipulation instructions v2)
>  - FEAT_FlagM2 (Enhancements to flag manipulation instructions)
> +- FEAT_HAFDBS (Hardware management of the access flag and dirty bit state)
>  - FEAT_HCX (Support for the HCRX_EL2 register)
>  - FEAT_HPDS (Hierarchical permission disables)
>  - FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index e6314e86d2..b064dc7964 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -1116,6 +1116,7 @@ static void aarch64_max_initfn(Object *obj)
>  cpu->isar.id_aa64mmfr0 = t;
>
>  t = cpu->isar.id_aa64mmfr1;
> +t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 2);   /* FEAT_HAFDBS */

I think we should split the access flag update and the
dirty-bit update into separate commits. It might be useful
for bisection purposes later, and it looks like they're pretty
cleanly separable. (Though if you look at my last comment in this
email, maybe not quite so clean as in the code as you have it here.)

>  t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* FEAT_VMID16 */
>  t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);   /* FEAT_VHE */
>  t = FIELD_DP64(t, ID_AA64MMFR1, HPDS, 1); /* FEAT_HPDS */
> diff --git a/target/arm/ptw.c b/target/arm/ptw.c
> index 45734b0d28..14ab56d1b5 100644
> --- a/target/arm/ptw.c
> +++ b/target/arm/ptw.c
> @@ -223,6 +223,7 @@ static bool S2_attrs_are_device(uint64_t hcr, uint8_t 
> attrs)
>  typedef struct {
>  bool is_secure;
>  bool be;
> +bool rw;
>  void *hphys;
>  hwaddr gphys;
>  } S1TranslateResult;
> @@ -261,7 +262,8 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
> mmu_idx,
>  pte_attrs = s2.cacheattrs.attrs;
>  pte_secure = s2.f.attrs.secure;
>  }
> -res->hphys = NULL;
> +res->hphys = NULL;  /* force slow path */
> +res->rw = false;/* debug never modifies */
>  } else {
>  CPUTLBEntryFull *full;
>  int flags;
> @@ -276,6 +278,7 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
> mmu_idx,
>  goto fail;
>  }
>  res->gphys = full->phys_addr;
> +res->rw = full->prot & PAGE_WRITE;
>  pte_attrs = full->pte_attrs;
>  pte_secure = full->attrs.secure;
>  }
> @@ -381,6 +384,56 @@ static uint64_t arm_ldq_ptw(CPUARMState *env, const 
> S1TranslateResult *s1,
>  return data;
>  }
>
> +static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
> + uint64_t new_val, const S1TranslateResult *s1,
> + ARMMMUFaultInfo *fi)
> +{
> +uint64_t cur_val;
> +
> +if (unlikely(!s1->hphys)) {
> +fi->type = ARMFault_UnsuppAtomicUpdate;
> +fi->s1ptw = true;
> +return 0;
> +}
> +
> +#ifndef CONFIG_ATOMIC64
> +/*
> + * We can't support the atomic operation on the host.  We should be
> + * running in round-robin mode though, which means that we would only
> + * race with dma i/o.
> + */
> +qemu_mutex_lock_iothread();

Are there definitely no code paths where we might try to do
a page table walk with the iothread already locked ?

> +if (s1->be) {
> +cur_val = ldq_be_p(s1->hphys);
> +if (cur_val == old_val) {
> +stq_be_p(s1->hphys, new_val);
> +}
> +} else {
> +cur_val = ldq_le_p(s1->hphys);
> +if (cur_val ==

[PATCH v2 2/3] acpi: arm/virt: build_gtdt: fix invalid 64-bit physical addresses

Per the ACPI 6.5 specification, on the GTDT Table Structure, the Counter Control
Block Address and Counter Read Block Address fields of the GTDT table should be
set to 0x if not provided, rather than 0x0.

Fixes: 41041e57085 ("acpi: arm/virt: build_gtdt: use 
acpi_table_begin()/acpi_table_end() instead of build_header()")

Signed-off-by: Miguel Luis 
Reviewed-by: Ani Sinha 
---
 hw/arm/virt-acpi-build.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9b3aee01bf..13c6e3e468 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -592,8 +592,7 @@ build_gtdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 acpi_table_begin(, table_data);
 
 /* CntControlBase Physical Address */
-/* FIXME: invalid value, should be 0x if not impl. ? */
-build_append_int_noprefix(table_data, 0, 8);
+build_append_int_noprefix(table_data, 0x, 8);
 build_append_int_noprefix(table_data, 0, 4); /* Reserved */
 /*
  * FIXME: clarify comment:
@@ -618,7 +617,7 @@ build_gtdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 /* Non-Secure EL2 timer Flags */
 build_append_int_noprefix(table_data, irqflags, 4);
 /* CntReadBase Physical address */
-build_append_int_noprefix(table_data, 0, 8);
+build_append_int_noprefix(table_data, 0x, 8);
 /* Platform Timer Count */
 build_append_int_noprefix(table_data, 0, 4);
 /* Platform Timer Offset */
-- 
2.37.3

Re: [External] : Re: [PATCH 3/3] tests/acpi: virt: update ACPI GTDT binaries


> On 21 Sep 2022, at 03:39, Ani Sinha  wrote:
> 
> 
> 
> On Tue, 20 Sep 2022, Miguel Luis wrote:
> 
>> Step 6 & 7 of the bios-tables-test.c documented procedure.
>> 
>> Differences between disassembled ASL files for GTDT:
>> 
>>@@ -13,14 +13,14 @@
>> [000h    4]Signature : "GTDT"[Generic Timer 
>> Description Table]
>> [004h 0004   4] Table Length : 0060
>> [008h 0008   1] Revision : 02
>>-[009h 0009   1] Checksum : 8C
>>+[009h 0009   1] Checksum : 9C
>> [00Ah 0010   6]   Oem ID : "BOCHS "
>> [010h 0016   8] Oem Table ID : "BXPC"
>> [018h 0024   4] Oem Revision : 0001
>> [01Ch 0028   4]  Asl Compiler ID : "BXPC"
>> [020h 0032   4]Asl Compiler Revision : 0001
>> 
>>-[024h 0036   8]Counter Block Address : 
>>+[024h 0036   8]Counter Block Address : 
>> [02Ch 0044   4] Reserved : 
>> 
>> [030h 0048   4] Secure EL1 Interrupt : 001D
>>@@ -46,16 +46,16 @@
>> Trigger Mode : 0
>> Polarity : 0
>>Always On : 0
>>-[050h 0080   8]   Counter Read Block Address : 
>>+[050h 0080   8]   Counter Read Block Address : 
>> 
>> [058h 0088   4] Platform Timer Count : 
>> [05Ch 0092   4]Platform Timer Offset : 
>> 
>> Raw Table Data: Length 96 (0x60)
>> 
>>-: 47 54 44 54 60 00 00 00 02 8C 42 4F 43 48 53 20  // 
>> GTDT`.BOCHS
>>+: 47 54 44 54 60 00 00 00 02 9C 42 4F 43 48 53 20  // 
>> GTDT`.BOCHS
>> 0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC
>> BXPC
>>-0020: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
>> 
>>+0020: 01 00 00 00 FF FF FF FF FF FF FF FF 00 00 00 00  // 
>> 
>> 0030: 1D 00 00 00 00 00 00 00 1E 00 00 00 04 00 00 00  // 
>> 
>> 0040: 1B 00 00 00 00 00 00 00 1A 00 00 00 00 00 00 00  // 
>> 
>>-0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
>> 
>>+0050: FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00  // 
>> 
>> 
>> Signed-off-by: Miguel Luis 
> 
> Acked-by: Ani Sinha 
> 

Thank you for the tags Ani. I’ve collected them and will spin v2 soon.

Thanks,
Miguel

>> ---
>> tests/data/acpi/virt/GTDT   | Bin 96 -> 96 bytes
>> tests/data/acpi/virt/GTDT.memhp | Bin 96 -> 96 bytes
>> tests/data/acpi/virt/GTDT.numamem   | Bin 96 -> 96 bytes
>> tests/qtest/bios-tables-test-allowed-diff.h |   3 ---
>> 4 files changed, 3 deletions(-)
>> 
>> diff --git a/tests/data/acpi/virt/GTDT b/tests/data/acpi/virt/GTDT
>> index 
>> 9408b71b59c0e0f2991c0053562280155b47bc0b..6f8cb9b8f30b55f4c93fe515982621e3db50feb2
>>  100644
>> GIT binary patch
>> delta 45
>> kcmYdD;BpUf2}xjJU|^avkxPo>KNL*VQ4xT#fs$YV0LH=;ng9R*
>> 
>> delta 45
>> jcmYdD;BpUf2}xjJU|{N*$R))AWPrg$9Tfo>8%6^Foy!E8
>> 
>> diff --git a/tests/data/acpi/virt/GTDT.memhp 
>> b/tests/data/acpi/virt/GTDT.memhp
>> index 
>> 9408b71b59c0e0f2991c0053562280155b47bc0b..6f8cb9b8f30b55f4c93fe515982621e3db50feb2
>>  100644
>> GIT binary patch
>> delta 45
>> kcmYdD;BpUf2}xjJU|^avkxPo>KNL*VQ4xT#fs$YV0LH=;ng9R*
>> 
>> delta 45
>> jcmYdD;BpUf2}xjJU|{N*$R))AWPrg$9Tfo>8%6^Foy!E8
>> 
>> diff --git a/tests/data/acpi/virt/GTDT.numamem 
>> b/tests/data/acpi/virt/GTDT.numamem
>> index 
>> 9408b71b59c0e0f2991c0053562280155b47bc0b..6f8cb9b8f30b55f4c93fe515982621e3db50feb2
>>  100644
>> GIT binary patch
>> delta 45
>> kcmYdD;BpUf2}xjJU|^avkxPo>KNL*VQ4xT#fs$YV0LH=;ng9R*
>> 
>> delta 45
>> jcmYdD;BpUf2}xjJU|{N*$R))AWPrg$9Tfo>8%6^Foy!E8
>> 
>> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
>> b/tests/qtest/bios-tables-test-allowed-diff.h
>> index 957bd1b4f6..dfb8523c8b 100644
>> --- a/tests/qtest/bios-tables-test-allowed-diff.h
>> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
>> @@ -1,4 +1 @@
>> /* List of comma-separated changed AML files to ignore */
>> -"tests/data/acpi/virt/GTDT",
>> -"tests/data/acpi/virt/GTDT.memhp",
>> -"tests/data/acpi/virt/GTDT.numamem",
>> --
>> 2.36.0
>> 
>>

Re: [RFC PATCH 0/4] Idea for using hardfloat in PPC

2022-10-07 Thread Alex Bennée

Richard Henderson  writes:

> On 10/5/22 07:37, Víctor Colombo wrote:
>> However, the impact in performance was not the expected. In x86_64 I
>> had a small 3% improvement, while in a Power9 machine there was a small
>> performance loss, as can be seem below (100 executions).
>> || min [s] | max [s] | avg [s] |
>> | before | 122.309 | 123.459 | 122.747 |
>> | after  | 123.906 | 125.016 | 124.373 |
>
> I hope this is because you didn't handle the most common cases: add, sub, 
> mul, div.
>
> The logic seems plausible, as far as it goes, and would work for the
> FR bit as well which afair isn't handled at all at the moment.  I'll
> review properly in a little while.

I wonder if this is something that could be generalised and pushed up
into the fpu stuff itself. We could after all cache the op and
decomposed parameters here in a generic way. The trick would be working
out how to do that without slowing down the current common case.

Is ppc unique in not persisting the inexact flag from previous
operations?

>
>
> r~

-- 
Alex Bennée

Re: [PATCH v3 4/8] m25p80: Add the mx25l25635f SFPD table

On [2022 Jul 22] Fri 08:35:58, Cédric Le Goater wrote:
> The mx25l25635e and mx25l25635f chips have the same JEDEC id but the
> mx25l25635f has more capabilities reported in the SFDP table. Support
> for 4B opcodes is of interest because it is exploited by the Linux
> kernel.
> 
> The SFDP table size is 0x200 bytes long. The mandatory table for basic
> features is available at byte 0x30 and an extra Macronix specific
> table is available at 0x60.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/block/m25p80_sfdp.h |  1 +
>  hw/block/m25p80.c  |  2 ++
>  hw/block/m25p80_sfdp.c | 68 ++
>  3 files changed, 71 insertions(+)
> 
> diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h
> index 0c46e669b335..87690a173c78 100644
> --- a/hw/block/m25p80_sfdp.h
> +++ b/hw/block/m25p80_sfdp.h
> @@ -18,6 +18,7 @@
>  extern uint8_t m25p80_sfdp_n25q256a(uint32_t addr);
>  
>  extern uint8_t m25p80_sfdp_mx25l25635e(uint32_t addr);
> +extern uint8_t m25p80_sfdp_mx25l25635f(uint32_t addr);
(optional -extern above)

>  
>  
>  #endif
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index 028b026d8ba2..6b120ce65212 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -234,6 +234,8 @@ static const FlashPartInfo known_devices[] = {
>  { INFO("mx25l12855e", 0xc22618,  0,  64 << 10, 256, 0) },
>  { INFO6("mx25l25635e", 0xc22019, 0xc22019,  64 << 10, 512, 0),
>.sfdp_read = m25p80_sfdp_mx25l25635e },
> +{ INFO6("mx25l25635f", 0xc22019, 0xc22019,  64 << 10, 512, 0),

I think I'm not seeing the extended id part in the datasheet I've found so
might be that you can switch to just INFO and _ext_id 0 above (might be the
same in the previous patch with the similar flash). Otherwise looks good to
me:

Reviewed-by: Francisco Iglesias 


> +  .sfdp_read = m25p80_sfdp_mx25l25635f },
>  { INFO("mx25l25655e", 0xc22619,  0,  64 << 10, 512, 0) },
>  { INFO("mx66l51235f", 0xc2201a,  0,  64 << 10, 1024, ER_4K | ER_32K) 
> },
>  { INFO("mx66u51235f", 0xc2253a,  0,  64 << 10, 1024, ER_4K | ER_32K) 
> },
> diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c
> index 6499c4c39954..70c13aea7c63 100644
> --- a/hw/block/m25p80_sfdp.c
> +++ b/hw/block/m25p80_sfdp.c
> @@ -82,3 +82,71 @@ static const uint8_t sfdp_mx25l25635e[] = {
>  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>  };
>  define_sfdp_read(mx25l25635e)
> +
> +static const uint8_t sfdp_mx25l25635f[] = {
> +0x53, 0x46, 0x44, 0x50, 0x00, 0x01, 0x01, 0xff,
> +0x00, 0x00, 0x01, 0x09, 0x30, 0x00, 0x00, 0xff,
> +0xc2, 0x00, 0x01, 0x04, 0x60, 0x00, 0x00, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xe5, 0x20, 0xf3, 0xff, 0xff, 0xff, 0xff, 0x0f,
> +0x44, 0xeb, 0x08, 0x6b, 0x08, 0x3b, 0x04, 0xbb,
> +0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0xff,
> +0xff, 0xff, 0x44, 0xeb, 0x0c, 0x20, 0x0f, 0x52,
> +0x10, 0xd8, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0x00, 0x36, 0x00, 0x27, 0x9d, 0xf9, 0xc0, 0x64,
> +0x85, 0xcb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xc2, 0xf5, 0x08, 0x0a,
> +0x08, 0x04, 0x03, 0x06, 0x00, 0x00, 0x07, 0x29,
> +0x17, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff,

Re: [PATCH v4] x86: add etc/phys-bits fw_cfg file

On Fri, Sep 23, 2022 at 08:23:12AM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > > Given newer processors have more than 40 and for older ones we know
> > > the possible values for the two relevant x86 vendors we could do
> > > something along the lines of:
> > >
> > >phys-bits >= 41   -> valid
> > >phys-bits == 40+ AuthenticAMD -> valid
> > >phys-bits == 36,39 + GenuineIntel -> valid
> > >everything else   -> invalid
> > >
> > > Does that look sensible to you?
> > >
> > 
> > Yes, it does! Is phys-bits == 36 the same as invalid?
> 
> 'invalid' would continue to use the current guesswork codepath for
> phys-bits.  Which will end up with phys-bits = 36 for smaller VMs, but
> it can go beyond that in VMs with alot (32G or more) of memory.  That
> logic assumes that physical machines with enough RAM for 32G+ guests
> have a physical address space > 64G.
> 
> 'phys-bits = 36' would be a hard limit.
> 
> So, it's not exactly the same but small VMs wouldn't see a difference.
> 
> take care,
>   Gerd

I dropped the patch for now.

-- 
MST

Re: [PATCH v3 2/2] hw/ide/piix: Ignore writes of hardwired PCI command register bits

On Sun, Sep 25, 2022 at 09:37:59AM +, Lev Kujawski wrote:
> One method to enable PCI bus mastering for IDE controllers, often used
> by x86 firmware, is to write 0x7 to the PCI command register.  Neither
> the PIIX3 specification nor actual hardware (a Tyan S1686D system)
> permit modification of the Memory Space Enable (MSE) bit, 1, and thus
> the command register would be left in an unspecified state without
> this patch.
> 
> * hw/ide/pci.c
>   Call post_load if provided by derived IDE controller.
> * hw/ide/piix.c
>   a) Add references to the PIIX data sheets.
>   b) Mask the MSE bit using the QEMU PCI device wmask field.
>   c) Add a post_load function to mask bits from saved machine states.
>   d) Specify post_load for both the PIIX3/4 IDE controllers.
> * include/hw/ide/pci.h
>   Switch from SIMPLE_TYPE to TYPE, explicitly create a PCIIDEClass
>   that includes the post_load function pointer.
> * tests/qtest/ide-test.c
>   Use the command_disabled field of the QPCIDevice testing model to
>   indicate that PCI_COMMAND_MEMORY is hardwired in the PIIX3/4 IDE
>   controller.
> 
> Signed-off-by: Lev Kujawski 


I guess this cna work but what I had in mind is much
simpler. Add an internal property (name starting with "x-")
enabling the buggy behaviour and set it in hw compat array.
If set - do not touch the wmask register.

post load hooks are harder to reason about.

Sorry about not being clear originally.

> ---
> (v2) Use QEMU's built-in PCI bit-masking support rather than attempting
>  to manually filter writes.  Thanks to Philippe Mathieu-Daude and
>  Michael S. Tsirkin for review and the pointer.
> (v3) Handle migration of older machine states, which may have set bits
>  masked by this patch, via a new post_load method of PCIIDEClass.
>  Thanks to Michael S. Tsirkin for catching this via review.
> 
>  hw/ide/pci.c   |  5 +
>  hw/ide/piix.c  | 39 +++
>  include/hw/ide/pci.h   |  7 ++-
>  tests/qtest/ide-test.c |  1 +
>  4 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index 84ba733548..e42c7b9415 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -447,6 +447,7 @@ static const VMStateDescription vmstate_bmdma = {
>  
>  static int ide_pci_post_load(void *opaque, int version_id)
>  {
> +PCIIDEClass *dc = PCI_IDE_GET_CLASS(opaque);
>  PCIIDEState *d = opaque;
>  int i;
>  
> @@ -457,6 +458,10 @@ static int ide_pci_post_load(void *opaque, int 
> version_id)
>  ide_bmdma_post_load(>bmdma[i], -1);
>  }
>  
> +if (dc->post_load) {
> +dc->post_load(d, version_id);
> +}
> +
>  return 0;
>  }
>  
> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> index 9a9b28078e..fd55ecbd36 100644
> --- a/hw/ide/piix.c
> +++ b/hw/ide/piix.c
> @@ -21,6 +21,12 @@
>   * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
>   * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>   * THE SOFTWARE.
> + *
> + * References:
> + *  [1] 82371FB (PIIX) AND 82371SB (PIIX3) PCI ISA IDE XCELERATOR,
> + *  290550-002, Intel Corporation, April 1997.
> + *  [2] 82371AB PCI-TO-ISA / IDE XCELERATOR (PIIX4), 290562-001,
> + *  Intel Corporation, April 1997.
>   */
>  
>  #include "qemu/osdep.h"
> @@ -159,6 +165,19 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
> **errp)
>  uint8_t *pci_conf = dev->config;
>  int rc;
>  
> +/*
> + * Mask all IDE PCI command register bits except for Bus Master
> + * Function Enable (bit 2) and I/O Space Enable (bit 0), as the
> + * remainder are hardwired to 0 [1, p.48] [2, p.89-90].
> + *
> + * NOTE: According to the PIIX3 datasheet [1], the Memory Space
> + * Enable (MSE, bit 1) is hardwired to 1, but this is contradicted
> + * by actual PIIX3 hardware, the datasheet itself (viz., Default
> + * Value: h), and the PIIX4 datasheet [2].
> + */
> +pci_set_word(dev->wmask + PCI_COMMAND,
> + PCI_COMMAND_MASTER | PCI_COMMAND_IO);
> +
>  pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
>  
>  bmdma_setup_bar(d);
> @@ -184,11 +203,28 @@ static void pci_piix_ide_exitfn(PCIDevice *dev)
>  }
>  }
>  
> +static int pci_piix_ide_post_load(PCIIDEState *s, int version_id)
> +{
> +PCIDevice *dev = PCI_DEVICE(s);
> +uint8_t *pci_conf = dev->config;
> +
> +/*
> + * To preserve backward compatibility, handle saved machine states
> + * with reserved bits set (see comment in pci_piix_ide_realize()).
> + */
> +pci_set_word(pci_conf + PCI_COMMAND,
> + pci_get_word(pci_conf + PCI_COMMAND) &
> + (PCI_COMMAND_MASTER | PCI_COMMAND_IO));
> +
> +return 0;
> +}
> +
>  /* NOTE: for the PIIX3, the IRQs and IOports are hardcoded */
>  static void piix3_ide_class_init(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
>

Re: [PATCH v2 01/11] block.c: assert bs->aio_context is written under BQL and drains

Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> Also here ->aio_context is read by I/O threads and written
> under BQL.
> 
> Reviewed-by: Hanna Reitz 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf

Re: [PATCH 1/4] hw/acpi/aml-build: Only generate cluster node in PPTT when specified

On Thu, Sep 22, 2022 at 09:11:40PM +0800, Yicong Yang wrote:
> From: Yicong Yang 
> 
> Currently we'll always generate a cluster node no matter user has
> specified '-smp clusters=X' or not. Cluster is an optional level
> and it's unncessary to build it if user don't need. So only generate
> it when user specify explicitly.
> 
> Also update the test ACPI tables.
> 
> Signed-off-by: Yicong Yang 

This is an example of a commit log repeating what the patch does.
Which is ok but the important thing is to explain the motivation -
why is it a bug to generate a cluster node without '-smp clusters'?


> ---
>  hw/acpi/aml-build.c   | 2 +-
>  hw/core/machine-smp.c | 3 +++
>  include/hw/boards.h   | 2 ++
>  3 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index e6bfac95c7..aab73af66d 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -2030,7 +2030,7 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, 
> MachineState *ms,
>  0, socket_id, NULL, 0);
>  }
>  
> -if (mc->smp_props.clusters_supported) {
> +if (mc->smp_props.clusters_supported && ms->smp.build_cluster) {
>  if (cpus->cpus[n].props.cluster_id != cluster_id) {
>  assert(cpus->cpus[n].props.cluster_id > cluster_id);
>  cluster_id = cpus->cpus[n].props.cluster_id;
> diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
> index b39ed21e65..5d37e8d07a 100644
> --- a/hw/core/machine-smp.c
> +++ b/hw/core/machine-smp.c
> @@ -158,6 +158,9 @@ void machine_parse_smp_config(MachineState *ms,
>  ms->smp.threads = threads;
>  ms->smp.max_cpus = maxcpus;
>  
> +if (config->has_clusters)
> +ms->smp.build_cluster = true;
> +
>  /* sanity-check of the computed topology */
>  if (sockets * dies * clusters * cores * threads != maxcpus) {
>  g_autofree char *topo_msg = cpu_hierarchy_to_string(ms);
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 7b416c9787..24aafc213d 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -305,6 +305,7 @@ typedef struct DeviceMemoryState {
>   * @cores: the number of cores in one cluster
>   * @threads: the number of threads in one core
>   * @max_cpus: the maximum number of logical processors on the machine
> + * @build_cluster: build cluster topology or not
>   */
>  typedef struct CpuTopology {
>  unsigned int cpus;
> @@ -314,6 +315,7 @@ typedef struct CpuTopology {
>  unsigned int cores;
>  unsigned int threads;
>  unsigned int max_cpus;
> +bool build_cluster;
>  } CpuTopology;
>  
>  /**
> -- 
> 2.24.0

Re: [PATCH v3 8/8] arm/aspeed: Replace mx25l25635e chip model

On [2022 Jul 22] Fri 08:36:02, Cédric Le Goater wrote:
> A mx25l25635f chip model is generally found on these machines. It's
> newer and uses 4B opcodes which is better to exercise the support in
> the Linux kernel.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: Francisco Iglesias 

> ---
>  hw/arm/aspeed.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 1c611284819d..7e95abc55b09 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -1157,7 +1157,7 @@ static void 
> aspeed_machine_palmetto_class_init(ObjectClass *oc, void *data)
>  amc->soc_name  = "ast2400-a1";
>  amc->hw_strap1 = PALMETTO_BMC_HW_STRAP1;
>  amc->fmc_model = "n25q256a";
> -amc->spi_model = "mx25l25635e";
> +amc->spi_model = "mx25l25635f";
>  amc->num_cs= 1;
>  amc->i2c_init  = palmetto_bmc_i2c_init;
>  mc->default_ram_size   = 256 * MiB;
> @@ -1208,7 +1208,7 @@ static void 
> aspeed_machine_ast2500_evb_class_init(ObjectClass *oc, void *data)
>  amc->soc_name  = "ast2500-a1";
>  amc->hw_strap1 = AST2500_EVB_HW_STRAP1;
>  amc->fmc_model = "mx25l25635e";
> -amc->spi_model = "mx25l25635e";
> +amc->spi_model = "mx25l25635f";
>  amc->num_cs= 1;
>  amc->i2c_init  = ast2500_evb_i2c_init;
>  mc->default_ram_size   = 512 * MiB;
> @@ -1258,7 +1258,7 @@ static void 
> aspeed_machine_witherspoon_class_init(ObjectClass *oc, void *data)
>  mc->desc   = "OpenPOWER Witherspoon BMC (ARM1176)";
>  amc->soc_name  = "ast2500-a1";
>  amc->hw_strap1 = WITHERSPOON_BMC_HW_STRAP1;
> -amc->fmc_model = "mx25l25635e";
> +amc->fmc_model = "mx25l25635f";
>  amc->spi_model = "mx66l1g45g";
>  amc->num_cs= 2;
>  amc->i2c_init  = witherspoon_bmc_i2c_init;
> -- 
> 2.35.3
>

Re: [PATCH v3 3/8] m25p80: Add the mx25l25635e SFPD table

Hi Cedric,

On [2022 Jul 22] Fri 08:35:57, Cédric Le Goater wrote:
> The SFDP table is 0x80 bytes long. The mandatory table for basic
> features is available at byte 0x30 and an extra Macronix specific
> table is available at 0x60.
> 
> 4B opcodes are not supported.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/block/m25p80_sfdp.h |  3 +++
>  hw/block/m25p80.c  |  3 ++-
>  hw/block/m25p80_sfdp.c | 26 ++
>  3 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h
> index d3a0a778ae84..0c46e669b335 100644
> --- a/hw/block/m25p80_sfdp.h
> +++ b/hw/block/m25p80_sfdp.h
> @@ -17,4 +17,7 @@
>  
>  extern uint8_t m25p80_sfdp_n25q256a(uint32_t addr);
>  
> +extern uint8_t m25p80_sfdp_mx25l25635e(uint32_t addr);

We could be without 'extern' in above hdr if we like (also the other patches),
either way:

Reviewed-by: Francisco Iglesias 

> +
> +
>  #endif
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index 13e7b28fd2b0..028b026d8ba2 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -232,7 +232,8 @@ static const FlashPartInfo known_devices[] = {
>  { INFO("mx25l6405d",  0xc22017,  0,  64 << 10, 128, 0) },
>  { INFO("mx25l12805d", 0xc22018,  0,  64 << 10, 256, 0) },
>  { INFO("mx25l12855e", 0xc22618,  0,  64 << 10, 256, 0) },
> -{ INFO6("mx25l25635e", 0xc22019, 0xc22019,  64 << 10, 512, 0) },
> +{ INFO6("mx25l25635e", 0xc22019, 0xc22019,  64 << 10, 512, 0),
> +  .sfdp_read = m25p80_sfdp_mx25l25635e },
>  { INFO("mx25l25655e", 0xc22619,  0,  64 << 10, 512, 0) },
>  { INFO("mx66l51235f", 0xc2201a,  0,  64 << 10, 1024, ER_4K | ER_32K) 
> },
>  { INFO("mx66u51235f", 0xc2253a,  0,  64 << 10, 1024, ER_4K | ER_32K) 
> },
> diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c
> index 24ec05de79a1..6499c4c39954 100644
> --- a/hw/block/m25p80_sfdp.c
> +++ b/hw/block/m25p80_sfdp.c
> @@ -56,3 +56,29 @@ static const uint8_t sfdp_n25q256a[] = {
>  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>  };
>  define_sfdp_read(n25q256a);
> +
> +
> +/*
> + * Matronix
> + */
> +
> +/* mx25l25635e. No 4B opcodes */
> +static const uint8_t sfdp_mx25l25635e[] = {
> +0x53, 0x46, 0x44, 0x50, 0x00, 0x01, 0x01, 0xff,
> +0x00, 0x00, 0x01, 0x09, 0x30, 0x00, 0x00, 0xff,
> +0xc2, 0x00, 0x01, 0x04, 0x60, 0x00, 0x00, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xe5, 0x20, 0xf3, 0xff, 0xff, 0xff, 0xff, 0x0f,
> +0x44, 0xeb, 0x08, 0x6b, 0x08, 0x3b, 0x04, 0xbb,
> +0xee, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0xff,
> +0xff, 0xff, 0x00, 0xff, 0x0c, 0x20, 0x0f, 0x52,
> +0x10, 0xd8, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0x00, 0x36, 0x00, 0x27, 0xf7, 0x4f, 0xff, 0xff,
> +0xd9, 0xc8, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +};
> +define_sfdp_read(mx25l25635e)
> -- 
> 2.35.3
>

Re: [PATCH v3 01/42] target/arm: Split s2walk_secure from ipa_secure in get_phys_addr

On Thu, 6 Oct 2022 at 21:58, Richard Henderson
 wrote:
>
> On 10/6/22 11:55, Peter Maydell wrote:
> > On Thu, 6 Oct 2022 at 19:20, Richard Henderson
> >  wrote:
> >>
> >> On 10/6/22 08:22, Peter Maydell wrote:
> >>> Yeah, cleared-at-start is fine. But here we're also relying on
> >>> the stage 2 call to get_phys_addr_lpae() not setting it to 1,
> >>> because we pass that the same 'result' pointer, not a fresh one.
> >>
> >> I clear it first: that patch is already merged:
> >>
> >>   memset(result, 0, sizeof(*result));
> >>   ret = get_phys_addr_lpae(env, ipa, access_type, s2_mmu_idx,
> >>is_el0, result, fi);
> >
> > Yes, but that doesn't help if this ^^^ get_phys_addr_lpae()
> > call sets result->attrs.secure = true.
>
> Ok, sure, let's make the write to .secure be unconditional.
> I've split this out into a new patch 2 for clarity.

If you can send that extra patch out, I can take it plus
1..20 from this series into target-arm.next, so your next revision
of this series can be smaller.

thanks
-- PMM

Re: [PATCH v2] hw/arm/aspeed: increase Bletchley memory size

2022-10-07 Thread Cédric Le Goater


On 10/7/22 13:05, Patrick Williams wrote:

For the PVT-class hardware we have increased the memory size of
this device to 2 GiB.  Adjust the device model accordingly.

Signed-off-by: Patrick Williams 


Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/arm/aspeed.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 7d2162c6ed..f8bc6d4a14 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -1330,6 +1330,13 @@ static void aspeed_machine_fuji_class_init(ObjectClass 
*oc, void *data)
  aspeed_soc_num_cpus(amc->soc_name);
  };
  
+/* On 32-bit hosts, lower RAM to 1G because of the 2047 MB limit */

+#if HOST_LONG_BITS == 32
+#define BLETCHLEY_BMC_RAM_SIZE (1 * GiB)
+#else
+#define BLETCHLEY_BMC_RAM_SIZE (2 * GiB)
+#endif
+
  static void aspeed_machine_bletchley_class_init(ObjectClass *oc, void *data)
  {
  MachineClass *mc = MACHINE_CLASS(oc);
@@ -1344,7 +1351,7 @@ static void 
aspeed_machine_bletchley_class_init(ObjectClass *oc, void *data)
  amc->num_cs= 2;
  amc->macs_mask = ASPEED_MAC2_ON;
  amc->i2c_init  = bletchley_bmc_i2c_init;
-mc->default_ram_size = 512 * MiB;
+mc->default_ram_size = BLETCHLEY_BMC_RAM_SIZE;
  mc->default_cpus = mc->min_cpus = mc->max_cpus =
  aspeed_soc_num_cpus(amc->soc_name);
  }

Re: [PATCH 0/4] Only generate cluster node in PPTT when specified

On Thu, Sep 22, 2022 at 09:11:39PM +0800, Yicong Yang wrote:
> From: Yicong Yang 
> 
> This series mainly change the policy for building a cluster topology node
> in PPTT. Previously we'll always build a cluster node in PPTT without
> asking the user, after this set the cluster node will be built only the
> the user specify through "-smp clusters=X".
> 
> Update the tests and test tables accordingly.

Given it's virt machine only, I'd like an ack from relevant maintainers.


> Yicong Yang (4):
>   hw/acpi/aml-build: Only generate cluster node in PPTT when specified
>   tests: virt: update expected ACPI tables for virt test
>   tests: acpi: aarch64: add topology test for aarch64
>   tests: acpi: aarch64: add *.topology tables
> 
>  hw/acpi/aml-build.c|   2 +-
>  hw/core/machine-smp.c  |   3 +++
>  include/hw/boards.h|   2 ++
>  tests/data/acpi/virt/APIC.pxb  | Bin 0 -> 168 bytes
>  tests/data/acpi/virt/APIC.topology | Bin 0 -> 700 bytes
>  tests/data/acpi/virt/DBG2.memhp| Bin 0 -> 87 bytes
>  tests/data/acpi/virt/DBG2.numamem  | Bin 0 -> 87 bytes
>  tests/data/acpi/virt/DBG2.pxb  | Bin 0 -> 87 bytes
>  tests/data/acpi/virt/DBG2.topology | Bin 0 -> 87 bytes
>  tests/data/acpi/virt/DSDT.topology | Bin 0 -> 5398 bytes
>  tests/data/acpi/virt/FACP.pxb  | Bin 0 -> 268 bytes
>  tests/data/acpi/virt/FACP.topology | Bin 0 -> 268 bytes
>  tests/data/acpi/virt/GTDT.pxb  | Bin 0 -> 96 bytes
>  tests/data/acpi/virt/GTDT.topology | Bin 0 -> 96 bytes
>  tests/data/acpi/virt/IORT.topology | Bin 0 -> 128 bytes
>  tests/data/acpi/virt/MCFG.pxb  | Bin 0 -> 60 bytes
>  tests/data/acpi/virt/MCFG.topology | Bin 0 -> 60 bytes
>  tests/data/acpi/virt/PPTT  | Bin 96 -> 76 bytes
>  tests/data/acpi/virt/PPTT.memhp| Bin 0 -> 76 bytes
>  tests/data/acpi/virt/PPTT.numamem  | Bin 0 -> 76 bytes
>  tests/data/acpi/virt/PPTT.pxb  | Bin 0 -> 76 bytes
>  tests/data/acpi/virt/PPTT.topology | Bin 0 -> 336 bytes
>  tests/data/acpi/virt/SPCR.pxb  | Bin 0 -> 80 bytes
>  tests/data/acpi/virt/SPCR.topology | Bin 0 -> 80 bytes
>  tests/qtest/bios-tables-test.c |  22 ++
>  25 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 tests/data/acpi/virt/APIC.pxb
>  create mode 100644 tests/data/acpi/virt/APIC.topology
>  create mode 100644 tests/data/acpi/virt/DBG2.memhp
>  create mode 100644 tests/data/acpi/virt/DBG2.numamem
>  create mode 100644 tests/data/acpi/virt/DBG2.pxb
>  create mode 100644 tests/data/acpi/virt/DBG2.topology
>  create mode 100644 tests/data/acpi/virt/DSDT.topology
>  create mode 100644 tests/data/acpi/virt/FACP.pxb
>  create mode 100644 tests/data/acpi/virt/FACP.topology
>  create mode 100644 tests/data/acpi/virt/GTDT.pxb
>  create mode 100644 tests/data/acpi/virt/GTDT.topology
>  create mode 100644 tests/data/acpi/virt/IORT.topology
>  create mode 100644 tests/data/acpi/virt/MCFG.pxb
>  create mode 100644 tests/data/acpi/virt/MCFG.topology
>  create mode 100644 tests/data/acpi/virt/PPTT.memhp
>  create mode 100644 tests/data/acpi/virt/PPTT.numamem
>  create mode 100644 tests/data/acpi/virt/PPTT.pxb
>  create mode 100644 tests/data/acpi/virt/PPTT.topology
>  create mode 100644 tests/data/acpi/virt/SPCR.pxb
>  create mode 100644 tests/data/acpi/virt/SPCR.topology
> 
> -- 
> 2.24.0

Re: [PATCH v4 5/5] test/acpi/bios-tables-test: SSDT: update golden master binaries

2022-10-07 Thread Robert Hoo

Ping...
On Tue, 2022-09-27 at 08:30 +0800, Robert Hoo wrote:
> On Mon, 2022-09-26 at 15:22 +0200, Igor Mammedov wrote:
> > > > 0800200c9a66"), One, 0x05, Local0, One)
> > > > +CreateDWordField (Local3, Zero, STTS)
> > > > +CreateField (Local3, 0x20, (LEN << 0x03),
> > > > LDAT)
> > > > +Name (LSA, Buffer (Zero){})
> > > > +ToBuffer (LDAT, LSA) /*
> > > > \_SB_.NVDR.NV00._LSR.LSA_ */
> > > > +Local1 = Package (0x02)
> > > > +{
> > > > +STTS,
> > > > +LSA
> > > > +}  
> > > 
> > > Hi Igor,
> > > 
> > > Here is a little different from original proposal 
> > > https://lore.kernel.org/qemu-devel/80b09055416c790922c7c3db60d2ba865792d1b0.ca...@linux.intel.com/
> > > 
> > >Local1 = Package (0x2) {STTS, toBuffer(LDAT)}
> > > 
> > > Because in my test, Linux guest complains:
> > > 
> > > [3.884656] ACPI Error: AE_SUPPORT, Expressions within package
> > > elements are not supported (20220331/dspkginit-172)
> > > [3.887104] ACPI Error: Aborting method \_SB.NVDR.NV00._LSR
> > > due
> > > to
> > > previous error (AE_SUPPORT) (20220331/psparse-531)
> > > 
> > > 
> > > So I have to move toBuffer() out of Package{} and name LSA to
> > > hold
> > > the
> > > buffer. If you have better idea, pls. let me know.
> > 
> > Would something like following work?
> > 
> > LocalX =  Buffer (Zero){}
> > LocalY = Package (0x01) { LocalX }
> 
> 
> No, Package{} doesn't accept LocalX as elements.
> 
> PackageTerm :=
> Package (
> NumElements // Nothing | ByteConstExpr | TermArg => Integer
> ) {PackageList} => Package
> 
> PackageList :=
> Nothing | 
> 
> PackageElement :=
> DataObject | NameString

Re: [External] : Re: [RFC PATCH 2/4] acpi: fadt: support revision 6.0 of the ACPI specification

Hi Ani,

> On 7 Oct 2022, at 04:25, Ani Sinha  wrote:
> 
> 
> 
> On Thu, 6 Oct 2022, Miguel Luis wrote:
> 
>> Update the Fixed ACPI Description Table (FADT) to revision 6.0 of the ACPI
>> specification adding the field "Hypervisor Vendor Identity" that was missing.
>> 
>> This field's description states the following: "64-bit identifier of 
>> hypervisor
>> vendor. All bytes in this field are considered part of the vendor identity.
>> These identifiers are defined independently by the vendors themselves,
>> usually following the name of the hypervisor product. Version information
>> should NOT be included in this field - this shall simply denote the vendor's
>> name or identifier. Version information can be communicated through a
>> supplemental vendor-specific hypervisor API. Firmware implementers would
>> place zero bytes into this field, denoting that no hypervisor is present in
>> the actual firmware."
>> 
>> Hereupon, what should a valid identifier of an Hypervisor Vendor ID be and
>> where should QEMU provide that information?
>> 
>> On this RFC there's the suggestion of having this information in sync by the
>> current acceleration name. This also seems to imply that QEMU, which 
>> generates
>> the FADT table, and the FADT consumer need to be in sync with the values of 
>> this
>> field.
>> 
>> Signed-off-by: Miguel Luis 
>> ---
>> hw/acpi/aml-build.c  | 14 +++---
>> hw/arm/virt-acpi-build.c | 10 +-
>> 2 files changed, 16 insertions(+), 8 deletions(-)
>> 
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index e6bfac95c7..5258c4ac64 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -31,6 +31,7 @@
>> #include "hw/pci/pci_bus.h"
>> #include "hw/pci/pci_bridge.h"
>> #include "qemu/cutils.h"
>> +#include "qemu/accel.h"
>> 
>> static GArray *build_alloc_array(void)
>> {
>> @@ -2070,7 +2071,7 @@ void build_pptt(GArray *table_data, BIOSLinker 
>> *linker, MachineState *ms,
>> acpi_table_end(linker, );
>> }
>> 
>> -/* build rev1/rev3/rev5.1 FADT */
>> +/* build rev1/rev3/rev5.1/rev6.0 FADT */
>> void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
>> const char *oem_id, const char *oem_table_id)
>> {
>> @@ -2193,8 +2194,15 @@ void build_fadt(GArray *tbl, BIOSLinker *linker, 
>> const AcpiFadtData *f,
>> /* SLEEP_STATUS_REG */
>> build_append_gas_from_struct(tbl, >sleep_sts);
>> 
>> -/* TODO: extra fields need to be added to support revisions above rev5 
>> */
>> -assert(f->rev == 5);
>> +if (f->rev <= 5) {
> 
> <= does not make sense. It should be f->rev == 5.
> The previous code compares f->rev <= 4 since it needs to cover revisions
> 2, 3 and 4.
> 

Indeed, that’s right. I will fix.

>> +goto done;
>> +}
>> +
>> +/* Hypervisor Vendor Identity */
>> +build_append_padded_str(tbl, current_accel_name(), 8, '\0');
> 
> I do not think the vendor identity should change based on the accelerator.
> The accelerator QEMU uses should be hidden from the guest OS as far as
> possible. I would suggest a string like "QEMU" for vendor ID.
> 

Thank you for the suggestion Ani. I will spin the next version with it.

Thanks,
Miguel

>> +
>> +/* TODO: extra fields need to be added to support revisions above rev6 
>> */
>> +assert(f->rev == 6);
>> 
>> done:
>> acpi_table_end(linker, );
>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>> index 9b3aee01bf..72bb6f61a5 100644
>> --- a/hw/arm/virt-acpi-build.c
>> +++ b/hw/arm/virt-acpi-build.c
>> @@ -809,13 +809,13 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
>> VirtMachineState *vms)
>> }
>> 
>> /* FADT */
>> -static void build_fadt_rev5(GArray *table_data, BIOSLinker *linker,
>> +static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
>> VirtMachineState *vms, unsigned dsdt_tbl_offset)
>> {
>> -/* ACPI v5.1 */
>> +/* ACPI v6.0 */
>> AcpiFadtData fadt = {
>> -.rev = 5,
>> -.minor_ver = 1,
>> +.rev = 6,
>> +.minor_ver = 0,
>> .flags = 1 << ACPI_FADT_F_HW_REDUCED_ACPI,
>> .xdsdt_tbl_offset = _tbl_offset,
>> };
>> @@ -945,7 +945,7 @@ void virt_acpi_build(VirtMachineState *vms, 
>> AcpiBuildTables *tables)
>> 
>> /* FADT MADT PPTT GTDT MCFG SPCR DBG2 pointed to by RSDT */
>> acpi_add_table(table_offsets, tables_blob);
>> -build_fadt_rev5(tables_blob, tables->linker, vms, dsdt);
>> +build_fadt_rev6(tables_blob, tables->linker, vms, dsdt);
>> 
>> acpi_add_table(table_offsets, tables_blob);
>> build_madt(tables_blob, tables->linker, vms);
>> --
>> 2.37.3
>> 
>>

Re: ublk-qcow2: ublk-qcow2 is available

2022-10-07 Thread Ming Lei

On Fri, Oct 07, 2022 at 07:21:51PM +0800, Yongji Xie wrote:
> On Fri, Oct 7, 2022 at 6:51 PM Ming Lei  wrote:
> >
> > On Fri, Oct 07, 2022 at 06:04:29PM +0800, Yongji Xie wrote:
> > > On Thu, Oct 6, 2022 at 7:24 PM Ming Lei  wrote:
> > > >
> > > > On Wed, Oct 05, 2022 at 08:21:45AM -0400, Stefan Hajnoczi wrote:
> > > > > On Wed, 5 Oct 2022 at 00:19, Ming Lei  wrote:
> > > > > >
> > > > > > On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote:
> > > > > > > On Tue, 4 Oct 2022 at 05:44, Ming Lei  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Mon, Oct 03, 2022 at 03:53:41PM -0400, Stefan Hajnoczi wrote:
> > > > > > > > > On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote:
> > > > > > > > > > ublk-qcow2 is available now.
> > > > > > > > >
> > > > > > > > > Cool, thanks for sharing!
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > So far it provides basic read/write function, and 
> > > > > > > > > > compression and snapshot
> > > > > > > > > > aren't supported yet. The target/backend implementation is 
> > > > > > > > > > completely
> > > > > > > > > > based on io_uring, and share the same io_uring with ublk IO 
> > > > > > > > > > command
> > > > > > > > > > handler, just like what ublk-loop does.
> > > > > > > > > >
> > > > > > > > > > Follows the main motivations of ublk-qcow2:
> > > > > > > > > >
> > > > > > > > > > - building one complicated target from scratch helps 
> > > > > > > > > > libublksrv APIs/functions
> > > > > > > > > >   become mature/stable more quickly, since qcow2 is 
> > > > > > > > > > complicated and needs more
> > > > > > > > > >   requirement from libublksrv compared with other simple 
> > > > > > > > > > ones(loop, null)
> > > > > > > > > >
> > > > > > > > > > - there are several attempts of implementing qcow2 driver 
> > > > > > > > > > in kernel, such as
> > > > > > > > > >   ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel 
> > > > > > > > > > qcow2(ro)`` [4], so ublk-qcow2
> > > > > > > > > >   might useful be for covering requirement in this field
> > > > > > > > > >
> > > > > > > > > > - performance comparison with qemu-nbd, and it was my 1st 
> > > > > > > > > > thought to evaluate
> > > > > > > > > >   performance of ublk/io_uring backend by writing one 
> > > > > > > > > > ublk-qcow2 since ublksrv
> > > > > > > > > >   is started
> > > > > > > > > >
> > > > > > > > > > - help to abstract common building block or design pattern 
> > > > > > > > > > for writing new ublk
> > > > > > > > > >   target/backend
> > > > > > > > > >
> > > > > > > > > > So far it basically passes xfstest(XFS) test by using 
> > > > > > > > > > ublk-qcow2 block
> > > > > > > > > > device as TEST_DEV, and kernel building workload is 
> > > > > > > > > > verified too. Also
> > > > > > > > > > soft update approach is applied in meta flushing, and meta 
> > > > > > > > > > data
> > > > > > > > > > integrity is guaranteed, 'make test T=qcow2/040' covers 
> > > > > > > > > > this kind of
> > > > > > > > > > test, and only cluster leak is reported during this test.
> > > > > > > > > >
> > > > > > > > > > The performance data looks much better compared with 
> > > > > > > > > > qemu-nbd, see
> > > > > > > > > > details in commit log[1], README[5] and STATUS[6]. And the 
> > > > > > > > > > test covers both
> > > > > > > > > > empty image and pre-allocated image, for example of 
> > > > > > > > > > pre-allocated qcow2
> > > > > > > > > > image(8GB):
> > > > > > > > > >
> > > > > > > > > > - qemu-nbd (make test T=qcow2/002)
> > > > > > > > >
> > > > > > > > > Single queue?
> > > > > > > >
> > > > > > > > Yeah.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > randwrite(4k): jobs 1, iops 24605
> > > > > > > > > > randread(4k): jobs 1, iops 30938
> > > > > > > > > > randrw(4k): jobs 1, iops read 13981 write 14001
> > > > > > > > > > rw(512k): jobs 1, iops read 724 write 728
> > > > > > > > >
> > > > > > > > > Please try qemu-storage-daemon's VDUSE export type as well. 
> > > > > > > > > The
> > > > > > > > > command-line should be similar to this:
> > > > > > > > >
> > > > > > > > >   # modprobe virtio_vdpa # attaches vDPA devices to host 
> > > > > > > > > kernel
> > > > > > > >
> > > > > > > > Not found virtio_vdpa module even though I enabled all the 
> > > > > > > > following
> > > > > > > > options:
> > > > > > > >
> > > > > > > > --- vDPA drivers
> > > > > > > >  vDPA device simulator core
> > > > > > > >vDPA simulator for networking device
> > > > > > > >vDPA simulator for block device
> > > > > > > >  VDUSE (vDPA Device in Userspace) support
> > > > > > > >  Intel IFC VF vDPA driver
> > > > > > > >  Virtio PCI bridge vDPA driver
> > > > > > > >  vDPA driver for Alibaba ENI
> > > > > > > >
> > > > > > > > BTW, my test environment is VM and the shared data is done in 
> > > > > > > > VM too, and
> > > > > > > > can virtio_vdpa be used inside VM?
> > > >

Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory

2022-10-07 Thread Jarkko Sakkinen

On Thu, Oct 06, 2022 at 03:34:58PM +, Sean Christopherson wrote:
> On Thu, Oct 06, 2022, Jarkko Sakkinen wrote:
> > On Thu, Oct 06, 2022 at 05:58:03PM +0300, Jarkko Sakkinen wrote:
> > > On Thu, Sep 15, 2022 at 10:29:07PM +0800, Chao Peng wrote:
> > > > This new extension, indicated by the new flag KVM_MEM_PRIVATE, adds two
> > > > additional KVM memslot fields private_fd/private_offset to allow
> > > > userspace to specify that guest private memory provided from the
> > > > private_fd and guest_phys_addr mapped at the private_offset of the
> > > > private_fd, spanning a range of memory_size.
> > > > 
> > > > The extended memslot can still have the userspace_addr(hva). When use, a
> > > > single memslot can maintain both private memory through private
> > > > fd(private_fd/private_offset) and shared memory through
> > > > hva(userspace_addr). Whether the private or shared part is visible to
> > > > guest is maintained by other KVM code.
> > > 
> > > What is anyway the appeal of private_offset field, instead of having just
> > > 1:1 association between regions and files, i.e. one memfd per region?
> 
> Modifying memslots is slow, both in KVM and in QEMU (not sure about Google's 
> VMM).
> E.g. if a vCPU converts a single page, it will be forced to wait until all 
> other
> vCPUs drop SRCU, which can have severe latency spikes, e.g. if KVM is 
> faulting in
> memory.  KVM's memslot updates also hold a mutex for the entire duration of 
> the
> update, i.e. conversions on different vCPUs would be fully serialized, 
> exacerbating
> the SRCU problem.
> 
> KVM also has historical baggage where it "needs" to zap _all_ SPTEs when any
> memslot is deleted.
> 
> Taking both a private_fd and a shared userspace address allows userspace to 
> convert
> between private and shared without having to manipulate memslots.

Right, this was really good explanation, thank you.

Still wondering could this possibly work (or not):

1. Union userspace_addr and private_fd.
2. Instead of introducing private_offset, use guest_phys_addr as the
   offset.
  
BR, Jarkko

Re: [PATCH] gitmodules: recurse by default

2022-10-07 Thread Daniel P . Berrangé

On Fri, Oct 07, 2022 at 11:45:56AM +0100, Daniel P. Berrangé wrote:
> On Fri, Oct 07, 2022 at 06:11:25AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Oct 07, 2022 at 09:07:17AM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Oct 06, 2022 at 08:24:01PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Oct 06, 2022 at 07:54:52PM +0100, Daniel P. Berrangé wrote:
> > > > > On Thu, Oct 06, 2022 at 07:39:07AM -0400, Michael S. Tsirkin wrote:
> > > > > > The most commmon complaint about submodules is that
> > > > > > they don't follow when one switches branches in the
> > > > > > main repo. Enable recursing into submodules by default
> > > > > > to address that.
> > > > > > 
> > > > > > Signed-off-by: Michael S. Tsirkin 
> > > > > > ---
> > > > > >  .gitmodules | 23 +++
> > > > > >  1 file changed, 23 insertions(+)

snip

> > I just retested and it's not working for me either :(
> > I was sure it worked but I guess the testing wasn't done properly.
> > Back to the drawing board sorry.
> 
> I think the problem is that this setting doesn't apply in the context
> of .gitmodules. Various commands take a '--recurse-submodules' parameter,
> and like many params this can be set in the .git/config file. The
> problem is .git/config isn't a file we can influence automatically,
> it is upto the dev to set things for every clone they do :-(

With the correct setting in my .git/config, I've just discovered
an unexpected & undesirable consequence of using recurse=true.
It affects the 'push' command. If your submodule contains a hash
that is not present in the upstream of the submodule, then when
you try to push, it will also try to push the submodule change.

eg, I have a qemu.git branch 'work' and i made a change to
ui/keycodemapdb. If I try to push to my gitlab fork, whose
remote I called 'gitlab', then it will also try to push
ui/keycodemapdb to a fork called 'gitlab'.  Except I don't
have any such fork existing, so my attempt to push my qemu.git
changes fails because of the submodule.

This is going to be annoying to people who are working on branches
with updates to the git submodules if we were to set recurse=true
by default, as they'll have to also setup remotes for submodules
they work on.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] error handling: Use TFR() macro where applicable

2022-10-07 Thread Nikita Ivanov

Hi!
Sorry for such a long absence, I've been resolving some other issues in my
life for a while. I've adjusted the patch according to your latest
comments. Could you check it out, please?

>From 5389c5ccc8789f8f666ab99e50d38af728bd2c9c Mon Sep 17 00:00:00 2001
From: Nikita Ivanov 
Date: Wed, 3 Aug 2022 12:54:00 +0300
Subject: [PATCH 1/2] error handling: Use TFR() macro where applicable

There is a defined TFR() macro in qemu/osdep.h which
handles the same while loop.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/415

Signed-off-by: Nikita Ivanov 
---
 block/file-posix.c| 39 ++-
 chardev/char-pty.c|  4 +---
 hw/9pfs/9p-local.c|  8 ++--
 net/l2tpv3.c  | 15 +++
 net/socket.c  | 16 +++-
 net/tap.c |  8 ++--
 qga/commands-posix.c  |  4 +---
 semihosting/syscalls.c|  4 +---
 tests/qtest/libqtest.c| 14 +++---
 tests/vhost-user-bridge.c |  4 +---
 util/main-loop.c  |  4 +---
 util/osdep.c  |  4 +---
 util/vfio-helpers.c   | 12 ++--
 13 files changed, 51 insertions(+), 85 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 66fdb07820..7892bdea31 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1238,9 +1238,9 @@ static int hdev_get_max_segments(int fd, struct stat
*st)
 ret = -errno;
 goto out;
 }
-do {
-ret = read(sysfd, buf, sizeof(buf) - 1);
-} while (ret == -1 && errno == EINTR);
+TFR(
+ret = read(sysfd, buf, sizeof(buf) - 1)
+);
 if (ret < 0) {
 ret = -errno;
 goto out;
@@ -1388,9 +1388,9 @@ static int handle_aiocb_ioctl(void *opaque)
 RawPosixAIOData *aiocb = opaque;
 int ret;

-do {
-ret = ioctl(aiocb->aio_fildes, aiocb->ioctl.cmd, aiocb->ioctl.buf);
-} while (ret == -1 && errno == EINTR);
+TFR(
+ret = ioctl(aiocb->aio_fildes, aiocb->ioctl.cmd, aiocb->ioctl.buf)
+);
 if (ret == -1) {
 return -errno;
 }
@@ -1472,18 +1472,17 @@ static ssize_t
handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
 {
 ssize_t len;

-do {
-if (aiocb->aio_type & QEMU_AIO_WRITE)
-len = qemu_pwritev(aiocb->aio_fildes,
-   aiocb->io.iov,
-   aiocb->io.niov,
-   aiocb->aio_offset);
- else
-len = qemu_preadv(aiocb->aio_fildes,
-  aiocb->io.iov,
-  aiocb->io.niov,
-  aiocb->aio_offset);
-} while (len == -1 && errno == EINTR);
+TFR(
+len = (aiocb->aio_type & QEMU_AIO_WRITE) ?
+qemu_pwritev(aiocb->aio_fildes,
+   aiocb->io.iov,
+   aiocb->io.niov,
+   aiocb->aio_offset) :
+qemu_preadv(aiocb->aio_fildes,
+  aiocb->io.iov,
+  aiocb->io.niov,
+  aiocb->aio_offset)
+);

 if (len == -1) {
 return -errno;
@@ -1908,9 +1907,7 @@ static int allocate_first_block(int fd, size_t
max_size)
 buf = qemu_memalign(max_align, write_size);
 memset(buf, 0, write_size);

-do {
-n = pwrite(fd, buf, write_size, 0);
-} while (n == -1 && errno == EINTR);
+TFR(n = pwrite(fd, buf, write_size, 0));

 ret = (n == -1) ? -errno : 0;

diff --git a/chardev/char-pty.c b/chardev/char-pty.c
index 53f25c6bbd..b2f490bacf 100644
--- a/chardev/char-pty.c
+++ b/chardev/char-pty.c
@@ -93,9 +93,7 @@ static void pty_chr_update_read_handler(Chardev *chr)
 pfd.fd = fioc->fd;
 pfd.events = G_IO_OUT;
 pfd.revents = 0;
-do {
-rc = g_poll(, 1, 0);
-} while (rc == -1 && errno == EINTR);
+TFR(rc = g_poll(, 1, 0));
 assert(rc >= 0);

 if (pfd.revents & G_IO_HUP) {
diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index d42ce6d8b8..c90ab947ba 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -470,9 +470,7 @@ static ssize_t local_readlink(FsContext *fs_ctx,
V9fsPath *fs_path,
 if (fd == -1) {
 return -1;
 }
-do {
-tsize = read(fd, (void *)buf, bufsz);
-} while (tsize == -1 && errno == EINTR);
+TFR(tsize = read(fd, (void *)buf, bufsz));
 close_preserve_errno(fd);
 } else if ((fs_ctx->export_flags & V9FS_SM_PASSTHROUGH) ||
(fs_ctx->export_flags & V9FS_SM_NONE)) {
@@ -908,9 +906,7 @@ static int local_symlink(FsContext *fs_ctx, const char
*oldpath,
 }
 /* Write the oldpath (target) to the file. */
 oldpath_size = strlen(oldpath);
-do {
-write_size = write(fd, (void *)oldpath, oldpath_size);
-} while (write_size == -1 && errno == EINTR);
+TFR(write_size = write(fd, (void *)oldpath, oldpath_size));

Re: [PATCH v6 09/13] block: add BlockRAMRegistrar

2022-10-07 Thread Stefano Garzarella


On Thu, Oct 06, 2022 at 05:35:03PM -0400, Stefan Hajnoczi wrote:

Emulated devices and other BlockBackend users wishing to take advantage
of blk_register_buf() all have the same repetitive job: register
RAMBlocks with the BlockBackend using RAMBlockNotifier.

Add a BlockRAMRegistrar API to do this. A later commit will use this
from hw/block/virtio-blk.c.

Signed-off-by: Stefan Hajnoczi 
---
MAINTAINERS  |  1 +
include/sysemu/block-ram-registrar.h | 37 ++
block/block-ram-registrar.c  | 58 
block/meson.build|  1 +
4 files changed, 97 insertions(+)
create mode 100644 include/sysemu/block-ram-registrar.h
create mode 100644 block/block-ram-registrar.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 0dcae6168a..91aed2cdc7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2498,6 +2498,7 @@ F: block*
F: block/
F: hw/block/
F: include/block/
+F: include/sysemu/block-*.h
F: qemu-img*
F: docs/tools/qemu-img.rst
F: qemu-io*
diff --git a/include/sysemu/block-ram-registrar.h 
b/include/sysemu/block-ram-registrar.h
new file mode 100644
index 00..d8b2f7942b
--- /dev/null
+++ b/include/sysemu/block-ram-registrar.h
@@ -0,0 +1,37 @@
+/*
+ * BlockBackend RAM Registrar
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef BLOCK_RAM_REGISTRAR_H
+#define BLOCK_RAM_REGISTRAR_H
+
+#include "exec/ramlist.h"
+
+/**
+ * struct BlockRAMRegistrar:
+ *
+ * Keeps RAMBlock memory registered with a BlockBackend using
+ * blk_register_buf() including hotplugged memory.
+ *
+ * Emulated devices or other BlockBackend users initialize a BlockRAMRegistrar
+ * with blk_ram_registrar_init() before submitting I/O requests with the
+ * BDRV_REQ_REGISTERED_BUF flag set.
+ */
+typedef struct {
+BlockBackend *blk;
+RAMBlockNotifier notifier;
+bool ok;
+} BlockRAMRegistrar;
+
+void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk);
+void blk_ram_registrar_destroy(BlockRAMRegistrar *r);
+
+/* Have all RAMBlocks been registered successfully? */
+static inline bool blk_ram_registrar_ok(BlockRAMRegistrar *r)
+{
+return r->ok;
+}
+
+#endif /* BLOCK_RAM_REGISTRAR_H */
diff --git a/block/block-ram-registrar.c b/block/block-ram-registrar.c
new file mode 100644
index 00..25dbafa789
--- /dev/null
+++ b/block/block-ram-registrar.c
@@ -0,0 +1,58 @@
+/*
+ * BlockBackend RAM Registrar
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/block-ram-registrar.h"
+#include "qapi/error.h"
+
+static void ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+size_t max_size)
+{
+BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier);
+Error *err = NULL;
+
+if (!r->ok) {
+return; /* don't try again if we've already failed */
+}


The segfault I was seeing is gone, though, and I'm getting a doubt.

Here we basically just report the error and prevent new regions from 
being registered. The VM still starts though and the blkio driver works 
as if nothing happened.


For drivers that require all regions to be registered, this can cause 
problems, so should we stop the VM in case of failure or put the blkio 
driver in a state such that IOs are not submitted?


Or maybe it's okay and then the device will somehow report the error 
when it can't find the mapped region?


Thanks,
Stefano


+
+if (!blk_register_buf(r->blk, host, max_size, )) {
+error_report_err(err);
+ram_block_notifier_remove(>notifier);
+r->ok = false;
+}
+}
+
+static void ram_block_removed(RAMBlockNotifier *n, void *host, size_t size,
+  size_t max_size)
+{
+BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier);
+blk_unregister_buf(r->blk, host, max_size);
+}
+
+void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk)
+{
+r->blk = blk;
+r->notifier = (RAMBlockNotifier){
+.ram_block_added = ram_block_added,
+.ram_block_removed = ram_block_removed,
+
+/*
+ * .ram_block_resized() is not necessary because we use the max_size
+ * value that does not change across resize.
+ */
+};
+r->ok = true;
+
+ram_block_notifier_add(>notifier);
+}
+
+void blk_ram_registrar_destroy(BlockRAMRegistrar *r)
+{
+if (r->ok) {
+ram_block_notifier_remove(>notifier);
+}
+}
diff --git a/block/meson.build b/block/meson.build
index 500878f082..b7c68b83a3 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -46,6 +46,7 @@ block_ss.add(files(
), zstd, zlib, gnutls)

softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
+softmmu_ss.add(files('block-ram-registrar.c'))

if get_option('qcow1').allowed()
  block_ss.add(files('qcow.c'))
--
2.37.3

[PULL 11/50] nbd: add missing coroutine_fn annotations

From: Paolo Bonzini 

Callers of coroutine_fn must be coroutine_fn themselves, or the call
must be within "if (qemu_in_coroutine())".  Apply coroutine_fn to
functions where this holds.

Reviewed-by: Alberto Faria 
Reviewed-by: Eric Blake 
Signed-off-by: Paolo Bonzini 
Message-Id: <20220922084924.201610-11-pbonz...@redhat.com>
[kwolf: Fixed up coding style]
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block/nbd.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 97683cce27..494b9d683e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -983,11 +983,12 @@ static void nbd_iter_request_error(NBDReplyChunkIter 
*iter, int ret)
  * nbd_reply_chunk_iter_receive
  * The pointer stored in @payload requires g_free() to free it.
  */
-static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
- NBDReplyChunkIter *iter,
- uint64_t handle,
- QEMUIOVector *qiov, NBDReply *reply,
- void **payload)
+static bool coroutine_fn nbd_reply_chunk_iter_receive(BDRVNBDState *s,
+  NBDReplyChunkIter *iter,
+  uint64_t handle,
+  QEMUIOVector *qiov,
+  NBDReply *reply,
+  void **payload)
 {
 int ret, request_ret;
 NBDReply local_reply;
-- 
2.37.3

Re: [PATCH] i386: Fix KVM_CAP_ADJUST_CLOCK capability check

2022-10-07 Thread Vitaly Kuznetsov

Paolo Bonzini  writes:

> Hi, a similar patch is now in.
>

Indeed,

commit c4ef867f2949bf2a2ae18a4e27cf1a34bbc8aecb
Author: Ray Zhang 
Date:   Thu Sep 22 18:05:23 2022 +0800

target/i386/kvm: fix kvmclock_current_nsec: Assertion `time.tsc_timestamp 
<= migration_tsc' failed

solves the problem as well.

-- 
Vitaly

[PULL 48/50] blockjob: remove unused functions

From: Emanuele Giuseppe Esposito 

These public functions are not used anywhere, thus can be dropped.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Kevin Wolf 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20220926093214.506243-21-eespo...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 include/block/blockjob.h | 31 ---
 blockjob.c   | 16 ++--
 2 files changed, 14 insertions(+), 33 deletions(-)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 10c24e240a..03032b2eca 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -102,17 +102,15 @@ typedef struct BlockJob {
  */
 
 /**
- * block_job_next:
+ * block_job_next_locked:
  * @job: A block job, or %NULL.
  *
  * Get the next element from the list of block jobs after @job, or the
  * first one if @job is %NULL.
  *
  * Returns the requested job, or %NULL if there are no more jobs left.
+ * Called with job lock held.
  */
-BlockJob *block_job_next(BlockJob *job);
-
-/* Same as block_job_next(), but called with job lock held. */
 BlockJob *block_job_next_locked(BlockJob *job);
 
 /**
@@ -122,6 +120,7 @@ BlockJob *block_job_next_locked(BlockJob *job);
  * Get the block job identified by @id (which must not be %NULL).
  *
  * Returns the requested job, or %NULL if it doesn't exist.
+ * Called with job lock *not* held.
  */
 BlockJob *block_job_get(const char *id);
 
@@ -161,43 +160,37 @@ void block_job_remove_all_bdrv(BlockJob *job);
 bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs);
 
 /**
- * block_job_set_speed:
+ * block_job_set_speed_locked:
  * @job: The job to set the speed for.
  * @speed: The new value
  * @errp: Error object.
  *
  * Set a rate-limiting parameter for the job; the actual meaning may
  * vary depending on the job type.
- */
-bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
-
-/*
- * Same as block_job_set_speed(), but called with job lock held.
- * Might release the lock temporarily.
+ *
+ * Called with job lock held, but might release it temporarily.
  */
 bool block_job_set_speed_locked(BlockJob *job, int64_t speed, Error **errp);
 
 /**
- * block_job_query:
+ * block_job_query_locked:
  * @job: The job to get information about.
  *
  * Return information about a job.
+ *
+ * Called with job lock held.
  */
-BlockJobInfo *block_job_query(BlockJob *job, Error **errp);
-
-/* Same as block_job_query(), but called with job lock held. */
 BlockJobInfo *block_job_query_locked(BlockJob *job, Error **errp);
 
 /**
- * block_job_iostatus_reset:
+ * block_job_iostatus_reset_locked:
  * @job: The job whose I/O status should be reset.
  *
  * Reset I/O status on @job and on BlockDriverState objects it uses,
  * other than job->blk.
+ *
+ * Called with job lock held.
  */
-void block_job_iostatus_reset(BlockJob *job);
-
-/* Same as block_job_iostatus_reset(), but called with job lock held. */
 void block_job_iostatus_reset_locked(BlockJob *job);
 
 /*
diff --git a/blockjob.c b/blockjob.c
index 120c1b7ead..bdf20a0e35 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -56,12 +56,6 @@ BlockJob *block_job_next_locked(BlockJob *bjob)
 return job ? container_of(job, BlockJob, job) : NULL;
 }
 
-BlockJob *block_job_next(BlockJob *bjob)
-{
-JOB_LOCK_GUARD();
-return block_job_next_locked(bjob);
-}
-
 BlockJob *block_job_get_locked(const char *id)
 {
 Job *job = job_get_locked(id);
@@ -308,7 +302,7 @@ bool block_job_set_speed_locked(BlockJob *job, int64_t 
speed, Error **errp)
 return true;
 }
 
-bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
+static bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 {
 JOB_LOCK_GUARD();
 return block_job_set_speed_locked(job, speed, errp);
@@ -357,12 +351,6 @@ BlockJobInfo *block_job_query_locked(BlockJob *job, Error 
**errp)
 return info;
 }
 
-BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
-{
-JOB_LOCK_GUARD();
-return block_job_query_locked(job, errp);
-}
-
 /* Called with job lock held */
 static void block_job_iostatus_set_err_locked(BlockJob *job, int error)
 {
@@ -525,7 +513,7 @@ void block_job_iostatus_reset_locked(BlockJob *job)
 job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
 }
 
-void block_job_iostatus_reset(BlockJob *job)
+static void block_job_iostatus_reset(BlockJob *job)
 {
 JOB_LOCK_GUARD();
 block_job_iostatus_reset_locked(job);
-- 
2.37.3

Re: [PATCH] i386: Fix KVM_CAP_ADJUST_CLOCK capability check

2022-10-07 Thread Paolo Bonzini

Hi, a similar patch is now in.

Paolo

Il ven 7 ott 2022, 05:26 Vitaly Kuznetsov  ha scritto:

> Vitaly Kuznetsov  writes:
>
> > Vitaly Kuznetsov  writes:
> >
> >> KVM commit c68dc1b577ea ("KVM: x86: Report host tsc and realtime values
> in
> >> KVM_GET_CLOCK") broke migration of certain workloads, e.g. Win11 + WSL2
> >> guest reboots immediately after migration. KVM, however, is not to
> >> blame this time. When KVM_CAP_ADJUST_CLOCK capability is checked, the
> >> result is all supported flags (which the above mentioned KVM commit
> >> enhanced) but kvm_has_adjust_clock_stable() wants it to be
> >> KVM_CLOCK_TSC_STABLE precisely. The result is that 'clock_is_reliable'
> >> is not set in vmstate and the saved clock reading is discarded in
> >> kvmclock_vm_state_change().
> >>
> >> Signed-off-by: Vitaly Kuznetsov 
> >> ---
> >>  target/i386/kvm/kvm.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> >> index a1fd1f53791d..c33192a87dcb 100644
> >> --- a/target/i386/kvm/kvm.c
> >> +++ b/target/i386/kvm/kvm.c
> >> @@ -157,7 +157,7 @@ bool kvm_has_adjust_clock_stable(void)
> >>  {
> >>  int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK);
> >>
> >> -return (ret == KVM_CLOCK_TSC_STABLE);
> >> +return ret & KVM_CLOCK_TSC_STABLE;
> >>  }
> >>
> >>  bool kvm_has_adjust_clock(void)
> >
> > Ping) This issue seems to introduce major migration issues with KVM >=
> v5.16
>
> Ping)
>
> --
> Vitaly
>
>

[PULL 49/50] job: remove unused functions