date:20210622

Re: Too slow edk2 bios boot?

2021-06-22 Thread Bin Meng

Hi Laszlo,

On Wed, Jun 23, 2021 at 12:13 AM Laszlo Ersek  wrote:

> On 06/18/21 15:06, Bin Meng wrote:
> > On Fri, Jun 18, 2021 at 7:46 PM Gerd Hoffmann  wrote:
> >
> >> On Fri, Jun 18, 2021 at 06:46:57PM +0800, Bin Meng wrote:
> >>> Hi Laszlo,
> >>>
> >>> Using the QEMU shipped edk2 bios, for i386, it boots very quickly to
> >>> the EFI shell.
> >>>
> >>> $ qemu-system-i386 -nographic -pflash edk2-i386-code.fd
>
> Ouch. Don't do this. If you use just one pflash chip, then a unified FD
> file is expected in that chip, containing both varstore and firmware
> executable.
>
> Upstream QEMU does not bundle / install unified FD files however. What it
> provides are separate executables and varstore *templates*.
>
> If you don't want to create a permanent variable store file for your VM,
> from the template called "edk2-i386-vars.fd", then the minimum command line
> is something like this:
>
> qemu-system-i386 \
>   -drive if=pflash,unit=0,format=raw,readonly=on,file=edk2-i386-code.fd \
>   -drive if=pflash,unit=1,format=raw,snapshot=on,file=edk2-i386-vars.fd \
>
> (Nowadays I should use the "blockdev" syntax instead of "-drive", but I've
> not updated my scripts thus far ;))
>

Thank you. I suggest we document this in the QEMU documentation [1]


> >>>
> >>> However with x86_64, it takes a very long time to boot to the EFI
> >>> shell. It seems it got stuck in the PXE boot. Any ideas?
> >>
> >> One year ago ia32 efi netboot support was dropped (and you are the first
> >> who noticed  ).
>
> I certainly noticed:
>
> http://mid.mail-archive.com/e6078611-789f-027b-bea5-759e02b10eee@redhat.com
>
>
> >>
> >
> > I guess not many people play with ia32 these days :)
> >
> >
> >>
> >> commit 9ed02fbb847277bef88dbe6a677cf3e5f39e5a38
> >> Author: Gerd Hoffmann 
> >> Date:   Wed Jul 22 12:24:35 2020 +0200
> >>
> >> ipxe: drop ia32 efi roms
> >>
> >> UEFI on ia32 never really took off.  Basically the BIOS -> UEFI
> shift
> >> came too late, x64 was widespread already, so vendors went from BIOS
> >> straight to UEFI on x64.
> >>
> >> Signed-off-by: Gerd Hoffmann 
> >>
> >>
> >>> I checked the boot manager, and it seems only 64-bit edk2 bios has
> >>> built-in PXE boot while 32-bit does not.
> >>
> >> It isn't edk2 but the nic boot roms, but yes, lack of pxe support on
> >> ia32 is the root cause.
> >>
> >
> > Got it.
> >
> >
> >>> Any idea to speed up this whole PXE boot thing?
> >>
> >> qemu -nic none ?
> >>
> >
> > Yep this works. Thanks a lot!
>
> If you need neither NICs nor disks in your guest at all, then "-nic none"
> is indeed the simplest solution.
>

If using NICs in the guest, then we have to adjust the order in the BIOS
boot menu?

[1] https://qemu.readthedocs.io/en/latest/system/target-i386.html

Regards,
Bin

Re: [PATCH 2/5] usb: drop usb_host_dev_is_scsi_storage hook

2021-06-22 Thread David Gibson

On Tue, Jun 22, 2021 at 02:49:12PM +0200, Gerd Hoffmann wrote:
> Introduce an usb device flag instead, set it when usb-host looks at the
> device descriptors anyway.  Also set it for emulated storage devices,
> for consistency.  Add an inline helper function to check the flag.
> 
> Signed-off-by: Gerd Hoffmann 

ppc parts
Acked-by: David Gibson 

> ---
>  include/hw/usb.h |  7 ++-
>  hw/ppc/spapr.c   |  2 +-
>  hw/usb/dev-storage-bot.c |  1 +
>  hw/usb/dev-storage-classic.c |  1 +
>  hw/usb/dev-uas.c |  1 +
>  hw/usb/host-libusb.c | 36 +++-
>  hw/usb/host-stub.c   |  5 -
>  7 files changed, 17 insertions(+), 36 deletions(-)
> 
> diff --git a/include/hw/usb.h b/include/hw/usb.h
> index 436e07b30404..33668dd0a99a 100644
> --- a/include/hw/usb.h
> +++ b/include/hw/usb.h
> @@ -219,6 +219,7 @@ enum USBDeviceFlags {
>  USB_DEV_FLAG_IS_HOST,
>  USB_DEV_FLAG_MSOS_DESC_ENABLE,
>  USB_DEV_FLAG_MSOS_DESC_IN_USE,
> +USB_DEV_FLAG_IS_SCSI_STORAGE,
>  };
>  
>  /* definition of a USB device */
> @@ -465,7 +466,6 @@ void usb_generic_async_ctrl_complete(USBDevice *s, 
> USBPacket *p);
>  
>  /* usb-linux.c */
>  void hmp_info_usbhost(Monitor *mon, const QDict *qdict);
> -bool usb_host_dev_is_scsi_storage(USBDevice *usbdev);
>  
>  /* usb ports of the VM */
>  
> @@ -561,6 +561,11 @@ const char *usb_device_get_product_desc(USBDevice *dev);
>  
>  const USBDesc *usb_device_get_usb_desc(USBDevice *dev);
>  
> +static inline bool usb_device_is_scsi_storage(USBDevice *dev)
> +{
> +return dev->flags & (1 << USB_DEV_FLAG_IS_SCSI_STORAGE);
> +}
> +
>  /* quirks.c */
>  
>  /* In bulk endpoints are streaming data sources (iow behave like isoc eps) */
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 4dd90b75cc52..f83a081af0f1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3106,7 +3106,7 @@ static char *spapr_get_fw_dev_path(FWPathProvider *p, 
> BusState *bus,
>   */
>  if (strcmp("usb-host", qdev_fw_name(dev)) == 0) {
>  USBDevice *usbdev = CAST(USBDevice, dev, TYPE_USB_DEVICE);
> -if (usb_host_dev_is_scsi_storage(usbdev)) {
> +if (usb_device_is_scsi_storage(usbdev)) {
>  return g_strdup_printf("storage@%s/disk", usbdev->port->path);
>  }
>  }
> diff --git a/hw/usb/dev-storage-bot.c b/hw/usb/dev-storage-bot.c
> index 6aad026d1133..68ebaca10c66 100644
> --- a/hw/usb/dev-storage-bot.c
> +++ b/hw/usb/dev-storage-bot.c
> @@ -32,6 +32,7 @@ static void usb_msd_bot_realize(USBDevice *dev, Error 
> **errp)
>  
>  usb_desc_create_serial(dev);
>  usb_desc_init(dev);
> +dev->flags |= (1 << USB_DEV_FLAG_IS_SCSI_STORAGE);
>  if (d->hotplugged) {
>  s->dev.auto_attach = 0;
>  }
> diff --git a/hw/usb/dev-storage-classic.c b/hw/usb/dev-storage-classic.c
> index 00cb34b22f02..3d017a4e6791 100644
> --- a/hw/usb/dev-storage-classic.c
> +++ b/hw/usb/dev-storage-classic.c
> @@ -64,6 +64,7 @@ static void usb_msd_storage_realize(USBDevice *dev, Error 
> **errp)
>  
>  usb_desc_create_serial(dev);
>  usb_desc_init(dev);
> +dev->flags |= (1 << USB_DEV_FLAG_IS_SCSI_STORAGE);
>  scsi_bus_new(>bus, sizeof(s->bus), DEVICE(dev),
>   _msd_scsi_info_storage, NULL);
>  scsi_dev = scsi_bus_legacy_add_drive(>bus, blk, 0, !!s->removable,
> diff --git a/hw/usb/dev-uas.c b/hw/usb/dev-uas.c
> index d2bd85d3f6bb..263056231c79 100644
> --- a/hw/usb/dev-uas.c
> +++ b/hw/usb/dev-uas.c
> @@ -926,6 +926,7 @@ static void usb_uas_realize(USBDevice *dev, Error **errp)
>  QTAILQ_INIT(>requests);
>  uas->status_bh = qemu_bh_new(usb_uas_send_status_bh, uas);
>  
> +dev->flags |= (1 << USB_DEV_FLAG_IS_SCSI_STORAGE);
>  scsi_bus_new(>bus, sizeof(uas->bus), DEVICE(dev),
>   _uas_scsi_info, NULL);
>  }
> diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
> index 2518306f527f..e6d21aa8e1d3 100644
> --- a/hw/usb/host-libusb.c
> +++ b/hw/usb/host-libusb.c
> @@ -770,6 +770,13 @@ static void usb_host_speed_compat(USBHostDevice *s)
>  for (i = 0; i < conf->bNumInterfaces; i++) {
>  for (a = 0; a < conf->interface[i].num_altsetting; a++) {
>  intf = >interface[i].altsetting[a];
> +
> +if (intf->bInterfaceClass == LIBUSB_CLASS_MASS_STORAGE &&
> +intf->bInterfaceSubClass == 6) { /* SCSI */
> +udev->flags |= (1 << USB_DEV_FLAG_IS_SCSI_STORAGE);
> +break;
> +}
> +
>  for (e = 0; e < intf->bNumEndpoints; e++) {
>  endp = >endpoint[e];
>  type = endp->bmAttributes & 0x3;
> @@ -1893,35 +1900,6 @@ static void usb_host_auto_check(void *unused)
>  timer_mod(usb_auto_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 2000);
>  }
>  
> -/**
> - * Check whether USB host device has a USB mass storage SCSI interface
> - */
> -bool

Re: [PATCH qemu] hw/net/vmxnet3: Remove g_assert_not_reached() when VMXNET3_REG_ICR is written

2021-06-22 Thread Jason Wang




在 2021/6/23 上午10:26, Qiang Liu 写道:

From: cyruscyliu 

A malicious guest user can write VMXNET3_REG_ICR to crash QEMU. This
patch remove the g_aasert_not_reached() there and make the access pass.

Fixes: 786fd2b0f87 ("VMXNET3 device implementation")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/309
Buglink: https://bugs.launchpad.net/qemu/+bug/1913923

Signed-off-by: Qiang Liu 



Do we need to warn about the unimplemented register?

Thanks



---
  hw/net/vmxnet3.c | 7 ---
  1 file changed, 7 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index eff299f629..a388918479 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1786,13 +1786,6 @@ vmxnet3_io_bar1_write(void *opaque,
  vmxnet3_set_variable_mac(s, val, s->temp_mac);
  break;

-/* Interrupt Cause Register */
-case VMXNET3_REG_ICR:
-VMW_CBPRN("Write BAR1 [VMXNET3_REG_ICR] = %" PRIx64 ", size %d",
-  val, size);
-g_assert_not_reached();
-break;
-
  /* Event Cause Register */
  case VMXNET3_REG_ECR:
  VMW_CBPRN("Write BAR1 [VMXNET3_REG_ECR] = %" PRIx64 ", size %d",
--
2.30.2

Re: [PATCH v5 0/2] target/s390x: Fix SIGILL/SIGFPE/SIGTRAP psw.addr reporting

2021-06-22 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20210623023250.3667563-1-...@linux.ibm.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210623023250.3667563-1-...@linux.ibm.com
Subject: [PATCH v5 0/2] target/s390x: Fix SIGILL/SIGFPE/SIGTRAP psw.addr 
reporting

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  
patchew/20210621141452.2045-1-jonathan.albre...@linux.vnet.ibm.com -> 
patchew/20210621141452.2045-1-jonathan.albre...@linux.vnet.ibm.com
 * [new tag] patchew/20210623023250.3667563-1-...@linux.ibm.com -> 
patchew/20210623023250.3667563-1-...@linux.ibm.com
Switched to a new branch 'test'
56bc4f3 tests/tcg/s390x: Test SIGILL and SIGSEGV handling
b6b6d39 target/s390x: Fix SIGILL/SIGFPE/SIGTRAP psw.addr reporting

=== OUTPUT BEGIN ===
1/2 Checking commit b6b6d3978456 (target/s390x: Fix SIGILL/SIGFPE/SIGTRAP 
psw.addr reporting)
2/2 Checking commit 56bc4f3bb893 (tests/tcg/s390x: Test SIGILL and SIGSEGV 
handling)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#26: 
new file mode 100644

ERROR: externs should be avoided in .c files
#44: FILE: tests/tcg/s390x/signal.c:14:
+void illegal_op(void);

ERROR: externs should be avoided in .c files
#45: FILE: tests/tcg/s390x/signal.c:15:
+void after_illegal_op(void);

ERROR: externs should be avoided in .c files
#51: FILE: tests/tcg/s390x/signal.c:21:
+void stg(void *dst, unsigned long src);

ERROR: externs should be avoided in .c files
#56: FILE: tests/tcg/s390x/signal.c:26:
+void mvc_8(void *dst, void *src);

total: 4 errors, 1 warnings, 169 lines checked

Patch 2/2 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210623023250.3667563-1-...@linux.ibm.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH] hw/arm/boot: Use NUMA node ID in memory node name

2021-06-22 Thread Gavin Shan


Hi Drew,

On 6/22/21 5:13 PM, Andrew Jones wrote:

On Tue, Jun 22, 2021 at 06:53:41PM +1000, Gavin Shan wrote:

On 6/3/21 2:48 PM, Gavin Shan wrote:

On 6/2/21 9:36 PM, Andrew Jones wrote:

On Wed, Jun 02, 2021 at 11:09:32AM +1000, Gavin Shan wrote:

On 6/1/21 5:50 PM, Andrew Jones wrote:

On Tue, Jun 01, 2021 at 03:30:04PM +0800, Gavin Shan wrote:

We possibly populate empty nodes where memory isn't included and might
be hot added at late time. The FDT memory nodes can't be created due
to conflicts on their names if multiple empty nodes are specified.
For example, the VM fails to start with the following error messages.

     /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64  \
     -accel kvm -machine virt,gic-version=host    \
     -cpu host -smp 4,sockets=2,cores=2,threads=1 -m 1024M,maxmem=64G \
     -object memory-backend-ram,id=mem0,size=512M \
     -object memory-backend-ram,id=mem1,size=512M \
     -numa node,nodeid=0,cpus=0-1,memdev=mem0 \
     -numa node,nodeid=1,cpus=2-3,memdev=mem1 \
     -numa node,nodeid=2  \
     -numa node,nodeid=3  \
   :
     -device virtio-balloon-pci,id=balloon0,free-page-reporting=yes

     qemu-system-aarch64: FDT: Failed to create subnode /memory@8000: \
  FDT_ERR_EXISTS

This fixes the issue by using NUMA node ID or zero in the memory node
name to avoid the conflicting memory node names. With this applied, the
VM can boot successfully with above command lines.

Signed-off-by: Gavin Shan 
---
    hw/arm/boot.c | 7 ++-
    1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index d7b059225e..3169bdf595 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -432,7 +432,12 @@ static int fdt_add_memory_node(void *fdt, uint32_t acells, 
hwaddr mem_base,
    char *nodename;
    int ret;
-    nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
+    if (numa_node_id >= 0) {
+    nodename = g_strdup_printf("/memory@%d", numa_node_id);
+    } else {
+    nodename = g_strdup("/memory@0");
+    }
+
    qemu_fdt_add_subnode(fdt, nodename);
    qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
    ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, 
mem_base,


[...]



I've sent one separate mail to check with Rob Herring. Hopefully he have
ideas as he is maintaining linux FDT subsystem. You have been included to
that thread. I didn't find something meaningful to this thread after doing
some google search either.

Yes, I agree with you we need to follow the specification strictly. It seems
it's uncertain about the 'physical memory map' bus binding requirements.



I didn't get expected answers from device-tree experts.


What response did you get? Can you please provide a link to the discussion?



Sorry about the confusion. I meant "no response" by "expected answers". Here
is the mail sent to Rob before, no reply so far:

https://lkml.org/lkml/2021/6/2/1446


After rethinking about it,
I plan to fix this like this way, but please let me know if it sounds sensible
to you.

The idea is to assign a (not overlapped) dummy base address to each memory
node in the device-tree. The dummy is (last_valid_memory_address + NUMA ID).
The 'length' of the 'reg' property in the device-tree nodes, corresponding
to empty NUMA nodes, is still zero. This ensures the nodes are still invalid
until memory is added to these nodes.

I had the temporary patch for the implementation. It works fine and VM can
boot up successfully.


I would like to be sure that the kernel developers for NUMA, memory
hotplug, and devicetree specifications are all happy with the approach
before adding it to QEMU.



As I understood, it won't break anything from perspectives of NUMA
and device-tree specification. First of all, NUMA cares the so-called
distance map and 'numa-node-id' property in the individual device-tree
nodes. The device-tree specification doesn't say 'length' in 'reg'
property of memory node can't be zero. Also, the linux device-tree
implementation has the check on 'length', the memory block won't be
added if it's zero.

Documentation/devicetree/bindings/numa.txt has more details about
the required device-tree NUMA properties.

I'm not sure about memory hotplug. I tried memory hot add and it seems
working finely. Memory hot-add/remove are working without issues:

/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
-accel kvm -machine virt,gic-version=host   \
-cpu host -smp 4,sockets=2,cores=2,threads=1\
-m 4096M,slots=16,maxmem=64G\
-object memory-backend-ram,id=mem0,size=2048M   \
-object memory-backend-ram,id=mem1,size=2048M   \
-numa node,nodeid=0,cpus=0-1,memdev=mem0\
-numa

[PATCH v5 1/2] target/s390x: Fix SIGILL/SIGFPE/SIGTRAP psw.addr reporting

2021-06-22 Thread Ilya Leoshkevich

For SIGILL, SIGFPE and SIGTRAP the PSW must point after the
instruction, and at the instruction for other signals. Currently under
qemu-user it always points at the instruction.

Fix by advancing psw.addr for these signals.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/319
Signed-off-by: Ilya Leoshkevich 
Co-developed-by: Ulrich Weigand 
---
 linux-user/s390x/cpu_loop.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/linux-user/s390x/cpu_loop.c b/linux-user/s390x/cpu_loop.c
index 30568139df..230217feeb 100644
--- a/linux-user/s390x/cpu_loop.c
+++ b/linux-user/s390x/cpu_loop.c
@@ -133,6 +133,11 @@ void cpu_loop(CPUS390XState *env)
 
 do_signal_pc:
 addr = env->psw.addr;
+/*
+ * For SIGILL, SIGFPE and SIGTRAP the PSW must point after the
+ * instruction.
+ */
+env->psw.addr += env->int_pgm_ilen;
 do_signal:
 info.si_signo = sig;
 info.si_errno = 0;
-- 
2.31.1

[PATCH v5 2/2] tests/tcg/s390x: Test SIGILL and SIGSEGV handling

2021-06-22 Thread Ilya Leoshkevich

Verify that s390x-specific uc_mcontext.psw.addr is reported correctly.

Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/s390x/Makefile.target |   1 +
 tests/tcg/s390x/signal.c| 165 
 2 files changed, 166 insertions(+)
 create mode 100644 tests/tcg/s390x/signal.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 241ef28f61..cdb7d85316 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -8,3 +8,4 @@ TESTS+=exrl-trtr
 TESTS+=pack
 TESTS+=mvo
 TESTS+=mvc
+TESTS+=signal
diff --git a/tests/tcg/s390x/signal.c b/tests/tcg/s390x/signal.c
new file mode 100644
index 00..37c6665075
--- /dev/null
+++ b/tests/tcg/s390x/signal.c
@@ -0,0 +1,165 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Various instructions that generate SIGILL and SIGSEGV. They could have been
+ * defined in a separate .s file, but this would complicate the build, so the
+ * inline asm is used instead.
+ */
+
+void illegal_op(void);
+void after_illegal_op(void);
+asm(".globl\tillegal_op\n"
+"illegal_op:\t.byte\t0x00,0x00\n"
+"\t.globl\tafter_illegal_op\n"
+"after_illegal_op:\tbr\t%r14");
+
+void stg(void *dst, unsigned long src);
+asm(".globl\tstg\n"
+"stg:\tstg\t%r3,0(%r2)\n"
+"\tbr\t%r14");
+
+void mvc_8(void *dst, void *src);
+asm(".globl\tmvc_8\n"
+"mvc_8:\tmvc\t0(8,%r2),0(%r3)\n"
+"\tbr\t%r14");
+
+static void safe_puts(const char *s)
+{
+write(0, s, strlen(s));
+write(0, "\n", 1);
+}
+
+enum exception {
+exception_operation,
+exception_translation,
+exception_protection,
+};
+
+static struct {
+int sig;
+void *addr;
+unsigned long psw_addr;
+enum exception exception;
+} expected;
+
+static void handle_signal(int sig, siginfo_t *info, void *ucontext)
+{
+void *page;
+int err;
+
+if (sig != expected.sig) {
+safe_puts("[  FAILED  ] wrong signal");
+_exit(1);
+}
+
+if (info->si_addr != expected.addr) {
+safe_puts("[  FAILED  ] wrong si_addr");
+_exit(1);
+}
+
+if (((ucontext_t *)ucontext)->uc_mcontext.psw.addr != expected.psw_addr) {
+safe_puts("[  FAILED  ] wrong psw.addr");
+_exit(1);
+}
+
+switch (expected.exception) {
+case exception_translation:
+page = mmap(expected.addr, 4096, PROT_READ | PROT_WRITE,
+MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+if (page != expected.addr) {
+safe_puts("[  FAILED  ] mmap() failed");
+_exit(1);
+}
+break;
+case exception_protection:
+err = mprotect(expected.addr, 4096, PROT_READ | PROT_WRITE);
+if (err != 0) {
+safe_puts("[  FAILED  ] mprotect() failed");
+_exit(1);
+}
+break;
+default:
+break;
+}
+}
+
+static void check_sigsegv(void *func, enum exception exception,
+  unsigned long val)
+{
+int prot;
+unsigned long *page;
+unsigned long *addr;
+int err;
+
+prot = exception == exception_translation ? PROT_NONE : PROT_READ;
+page = mmap(NULL, 4096, prot, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+assert(page != MAP_FAILED);
+if (exception == exception_translation) {
+/* Hopefully nothing will be mapped at this address. */
+err = munmap(page, 4096);
+assert(err == 0);
+}
+addr = page + (val & 0x1ff);
+
+expected.sig = SIGSEGV;
+expected.addr = page;
+expected.psw_addr = (unsigned long)func;
+expected.exception = exception;
+if (func == stg) {
+stg(addr, val);
+} else {
+assert(func == mvc_8);
+mvc_8(addr, );
+}
+assert(*addr == val);
+
+err = munmap(page, 4096);
+assert(err == 0);
+}
+
+int main(void)
+{
+struct sigaction act;
+int err;
+
+memset(, 0, sizeof(act));
+act.sa_sigaction = handle_signal;
+act.sa_flags = SA_SIGINFO;
+err = sigaction(SIGILL, , NULL);
+assert(err == 0);
+err = sigaction(SIGSEGV, , NULL);
+assert(err == 0);
+
+safe_puts("[ RUN  ] Operation exception");
+expected.sig = SIGILL;
+expected.addr = illegal_op;
+expected.psw_addr = (unsigned long)after_illegal_op;
+expected.exception = exception_operation;
+illegal_op();
+safe_puts("[   OK ]");
+
+safe_puts("[ RUN  ] Translation exception from stg");
+check_sigsegv(stg, exception_translation, 42);
+safe_puts("[   OK ]");
+
+safe_puts("[ RUN  ] Translation exception from mvc");
+check_sigsegv(mvc_8, exception_translation, 4242);
+safe_puts("[   OK ]");
+
+safe_puts("[ RUN  ] Protection exception from stg");
+check_sigsegv(stg, exception_protection, 424242);
+safe_puts("[   OK ]");
+
+safe_puts("[ RUN  ] Protection exception from mvc");
+check_sigsegv(mvc_8, exception_protection, 42424242);
+safe_puts("[

[PATCH v5 0/2] target/s390x: Fix SIGILL/SIGFPE/SIGTRAP psw.addr reporting

2021-06-22 Thread Ilya Leoshkevich

qemu-s390x puts a wrong value into SIGILL's siginfo_t's psw.addr: it
should be a pointer to the instruction following the illegal
instruction, but at the moment it is a pointer to the illegal
instruction itself. This breaks OpenJDK, which relies on this value.
A similar problem exists for SIGFPE and SIGTRAP.

Patch 1 fixes the issue, patch 2 adds a test.

v1: https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06592.html
v1 -> v2: Use a better buglink (Cornelia), simplify the inline asm
  magic in the test and add an explanation (David).

v2: https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06649.html
v2 -> v3: Fix SIGSEGV handling (found when trying to run valgrind under
  qemu-user).

v3: https://lists.nongnu.org/archive/html/qemu-devel/2021-06/msg00299.html
v3 -> v4: Fix compiling the test on Ubuntu 20.04 (Jonathan).

v4: https://lists.nongnu.org/archive/html/qemu-devel/2021-06/msg05848.html
v4 -> v5: Greatly simplify the fix (Ulrich).

Note: the compare-and-trap SIGFPE issue is being fixed separately.
https://lists.nongnu.org/archive/html/qemu-devel/2021-06/msg05690.html

Ilya Leoshkevich (2):
  target/s390x: Fix SIGILL/SIGFPE/SIGTRAP psw.addr reporting
  tests/tcg/s390x: Test SIGILL and SIGSEGV handling

 linux-user/s390x/cpu_loop.c |   5 +
 tests/tcg/s390x/Makefile.target |   1 +
 tests/tcg/s390x/signal.c| 165 
 3 files changed, 171 insertions(+)
 create mode 100644 tests/tcg/s390x/signal.c

-- 
2.31.1

Re: [PATCH 1/2] linux-user/s390x: signal with SIGFPE on compare-and-trap

2021-06-22 Thread Ilya Leoshkevich

On Mon, 2021-06-21 at 10:14 -0400, Jonathan Albrecht wrote:
> Currently when a compare-and-trap instruction is executed, qemu will
> always raise a SIGILL signal. On real hardware, a SIGFPE is raised.
> 
> Change the PGM_DATA case in cpu_loop to follow the behavior in
> linux kernel /arch/s390/kernel/traps.c.
>  * Only raise SIGILL if DXC == 0
>  * If DXC matches an IEEE exception, raise SIGFPE with correct si_code
>  * Raise SIGFPE with si_code == 0 for everything else
> 
> When applied on 20210602002210.3144559-2-...@linux.ibm.com, this fixes
> crashes in the java jdk such as the linked bug.
> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1920913
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/319
> Signed-off-by: Jonathan Albrecht <
> jonathan.albre...@linux.vnet.ibm.com>
> ---
>  linux-user/s390x/cpu_loop.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)

I tried this on top of my SIGILL patch to run Maven, it worked without
issues.

Acked-by: Ilya Leoshkevich 
Tested-by: Ilya Leoshkevich

[PATCH qemu] hw/net/vmxnet3: Remove g_assert_not_reached() when VMXNET3_REG_ICR is written

2021-06-22 Thread Qiang Liu

From: cyruscyliu 

A malicious guest user can write VMXNET3_REG_ICR to crash QEMU. This
patch remove the g_aasert_not_reached() there and make the access pass.

Fixes: 786fd2b0f87 ("VMXNET3 device implementation")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/309
Buglink: https://bugs.launchpad.net/qemu/+bug/1913923

Signed-off-by: Qiang Liu 
---
 hw/net/vmxnet3.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index eff299f629..a388918479 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1786,13 +1786,6 @@ vmxnet3_io_bar1_write(void *opaque,
 vmxnet3_set_variable_mac(s, val, s->temp_mac);
 break;

-/* Interrupt Cause Register */
-case VMXNET3_REG_ICR:
-VMW_CBPRN("Write BAR1 [VMXNET3_REG_ICR] = %" PRIx64 ", size %d",
-  val, size);
-g_assert_not_reached();
-break;
-
 /* Event Cause Register */
 case VMXNET3_REG_ECR:
 VMW_CBPRN("Write BAR1 [VMXNET3_REG_ECR] = %" PRIx64 ", size %d",
--
2.30.2

[v3] migration: fix the memory overwriting risk in add_to_iovec

2021-06-22 Thread Lin Feng

From: Feng Lin 

When testing migration, a Segmentation fault qemu core is generated.
0  error_free (err=0x1)
1  0x7f8b862df647 in qemu_fclose (f=f@entry=0x55e06c247640)
2  0x7f8b8516d59a in migrate_fd_cleanup (s=s@entry=0x55e06c0e1ef0)
3  0x7f8b8516d66c in migrate_fd_cleanup_bh (opaque=0x55e06c0e1ef0)
4  0x7f8b8626a47f in aio_bh_poll (ctx=ctx@entry=0x55e06b5a16d0)
5  0x7f8b8626e71f in aio_dispatch (ctx=0x55e06b5a16d0)
6  0x7f8b8626a33d in aio_ctx_dispatch (source=, 
callback=, user_data=)
7  0x7f8b866bdba4 in g_main_context_dispatch ()
8  0x7f8b8626cde9 in glib_pollfds_poll ()
9  0x7f8b8626ce62 in os_host_main_loop_wait (timeout=)
10 0x7f8b8626cffd in main_loop_wait (nonblocking=nonblocking@entry=0)
11 0x7f8b862ef01f in main_loop ()
Using gdb print the struct QEMUFile f = {
  ...,
  iovcnt = 65, last_error = 21984,
  last_error_obj = 0x1, shutdown = true
}
Well iovcnt is overflow, because the max size of MAX_IOV_SIZE is 64.
struct QEMUFile {
...;
struct iovec iov[MAX_IOV_SIZE];
unsigned int iovcnt;
int last_error;
Error *last_error_obj;
bool shutdown;
};
iovcnt and last_error is overwrited by add_to_iovec().
Right now, add_to_iovec() increase iovcnt before check the limit.
And it seems that add_to_iovec() assumes that iovcnt will set to zero
in qemu_fflush(). But qemu_fflush() will directly return when f->shutdown
is true.

The situation may occur when libvirtd restart during migration, after
f->shutdown is set, before calling qemu_file_set_error() in
qemu_file_shutdown().

So the safiest way is checking the iovcnt before increasing it.

Signed-off-by: Feng Lin 
---
 migration/qemu-file.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index d6e03dbc0e..f6486cf7bc 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -416,6 +416,9 @@ static int add_to_iovec(QEMUFile *f, const uint8_t *buf, 
size_t size,
 {
 f->iov[f->iovcnt - 1].iov_len += size;
 } else {
+if (f->iovcnt >= MAX_IOV_SIZE) {
+goto fflush;
+}
 if (may_free) {
 set_bit(f->iovcnt, f->may_free);
 }
@@ -423,12 +426,12 @@ static int add_to_iovec(QEMUFile *f, const uint8_t *buf, 
size_t size,
 f->iov[f->iovcnt++].iov_len = size;
 }
 
-if (f->iovcnt >= MAX_IOV_SIZE) {
-qemu_fflush(f);
-return 1;
+if (f->iovcnt < MAX_IOV_SIZE) {
+return 0;
 }
-
-return 0;
+fflush:
+qemu_fflush(f);
+return 1;
 }
 
 static void add_buf_to_iovec(QEMUFile *f, size_t len)
-- 
2.23.0

[v2] migration: fix the memory overwriting risk in add_to_iovec

2021-06-22 Thread Lin Feng

From: Feng Lin 

When testing migration, a Segmentation fault qemu core is generated.
0  error_free (err=0x1)
1  0x7f8b862df647 in qemu_fclose (f=f@entry=0x55e06c247640)
2  0x7f8b8516d59a in migrate_fd_cleanup (s=s@entry=0x55e06c0e1ef0)
3  0x7f8b8516d66c in migrate_fd_cleanup_bh (opaque=0x55e06c0e1ef0)
4  0x7f8b8626a47f in aio_bh_poll (ctx=ctx@entry=0x55e06b5a16d0)
5  0x7f8b8626e71f in aio_dispatch (ctx=0x55e06b5a16d0)
6  0x7f8b8626a33d in aio_ctx_dispatch (source=, 
callback=, user_data=)
7  0x7f8b866bdba4 in g_main_context_dispatch ()
8  0x7f8b8626cde9 in glib_pollfds_poll ()
9  0x7f8b8626ce62 in os_host_main_loop_wait (timeout=)
10 0x7f8b8626cffd in main_loop_wait (nonblocking=nonblocking@entry=0)
11 0x7f8b862ef01f in main_loop ()
Using gdb print the struct QEMUFile f = {
  ...,
  iovcnt = 65, last_error = 21984,
  last_error_obj = 0x1, shutdown = true
}
Well iovcnt is overflow, because the max size of MAX_IOV_SIZE is 64.
struct QEMUFile {
...;
struct iovec iov[MAX_IOV_SIZE];
unsigned int iovcnt;
int last_error;
Error *last_error_obj;
bool shutdown;
};
iovcnt and last_error is overwrited by add_to_iovec().
Right now, add_to_iovec() increase iovcnt before check the limit.
And it seems that add_to_iovec() assumes that iovcnt will set to zero
in qemu_fflush(). But qemu_fflush() will directly return when f->shutdown
is true.

The situation may occur when libvirtd restart during migration, after
f->shutdown is set, before calling qemu_file_set_error() in
qemu_file_shutdown().

So the safiest way is checking the iovcnt before increasing it.

Signed-off-by: Feng Lin 
---
 migration/qemu-file.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index d6e03dbc0e..f6486cf7bc 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -416,6 +416,9 @@ static int add_to_iovec(QEMUFile *f, const uint8_t *buf, 
size_t size,
 {
 f->iov[f->iovcnt - 1].iov_len += size;
 } else {
+if (f->iovcnt >= MAX_IOV_SIZE) {
+goto fflush;
+}
 if (may_free) {
 set_bit(f->iovcnt, f->may_free);
 }
@@ -423,12 +426,12 @@ static int add_to_iovec(QEMUFile *f, const uint8_t *buf, 
size_t size,
 f->iov[f->iovcnt++].iov_len = size;
 }
 
-if (f->iovcnt >= MAX_IOV_SIZE) {
-qemu_fflush(f);
-return 1;
+if (f->iovcnt < MAX_IOV_SIZE) {
+return 0;
 }
-
-return 0;
+fflush:
+qemu_fflush();
+return 1;
 }
 
 static void add_buf_to_iovec(QEMUFile *f, size_t len)
-- 
2.23.0

Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.

2021-06-22 Thread Jason Wang




在 2021/6/22 下午5:09, Toke Høiland-Jørgensen 写道:

Daniel P. Berrangé  writes:


On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:

Jason Wang  writes:


在 2021/6/22 上午11:29, Yuri Benditovich 写道:

On Mon, Jun 21, 2021 at 12:20 PM Jason Wang  wrote:

在 2021/6/19 上午4:03, Andrew Melnichenko 写道:

Hi Jason,
I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
and Debian - no need permissions to update BPF maps.

How about RHEL :) ?

If I'm not mistaken, the RHEL releases do not use modern kernels yet
(for BPF we need 5.8+).
So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.

Adding Toke for more ideas on this.

Ignore the kernel version number; we backport all of BPF to RHEL,
basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.

However, we completely disable unprivileged BPF on RHEL kernels. Also,
there's upstream commit:
08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")

which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
believe this may end up being the default on Fedora as well.

So any design relying on unprivileged BPF is likely to break; I'd
suggest you look into how you can get this to work with CAP_BPF :)

QEMU will never have any capabilities. Any resources that required
privileges have to be opened by a separate privileged helper, and the
open FD then passed across to the QEMU process. This relies on the
capabilities checks only being performed at time of initial opening,
and *not* on operations performed on the already open FD.

That won't work for regular map updates either, unfortunately: you still
have to perform a bpf() syscall to update an element, and that is a
privileged operation.

You may be able to get around this by using an array map type and
mmap()'ing the map contents, but I'm not sure how well that will work
across process boundaries.

If it doesn't, I only see two possibilities: populate the map
ahead-of-time and leave it in place, or keep the privileged helper
process around to perform map updates on behalf of QEMU...



Right, and this could be probably done by extending and tracking the RSS 
update via rx filter event.


Thanks




-Toke

[PATCH] hw/audio/sb16: Restrict I/O sampling rate range for command 41h/42h

2021-06-22 Thread Qiang Liu

The I/O sampling rate range is enforced to 5000 to 45000HZ according to
commit a2cd86a9. Setting I/O sampling rate with command 41h/42h, a guest
user can break this assumption and trigger an assertion in audio_calloc
via command 0xd4. This patch restricts the I/O sampling rate range for
command 41h/42h.

Fixes: 85571bc7415 ("audio merge (malc)")
Signed-off-by: Qiang Liu 
---
 hw/audio/sb16.c  | 31 +++
 tests/qtest/fuzz-sb16-test.c | 17 +
 2 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/hw/audio/sb16.c b/hw/audio/sb16.c
index 5cf121f..60f1f75 100644
--- a/hw/audio/sb16.c
+++ b/hw/audio/sb16.c
@@ -229,6 +229,23 @@ static void continue_dma8 (SB16State *s)
 control (s, 1);
 }

+static inline int restrict_sampling_rate(int freq)
+{
+if (freq < SAMPLE_RATE_MIN) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "sampling range too low: %d, increasing to %u\n",
+  freq, SAMPLE_RATE_MIN);
+return SAMPLE_RATE_MIN;
+} else if (freq > SAMPLE_RATE_MAX) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "sampling range too high: %d, decreasing to %u\n",
+  freq, SAMPLE_RATE_MAX);
+return SAMPLE_RATE_MAX;
+} else {
+return freq;
+}
+}
+
 static void dma_cmd8 (SB16State *s, int mask, int dma_len)
 {
 s->fmt = AUDIO_FORMAT_U8;
@@ -244,17 +261,7 @@ static void dma_cmd8 (SB16State *s, int mask, int
dma_len)
 int tmp = (256 - s->time_const);
 s->freq = (100 + (tmp / 2)) / tmp;
 }
-if (s->freq < SAMPLE_RATE_MIN) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "sampling range too low: %d, increasing to %u\n",
-  s->freq, SAMPLE_RATE_MIN);
-s->freq = SAMPLE_RATE_MIN;
-} else if (s->freq > SAMPLE_RATE_MAX) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "sampling range too high: %d, decreasing to %u\n",
-  s->freq, SAMPLE_RATE_MAX);
-s->freq = SAMPLE_RATE_MAX;
-}
+s->freq = restrict_sampling_rate(s->freq);

 if (dma_len != -1) {
 s->block_size = dma_len << s->fmt_stereo;
@@ -768,7 +775,7 @@ static void complete (SB16State *s)
  * and FT2 sets output freq with this (go figure).  Compare:
  *
http://homepages.cae.wisc.edu/~brodskye/sb16doc/sb16doc.html#SamplingRate
  */
-s->freq = dsp_get_hilo (s);
+s->freq = restrict_sampling_rate(dsp_get_hilo(s));
 ldebug ("set freq %d\n", s->freq);
 break;

diff --git a/tests/qtest/fuzz-sb16-test.c b/tests/qtest/fuzz-sb16-test.c
index 51030cd..f47a8bc 100644
--- a/tests/qtest/fuzz-sb16-test.c
+++ b/tests/qtest/fuzz-sb16-test.c
@@ -37,6 +37,22 @@ static void test_fuzz_sb16_0x91(void)
 qtest_quit(s);
 }

+/*
+ * This used to trigger the assert in audio_calloc
+ * through command 0xd4
+ */
+static void test_fuzz_sb16_0xd4(void)
+{
+QTestState *s = qtest_init("-M pc -display none "
+   "-device sb16,audiodev=none "
+   "-audiodev id=none,driver=none");
+qtest_outb(s, 0x22c, 0x41);
+qtest_outb(s, 0x22c, 0x00);
+qtest_outb(s, 0x22c, 0x14);
+qtest_outb(s, 0x22c, 0xd4);
+qtest_quit(s);
+}
+
 int main(int argc, char **argv)
 {
 const char *arch = qtest_get_arch();
@@ -46,6 +62,7 @@ int main(int argc, char **argv)
if (strcmp(arch, "i386") == 0) {
 qtest_add_func("fuzz/test_fuzz_sb16/1c", test_fuzz_sb16_0x1c);
 qtest_add_func("fuzz/test_fuzz_sb16/91", test_fuzz_sb16_0x91);
+qtest_add_func("fuzz/test_fuzz_sb16/d4", test_fuzz_sb16_0xd4);
}

return g_test_run();
--
2.7.4

Re: [PATCH v1 1/1] migration: Unregister yank if migration setup fails

2021-06-22 Thread Leonardo Bras Soares Passos

On Tue, Jun 22, 2021 at 2:38 PM Peter Xu  wrote:
[...]
> Yes, looks right to me:
>
> Reviewed-by: Peter Xu 
>
> --
> Peter Xu

Thanks Peter!

RE: [PATCH v5 10/14] target/hexagon: import parser for idef-parser

2021-06-22 Thread Taylor Simpson




> -Original Message-
> From: Alessandro Di Federico 
> Sent: Saturday, June 19, 2021 3:37 AM
> To: qemu-devel@nongnu.org
> Cc: Taylor Simpson ; Brian Cain
> ; bab...@rev.ng; ni...@rev.ng; phi...@redhat.com;
> richard.hender...@linaro.org; Alessandro Di Federico 
> Subject: [PATCH v5 10/14] target/hexagon: import parser for idef-parser
> 
> From: Paolo Montesel 
> 
> Signed-off-by: Alessandro Di Federico 
> Signed-off-by: Paolo Montesel 
> ---
> diff --git a/target/hexagon/idef-parser/parser-helpers.h
> b/target/hexagon/idef-parser/parser-helpers.h
> new file mode 100644
> index 00..fec3ad7819
> --- /dev/null
> +++ b/target/hexagon/idef-parser/parser-helpers.h
> @@ -0,0 +1,347 @@
> +
> +#define OUT_IMPL(c, locp, x)\
> +QEMU_GENERIC(typeof(*x),\
> +(char,  str_print), \
> +(uint64_t,  uint64_print),  \
> +(int,   int_print), \
> +(unsigned,  uint_print),\
> +(HexValue,  rvalue_out),\
> +out_assert  \
> +)(c, locp, x);  \
> +

QEMU_GENERIC has been removed

commit de51d8cbf0f9a9745ac02fb07e02063b7dfe35b9
Author: Richard Henderson 
Date:   Mon Jun 14 16:31:42 2021 -0700

qemu/compiler: Remove QEMU_GENERIC

All previous users now use C11 _Generic.

Signed-off-by: Richard Henderson 
Reviewed-by: Alex Benne 
Message-Id: <20210614233143.1221879-8-richard.hender...@linaro.org>
Signed-off-by: Paolo Bonzini 

You can now write this as

#define OUT_IMPL(c, locp, x)  \
_Generic(*x,  \
char:  str_print, \
uint64_t:  uint64_print,  \
int:   int_print, \
unsigned:  uint_print,\
HexValue:  rvalue_out,\
default: out_assert   \
)(c, locp, x);


Thanks,
Taylor

Re: [PATCH RFC 0/6] i386/pc: Fix creation of >= 1Tb guests on AMD systems with IOMMU

2021-06-22 Thread Alex Williamson

On Tue, 22 Jun 2021 16:48:59 +0100
Joao Martins  wrote:

> Hey,
> 
> This series lets Qemu properly spawn i386 guests with >= 1Tb with VFIO, 
> particularly
> when running on AMD systems with an IOMMU.
> 
> Since Linux v5.4, VFIO validates whether the IOVA in DMA_MAP ioctl is valid 
> and it
> will return -EINVAL on those cases. On x86, Intel hosts aren't particularly
> affected by this extra validation. But AMD systems with IOMMU have a hole in
> the 1TB boundary which is *reserved* for HyperTransport I/O addresses located
> here  FD__h - FF__h. See IOMMU manual [1], specifically
> section '2.1.2 IOMMU Logical Topology', Table 3 on what those addresses mean.
> 
> VFIO DMA_MAP calls in this IOVA address range fall through this check and 
> hence return
>  -EINVAL, consequently failing the creation the guests bigger than 1010G. 
> Example
> of the failure:
> 
> qemu-system-x86_64: -device vfio-pci,host=:41:10.1,bootindex=-1: 
> VFIO_MAP_DMA: -22
> qemu-system-x86_64: -device vfio-pci,host=:41:10.1,bootindex=-1: vfio 
> :41:10.1: 
>   failed to setup container for group 258: memory listener initialization 
> failed:
>   Region pc.ram: vfio_dma_map(0x55ba53e7a9d0, 0x1, 
> 0xff3000, 0x7ed243e0) = -22 (Invalid argument)
> 
> Prior to v5.4, we could map using these IOVAs *but* that's still not the 
> right thing
> to do and could trigger certain IOMMU events (e.g. INVALID_DEVICE_REQUEST), or
> spurious guest VF failures from the resultant IOMMU target abort (see Errata 
> 1155[2])
> as documented on the links down below.
> 
> This series tries to address that by dealing with this AMD-specific 1Tb hole,
> similarly to how we deal with the 4G hole today in x86 in general. It is 
> splitted
> as following:
> 
> * patch 1: initialize the valid IOVA ranges above 4G, adding an iterator
>which gets used too in other parts of pc/acpi besides MR creation. 
> The
>  allowed IOVA *only* changes if it's an AMD host, so no change for
>  Intel. We walk the allowed ranges for memory above 4G, and
>  add a E820_RESERVED type everytime we find a hole (which is at the
>  1TB boundary).
>  
>  NOTE: For purposes of this RFC, I rely on cpuid in hw/i386/pc.c but I
>  understand that it doesn't cover the non-x86 host case running TCG.
> 
>  Additionally, an alternative to hardcoded ranges as we do today,
>  VFIO could advertise the platform valid IOVA ranges without 
> necessarily
>  requiring to have a PCI device added in the vfio container. That 
> would
>  fetching the valid IOVA ranges from VFIO, rather than hardcoded IOVA
>  ranges as we do today. But sadly, wouldn't work for older 
> hypervisors.


$ grep -h . /sys/kernel/iommu_groups/*/reserved_regions | sort -u
0xfee0 0xfeef msi
0x00fd 0x00ff reserved

Ideally we might take that into account on all hosts, but of course
then we run into massive compatibility issues when we consider
migration.  We run into similar problems when people try to assign
devices to non-x86 TCG hosts, where the arch doesn't have a natural
memory hole overlapping the msi range.

The issue here is similar to trying to find a set of supported CPU
flags across hosts, QEMU only has visibility to the host where it runs,
an upper level tool needs to be able to pass through information about
compatibility to all possible migration targets.  Towards that end, we
should probably have command line options that either allow to specify
specific usable or reserved GPA address ranges.  For example something
like:
--reserved-mem-ranges=host

Or explicitly:

--reserved-mem-ranges=13G@1010G,1M@4078M

> * patch 2 - 5: cover the remaining parts of the surrounding the mem map, 
> particularly
>  ACPI SRAT ranges, CMOS, hotplug as well as the PCI 64-bit hole.
> 
> * patch 6: Add a machine property which is disabled for older machine types 
> (<=6.0)
>  to keep things as is.
> 
> The 'consequence' of this approach is that we may need more than the default
> phys-bits e.g. a guest with 1024G, will have ~13G be put after the 1TB
> address, consequently needing 41 phys-bits as opposed to the default of 40.
> I can't figure a reasonable way to establish the required phys-bits we
> need for the memory map in a dynamic way, especially considering that
> today there's already a precedent to depend on the user to pick the right 
> value
> of phys-bits (regardless of this series).
> 
> Additionally, the reserved region is always added regardless of whether we 
> have
> VFIO devices to cover the VFIO device hotplug case.

Various migration issues as you note later in the series.

> Other options considered:
> 
> a) Consider the reserved range part of RAM, and just marking it as
> E820_RESERVED without SPA allocated for it. So a -m 1024G guest would
> only allocate 1010G of RAM and the

Re: [PATCH v2 1/2] sev/i386: Introduce sev_add_kernel_loader_hashes for measured linux boot

2021-06-22 Thread Connor Kuehl

On 6/21/21 2:05 PM, Dov Murik wrote:
> +static void fill_sev_hash_table_entry(SevHashTableEntry *e, const uint8_t 
> *guid,
> +  const uint8_t *hash, size_t hash_len)
> +{
> +memcpy(e->guid, guid, sizeof(e->guid));
> +e->len = sizeof(*e);
> +memcpy(e->hash, hash, hash_len);

Should this memcpy be constrained to MIN(sizeof(e->hash), hash_len)? Or
perhaps an assert statement since I see below that this function's
caller sets this to HASH_SIZE which is currently == sizeof(e->hash).

Actually, the assert statement would be easier to debug if the input
to this function is ever unexpected, especially since it avoids an
outcome where the input is silently truncated; which is a pitfall that
that the memcpy clamping would fall into.

Connor

Re: [PATCH v4 6/6] block-copy: atomic .cancelled and .finished fields in BlockCopyCallState

2021-06-22 Thread Emanuele Giuseppe Esposito





On 22/06/2021 12:39, Vladimir Sementsov-Ogievskiy wrote:

22.06.2021 13:20, Paolo Bonzini wrote:

On 22/06/21 11:36, Vladimir Sementsov-Ogievskiy wrote:
It does.  If it returns true, you still want the load of finished to 
happen before the reads that follow.


Hmm.. The worst case if we use just qatomic_read is that assertion 
will not crash when it actually should. That doesn't break the logic. 
But that's not good anyway.


OK, I agree, let's keep it.


You can also have a finished job, but get a stale value for 
error_is_read or ret.  The issue is not in getting the stale value per 
se, but in block_copy_call_status's caller not expecting it.


(I understand you agree, but I guess it can be interesting to learn 
about this too).




Hmm. So, do you mean that we can read ret and error_is_read ONLY after 
explicitly doing load_acquire(finished) and checking that it's true?


That means that we must do it not in assertion (to not be compiled out):

bool finished = load_acquire()

assert(finished);

... read reat and error_is_read ...




If I understand correctly, this was what I was trying to say before: 
maybe it's better that we make sure that @finished is set before reading 
@ret and @error_is_read. And because assert can be disabled, we can do 
like you wrote above.


Anyways, let's wait Paolo's answer for this. Once this is ready, I will 
send v5.


Emanuele

Re: [PATCH v2 2/2] x86/sev: generate SEV kernel loader hashes in x86_load_linux

2021-06-22 Thread Connor Kuehl

On 6/21/21 2:05 PM, Dov Murik wrote:
> If SEV is enabled and a kernel is passed via -kernel, pass the hashes of
> kernel/initrd/cmdline in an encrypted guest page to OVMF for SEV
> measured boot.
> 
> Co-developed-by: James Bottomley 
> Signed-off-by: James Bottomley 
> Signed-off-by: Dov Murik 
> ---
>  hw/i386/x86.c | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index ed796fe6ba..5c46463d9f 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -45,6 +45,7 @@
>  #include "hw/i386/fw_cfg.h"
>  #include "hw/intc/i8259.h"
>  #include "hw/rtc/mc146818rtc.h"
> +#include "target/i386/sev_i386.h"
>  
>  #include "hw/acpi/cpu_hotplug.h"
>  #include "hw/irq.h"
> @@ -778,6 +779,7 @@ void x86_load_linux(X86MachineState *x86ms,
>  const char *initrd_filename = machine->initrd_filename;
>  const char *dtb_filename = machine->dtb;
>  const char *kernel_cmdline = machine->kernel_cmdline;
> +KernelLoaderContext kernel_loader_context = {};
>  
>  /* Align to 16 bytes as a paranoia measure */
>  cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
> @@ -924,6 +926,8 @@ void x86_load_linux(X86MachineState *x86ms,
>  fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
>  fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) + 1);
>  fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> +kernel_loader_context.cmdline_data = (char *)kernel_cmdline;
> +kernel_loader_context.cmdline_size = strlen(kernel_cmdline) + 1;

I just wanted to check my understanding: I'm guessing you didn't set
`kernel_loader_context.cmdline_size` to `cmdline_size` (defined above)
so guest owners don't have to be aware of whatever alignment precaution
QEMU takes when producing their own measurement, right?

Otherwise:

Reviewed-by: Connor Kuehl

Re: SD/MMC host controller + 64-bit system bus

2021-06-22 Thread Philippe Mathieu-Daudé

Hi Joanne,

On 6/22/21 8:07 PM, Joanne Koong wrote:
> Hello! I noticed that the default SD/MMC host controller only supports a
> 32-bit system bus. Is there a reason 64-bit system buses aren't
> supported by default?

We aim to support the spec v2.00, so this is a bug in the model, 64-bit
system bus should be supported. Do you mind sending a patch?

Thanks,

Phil.

Re: [PATCH v3 03/24] modules: generate modinfo.c

2021-06-22 Thread Jose R. Ziviani

On Fri, Jun 18, 2021 at 06:53:32AM +0200, Gerd Hoffmann wrote:
> Add script to generate C source with a small
> database containing the module meta-data.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  scripts/modinfo-generate.py | 84 +
>  include/qemu/module.h   | 17 
>  softmmu/vl.c|  4 ++
>  util/module.c   | 11 +
>  meson.build | 13 +-
>  5 files changed, 128 insertions(+), 1 deletion(-)
>  create mode 100755 scripts/modinfo-generate.py
> 
> diff --git a/scripts/modinfo-generate.py b/scripts/modinfo-generate.py
> new file mode 100755
> index ..2b925432655a
> --- /dev/null
> +++ b/scripts/modinfo-generate.py
> @@ -0,0 +1,84 @@
> +#!/usr/bin/env python3
> +# -*- coding: utf-8 -*-
> +
> +import os
> +import sys
> +
> +def print_array(name, values):
> +if len(values) == 0:
> +return
> +list = ", ".join(values)
> +print(".%s = ((const char*[]){ %s, NULL })," % (name, list))
> +
> +def parse_line(line):
> +kind = ""
> +data = ""
> +get_kind = False
> +get_data = False
> +for item in line.split():
> +if item == "MODINFO_START":
> +get_kind = True
> +continue
> +if item.startswith("MODINFO_END"):
> +get_data = False
> +continue
> +if get_kind:
> +kind = item
> +get_kind = False
> +get_data = True
> +continue
> +if get_data:
> +data += " " + item
> +continue
> +return (kind, data)
> +
> +def generate(name, lines):
> +arch = ""
> +objs = []
> +deps = []
> +opts = []
> +for line in lines:
> +if line.find("MODINFO_START") != -1:
> +(kind, data) = parse_line(line)
> +if kind == 'obj':
> +objs.append(data)
> +elif kind == 'dep':
> +deps.append(data)
> +elif kind == 'opts':
> +opts.append(data)
> +elif kind == 'arch':
> +arch = data;
> +else:
> +print("unknown:", kind)
> +exit(1)
> +
> +print(".name = \"%s\"," % name)
> +if arch != "":
> +print(".arch = %s," % arch)
> +print_array("objs", objs)
> +print_array("deps", deps)
> +print_array("opts", opts)
> +print("},{");
> +
> +def print_pre():
> +print("/* generated by scripts/modinfo.py */")
> +print("#include \"qemu/osdep.h\"")
> +print("#include \"qemu/module.h\"")
> +print("const QemuModinfo qemu_modinfo[] = {{")
> +
> +def print_post():
> +print("/* end of list */")
> +print("}};")
> +
> +def main(args):
> +print_pre()
> +for modinfo in args:
> +with open(modinfo) as f:
> +lines = f.readlines()
> +print("/* %s */" % modinfo)
> +(basename, ext) = os.path.splitext(modinfo)
> +generate(basename, lines)
> +print_post()

I attached a patch in this message to check if a dependency can be satisfied. 
It will detect the following case:
(in any module)
module_dep("blabla");

[2/5] Generating modinfo.c with a custom command (wrapped by meson to capture 
output)
FAILED: modinfo.c 
/usr/bin/meson --internal exe --capture modinfo.c -- /home/...
Dependency blabla cannot be satisfied
/* generated by scripts/modinfo.py */
#include "qemu/osdep.h"
#include "qemu/module.h"
const QemuModinfo qemu_modinfo[] = {{
/* ui-curses.modinfo */
.name = "ui-curses",
...
},{
/* accel-tcg-x86_64.modinfo */
.name = "accel-tcg-x86_64",
.arch =  "x86_64",
.objs = ((const char*[]){  ("tcg" "-" "accel" "-ops"), NULL }),
.deps = ((const char*[]){  "blabla", NULL }),
},{
/* end of list */
}};
ninja: build stopped: subcommand failed.
make: *** [Makefile:154: run-ninja] Error 1

It can help developers to know early they have a typo somewhere.
You can add this code if you like.

> +
> +if __name__ == "__main__":
> +main(sys.argv[1:])
> diff --git a/include/qemu/module.h b/include/qemu/module.h
> index 81ef086da023..a98748d501d3 100644
> --- a/include/qemu/module.h
> +++ b/include/qemu/module.h
> @@ -98,4 +98,21 @@ void module_load_qom_all(void);
>  /* module registers QemuOpts  */
>  #define module_opts(name) modinfo(opts, name)
>  
> +/*
> + * module info database
> + *
> + * scripts/modinfo-generate.c will build this using the data collected
> + * by scripts/modinfo-collect.py
> + */
> +typedef struct QemuModinfo QemuModinfo;
> +struct QemuModinfo {
> +const char *name;
> +const char *arch;
> +const char **objs;
> +const char **deps;
> +const char **opts;
> +};
> +extern const QemuModinfo qemu_modinfo[];
> +void module_init_info(const QemuModinfo *info);
> +
>  #endif
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 326c1e908008..a4857ec43ff3 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2740,6 +2740,10 @@ void

Re: [PATCH] target/mips: fix emulation of nanoMIPS BPOSGE32 instruction

2021-06-22 Thread Philippe Mathieu-Daudé

Hi Filip and Aleksandar,

On 6/15/21 7:33 PM, Philippe Mathieu-Daudé wrote:
> On 6/15/21 7:22 PM, Aleksandar Rikalo wrote:
>> Per the "MIPS® Architecture Extension: nanoMIPS32 DSP Technical
>> Reference Manual — Revision 0.04" p. 88 "BPOSGE32C", offset argument (imm)
>> should be left-shifted first.
>> This change was tested against test_dsp_r1_bposge32.c DSP test.
> 
> Thank you! Could you add a job to run these tests on the mainstream CI?
> You simply need to open a GitLab account, add your job (probably in
> .gitlab-ci.d/buildtest.yml) and push your branch to test it.

One week passed, so I'll proceed with the MIPS pull request without
these tests. If you want to send them later, they are still welcomed!

Regards,

Phil.

>> Reviewed-by: Aleksandar Rikalo 
>> Signed-off-by: Filip Vidojevic 
> 
> Reviewed-by: Philippe Mathieu-Daudé 
> 
>> ---
>>  target/mips/tcg/translate.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
>> index 797eba4434..2d0a723061 100644
>> --- a/target/mips/tcg/translate.c
>> +++ b/target/mips/tcg/translate.c
>> @@ -21137,7 +21137,7 @@ static int
>> decode_nanomips_32_48_opc(CPUMIPSState *env, DisasContext *ctx)
>>                                        extract32(ctx->opcode, 0, 1) << 13;
>>  
>>                          gen_compute_branch_nm(ctx, OPC_BPOSGE32, 4, -1, -2,
>> -                                              imm);
>> +                                              imm << 1);
>>                      }
>>                      break;
>>                  default:
>> -- 
>> 2.25.1
>

Re: [PATCH 0/1] Add features and cpu models

2021-06-22 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20210622201923.150205-1-borntrae...@de.ibm.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210622201923.150205-1-borntrae...@de.ibm.com
Subject: [PATCH 0/1] Add features and cpu models

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
b71b2c5 s390x/cpumodel: add 3931 and 3932

=== OUTPUT BEGIN ===
ERROR: line over 90 characters
#35: FILE: target/s390x/cpu_features_def.h.inc:113:
+DEF_FEAT(VECTOR_PACKED_DECIMAL_ENH2, "vxpdeh2", STFL, 192, 
"Vector-Packed-Decimal-Enhancement facility 2")

WARNING: line over 80 characters
#38: FILE: target/s390x/cpu_features_def.h.inc:116:
+DEF_FEAT(ACTIVITY, "activity", STFL, 196, "Processor-Activity-Instrumentation 
facility")

WARNING: line over 80 characters
#59: FILE: target/s390x/cpu_models.c:817:
+{ S390_FEAT_VECTOR_PACKED_DECIMAL_ENH, S390_FEAT_VECTOR_PACKED_DECIMAL 
},

WARNING: line over 80 characters
#60: FILE: target/s390x/cpu_models.c:818:
+{ S390_FEAT_VECTOR_PACKED_DECIMAL_ENH2, 
S390_FEAT_VECTOR_PACKED_DECIMAL_ENH },

total: 1 errors, 3 warnings, 73 lines checked

Commit b71b2c5056b7 (s390x/cpumodel: add 3931 and 3932) has style problems, 
please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210622201923.150205-1-borntrae...@de.ibm.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v2] coreaudio: Lock only the buffer

2021-06-22 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20210622201740.38005-1-akihiko.od...@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210622201740.38005-1-akihiko.od...@gmail.com
Subject: [PATCH v2] coreaudio: Lock only the buffer

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20210608040532.56449-1-ma.mando...@gmail.com -> 
patchew/20210608040532.56449-1-ma.mando...@gmail.com
 - [tag update]  patchew/20210621103337.36637-1-eespo...@redhat.com -> 
patchew/20210621103337.36637-1-eespo...@redhat.com
 * [new tag] patchew/20210622201740.38005-1-akihiko.od...@gmail.com -> 
patchew/20210622201740.38005-1-akihiko.od...@gmail.com
 * [new tag] patchew/20210622201923.150205-1-borntrae...@de.ibm.com -> 
patchew/20210622201923.150205-1-borntrae...@de.ibm.com
Switched to a new branch 'test'
6c7537d coreaudio: Lock only the buffer

=== OUTPUT BEGIN ===
ERROR: space prohibited between function name and open parenthesis '('
#58: FILE: audio/coreaudio.c:245:
+static int coreaudio_buf_lock (coreaudioVoiceOut *core, const char *fn_name)

ERROR: space prohibited between function name and open parenthesis '('
#63: FILE: audio/coreaudio.c:249:
+err = pthread_mutex_lock (>buf_mutex);

ERROR: space prohibited between function name and open parenthesis '('
#72: FILE: audio/coreaudio.c:258:
+static int coreaudio_buf_unlock (coreaudioVoiceOut *core, const char *fn_name)

ERROR: space prohibited between function name and open parenthesis '('
#77: FILE: audio/coreaudio.c:262:
+err = pthread_mutex_unlock (>buf_mutex);

ERROR: space prohibited between function name and open parenthesis '('
#114: FILE: audio/coreaudio.c:314:
+if (coreaudio_buf_lock (core, "audioDeviceIOProc")) {

ERROR: space prohibited between function name and open parenthesis '('
#121: FILE: audio/coreaudio.c:320:
+coreaudio_buf_unlock (core, "audioDeviceIOProc(old device)");

ERROR: space prohibited between function name and open parenthesis '('
#130: FILE: audio/coreaudio.c:330:
+coreaudio_buf_unlock (core, "audioDeviceIOProc(empty)");

ERROR: space prohibited between function name and open parenthesis '('
#139: FILE: audio/coreaudio.c:352:
+coreaudio_buf_unlock (core, "audioDeviceIOProc");

total: 8 errors, 0 warnings, 237 lines checked

Commit 6c7537d704dd (coreaudio: Lock only the buffer) has style problems, 
please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210622201740.38005-1-akihiko.od...@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH 1/1] s390x/cpumodel: add 3931 and 3932

2021-06-22 Thread Christian Borntraeger

This defines 5 new facilities and the new 3931 and 3932 machines.
As before the name is not yet known and we do use gen16a and gen16b.
The new features are part of the full model.

The default model is still empty (same as z15) and will be added
in a separate patch at a later point in time.

Also add the dependencies of new facilities and as a fix for z15 add
a dependency from S390_FEAT_VECTOR_PACKED_DECIMAL_ENH to
S390_VECTOR_PACKED_DECIMAL.

Signed-off-by: Christian Borntraeger 
---
 target/s390x/cpu_features_def.h.inc |  5 +
 target/s390x/cpu_models.c   |  6 ++
 target/s390x/gen-features.c | 14 ++
 3 files changed, 25 insertions(+)

diff --git a/target/s390x/cpu_features_def.h.inc 
b/target/s390x/cpu_features_def.h.inc
index 7db3449e0434..c71caee74411 100644
--- a/target/s390x/cpu_features_def.h.inc
+++ b/target/s390x/cpu_features_def.h.inc
@@ -109,6 +109,11 @@ DEF_FEAT(VECTOR_PACKED_DECIMAL_ENH, "vxpdeh", STFL, 152, 
"Vector-Packed-Decimal-
 DEF_FEAT(MSA_EXT_9, "msa9-base", STFL, 155, 
"Message-security-assist-extension-9 facility (excluding subfunctions)")
 DEF_FEAT(ETOKEN, "etoken", STFL, 156, "Etoken facility")
 DEF_FEAT(UNPACK, "unpack", STFL, 161, "Unpack facility")
+DEF_FEAT(NNPA, "nnpa", STFL, 165, "NNPA facility")
+DEF_FEAT(VECTOR_PACKED_DECIMAL_ENH2, "vxpdeh2", STFL, 192, 
"Vector-Packed-Decimal-Enhancement facility 2")
+DEF_FEAT(BEAR, "bear", STFL, 193, "BEAR-enhancement facility")
+DEF_FEAT(RDP, "rdp", STFL, 194, "Reset-DAT-protection facility")
+DEF_FEAT(ACTIVITY, "activity", STFL, 196, "Processor-Activity-Instrumentation 
facility")
 
 /* Features exposed via SCLP SCCB Byte 80 - 98  (bit numbers relative to 
byte-80) */
 DEF_FEAT(SIE_GSLS, "gsls", SCLP_CONF_CHAR, 40, "SIE: 
Guest-storage-limit-suppression facility")
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 94090a6e223d..9699823b2074 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -88,6 +88,8 @@ static S390CPUDef s390_cpu_defs[] = {
 CPUDEF_INIT(0x3907, 14, 1, 47, 0x0800U, "z14ZR1", "IBM z14 Model ZR1 
GA1"),
 CPUDEF_INIT(0x8561, 15, 1, 47, 0x0800U, "gen15a", "IBM z15 T01 GA1"),
 CPUDEF_INIT(0x8562, 15, 1, 47, 0x0800U, "gen15b", "IBM z15 T02 GA1"),
+CPUDEF_INIT(0x3931, 16, 1, 47, 0x0800U, "gen16a", "IBM 3931 GA1"),
+CPUDEF_INIT(0x3932, 16, 1, 47, 0x0800U, "gen16b", "IBM 3932 GA1"),
 };
 
 #define QEMU_MAX_CPU_TYPE 0x3906
@@ -812,6 +814,8 @@ static void check_consistency(const S390CPUModel *model)
 { S390_FEAT_MSA_EXT_9, S390_FEAT_MSA_EXT_4 },
 { S390_FEAT_MULTIPLE_EPOCH, S390_FEAT_TOD_CLOCK_STEERING },
 { S390_FEAT_VECTOR_PACKED_DECIMAL, S390_FEAT_VECTOR },
+{ S390_FEAT_VECTOR_PACKED_DECIMAL_ENH, S390_FEAT_VECTOR_PACKED_DECIMAL 
},
+{ S390_FEAT_VECTOR_PACKED_DECIMAL_ENH2, 
S390_FEAT_VECTOR_PACKED_DECIMAL_ENH },
 { S390_FEAT_VECTOR_ENH, S390_FEAT_VECTOR },
 { S390_FEAT_INSTRUCTION_EXEC_PROT, S390_FEAT_SIDE_EFFECT_ACCESS_ESOP2 
},
 { S390_FEAT_SIDE_EFFECT_ACCESS_ESOP2, S390_FEAT_ESOP },
@@ -843,6 +847,8 @@ static void check_consistency(const S390CPUModel *model)
 { S390_FEAT_PTFF_STOUE, S390_FEAT_MULTIPLE_EPOCH },
 { S390_FEAT_AP_QUEUE_INTERRUPT_CONTROL, S390_FEAT_AP },
 { S390_FEAT_DIAG_318, S390_FEAT_EXTENDED_LENGTH_SCCB },
+{ S390_FEAT_NNPA, S390_FEAT_VECTOR },
+{ S390_FEAT_RDP, S390_FEAT_LOCAL_TLB_CLEARING },
 };
 int i;
 
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 242c95ede48a..a7396d3d5f30 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -424,6 +424,8 @@ static uint16_t base_GEN15_GA1[] = {
 S390_FEAT_MISC_INSTRUCTION_EXT3,
 };
 
+#define base_GEN16_GA1 EmptyFeat
+
 /* Full features (in order of release)
  * Automatically includes corresponding base features.
  * Full features are all features this hardware supports even if kvm/QEMU do 
not
@@ -567,6 +569,15 @@ static uint16_t full_GEN15_GA1[] = {
 S390_FEAT_UNPACK,
 };
 
+static uint16_t full_GEN16_GA1[] = {
+S390_FEAT_NNPA,
+S390_FEAT_VECTOR_PACKED_DECIMAL_ENH2,
+S390_FEAT_BEAR,
+S390_FEAT_RDP,
+S390_FEAT_ACTIVITY,
+};
+
+
 /* Default features (in order of release)
  * Automatically includes corresponding base features.
  * Default features are all features this version of QEMU supports for this
@@ -652,6 +663,8 @@ static uint16_t default_GEN15_GA1[] = {
 S390_FEAT_ETOKEN,
 };
 
+#define default_GEN16_GA1 EmptyFeat
+
 /* QEMU (CPU model) features */
 
 static uint16_t qemu_V2_11[] = {
@@ -785,6 +798,7 @@ static CpuFeatDefSpec CpuFeatDef[] = {
 CPU_FEAT_INITIALIZER(GEN14_GA1),
 CPU_FEAT_INITIALIZER(GEN14_GA2),
 CPU_FEAT_INITIALIZER(GEN15_GA1),
+CPU_FEAT_INITIALIZER(GEN16_GA1),
 };
 
 #define FEAT_GROUP_INITIALIZER(_name)  \
-- 
2.31.1

[PATCH 0/1] Add features and cpu models

2021-06-22 Thread Christian Borntraeger

5 new features and 2 new models

Christian Borntraeger (1):
  s390x/cpumodel: add 3931 and 3932

 target/s390x/cpu_features_def.h.inc |  5 +
 target/s390x/cpu_models.c   |  6 ++
 target/s390x/gen-features.c | 14 ++
 3 files changed, 25 insertions(+)

-- 
2.31.1

Re: [PATCH v3 7/7] tests/acceptance: Handle cpu tag on x86_cpu_model_versions tests

2021-06-22 Thread Willian Rampazzo

On Fri, Apr 30, 2021 at 10:35 AM Wainer dos Santos Moschetta
 wrote:
>
> Some test cases on x86_cpu_model_versions.py are corner cases because they
> need to pass extra options to the -cpu argument. Once the avocado_qemu
> framework will set -cpu automatically, the value should be reset. This changed
> those tests so to call set_vm_arg() to overwrite the -cpu value.
>
> Signed-off-by: Wainer dos Santos Moschetta 
> ---
>  tests/acceptance/x86_cpu_model_versions.py | 40 +-
>  1 file changed, 32 insertions(+), 8 deletions(-)
>

Reviewed-by: Willian Rampazzo

[PATCH v2] coreaudio: Lock only the buffer

2021-06-22 Thread Akihiko Odaki

On macOS 11.3.1, Core Audio calls AudioDeviceIOProc after calling an
internal function named HALB_Mutex::Lock(), which locks a mutex in
HALB_IOThread::Entry(void*). HALB_Mutex::Lock() is also called in
AudioObjectGetPropertyData, which is called by coreaudio driver.
Therefore, a deadlock will occur if coreaudio driver calls
AudioObjectGetPropertyData while holding a lock for a mutex and tries
to lock the same mutex in AudioDeviceIOProc.

audioDeviceIOProc, which implements AudioDeviceIOProc in coreaudio
driver, requires an exclusive access for the device configuration and
the buffer. Fortunately, a mutex is necessary only for the buffer in
audioDeviceIOProc because a change for the device configuration occurs
only before setting up AudioDeviceIOProc or after stopping the playback
with AudioDeviceStop.

With this change, the mutex owned by the driver will only be used for
the buffer, and the device configuration change will be protected with
the implicit iothread mutex.

Signed-off-by: Akihiko Odaki 
---
 audio/coreaudio.c | 102 +++---
 1 file changed, 41 insertions(+), 61 deletions(-)

diff --git a/audio/coreaudio.c b/audio/coreaudio.c
index 578ec9b8b2e..c239f756337 100644
--- a/audio/coreaudio.c
+++ b/audio/coreaudio.c
@@ -26,6 +26,7 @@
 #include 
 #include /* pthread_X */
 
+#include "qemu/main-loop.h"
 #include "qemu/module.h"
 #include "audio.h"
 
@@ -34,7 +35,7 @@
 
 typedef struct coreaudioVoiceOut {
 HWVoiceOut hw;
-pthread_mutex_t mutex;
+pthread_mutex_t buf_mutex;
 AudioDeviceID outputDeviceID;
 int frameSizeSetting;
 uint32_t bufferCount;
@@ -260,11 +261,11 @@ static void GCC_FMT_ATTR (3, 4) coreaudio_logerr2 (
 #define coreaudio_playback_logerr(status, ...) \
 coreaudio_logerr2(status, "playback", __VA_ARGS__)
 
-static int coreaudio_lock (coreaudioVoiceOut *core, const char *fn_name)
+static int coreaudio_buf_lock (coreaudioVoiceOut *core, const char *fn_name)
 {
 int err;
 
-err = pthread_mutex_lock (>mutex);
+err = pthread_mutex_lock (>buf_mutex);
 if (err) {
 dolog ("Could not lock voice for %s\nReason: %s\n",
fn_name, strerror (err));
@@ -273,11 +274,11 @@ static int coreaudio_lock (coreaudioVoiceOut *core, const 
char *fn_name)
 return 0;
 }
 
-static int coreaudio_unlock (coreaudioVoiceOut *core, const char *fn_name)
+static int coreaudio_buf_unlock (coreaudioVoiceOut *core, const char *fn_name)
 {
 int err;
 
-err = pthread_mutex_unlock (>mutex);
+err = pthread_mutex_unlock (>buf_mutex);
 if (err) {
 dolog ("Could not unlock voice for %s\nReason: %s\n",
fn_name, strerror (err));
@@ -292,13 +293,13 @@ static int coreaudio_unlock (coreaudioVoiceOut *core, 
const char *fn_name)
 coreaudioVoiceOut *core = (coreaudioVoiceOut *) hw; \
 ret_type ret;   \
 \
-if (coreaudio_lock(core, "coreaudio_" #name)) { \
+if (coreaudio_buf_lock(core, "coreaudio_" #name)) { \
 return 0;   \
 }   \
 \
 ret = glue(audio_generic_, name)args;   \
 \
-coreaudio_unlock(core, "coreaudio_" #name); \
+coreaudio_buf_unlock(core, "coreaudio_" #name); \
 return ret; \
 }
 COREAUDIO_WRAPPER_FUNC(get_buffer_out, void *, (HWVoiceOut *hw, size_t *size),
@@ -310,7 +311,10 @@ COREAUDIO_WRAPPER_FUNC(write, size_t, (HWVoiceOut *hw, 
void *buf, size_t size),
(hw, buf, size))
 #undef COREAUDIO_WRAPPER_FUNC
 
-/* callback to feed audiooutput buffer */
+/*
+ * callback to feed audiooutput buffer. called without iothread lock.
+ * allowed to lock "buf_mutex", but disallowed to have any other locks.
+ */
 static OSStatus audioDeviceIOProc(
 AudioDeviceID inDevice,
 const AudioTimeStamp *inNow,
@@ -326,13 +330,13 @@ static OSStatus audioDeviceIOProc(
 coreaudioVoiceOut *core = (coreaudioVoiceOut *) hwptr;
 size_t len;
 
-if (coreaudio_lock (core, "audioDeviceIOProc")) {
+if (coreaudio_buf_lock (core, "audioDeviceIOProc")) {
 inInputTime = 0;
 return 0;
 }
 
 if (inDevice != core->outputDeviceID) {
-coreaudio_unlock (core, "audioDeviceIOProc(old device)");
+coreaudio_buf_unlock (core, "audioDeviceIOProc(old device)");
 return 0;
 }
 
@@ -342,7 +346,7 @@ static OSStatus audioDeviceIOProc(
 /* if there are not enough samples, set signal and return */
 if (pending_frames < frameCount) {
 inInputTime = 0;
-coreaudio_unlock (core,

[Bug 1776920] Re: qemu-img convert on Mac OSX creates corrupt images

2021-06-22 Thread Juan Niño

Hey there! I tested @wkozaczuk's suggested minimal steps and THEY WORKED
FOR ME!!

The steps executed on my mac:
1. dd if=boot.bin of=image.img > /dev/null 2>&1
2. dd if=lzloader.elf of=image.img conv=notrunc seek=128 > /dev/null 2>&1
3. qemu-img convert image.img -O qcow2 image.qemu
4. qemu-img convert image.qemu -O qcow2 image2.qemu

The end result:
-rw-r--r--  1 ***  ***  6684672 Jun 22 14:19 image.img
-rw-r--r--  1 ***  ***  7012352 Jun 22 14:20 image.qemu
-rw-r--r--  1 ***  ***  7012352 Jun 22 14:20 image2.qemu
-rw-r--r--  1 ***  ***  6750208 Jun 22 14:22 image2.vbox

The result of regular compare:
qemu-img compare image.qemu image2.qemu
Images are identical.

The result of strict compare:
qemu-img compare -s image.qemu image2.qemu
Images are identical.

Qemu-img on my Mac:
qemu-img --version
qemu-img version 6.0.0
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

Hardware Overview:

  Model Name:   MacBook Pro
  Model Identifier: MacBookPro16,1
  Processor Name:   8-Core Intel Core i9
  Processor Speed:  2,4 GHz
  Number of Processors: 1
  Total Number of Cores:8
  L2 Cache (per Core):  256 KB
  L3 Cache: 16 MB
  Hyper-Threading Technology:   Enabled
  Memory:   64 GB
  Activation Lock Status:   Enabled

Storage:

  Mount Point:  /
  File System:  APFS
  Writable: No
  Ignore Ownership: No
  BSD Name: disk1s1
  Volume UUID:  67798918-C522-45C3-918F-3C4155EF4D13
  Physical Drive:
  Device Name:  APPLE SSD AP1024N
  Media Name:   AppleAPFSMedia
  Medium Type:  SSD
  Protocol: PCI-Express
  Internal: Yes
  Partition Map Type:   Unknown
  S.M.A.R.T. Status:Verified

System Software Overview:

  System Version:   macOS 10.15.7 (19H1217)
  Kernel Version:   Darwin 19.6.0
  Boot Volume:  Macintosh HD
  Boot Mode:Normal
  Secure Virtual Memory:Enabled
  System Integrity Protection:  Enabled

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1776920

Title:
  qemu-img convert on Mac OSX creates corrupt images

Status in QEMU:
  Expired

Bug description:
  An image created by qemu-img create, then modified by another program
  is converted to bad/corrupt image when using convert sub command on
  Mac OSX. The same convert works on Linux. The version of qemu-img is
  2.12.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1776920/+subscriptions

Re: [PATCH v3 6/7] tests/acceptance: Add set_vm_arg() to the Test class

2021-06-22 Thread Willian Rampazzo

On Fri, Apr 30, 2021 at 10:35 AM Wainer dos Santos Moschetta
 wrote:
>
> The set_vm_arg method is added to avocado_qemu.Test class on this
> change. Use that method to set (or replace) an argument to the list of
> arguments given to the QEMU binary.
>
> Suggested-by: Cleber Rosa 
> Signed-off-by: Wainer dos Santos Moschetta 
> ---
>  tests/acceptance/avocado_qemu/__init__.py | 21 +
>  1 file changed, 21 insertions(+)
>
> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> b/tests/acceptance/avocado_qemu/__init__.py
> index 7f8e703757..14c6ae70c8 100644
> --- a/tests/acceptance/avocado_qemu/__init__.py
> +++ b/tests/acceptance/avocado_qemu/__init__.py
> @@ -240,6 +240,27 @@ def get_vm(self, *args, name=None):
>  self._vms[name].set_machine(self.machine)
>  return self._vms[name]
>
> +def set_vm_arg(self, arg, value):
> +"""
> +Set an argument to list of extra arguments to be given to the QEMU
> +binary. If the argument already exists then its value is replaced.
> +
> +:param arg: the QEMU argument, such as "-cpu" in "-cpu host"
> +:type arg: str
> +:param value: the argument value, such as "host" in "-cpu host"
> +:type value: str
> +"""
> +if not arg or not value:
> +return
> +if arg not in self.vm.args:
> +self.vm.args.extend([arg, value])
> +else:
> +idx = self.vm.args.index(arg) + 1
> +if idx < len(self.vm.args):
> +self.vm.args[idx] = value
> +else:
> +self.vm.args.append(value)
> +

Considering all args in self.vm.args are composed of [arg,value]:

Reviewed-by: Willian Rampazzo 

>  def tearDown(self):
>  for vm in self._vms.values():
>  vm.shutdown()
> --
> 2.29.2
>

Re: [PATCH 5/6] tests/acceptance: add replay kernel test for alpha

2021-06-22 Thread Willian Rampazzo

On Thu, Jun 10, 2021 at 8:25 AM Pavel Dovgalyuk
 wrote:
>
> This patch adds record/replay test which boots Linux
> kernel on alpha platform. The test uses kernel binaries
> taken from boot_linux_console test.
>
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  tests/acceptance/replay_kernel.py |   17 +
>  1 file changed, 17 insertions(+)
>

Reviewed-by: Willian Rampazzo

Re: [PATCH 4/6] tests/acceptance: add replay kernel test for nios2

2021-06-22 Thread Willian Rampazzo

On Thu, Jun 10, 2021 at 8:25 AM Pavel Dovgalyuk
 wrote:
>
> This patch adds record/replay test which boots Linux
> kernel on nios2 platform. The test uses kernel binaries
> taken from boot_linux_console test.
>
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  tests/acceptance/replay_kernel.py |   11 +++
>  1 file changed, 11 insertions(+)
>

Reviewed-by: Willian Rampazzo

Re: [PATCH 3/6] tests/acceptance: add replay kernel test for openrisc

2021-06-22 Thread Willian Rampazzo

On Thu, Jun 10, 2021 at 8:25 AM Pavel Dovgalyuk
 wrote:
>
> This patch adds record/replay test which boots Linux
> kernel on openrisc platform. The test uses kernel binaries
> taken from boot_linux_console test.
>
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  tests/acceptance/replay_kernel.py |   11 +++
>  1 file changed, 11 insertions(+)
>

Reviewed-by: Willian Rampazzo

Re: [PATCH 2/6] tests/acceptance: add replay kernel test for ppc64

2021-06-22 Thread Willian Rampazzo

On Thu, Jun 10, 2021 at 8:25 AM Pavel Dovgalyuk
 wrote:
>
> This patch adds record/replay test which boots Linux
> kernel on ppc64 platform. The test uses kernel binaries
> taken from boot_linux_console test.
>
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  tests/acceptance/boot_linux_console.py |   12 
>  1 file changed, 12 insertions(+)
>

This is already upstream, right? b52d7e216c6 or am I missing something?

Re: [PATCH 1/6] tests/acceptance: add replay kernel test for s390

2021-06-22 Thread Willian Rampazzo

On Thu, Jun 10, 2021 at 8:24 AM Pavel Dovgalyuk
 wrote:
>
> This patch adds record/replay test which boots Linux
> kernel on s390x platform. The test uses kernel binaries
> taken from boot_linux_console test.
>
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  tests/acceptance/replay_kernel.py |   16 
>  1 file changed, 16 insertions(+)
>

Reviewed-by: Willian Rampazzo

Re: RFC: Implementation of QMP documentation retrieval command

2021-06-22 Thread Niteesh G. S.

Hi Stefan,
On Tue, Jun 22, 2021 at 3:05 PM Stefan Hajnoczi  wrote:

> On Mon, Jun 21, 2021 at 11:56:30PM +0530, Niteesh G. S. wrote:
> > TLDR: The goal of this mail wasn't to review the dummy command I had
> posted
> > but
> > rather start a discussion regarding the implementation of the QMP
> > documentation
> > retrieval command for the TUI.
>
> It's not clear to me what exactly you wanted to discuss. Here is the QMP
> schema from the commit you linked. It's something concrete that we can
> discuss:
>

I wanted to discuss the implementation of the documentation retrieval
command. Things like
1) The JSON schema we will be using to represent the documentation.
2) How will we be parsing documentation from the JSON files under qemu/qapi?
3) How will/where we'll be storing this parsed information?
And other questions which will have to be answered before proceeding to
implement this command.
4) Where to get data for autocomplete for the TUI?

- One easy way is to hardcode all available commands in the TUI
   autocomplete. But then we have to make sure to update the autocomplete
   list for TUI every time one new command gets added to QMP.

  ##
>   # @CommandDocumentation:
>   #
>   # A object representing documentation for a command.
>   #
>   # @name: Command name
>   #
>   # @doc: A string containing the documentation.
>
> Is @doc in some kind of markup or plain text?
>

Since this is just a prototype I have used plain text. But for the real
command
I expect something more structured since the comments I have seen in the
QAPI schema has some structure associated with them.
eg:
##
# @query-status:
#
# Query the run status of all VCPUs
#
# Returns: @StatusInfo reflecting all VCPUs
#
# Since:  0.14
#
# Example:
#
# -> { "execute": "query-status" }
# <- { "return": { "running": true,
#  "singlestep": false,
#  "status": "running" } }
#
##
We have the following structure
1) Command name
2) Documentation
3) Arguments (if any)
4) Return type with reference to non-primitive data types like
structs/enums etc
5) Since
6) Example

In the case of commands referring structures/enums and other non-primitive
data types
if possible we should also add their documentation along with the
documentation
for the command.
Yes, we could find out all the data types referenced by the current command
and
add them to the documentation if possible. This will make it easy for the
user.
If it isn't possible then we must allow to also query documentation related
to struct/enums etc.

  #
>   ##
>   { 'struct': 'CommandDocumentation',
> 'data': {'name': 'str', 'doc': 'str'} }
>
>   ##
>   # @query-cmd-doc:
>   #
>   # (A simple *prototype* implementation)
>   # Command query-cmd-doc will return the documentation for the command
>   # specified. This will help QMP clients currently the AQMP TUI to show
>   # documentation related to a specific command.
>   #
>   # @command-name: The command name to query documentation for
>   #
>   # Returns: A @CommandDocumentation object containing the documentation.
>   #
>   # Since: TODO: Add a number
>   ##
>   { 'command': 'query-cmd-doc',
> 'data': { 'command-name': 'str' },
> 'returns': 'CommandDocumentation' }
>
> Is there a way to retrieve struct/enum/etc documentation?
>
Not sure. I have gone through the parser code in qemu/scripts/qapi and also
have
seen the parser being used for documentation generation but I still don't
understand
the capabilities of the parser.

> Do you see a need to query multiple items of documentation in a single
> command? A single item query command can become a performance bottleneck
> if the clients wants to query the documentation for all commands, for
> example. This can be solved by making the the return value an array and
> allowing multiple commands to be queried at once.
>
Why will clients want to query the documentation for all commands? Even if
they do
won't that be an infrequent operation?
>From the TUI perspective, I think it will be enough if we just have the
capability to
service one command at a time. We can also have the TUI cache the results
and
validate the cache during the greeting process by sending some kind of hash
to
notify if any documentation has changed or not.

>
> Do you see a need for wildcard queries where the client does not have
> the full command name? I guess the client has enough auto-completion
> information to search all commands on the client side, so maybe this
> functionality isn't necessary here?
>
One of my major questions(also mentioned above) is how will the client-side
get information regarding all the commands available in QMP? If we implement
a proper autocomplete feature then I don't think we will have to worry about
wildcard queries.

> Stefan
>

Re: [PATCH 2/2] target/ppc: Drop PowerPCCPUClass::interrupts_big_endian()

2021-06-22 Thread Fabiano Rosas

Greg Kurz  writes:

> This isn't used anymore.
>
> Signed-off-by: Greg Kurz 

Reviewed-by: Fabiano Rosas 

> ---
>  target/ppc/cpu-qom.h  |  1 -
>  target/ppc/cpu_init.c | 17 -
>  2 files changed, 18 deletions(-)
>
> diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> index 06b6571bc9d5..7b424e3cb0bc 100644
> --- a/target/ppc/cpu-qom.h
> +++ b/target/ppc/cpu-qom.h
> @@ -199,7 +199,6 @@ struct PowerPCCPUClass {
>  void (*init_proc)(CPUPPCState *env);
>  int  (*check_pow)(CPUPPCState *env);
>  int (*handle_mmu_fault)(PowerPCCPU *cpu, vaddr eaddr, int rwx, int 
> mmu_idx);
> -bool (*interrupts_big_endian)(PowerPCCPU *cpu);
>  };
>  
>  #ifndef CONFIG_USER_ONLY
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index d0411e7302a2..1a22aef874b1 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -2666,18 +2666,6 @@ static int check_pow_hid0_74xx(CPUPPCState *env)
>  return 0;
>  }
>  
> -static bool ppc_cpu_interrupts_big_endian_always(PowerPCCPU *cpu)
> -{
> -return true;
> -}
> -
> -#ifdef TARGET_PPC64
> -static bool ppc_cpu_interrupts_big_endian_lpcr(PowerPCCPU *cpu)
> -{
> -return !(cpu->env.spr[SPR_LPCR] & LPCR_ILE);
> -}
> -#endif
> -
>  
> /*/
>  /* PowerPC implementations definitions   
> */
>  
> @@ -7740,7 +7728,6 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
>   POWERPC_FLAG_VSX;
>  pcc->l1_dcache_size = 0x8000;
>  pcc->l1_icache_size = 0x8000;
> -pcc->interrupts_big_endian = ppc_cpu_interrupts_big_endian_lpcr;
>  }
>  
>  static void init_proc_POWER8(CPUPPCState *env)
> @@ -7918,7 +7905,6 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
>   POWERPC_FLAG_VSX | POWERPC_FLAG_TM;
>  pcc->l1_dcache_size = 0x8000;
>  pcc->l1_icache_size = 0x8000;
> -pcc->interrupts_big_endian = ppc_cpu_interrupts_big_endian_lpcr;
>  }
>  
>  #ifdef CONFIG_SOFTMMU
> @@ -8136,7 +8122,6 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
>   POWERPC_FLAG_VSX | POWERPC_FLAG_TM | POWERPC_FLAG_SCV;
>  pcc->l1_dcache_size = 0x8000;
>  pcc->l1_icache_size = 0x8000;
> -pcc->interrupts_big_endian = ppc_cpu_interrupts_big_endian_lpcr;
>  }
>  
>  #ifdef CONFIG_SOFTMMU
> @@ -8347,7 +8332,6 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data)
>   POWERPC_FLAG_VSX | POWERPC_FLAG_TM | POWERPC_FLAG_SCV;
>  pcc->l1_dcache_size = 0x8000;
>  pcc->l1_icache_size = 0x8000;
> -pcc->interrupts_big_endian = ppc_cpu_interrupts_big_endian_lpcr;
>  }
>  
>  #if !defined(CONFIG_USER_ONLY)
> @@ -9094,7 +9078,6 @@ static void ppc_cpu_class_init(ObjectClass *oc, void 
> *data)
>  device_class_set_parent_unrealize(dc, ppc_cpu_unrealize,
>>parent_unrealize);
>  pcc->pvr_match = ppc_pvr_match_default;
> -pcc->interrupts_big_endian = ppc_cpu_interrupts_big_endian_always;
>  device_class_set_props(dc, ppc_cpu_properties);
>  
>  device_class_set_parent_reset(dc, ppc_cpu_reset, >parent_reset);

Re: [PATCH 1/2] target/ppc: Introduce ppc_interrupts_little_endian()

2021-06-22 Thread Fabiano Rosas

Greg Kurz  writes:

> PowerPC CPUs use big endian by default but starting with POWER7,
> server grade CPUs use the ILE bit of the LPCR special purpose
> register to decide on the endianness to use when handling
> interrupts. This gives a clue to QEMU on the endianness the
> guest kernel is running, which is needed when generating an
> ELF dump of the guest or when delivering an FWNMI machine
> check interrupt.
>
> Commit 382d2db62bcb ("target-ppc: Introduce callback for interrupt
> endianness") added a class method to PowerPCCPUClass to modelize
> this : default implementation returns a fixed "big endian" value,
> while POWER7 and newer do the LPCR_ILE check. This is suboptimal
> as it forces to implement the method for every new CPU family, and
> it is very unlikely that this will ever be different than what we
> have today.
>
> We basically only have three cases to consider:
> a) CPU doesn't have an LPCR => big endian
> b) CPU has an LPCR but doesn't support the ILE bit => big endian
> c) CPU has an LPCR and supports the ILE bit => little or big endian
>
> Instead of class methods, introduce an inline helper that checks the
> ILE bit in the LPCR_MASK to decide on the outcome. The new helper
> words little endian instead of big endian. This allows to drop a !
> operator in ppc_cpu_do_fwnmi_machine_check().
>
> Signed-off-by: Greg Kurz 

Reviewed-by: Fabiano Rosas 

> ---
>  target/ppc/cpu.h | 15 +++
>  target/ppc/arch_dump.c   |  8 +++-
>  target/ppc/excp_helper.c |  3 +--
>  3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index b4de0db7ff5c..93d308ac8f2d 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -2643,6 +2643,21 @@ static inline bool ppc_has_spr(PowerPCCPU *cpu, int 
> spr)
>  return cpu->env.spr_cb[spr].name != NULL;
>  }
>  
> +static inline bool ppc_interrupts_little_endian(PowerPCCPU *cpu)
> +{
> +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +
> +/*
> + * Only models that have an LPCR and know about LPCR_ILE can do little
> + * endian.
> + */
> +if (pcc->lpcr_mask & LPCR_ILE) {
> +return !!(cpu->env.spr[SPR_LPCR] & LPCR_ILE);
> +}
> +
> +return false;
> +}
> +
>  void dump_mmu(CPUPPCState *env);
>  
>  void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
> diff --git a/target/ppc/arch_dump.c b/target/ppc/arch_dump.c
> index 9210e61ef463..bb392f6d8885 100644
> --- a/target/ppc/arch_dump.c
> +++ b/target/ppc/arch_dump.c
> @@ -227,22 +227,20 @@ int cpu_get_dump_info(ArchDumpInfo *info,
>const struct GuestPhysBlockList *guest_phys_blocks)
>  {
>  PowerPCCPU *cpu;
> -PowerPCCPUClass *pcc;
>  
>  if (first_cpu == NULL) {
>  return -1;
>  }
>  
>  cpu = POWERPC_CPU(first_cpu);
> -pcc = POWERPC_CPU_GET_CLASS(cpu);
>  
>  info->d_machine = PPC_ELF_MACHINE;
>  info->d_class = ELFCLASS;
>  
> -if ((*pcc->interrupts_big_endian)(cpu)) {
> -info->d_endian = ELFDATA2MSB;
> -} else {
> +if (ppc_interrupts_little_endian(cpu)) {
>  info->d_endian = ELFDATA2LSB;
> +} else {
> +info->d_endian = ELFDATA2MSB;
>  }
>  /* 64KB is the max page size for pseries kernel */
>  if (strncmp(object_get_typename(qdev_get_machine()),
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index fd147e2a3766..a79a0ed465e5 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -1099,7 +1099,6 @@ void ppc_cpu_do_fwnmi_machine_check(CPUState *cs, 
> target_ulong vector)
>  {
>  PowerPCCPU *cpu = POWERPC_CPU(cs);
>  CPUPPCState *env = >env;
> -PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>  target_ulong msr = 0;
>  
>  /*
> @@ -1108,7 +1107,7 @@ void ppc_cpu_do_fwnmi_machine_check(CPUState *cs, 
> target_ulong vector)
>   */
>  msr = (1ULL << MSR_ME);
>  msr |= env->msr & (1ULL << MSR_SF);
> -if (!(*pcc->interrupts_big_endian)(cpu)) {
> +if (ppc_interrupts_little_endian(cpu)) {
>  msr |= (1ULL << MSR_LE);
>  }

RE: Denormal input handling

2021-06-22 Thread Michael Morrell

OK, I've done more testing.  I'm not sure if we need any specialization, but 
the setting for float_flag_inorm_denormal isn't right for x86.

It is set unconditionally when flush_inputs_to_zero is false, but it needs to 
take into account the other operand(s).   Given "denorm / 0" or any instruction 
with a NaN operand, float_flag_inorm_denormal should not be set (and that way, 
the DE bit in MXCSR won't be set when it shouldn't be).

   Michael

-Original Message-
From: Michael Morrell 
Sent: Monday, June 21, 2021 5:57 PM
To: 'Richard Henderson' ; qemu-devel@nongnu.org
Subject: RE: Denormal input handling

Richard,

I was under the mistaken impression that your changes in this area (splitting 
float_flag_input_denormal into 2 flags) were already checked in, but I see now 
that is not the case.  I should probably wait until that is done before I try 
to claim there are additional issues here.

Michael

-Original Message-
From: Richard Henderson 
Sent: Monday, June 21, 2021 4:30 PM
To: Michael Morrell ; qemu-devel@nongnu.org
Subject: Re: Denormal input handling

On 6/21/21 4:13 PM, Michael Morrell wrote:
> I have another couple of thoughts around input denormal handling.
> 
> The first is straightforward.  I noticed that the Aarch64 port doesn't 
> report input denormals (I could not find any code which sets the IDC 
> bit in the FPSR).  I found code in the arm (not aarch64) port that 
> sets other bits like IXC, but nothing for IDC.   Is that simply because no 
> one has bothered to add this support?

It's because we failed to use symbolic constants.  See 
vfp_exceptbits_from_host.  Which
*is* used for both aarch64 and arm.

> The second concerns support for cases where multiple exception 
> conditions occur.   I had originally thought that denormal input 
> handling would be orthogonal to everything else and so a case like 
> "sNaN  + denorm" would set both the invalid and input denormal flags 
> or "denorm / 0" would set both idivde by zero and input denormal, but I don't 
> think that is true for at least some architectures.  Perhaps some 
> specialization is needed here?

If you've got a specific example, we can look at it.  There's no point adding 
specialization that isn't going to be used.

r~

Re: [PATCH v3 03/24] modules: generate modinfo.c

2021-06-22 Thread Jose R. Ziviani

Hello,

Just a small change.

On Fri, Jun 18, 2021 at 06:53:32AM +0200, Gerd Hoffmann wrote:
> Add script to generate C source with a small
> database containing the module meta-data.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  scripts/modinfo-generate.py | 84 +
>  include/qemu/module.h   | 17 
>  softmmu/vl.c|  4 ++
>  util/module.c   | 11 +
>  meson.build | 13 +-
>  5 files changed, 128 insertions(+), 1 deletion(-)
>  create mode 100755 scripts/modinfo-generate.py
> 
> diff --git a/scripts/modinfo-generate.py b/scripts/modinfo-generate.py
> new file mode 100755
> index ..2b925432655a
> --- /dev/null
> +++ b/scripts/modinfo-generate.py
> @@ -0,0 +1,84 @@
> +#!/usr/bin/env python3
> +# -*- coding: utf-8 -*-
> +
> +import os
> +import sys
> +
> +def print_array(name, values):
> +if len(values) == 0:
> +return
> +list = ", ".join(values)
> +print(".%s = ((const char*[]){ %s, NULL })," % (name, list))
> +
> +def parse_line(line):
> +kind = ""
> +data = ""
> +get_kind = False
> +get_data = False
> +for item in line.split():
> +if item == "MODINFO_START":
> +get_kind = True
> +continue
> +if item.startswith("MODINFO_END"):
> +get_data = False
> +continue
> +if get_kind:
> +kind = item
> +get_kind = False
> +get_data = True
> +continue
> +if get_data:
> +data += " " + item
> +continue
> +return (kind, data)
> +
> +def generate(name, lines):
> +arch = ""
> +objs = []
> +deps = []
> +opts = []
> +for line in lines:
> +if line.find("MODINFO_START") != -1:
> +(kind, data) = parse_line(line)
> +if kind == 'obj':
> +objs.append(data)
> +elif kind == 'dep':
> +deps.append(data)
> +elif kind == 'opts':
> +opts.append(data)
> +elif kind == 'arch':
> +arch = data;
> +else:
> +print("unknown:", kind)
> +exit(1)
> +
> +print(".name = \"%s\"," % name)
> +if arch != "":
> +print(".arch = %s," % arch)
> +print_array("objs", objs)
> +print_array("deps", deps)
> +print_array("opts", opts)
> +print("},{");
> +
> +def print_pre():
> +print("/* generated by scripts/modinfo.py */")

generated by scripts/modinfo-generate.py

> +print("#include \"qemu/osdep.h\"")
> +print("#include \"qemu/module.h\"")
> +print("const QemuModinfo qemu_modinfo[] = {{")
> +
> +def print_post():
> +print("/* end of list */")
> +print("}};")
> +
> +def main(args):
> +print_pre()
> +for modinfo in args:
> +with open(modinfo) as f:
> +lines = f.readlines()
> +print("/* %s */" % modinfo)
> +(basename, ext) = os.path.splitext(modinfo)
> +generate(basename, lines)
> +print_post()
> +
> +if __name__ == "__main__":
> +main(sys.argv[1:])
> diff --git a/include/qemu/module.h b/include/qemu/module.h
> index 81ef086da023..a98748d501d3 100644
> --- a/include/qemu/module.h
> +++ b/include/qemu/module.h
> @@ -98,4 +98,21 @@ void module_load_qom_all(void);
>  /* module registers QemuOpts  */
>  #define module_opts(name) modinfo(opts, name)
>  
> +/*
> + * module info database
> + *
> + * scripts/modinfo-generate.c will build this using the data collected
> + * by scripts/modinfo-collect.py
> + */
> +typedef struct QemuModinfo QemuModinfo;
> +struct QemuModinfo {
> +const char *name;
> +const char *arch;
> +const char **objs;
> +const char **deps;
> +const char **opts;
> +};
> +extern const QemuModinfo qemu_modinfo[];
> +void module_init_info(const QemuModinfo *info);
> +
>  #endif
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 326c1e908008..a4857ec43ff3 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2740,6 +2740,10 @@ void qemu_init(int argc, char **argv, char **envp)
>  error_init(argv[0]);
>  qemu_init_exec_dir(argv[0]);
>  
> +#ifdef CONFIG_MODULES
> +module_init_info(qemu_modinfo);
> +#endif
> +
>  qemu_init_subsystems();
>  
>  /* first pass of option parsing */
> diff --git a/util/module.c b/util/module.c
> index eee8ff2de136..8d3e8275b9f7 100644
> --- a/util/module.c
> +++ b/util/module.c
> @@ -110,6 +110,17 @@ void module_call_init(module_init_type type)
>  }
>  
>  #ifdef CONFIG_MODULES
> +
> +static const QemuModinfo module_info_stub[] = { {
> +/* end of list */
> +} };
> +static const QemuModinfo *module_info = module_info_stub;
> +
> +void module_init_info(const QemuModinfo *info)
> +{
> +module_info = info;
> +}
> +
>  static int module_load_file(const char *fname, bool mayfail, bool 
> export_symbols)
>  {
>  GModule *g_module;
> diff --git a/meson.build

SD/MMC host controller + 64-bit system bus

2021-06-22 Thread Joanne Koong

Hello! I noticed that the default SD/MMC host controller only supports a
32-bit system bus. Is there a reason 64-bit system buses aren't supported
by default?

Thanks!

Re: [PATCH v2 2/1] qemu-img: Add "backing":true to unallocated map segments

2021-06-22 Thread Vladimir Sementsov-Ogievskiy


22.06.2021 18:38, Kevin Wolf wrote:

Am 11.06.2021 um 21:03 hat Eric Blake geschrieben:

To save the user from having to check 'qemu-img info --backing-chain'
or other followup command to determine which "depth":n goes beyond the
chain, add a boolean field "backing" that is set only for unallocated
portions of the disk.

Signed-off-by: Eric Blake 
---

Touches the same iotest output as 1/1.  If we decide that switching to
"depth":n+1 is too risky, and that the mere addition of "backing":true
while keeping "depth":n is good enough, then we'd have just one patch,
instead of this double churn.  Preferences?


I think the additional flag is better because it's guaranteed to be
backwards compatible, and because you don't need to know the number of
layers to infer whether a cluster was allocated in the whole backing
chain. And by exposing ALLOCATED we definitely give access to the whole
information that exists in QEMU.

However, to continue with the bike shedding: I won't insist on
"allocated" even if that is what the flag is called internally and
consistency is usually helpful, but "backing" is misleading, too,
because intuitively it doesn't cover the top layer or standalone images
without a backing file. How about something like "present"?



IMHO, it does cover. If we have qcow2 image with unallocated clusters, but 
unspecified backing file it's good to know that these unallocated clusters are 
not simply ZERO, but actually point to backing file, which is just absent now 
(and therefore read returns zeros). User may start qemu and specify backing 
file by options, or set backing file in the image, etc. So, the information 
does make sense anyway.

I think it would be good to start saying about backing chains explicitly, and not hide 
them under "allocated" concept which has different meanings.


--
Best regards,
Vladimir

Re: [PATCH] tcg: Avoid including 'trace-tcg.h' in target translate.c

2021-06-22 Thread Richard Henderson


On 6/22/21 9:15 AM, Philippe Mathieu-Daudé wrote:

The root trace-events only declares a single TCG event:

   $ git grep -w tcg trace-events
   trace-events:115:# tcg/tcg-op.c
   trace-events:137:vcpu tcg guest_mem_before(TCGv vaddr, uint16_t info) "info=%d", 
"vaddr=0x%016"PRIx64" info=%d"

and only a tcg/tcg-op.c uses it:

   $ git grep -l trace_guest_mem_before_tcg
   tcg/tcg-op.c

therefore it is pointless to include "trace-tcg.h" in each target
(because it is not used). Remove it.

Signed-off-by: Philippe Mathieu-Daudé
---


Thanks, queued.


r~

Re: [RFC PATCH v4 0/7] hw/arm/virt: Introduce cpu topology support

2021-06-22 Thread Daniel P . Berrangé

On Tue, Jun 22, 2021 at 07:29:34PM +0200, Andrew Jones wrote:
> On Tue, Jun 22, 2021 at 06:14:25PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Jun 22, 2021 at 05:40:13PM +0200, Igor Mammedov wrote:
> > > On Tue, 22 Jun 2021 16:29:15 +0200
> > > Andrew Jones  wrote:
> > > 
> > > > On Tue, Jun 22, 2021 at 03:10:57PM +0100, Daniel P. Berrangé wrote:
> > > > > On Tue, Jun 22, 2021 at 10:04:52PM +0800, wangyanan (Y) wrote:  
> > > > > > Hi Daniel,
> > > > > > 
> > > > > > On 2021/6/22 20:41, Daniel P. Berrangé wrote:  
> > > > > > > On Tue, Jun 22, 2021 at 08:31:22PM +0800, wangyanan (Y) wrote:  
> > > > > > > > 
> > > > > > > > On 2021/6/22 19:46, Andrew Jones wrote:  
> > > > > > > > > On Tue, Jun 22, 2021 at 11:18:09AM +0100, Daniel P. Berrangé 
> > > > > > > > > wrote:  
> > > > > > > > > > On Tue, Jun 22, 2021 at 05:34:06PM +0800, Yanan Wang wrote: 
> > > > > > > > > >  
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > This is v4 of the series [1] that I posted to introduce 
> > > > > > > > > > > support for
> > > > > > > > > > > generating cpu topology descriptions to guest. Comments 
> > > > > > > > > > > are welcome!
> > > > > > > > > > > 
> > > > > > > > > > > Description:
> > > > > > > > > > > Once the view of an accurate virtual cpu topology is 
> > > > > > > > > > > provided to guest,
> > > > > > > > > > > with a well-designed vCPU pinning to the pCPU we may get 
> > > > > > > > > > > a huge benefit,
> > > > > > > > > > > e.g., the scheduling performance improvement. See Dario 
> > > > > > > > > > > Faggioli's
> > > > > > > > > > > research and the related performance tests in [2] for 
> > > > > > > > > > > reference. So here
> > > > > > > > > > > we go, this patch series introduces cpu topology support 
> > > > > > > > > > > for ARM platform.
> > > > > > > > > > > 
> > > > > > > > > > > In this series, instead of quietly enforcing the support 
> > > > > > > > > > > for the latest
> > > > > > > > > > > machine type, a new parameter "expose=on|off" in -smp 
> > > > > > > > > > > command line is
> > > > > > > > > > > introduced to leave QEMU users a choice to decide whether 
> > > > > > > > > > > to enable the
> > > > > > > > > > > feature or not. This will allow the feature to work on 
> > > > > > > > > > > different machine
> > > > > > > > > > > types and also ideally compat with already in-use -smp 
> > > > > > > > > > > command lines.
> > > > > > > > > > > Also we make much stricter requirement for the topology 
> > > > > > > > > > > configuration
> > > > > > > > > > > with "expose=on".  
> > > > > > > > > > Seeing this 'expose=on' parameter feels to me like we're 
> > > > > > > > > > adding a
> > > > > > > > > > "make-it-work=yes" parameter. IMHO this is just something 
> > > > > > > > > > that should
> > > > > > > > > > be done by default for the current machine type version and 
> > > > > > > > > > beyond.
> > > > > > > > > > I don't see the need for a parameter to turnthis on, 
> > > > > > > > > > especially since
> > > > > > > > > > it is being made architecture specific.
> > > > > > > > > >   
> > > > > > > > > I agree.
> > > > > > > > > 
> > > > > > > > > Yanan, we never discussed an "expose" parameter in the 
> > > > > > > > > previous versions
> > > > > > > > > of this series. We discussed a "strict" parameter though, 
> > > > > > > > > which would
> > > > > > > > > allow existing command lines to "work" using assumptions of 
> > > > > > > > > what the user
> > > > > > > > > meant and strict=on users to get what they mean or an error 
> > > > > > > > > saying that
> > > > > > > > > they asked for something that won't work or would require 
> > > > > > > > > unreasonable
> > > > > > > > > assumptions. Why was this changed to an "expose" parameter?  
> > > > > > > > Yes, we indeed discuss a new "strict" parameter but not a 
> > > > > > > > "expose" in v2 [1]
> > > > > > > > of this series.
> > > > > > > > [1] 
> > > > > > > > https://patchwork.kernel.org/project/qemu-devel/patch/20210413080745.33004-6-wangyana...@huawei.com/
> > > > > > > > 
> > > > > > > > And in the discussion, we hoped things would work like below 
> > > > > > > > with "strict"
> > > > > > > > parameter:
> > > > > > > > Users who want to describe cpu topology should provide cmdline 
> > > > > > > > like
> > > > > > > > 
> > > > > > > > -smp strict=on,cpus=4,sockets=2,cores=2,threads=1
> > > > > > > > 
> > > > > > > > and in this case we require an more accurate -smp configuration 
> > > > > > > > and
> > > > > > > > then generate the cpu topology description through ACPI/DT.
> > > > > > > > 
> > > > > > > > While without a strict description, no cpu topology description 
> > > > > > > > would
> > > > > > > > be generated, so they get nothing through ACPI/DT.
> > > > > > > > 
> > > > > > > > It seems to me that the "strict" parameter actually serves as a 
> > > > > > > > knob to
> > > > > > > > turn on/off the exposure of topology, and this is the reason I 
> > > > > > > > changed
> > > > > > > >

Re: [PATCH v1 1/1] migration: Unregister yank if migration setup fails

2021-06-22 Thread Peter Xu

On Mon, Jun 21, 2021 at 11:42:36PM -0300, Leonardo Bras wrote:
> Currently, if a qemu instance is started with "-incoming defer" and
> an incorect parameter is passed to "migrate_incoming", it will print the
> expected error and reply with "duplicate yank instance" for any upcoming
> "migrate_incoming" command.
> 
> This renders current qemu process unusable, and requires a new qemu
> process to be started before accepting a migration.
> 
> This is caused by a yank_register_instance() that happens in
> qemu_start_incoming_migration() but is never reverted if any error
> happens.
> 
> Solves this by unregistering the instance if anything goes wrong
> in the function, allowing a new "migrate_incoming" command to be
> accepted.
> 
> Fixes: b5eea99ec2f ("migration: Add yank feature", 2021-01-13)
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1974366
> Signed-off-by: Leonardo Bras 
> 
> ---
>  migration/migration.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 4228635d18..ddcf9e1868 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -474,9 +474,13 @@ static void qemu_start_incoming_migration(const char 
> *uri, Error **errp)
>  } else if (strstart(uri, "fd:", )) {
>  fd_start_incoming_migration(p, errp);
>  } else {
> -yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>  error_setg(errp, "unknown migration protocol: %s", uri);
>  }
> +
> +if (*errp) {
> +yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> +}
> +
>  }

Yes, looks right to me:

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v3 3/3] avocado_qemu: Add Intel iommu tests

2021-06-22 Thread Peter Xu

Hi, Eric,

On Mon, Jun 21, 2021 at 10:08:24AM +0200, Eric Auger wrote:
> Add Intel IOMMU functional tests based on fedora 31.
> Different configs are checked:
> - strict
> - caching mode, strict
> - passthrough.
> 
> Signed-off-by: Eric Auger 

Acked-by: Peter Xu 

Thanks for adding this test!

-- 
Peter Xu

Re: [RFC PATCH v4 0/7] hw/arm/virt: Introduce cpu topology support

2021-06-22 Thread Andrew Jones

On Tue, Jun 22, 2021 at 06:14:25PM +0100, Daniel P. Berrangé wrote:
> On Tue, Jun 22, 2021 at 05:40:13PM +0200, Igor Mammedov wrote:
> > On Tue, 22 Jun 2021 16:29:15 +0200
> > Andrew Jones  wrote:
> > 
> > > On Tue, Jun 22, 2021 at 03:10:57PM +0100, Daniel P. Berrangé wrote:
> > > > On Tue, Jun 22, 2021 at 10:04:52PM +0800, wangyanan (Y) wrote:  
> > > > > Hi Daniel,
> > > > > 
> > > > > On 2021/6/22 20:41, Daniel P. Berrangé wrote:  
> > > > > > On Tue, Jun 22, 2021 at 08:31:22PM +0800, wangyanan (Y) wrote:  
> > > > > > > 
> > > > > > > On 2021/6/22 19:46, Andrew Jones wrote:  
> > > > > > > > On Tue, Jun 22, 2021 at 11:18:09AM +0100, Daniel P. Berrangé 
> > > > > > > > wrote:  
> > > > > > > > > On Tue, Jun 22, 2021 at 05:34:06PM +0800, Yanan Wang wrote:  
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > This is v4 of the series [1] that I posted to introduce 
> > > > > > > > > > support for
> > > > > > > > > > generating cpu topology descriptions to guest. Comments are 
> > > > > > > > > > welcome!
> > > > > > > > > > 
> > > > > > > > > > Description:
> > > > > > > > > > Once the view of an accurate virtual cpu topology is 
> > > > > > > > > > provided to guest,
> > > > > > > > > > with a well-designed vCPU pinning to the pCPU we may get a 
> > > > > > > > > > huge benefit,
> > > > > > > > > > e.g., the scheduling performance improvement. See Dario 
> > > > > > > > > > Faggioli's
> > > > > > > > > > research and the related performance tests in [2] for 
> > > > > > > > > > reference. So here
> > > > > > > > > > we go, this patch series introduces cpu topology support 
> > > > > > > > > > for ARM platform.
> > > > > > > > > > 
> > > > > > > > > > In this series, instead of quietly enforcing the support 
> > > > > > > > > > for the latest
> > > > > > > > > > machine type, a new parameter "expose=on|off" in -smp 
> > > > > > > > > > command line is
> > > > > > > > > > introduced to leave QEMU users a choice to decide whether 
> > > > > > > > > > to enable the
> > > > > > > > > > feature or not. This will allow the feature to work on 
> > > > > > > > > > different machine
> > > > > > > > > > types and also ideally compat with already in-use -smp 
> > > > > > > > > > command lines.
> > > > > > > > > > Also we make much stricter requirement for the topology 
> > > > > > > > > > configuration
> > > > > > > > > > with "expose=on".  
> > > > > > > > > Seeing this 'expose=on' parameter feels to me like we're 
> > > > > > > > > adding a
> > > > > > > > > "make-it-work=yes" parameter. IMHO this is just something 
> > > > > > > > > that should
> > > > > > > > > be done by default for the current machine type version and 
> > > > > > > > > beyond.
> > > > > > > > > I don't see the need for a parameter to turnthis on, 
> > > > > > > > > especially since
> > > > > > > > > it is being made architecture specific.
> > > > > > > > >   
> > > > > > > > I agree.
> > > > > > > > 
> > > > > > > > Yanan, we never discussed an "expose" parameter in the previous 
> > > > > > > > versions
> > > > > > > > of this series. We discussed a "strict" parameter though, which 
> > > > > > > > would
> > > > > > > > allow existing command lines to "work" using assumptions of 
> > > > > > > > what the user
> > > > > > > > meant and strict=on users to get what they mean or an error 
> > > > > > > > saying that
> > > > > > > > they asked for something that won't work or would require 
> > > > > > > > unreasonable
> > > > > > > > assumptions. Why was this changed to an "expose" parameter?  
> > > > > > > Yes, we indeed discuss a new "strict" parameter but not a 
> > > > > > > "expose" in v2 [1]
> > > > > > > of this series.
> > > > > > > [1] 
> > > > > > > https://patchwork.kernel.org/project/qemu-devel/patch/20210413080745.33004-6-wangyana...@huawei.com/
> > > > > > > 
> > > > > > > And in the discussion, we hoped things would work like below with 
> > > > > > > "strict"
> > > > > > > parameter:
> > > > > > > Users who want to describe cpu topology should provide cmdline 
> > > > > > > like
> > > > > > > 
> > > > > > > -smp strict=on,cpus=4,sockets=2,cores=2,threads=1
> > > > > > > 
> > > > > > > and in this case we require an more accurate -smp configuration 
> > > > > > > and
> > > > > > > then generate the cpu topology description through ACPI/DT.
> > > > > > > 
> > > > > > > While without a strict description, no cpu topology description 
> > > > > > > would
> > > > > > > be generated, so they get nothing through ACPI/DT.
> > > > > > > 
> > > > > > > It seems to me that the "strict" parameter actually serves as a 
> > > > > > > knob to
> > > > > > > turn on/off the exposure of topology, and this is the reason I 
> > > > > > > changed
> > > > > > > the name.  
> > > > > > Yes, the use of 'strict=on' is no better than expose=on IMHO.
> > > > > > 
> > > > > > If I give QEMU a cli
> > > > > > 
> > > > > >-smp cpus=4,sockets=2,cores=2,threads=1
> > > > > > 
> > > > > > then I expect that topology to be exposed to the guest.

Re: [RFC PATCH v4 0/7] hw/arm/virt: Introduce cpu topology support

2021-06-22 Thread Daniel P . Berrangé

On Tue, Jun 22, 2021 at 05:40:13PM +0200, Igor Mammedov wrote:
> On Tue, 22 Jun 2021 16:29:15 +0200
> Andrew Jones  wrote:
> 
> > On Tue, Jun 22, 2021 at 03:10:57PM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Jun 22, 2021 at 10:04:52PM +0800, wangyanan (Y) wrote:  
> > > > Hi Daniel,
> > > > 
> > > > On 2021/6/22 20:41, Daniel P. Berrangé wrote:  
> > > > > On Tue, Jun 22, 2021 at 08:31:22PM +0800, wangyanan (Y) wrote:  
> > > > > > 
> > > > > > On 2021/6/22 19:46, Andrew Jones wrote:  
> > > > > > > On Tue, Jun 22, 2021 at 11:18:09AM +0100, Daniel P. Berrangé 
> > > > > > > wrote:  
> > > > > > > > On Tue, Jun 22, 2021 at 05:34:06PM +0800, Yanan Wang wrote:  
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > This is v4 of the series [1] that I posted to introduce 
> > > > > > > > > support for
> > > > > > > > > generating cpu topology descriptions to guest. Comments are 
> > > > > > > > > welcome!
> > > > > > > > > 
> > > > > > > > > Description:
> > > > > > > > > Once the view of an accurate virtual cpu topology is provided 
> > > > > > > > > to guest,
> > > > > > > > > with a well-designed vCPU pinning to the pCPU we may get a 
> > > > > > > > > huge benefit,
> > > > > > > > > e.g., the scheduling performance improvement. See Dario 
> > > > > > > > > Faggioli's
> > > > > > > > > research and the related performance tests in [2] for 
> > > > > > > > > reference. So here
> > > > > > > > > we go, this patch series introduces cpu topology support for 
> > > > > > > > > ARM platform.
> > > > > > > > > 
> > > > > > > > > In this series, instead of quietly enforcing the support for 
> > > > > > > > > the latest
> > > > > > > > > machine type, a new parameter "expose=on|off" in -smp command 
> > > > > > > > > line is
> > > > > > > > > introduced to leave QEMU users a choice to decide whether to 
> > > > > > > > > enable the
> > > > > > > > > feature or not. This will allow the feature to work on 
> > > > > > > > > different machine
> > > > > > > > > types and also ideally compat with already in-use -smp 
> > > > > > > > > command lines.
> > > > > > > > > Also we make much stricter requirement for the topology 
> > > > > > > > > configuration
> > > > > > > > > with "expose=on".  
> > > > > > > > Seeing this 'expose=on' parameter feels to me like we're adding 
> > > > > > > > a
> > > > > > > > "make-it-work=yes" parameter. IMHO this is just something that 
> > > > > > > > should
> > > > > > > > be done by default for the current machine type version and 
> > > > > > > > beyond.
> > > > > > > > I don't see the need for a parameter to turnthis on, especially 
> > > > > > > > since
> > > > > > > > it is being made architecture specific.
> > > > > > > >   
> > > > > > > I agree.
> > > > > > > 
> > > > > > > Yanan, we never discussed an "expose" parameter in the previous 
> > > > > > > versions
> > > > > > > of this series. We discussed a "strict" parameter though, which 
> > > > > > > would
> > > > > > > allow existing command lines to "work" using assumptions of what 
> > > > > > > the user
> > > > > > > meant and strict=on users to get what they mean or an error 
> > > > > > > saying that
> > > > > > > they asked for something that won't work or would require 
> > > > > > > unreasonable
> > > > > > > assumptions. Why was this changed to an "expose" parameter?  
> > > > > > Yes, we indeed discuss a new "strict" parameter but not a "expose" 
> > > > > > in v2 [1]
> > > > > > of this series.
> > > > > > [1] 
> > > > > > https://patchwork.kernel.org/project/qemu-devel/patch/20210413080745.33004-6-wangyana...@huawei.com/
> > > > > > 
> > > > > > And in the discussion, we hoped things would work like below with 
> > > > > > "strict"
> > > > > > parameter:
> > > > > > Users who want to describe cpu topology should provide cmdline like
> > > > > > 
> > > > > > -smp strict=on,cpus=4,sockets=2,cores=2,threads=1
> > > > > > 
> > > > > > and in this case we require an more accurate -smp configuration and
> > > > > > then generate the cpu topology description through ACPI/DT.
> > > > > > 
> > > > > > While without a strict description, no cpu topology description 
> > > > > > would
> > > > > > be generated, so they get nothing through ACPI/DT.
> > > > > > 
> > > > > > It seems to me that the "strict" parameter actually serves as a 
> > > > > > knob to
> > > > > > turn on/off the exposure of topology, and this is the reason I 
> > > > > > changed
> > > > > > the name.  
> > > > > Yes, the use of 'strict=on' is no better than expose=on IMHO.
> > > > > 
> > > > > If I give QEMU a cli
> > > > > 
> > > > >-smp cpus=4,sockets=2,cores=2,threads=1
> > > > > 
> > > > > then I expect that topology to be exposed to the guest. I shouldn't
> > > > > have to add extra flags to make that happen.
> > > > > 
> > > > > Looking at the thread, it seems the concern was around the fact that
> > > > > the settings were not honoured historically and thus the CLI values
> > > > > could be garbage. ie  -smp

Re: [PULL 0/9] Linux user for 6.1 patches

2021-06-22 Thread Peter Maydell

On Mon, 21 Jun 2021 at 12:07, Laurent Vivier  wrote:
>
> The following changes since commit 1ea06abceec61b6f3ab33dadb0510b6e09fb61e2:
>
>   Merge remote-tracking branch 
> 'remotes/berrange-gitlab/tags/misc-fixes-pull-request' into staging 
> (2021-06-14 15:59:13 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/vivier/qemu.git tags/linux-user-for-6.1-pull-request
>
> for you to fetch changes up to 96ff758c6e9cd5a01443ee15afbd0df4f00c37a8:
>
>   linux-user: Use public sigev_notify_thread_id member if available 
> (2021-06-20 16:41:47 +0200)
>
> 
> Linux-user pull request 20210621
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM

Re: [PATCH 3/4] modules: module.h kerneldoc annotations

2021-06-22 Thread Jose R. Ziviani

Hello Gerd,

On Tue, Jun 22, 2021 at 02:51:09PM +0200, Gerd Hoffmann wrote:
> ---
>  include/qemu/module.h | 59 +--
>  1 file changed, 45 insertions(+), 14 deletions(-)

This header has a copyright date from 2009. Not sure if it requires an
update.

> 
> diff --git a/include/qemu/module.h b/include/qemu/module.h
> index 7f4b1af8198c..8bc80535a4d4 100644
> --- a/include/qemu/module.h
> +++ b/include/qemu/module.h
> @@ -74,11 +74,18 @@ void module_load_qom_one(const char *type);
>  void module_load_qom_all(void);
>  void module_allow_arch(const char *arch);
>  
> -/*
> - * module info annotation macros
> +/**
> + * DOC: module info annotation macros
>   *
> - * scripts/modinfo-collect.py will collect module info,
> - * using the preprocessor and -DQEMU_MODINFO
> + * `scripts/modinfo-collect.py` will collect module info,
> + * using the preprocessor and -DQEMU_MODINFO.
> + *
> + * `scripts/modinfo-generate.py` will create a module meta-data database
> + * from the collected information so qemu knows about module
> + * dependencies and QOM objects implemented by modules.
> + *
> + * See `*.modinfo` and `modinfo.c` in the build directory to check the
> + * script results.
>   */
>  #ifdef QEMU_MODINFO
>  # define modinfo(kind, value) \
> @@ -87,24 +94,48 @@ void module_allow_arch(const char *arch);
>  # define modinfo(kind, value)
>  #endif
>  
> -/* module implements QOM type  */
> +/**
> + * module_obj
> + *
> + * @name: QOM type.
> + *
> + * This module implements QOM type @name.
> + */
>  #define module_obj(name) modinfo(obj, name)
>  
> -/* module has a dependency on  */
> +/**
> + * module_dep
> + *
> + * @name: module name
> + *
> + * This module depends on module @name.
> + */
>  #define module_dep(name) modinfo(dep, name)
>  
> -/* module is for target architecture  */
> +/**
> + * module_arch
> + *
> + * @arch: target architecture
> + *
> + * This module is for target architecture @arch.
> + *
> + * Note that target-dependent modules are tagged automatically, so
> + * this is only needed in case target-independent modules should be
> + * restricted.  Use case example: the ccw bus is implemented by s390x
> + * only.
> + */
>  #define module_arch(name) modinfo(arch, name)
>  
> -/* module registers QemuOpts  */
> +/**
> + * module_opts
> + *
> + * @name: QemuOpts name
> + *
> + * This module registers QemuOpts @name.
> + */
>  #define module_opts(name) modinfo(opts, name)
>  
> -/*
> - * module info database
> - *
> - * scripts/modinfo-generate.c will build this using the data collected
> - * by scripts/modinfo-collect.py
> - */
> +/* module info database (created by scripts/modinfo-generate.py) */
>  typedef struct QemuModinfo QemuModinfo;
>  struct QemuModinfo {
>  const char *name;
> -- 
> 2.31.1
> 
> 


signature.asc
Description: Digital signature

Re: [RFC PATCH v4 0/7] hw/arm/virt: Introduce cpu topology support

2021-06-22 Thread Andrew Jones

On Tue, Jun 22, 2021 at 05:40:13PM +0200, Igor Mammedov wrote:
> On Tue, 22 Jun 2021 16:29:15 +0200
> Andrew Jones  wrote:
> 
> > On Tue, Jun 22, 2021 at 03:10:57PM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Jun 22, 2021 at 10:04:52PM +0800, wangyanan (Y) wrote:  
> > > > Hi Daniel,
> > > > 
> > > > On 2021/6/22 20:41, Daniel P. Berrangé wrote:  
> > > > > On Tue, Jun 22, 2021 at 08:31:22PM +0800, wangyanan (Y) wrote:  
> > > > > > 
> > > > > > On 2021/6/22 19:46, Andrew Jones wrote:  
> > > > > > > On Tue, Jun 22, 2021 at 11:18:09AM +0100, Daniel P. Berrangé 
> > > > > > > wrote:  
> > > > > > > > On Tue, Jun 22, 2021 at 05:34:06PM +0800, Yanan Wang wrote:  
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > This is v4 of the series [1] that I posted to introduce 
> > > > > > > > > support for
> > > > > > > > > generating cpu topology descriptions to guest. Comments are 
> > > > > > > > > welcome!
> > > > > > > > > 
> > > > > > > > > Description:
> > > > > > > > > Once the view of an accurate virtual cpu topology is provided 
> > > > > > > > > to guest,
> > > > > > > > > with a well-designed vCPU pinning to the pCPU we may get a 
> > > > > > > > > huge benefit,
> > > > > > > > > e.g., the scheduling performance improvement. See Dario 
> > > > > > > > > Faggioli's
> > > > > > > > > research and the related performance tests in [2] for 
> > > > > > > > > reference. So here
> > > > > > > > > we go, this patch series introduces cpu topology support for 
> > > > > > > > > ARM platform.
> > > > > > > > > 
> > > > > > > > > In this series, instead of quietly enforcing the support for 
> > > > > > > > > the latest
> > > > > > > > > machine type, a new parameter "expose=on|off" in -smp command 
> > > > > > > > > line is
> > > > > > > > > introduced to leave QEMU users a choice to decide whether to 
> > > > > > > > > enable the
> > > > > > > > > feature or not. This will allow the feature to work on 
> > > > > > > > > different machine
> > > > > > > > > types and also ideally compat with already in-use -smp 
> > > > > > > > > command lines.
> > > > > > > > > Also we make much stricter requirement for the topology 
> > > > > > > > > configuration
> > > > > > > > > with "expose=on".  
> > > > > > > > Seeing this 'expose=on' parameter feels to me like we're adding 
> > > > > > > > a
> > > > > > > > "make-it-work=yes" parameter. IMHO this is just something that 
> > > > > > > > should
> > > > > > > > be done by default for the current machine type version and 
> > > > > > > > beyond.
> > > > > > > > I don't see the need for a parameter to turnthis on, especially 
> > > > > > > > since
> > > > > > > > it is being made architecture specific.
> > > > > > > >   
> > > > > > > I agree.
> > > > > > > 
> > > > > > > Yanan, we never discussed an "expose" parameter in the previous 
> > > > > > > versions
> > > > > > > of this series. We discussed a "strict" parameter though, which 
> > > > > > > would
> > > > > > > allow existing command lines to "work" using assumptions of what 
> > > > > > > the user
> > > > > > > meant and strict=on users to get what they mean or an error 
> > > > > > > saying that
> > > > > > > they asked for something that won't work or would require 
> > > > > > > unreasonable
> > > > > > > assumptions. Why was this changed to an "expose" parameter?  
> > > > > > Yes, we indeed discuss a new "strict" parameter but not a "expose" 
> > > > > > in v2 [1]
> > > > > > of this series.
> > > > > > [1] 
> > > > > > https://patchwork.kernel.org/project/qemu-devel/patch/20210413080745.33004-6-wangyana...@huawei.com/
> > > > > > 
> > > > > > And in the discussion, we hoped things would work like below with 
> > > > > > "strict"
> > > > > > parameter:
> > > > > > Users who want to describe cpu topology should provide cmdline like
> > > > > > 
> > > > > > -smp strict=on,cpus=4,sockets=2,cores=2,threads=1
> > > > > > 
> > > > > > and in this case we require an more accurate -smp configuration and
> > > > > > then generate the cpu topology description through ACPI/DT.
> > > > > > 
> > > > > > While without a strict description, no cpu topology description 
> > > > > > would
> > > > > > be generated, so they get nothing through ACPI/DT.
> > > > > > 
> > > > > > It seems to me that the "strict" parameter actually serves as a 
> > > > > > knob to
> > > > > > turn on/off the exposure of topology, and this is the reason I 
> > > > > > changed
> > > > > > the name.  
> > > > > Yes, the use of 'strict=on' is no better than expose=on IMHO.
> > > > > 
> > > > > If I give QEMU a cli
> > > > > 
> > > > >-smp cpus=4,sockets=2,cores=2,threads=1
> > > > > 
> > > > > then I expect that topology to be exposed to the guest. I shouldn't
> > > > > have to add extra flags to make that happen.
> > > > > 
> > > > > Looking at the thread, it seems the concern was around the fact that
> > > > > the settings were not honoured historically and thus the CLI values
> > > > > could be garbage. ie  -smp

Re: [PATCH v2 2/1] qemu-img: Add "backing":true to unallocated map segments

2021-06-22 Thread Nir Soffer

On Fri, Jun 11, 2021 at 10:03 PM Eric Blake  wrote:
>
> To save the user from having to check 'qemu-img info --backing-chain'
> or other followup command to determine which "depth":n goes beyond the
> chain, add a boolean field "backing" that is set only for unallocated
> portions of the disk.
>
> Signed-off-by: Eric Blake 
> ---
>
> Touches the same iotest output as 1/1.  If we decide that switching to
> "depth":n+1 is too risky, and that the mere addition of "backing":true
> while keeping "depth":n is good enough, then we'd have just one patch,
> instead of this double churn.  Preferences?
>
>  docs/tools/qemu-img.rst|  3 ++
>  qapi/block-core.json   |  7 ++-
>  qemu-img.c | 15 +-
>  tests/qemu-iotests/122.out | 34 +++---
>  tests/qemu-iotests/154.out | 96 +++---
>  tests/qemu-iotests/179.out | 66 +-
>  tests/qemu-iotests/223.out | 24 +-
>  tests/qemu-iotests/244.out |  6 +--
>  tests/qemu-iotests/252.out |  4 +-
>  tests/qemu-iotests/274.out | 16 +++
>  tests/qemu-iotests/291.out |  8 ++--
>  tests/qemu-iotests/309.out |  4 +-
>  12 files changed, 150 insertions(+), 133 deletions(-)
>
> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> index c155b1bf3cc8..fbc623b645c3 100644
> --- a/docs/tools/qemu-img.rst
> +++ b/docs/tools/qemu-img.rst
> @@ -601,6 +601,9 @@ Command description:
>  a ``depth``; for example, a depth of 2 refers to the backing file
>  of the backing file of *FILENAME*.  Depth will be one larger than
>  the chain length if no file in the chain provides the data.
> +  - an optional ``backing`` field is present with value true if no
> +file in the backing chain provides the data (making it easier to
> +identify when ``depth`` exceeds the chain length).
>
>In JSON format, the ``offset`` field is optional; it is absent in
>cases where ``human`` format would omit the entry or exit with an error.
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 2ea294129e08..cebe12ba16a0 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -264,6 +264,9 @@
>  # @offset: if present, the image file stores the data for this range
>  #  in raw format at the given (host) offset
>  #
> +# @backing: if present, the range is not allocated within the backing
> +#   chain (since 6.1)
> +#
>  # @filename: filename that is referred to by @offset
>  #
>  # Since: 2.6
> @@ -271,8 +274,8 @@
>  ##
>  { 'struct': 'MapEntry',
>'data': {'start': 'int', 'length': 'int', 'data': 'bool',
> -   'zero': 'bool', 'depth': 'int', '*offset': 'int',
> -   '*filename': 'str' } }
> +   'zero': 'bool', 'depth': 'int', '*backing': 'bool',
> +   '*offset': 'int', '*filename': 'str' } }
>
>  ##
>  # @BlockdevCacheInfo:
> diff --git a/qemu-img.c b/qemu-img.c
> index 33a5cd012b8b..4d357f534803 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -2977,8 +2977,13 @@ static int dump_map_entry(OutputFormat output_format, 
> MapEntry *e,
>  break;
>  case OFORMAT_JSON:
>  printf("{ \"start\": %"PRId64", \"length\": %"PRId64","
> -   " \"depth\": %"PRId64", \"zero\": %s, \"data\": %s",
> -   e->start, e->length, e->depth,
> +   " \"depth\": %"PRId64, e->start, e->length, e->depth);
> +if (e->has_backing) {
> +/* Backing should only be set at the end of the chain */
> +assert(e->backing && e->depth > 0);
> +printf(", \"backing\": true");
> +}

It will be easier to inspect the output if common fields come before
optional fields.

> +printf(", \"zero\": %s, \"data\": %s",
> e->zero ? "true" : "false",
> e->data ? "true" : "false");
>  if (e->has_offset) {
...
> diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
> index 779dab4847f0..c5aa2c9866f1 100644
> --- a/tests/qemu-iotests/122.out
> +++ b/tests/qemu-iotests/122.out
> @@ -68,11 +68,11 @@ read 65536/65536 bytes at offset 4194304
>  read 65536/65536 bytes at offset 8388608
>  64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  [{ "start": 0, "length": 65536, "depth": 0, "zero": false, "data": true},
> -{ "start": 65536, "length": 4128768, "depth": 1, "zero": true, "data": 
> false},
> +{ "start": 65536, "length": 4128768, "depth": 1, "backing": true, "zero": 
> true, "data": false},

So this output would be:

[{ "start": 0, "length": 65536, "depth": 0, "zero": false, "data": true},
 { "start": 65536, "length": 4128768, "depth": 1, "zero": true,
"data": false, "backing": true},

Re: [PATCH v2 2/1] qemu-img: Add "backing":true to unallocated map segments

2021-06-22 Thread Nir Soffer

On Tue, Jun 22, 2021 at 6:38 PM Kevin Wolf  wrote:
>
> Am 11.06.2021 um 21:03 hat Eric Blake geschrieben:
> > To save the user from having to check 'qemu-img info --backing-chain'
> > or other followup command to determine which "depth":n goes beyond the
> > chain, add a boolean field "backing" that is set only for unallocated
> > portions of the disk.
> >
> > Signed-off-by: Eric Blake 
> > ---
> >
> > Touches the same iotest output as 1/1.  If we decide that switching to
> > "depth":n+1 is too risky, and that the mere addition of "backing":true
> > while keeping "depth":n is good enough, then we'd have just one patch,
> > instead of this double churn.  Preferences?
>
> I think the additional flag is better because it's guaranteed to be
> backwards compatible, and because you don't need to know the number of
> layers to infer whether a cluster was allocated in the whole backing
> chain. And by exposing ALLOCATED we definitely give access to the whole
> information that exists in QEMU.
>
> However, to continue with the bike shedding: I won't insist on
> "allocated" even if that is what the flag is called internally and
> consistency is usually helpful, but "backing" is misleading, too,
> because intuitively it doesn't cover the top layer or standalone images
> without a backing file. How about something like "present"?

Looks hard to document:

# @present: if present and false, the range is not allocated within the
#   backing chain (since 6.1)

And is not consistent with "offset". It would work better as:

# @present: if present, the range is allocated within the backing
#   chain (since 6.1)

Or:

# @absent: if present, the range is not allocated within the backing
#   chain (since 6.1)

This is used by libnbd now:
https://github.com/libguestfs/libnbd/commit/1d01d2ac4f6443b160b7d81119d555e1aaedb56d

But I'm fine with "backing", It is consistent with BLK_BACKING_FILE,
meaning this area exposes data from a backing file (if one exists).

We use "backing" internally to be consistent with future qemu-img.

Re: [PATCH 3/4] export/fuse: Let permissions be adjustable

2021-06-22 Thread Kevin Wolf

Am 22.06.2021 um 17:22 hat Max Reitz geschrieben:
> On 22.06.21 17:02, Kevin Wolf wrote:
> > Am 14.06.2021 um 16:44 hat Max Reitz geschrieben:
> > > Allow changing the file mode, UID, and GID through SETATTR.
> > > 
> > > This only really makes sense with allow-other, though (because without
> > > it, the effective access mode is fixed to be 0600 (u+rw) with qemu's
> > > user being the file's owner), so changing these stat fields is not
> > > allowed without allow-other.
> > > 
> > > Signed-off-by: Max Reitz 
> > > ---
> > >   block/export/fuse.c | 48 ++---
> > >   1 file changed, 37 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/block/export/fuse.c b/block/export/fuse.c
> > > index 1d54286d90..742e0af657 100644
> > > --- a/block/export/fuse.c
> > > +++ b/block/export/fuse.c
> > > @@ -47,6 +47,10 @@ typedef struct FuseExport {
> > >   bool writable;
> > >   bool growable;
> > >   bool allow_other;
> > > +
> > > +mode_t st_mode;
> > > +uid_t st_uid;
> > > +gid_t st_gid;
> > >   } FuseExport;
> > >   static GHashTable *exports;
> > > @@ -120,6 +124,13 @@ static int fuse_export_create(BlockExport *blk_exp,
> > >   exp->growable = args->growable;
> > >   exp->allow_other = args->allow_other;
> > > +exp->st_mode = S_IFREG | S_IRUSR;
> > > +if (exp->writable) {
> > > +exp->st_mode |= S_IWUSR;
> > > +}
> > > +exp->st_uid = getuid();
> > > +exp->st_gid = getgid();
> > > +
> > >   ret = setup_fuse_export(exp, args->mountpoint, errp);
> > >   if (ret < 0) {
> > >   goto fail;
> > > @@ -329,7 +340,6 @@ static void fuse_getattr(fuse_req_t req, fuse_ino_t 
> > > inode,
> > >   int64_t length, allocated_blocks;
> > >   time_t now = time(NULL);
> > >   FuseExport *exp = fuse_req_userdata(req);
> > > -mode_t mode;
> > >   length = blk_getlength(exp->common.blk);
> > >   if (length < 0) {
> > > @@ -344,17 +354,12 @@ static void fuse_getattr(fuse_req_t req, fuse_ino_t 
> > > inode,
> > >   allocated_blocks = DIV_ROUND_UP(allocated_blocks, 512);
> > >   }
> > > -mode = S_IFREG | S_IRUSR;
> > > -if (exp->writable) {
> > > -mode |= S_IWUSR;
> > > -}
> > > -
> > >   statbuf = (struct stat) {
> > >   .st_ino = inode,
> > > -.st_mode= mode,
> > > +.st_mode= exp->st_mode,
> > >   .st_nlink   = 1,
> > > -.st_uid = getuid(),
> > > -.st_gid = getgid(),
> > > +.st_uid = exp->st_uid,
> > > +.st_gid = exp->st_gid,
> > >   .st_size= length,
> > >   .st_blksize = blk_bs(exp->common.blk)->bl.request_alignment,
> > >   .st_blocks  = allocated_blocks,
> > > @@ -400,15 +405,23 @@ static int fuse_do_truncate(const FuseExport *exp, 
> > > int64_t size,
> > >   }
> > >   /**
> > > - * Let clients set file attributes.  Only resizing is supported.
> > > + * Let clients set file attributes.  With allow_other, only resizing and
> > > + * changing permissions (st_mode, st_uid, st_gid) is allowed.  Without
> > > + * allow_other, only resizing is supported.
> > >*/
> > >   static void fuse_setattr(fuse_req_t req, fuse_ino_t inode, struct stat 
> > > *statbuf,
> > >int to_set, struct fuse_file_info *fi)
> > >   {
> > >   FuseExport *exp = fuse_req_userdata(req);
> > > +int supported_attrs;
> > >   int ret;
> > > -if (to_set & ~FUSE_SET_ATTR_SIZE) {
> > > +supported_attrs = FUSE_SET_ATTR_SIZE;
> > > +if (exp->allow_other) {
> > > +supported_attrs |= FUSE_SET_ATTR_MODE | FUSE_SET_ATTR_UID |
> > > +FUSE_SET_ATTR_GID;
> > > +}
> > > +if (to_set & ~supported_attrs) {
> > >   fuse_reply_err(req, ENOTSUP);
> > >   return;
> > >   }
> > > @@ -426,6 +439,19 @@ static void fuse_setattr(fuse_req_t req, fuse_ino_t 
> > > inode, struct stat *statbuf,
> > >   }
> > >   }
> > > +if (to_set & FUSE_SET_ATTR_MODE) {
> > > +/* Only allow changing the file mode, not the type */
> > > +exp->st_mode = (statbuf->st_mode & 0) | S_IFREG;
> > > +}
> > Should we check that the mode actually makes sense? Not sure if making
> > an image executable has a good use case, and making it writable in the
> > permissions for a read-only export isn't a good idea either.
> 
> I mean, I don’t mind what the user does.  It doesn’t really faze us, I
> believe.  If the image contains an executable ELF and the user wants to run
> it directly from FUSE...  I don’t mind.
> 
> As for +w on RO exports, I’m not sure.  This reminds me of `sudo chattr +i
> $file`, which effectively makes any regular file read-only, too, and it can
> still keep +w.  So the file permissions are basically just ACLs, getting
> permission for something doesn’t mean you can actually do it.
> 
> OTOH, the difference to `chattr +i` is that we’d allow opening the export
> R/W, only

[PATCH] tcg: Avoid including 'trace-tcg.h' in target translate.c

2021-06-22 Thread Philippe Mathieu-Daudé

The root trace-events only declares a single TCG event:

  $ git grep -w tcg trace-events
  trace-events:115:# tcg/tcg-op.c
  trace-events:137:vcpu tcg guest_mem_before(TCGv vaddr, uint16_t info) 
"info=%d", "vaddr=0x%016"PRIx64" info=%d"

and only a tcg/tcg-op.c uses it:

  $ git grep -l trace_guest_mem_before_tcg
  tcg/tcg-op.c

therefore it is pointless to include "trace-tcg.h" in each target
(because it is not used). Remove it.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/alpha/translate.c  | 1 -
 target/arm/translate-a64.c| 1 -
 target/arm/translate-sve.c| 1 -
 target/arm/translate.c| 1 -
 target/cris/translate.c   | 1 -
 target/hppa/translate.c   | 1 -
 target/i386/tcg/translate.c   | 1 -
 target/m68k/translate.c   | 1 -
 target/microblaze/translate.c | 1 -
 target/mips/tcg/translate.c   | 1 -
 target/openrisc/translate.c   | 1 -
 target/ppc/translate.c| 1 -
 target/rx/translate.c | 1 -
 target/s390x/translate.c  | 1 -
 target/sh4/translate.c| 1 -
 target/sparc/translate.c  | 1 -
 target/xtensa/translate.c | 1 -
 17 files changed, 17 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index f454adea5e0..5fcedd85d36 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -28,7 +28,6 @@
 #include "exec/cpu_ldst.h"
 #include "exec/helper-proto.h"
 #include "exec/helper-gen.h"
-#include "trace-tcg.h"
 #include "exec/translator.h"
 #include "exec/log.h"
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 7f74d0e81a9..066d942b7fc 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -35,7 +35,6 @@
 #include "exec/helper-gen.h"
 #include "exec/log.h"
 
-#include "trace-tcg.h"
 #include "translate-a64.h"
 #include "qemu/atomic128.h"
 
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 46210eb696d..35d838aa068 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -30,7 +30,6 @@
 #include "exec/helper-proto.h"
 #include "exec/helper-gen.h"
 #include "exec/log.h"
-#include "trace-tcg.h"
 #include "translate-a64.h"
 #include "fpu/softfloat.h"
 
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9e2cca77077..3a3ccc97eb6 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -34,7 +34,6 @@
 #include "exec/helper-proto.h"
 #include "exec/helper-gen.h"
 
-#include "trace-tcg.h"
 #include "exec/log.h"
 
 
diff --git a/target/cris/translate.c b/target/cris/translate.c
index 6dd5a267a61..044e587eeb2 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -37,7 +37,6 @@
 
 #include "exec/helper-gen.h"
 
-#include "trace-tcg.h"
 #include "exec/log.h"
 
 
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 64af1e0d5cc..424ec3252ed 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -27,7 +27,6 @@
 #include "exec/helper-proto.h"
 #include "exec/helper-gen.h"
 #include "exec/translator.h"
-#include "trace-tcg.h"
 #include "exec/log.h"
 
 /* Since we have a distinction between register size and address size,
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index a7f5c0c8f20..5fb3350c5a8 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -30,7 +30,6 @@
 #include "exec/helper-gen.h"
 #include "helper-tcg.h"
 
-#include "trace-tcg.h"
 #include "exec/log.h"
 
 #define PREFIX_REPZ   0x01
diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index f0c5bf9154e..348fc6e844e 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -31,7 +31,6 @@
 #include "exec/helper-proto.h"
 #include "exec/helper-gen.h"
 
-#include "trace-tcg.h"
 #include "exec/log.h"
 #include "fpu/softfloat.h"
 
diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
index c1b13f4c7d3..5dfb08d49f1 100644
--- a/target/microblaze/translate.c
+++ b/target/microblaze/translate.c
@@ -29,7 +29,6 @@
 #include "exec/translator.h"
 #include "qemu/qemu-print.h"
 
-#include "trace-tcg.h"
 #include "exec/log.h"
 
 #define EXTRACT_FIELD(src, start, end) \
diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
index 797eba44347..9895f7336c6 100644
--- a/target/mips/tcg/translate.c
+++ b/target/mips/tcg/translate.c
@@ -32,7 +32,6 @@
 #include "semihosting/semihost.h"
 
 #include "target/mips/trace.h"
-#include "trace-tcg.h"
 #include "exec/translator.h"
 #include "exec/log.h"
 #include "qemu/qemu-print.h"
diff --git a/target/openrisc/translate.c b/target/openrisc/translate.c
index a9c81f8bd5b..5db63d76093 100644
--- a/target/openrisc/translate.c
+++ b/target/openrisc/translate.c
@@ -33,7 +33,6 @@
 #include "exec/helper-gen.h"
 #include "exec/gen-icount.h"
 
-#include "trace-tcg.h"
 #include "exec/log.h"
 
 /* is_jmp field values */
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index f65d1e81eac..07d79acc08f 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@

Re: Too slow edk2 bios boot?

2021-06-22 Thread Laszlo Ersek

On 06/18/21 15:06, Bin Meng wrote:
> On Fri, Jun 18, 2021 at 7:46 PM Gerd Hoffmann  wrote:
> 
>> On Fri, Jun 18, 2021 at 06:46:57PM +0800, Bin Meng wrote:
>>> Hi Laszlo,
>>>
>>> Using the QEMU shipped edk2 bios, for i386, it boots very quickly to
>>> the EFI shell.
>>>
>>> $ qemu-system-i386 -nographic -pflash edk2-i386-code.fd

Ouch. Don't do this. If you use just one pflash chip, then a unified FD file is 
expected in that chip, containing both varstore and firmware executable.

Upstream QEMU does not bundle / install unified FD files however. What it 
provides are separate executables and varstore *templates*.

If you don't want to create a permanent variable store file for your VM, from 
the template called "edk2-i386-vars.fd", then the minimum command line is 
something like this:

qemu-system-i386 \
  -drive if=pflash,unit=0,format=raw,readonly=on,file=edk2-i386-code.fd \
  -drive if=pflash,unit=1,format=raw,snapshot=on,file=edk2-i386-vars.fd \

(Nowadays I should use the "blockdev" syntax instead of "-drive", but I've not 
updated my scripts thus far ;))


>>>
>>> However with x86_64, it takes a very long time to boot to the EFI
>>> shell. It seems it got stuck in the PXE boot. Any ideas?
>>
>> One year ago ia32 efi netboot support was dropped (and you are the first
>> who noticed  ).

I certainly noticed:

http://mid.mail-archive.com/e6078611-789f-027b-bea5-759e02b10eee@redhat.com


>>
> 
> I guess not many people play with ia32 these days :)
> 
> 
>>
>> commit 9ed02fbb847277bef88dbe6a677cf3e5f39e5a38
>> Author: Gerd Hoffmann 
>> Date:   Wed Jul 22 12:24:35 2020 +0200
>>
>> ipxe: drop ia32 efi roms
>>
>> UEFI on ia32 never really took off.  Basically the BIOS -> UEFI shift
>> came too late, x64 was widespread already, so vendors went from BIOS
>> straight to UEFI on x64.
>>
>> Signed-off-by: Gerd Hoffmann 
>>
>>
>>> I checked the boot manager, and it seems only 64-bit edk2 bios has
>>> built-in PXE boot while 32-bit does not.
>>
>> It isn't edk2 but the nic boot roms, but yes, lack of pxe support on
>> ia32 is the root cause.
>>
> 
> Got it.
> 
> 
>>> Any idea to speed up this whole PXE boot thing?
>>
>> qemu -nic none ?
>>
> 
> Yep this works. Thanks a lot!

If you need neither NICs nor disks in your guest at all, then "-nic none" is 
indeed the simplest solution.

Thanks,
Laszlo

[Bug 1907497] Re: [OSS-Fuzz] Issue 28435 qemu:qemu-fuzz-i386-target-generic-fuzz-intel-hda: Stack-overflow in ldl_le_dma

2021-06-22 Thread Mauro Matteo Cascella

Just FYI, this issue was assigned CVE-2021-3611 by Red Hat.

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-3611

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1907497

Title:
  [OSS-Fuzz] Issue 28435 qemu:qemu-fuzz-i386-target-generic-fuzz-intel-
  hda: Stack-overflow in ldl_le_dma

Status in QEMU:
  Confirmed

Bug description:
   affects qemu

  === Reproducer (build with --enable-sanitizers) ===

  cat << EOF | ./qemu-system-i386 -machine q35 -nodefaults \
  -device intel-hda,id=hda0 -device hda-output,bus=hda0.0 \
  -device hda-micro,bus=hda0.0 -device hda-duplex,bus=hda0.0 \
  -qtest stdio
  outl 0xcf8 0x8804
  outw 0xcfc 0x
  write 0x0 0x1 0x12
  write 0x2 0x1 0x2f
  outl 0xcf8 0x8811
  outl 0xcfc 0x5a6a4406
  write 0x6a44005a 0x1 0x11
  write 0x6a44005c 0x1 0x3f
  write 0x6a442050 0x4 0x446a
  write 0x6a44204a 0x1 0xf3
  write 0x6a44204c 0x1 0xff
  writeq 0x6a44005a 0x17b3f0011
  write 0x6a442050 0x4 0x446a
  write 0x6a44204a 0x1 0xf3
  write 0x6a44204c 0x1 0xff
  EOF

  === Stack Trace ===
  ==411958==ERROR: AddressSanitizer: stack-overflow on address 0x7ffcaeb8bc88 
(pc 0x55c7c9dc1159 bp 0x7ffcaeb8c4d0 sp 0x7ffcaeb8bc90 T0)
  #0 0x55c7c9dc1159 in __asan_memcpy (u-system-i386+0x2a13159)
  #1 0x55c7cb2a457e in flatview_do_translate softmmu/physmem.c:513:12
  #2 0x55c7cb2bdab0 in flatview_translate softmmu/physmem.c:563:15
  #3 0x55c7cb2bdab0 in flatview_read softmmu/physmem.c:2861:10
  #4 0x55c7cb2bdab0 in address_space_read_full softmmu/physmem.c:2875:18
  #5 0x55c7caaec937 in dma_memory_rw_relaxed include/sysemu/dma.h:87:18
  #6 0x55c7caaec937 in dma_memory_rw include/sysemu/dma.h:110:12
  #7 0x55c7caaec937 in dma_memory_read include/sysemu/dma.h:116:12
  #8 0x55c7caaec937 in ldl_le_dma include/sysemu/dma.h:179:1
  #9 0x55c7caaec937 in ldl_le_pci_dma include/hw/pci/pci.h:816:1
  #10 0x55c7caaec937 in intel_hda_corb_run hw/audio/intel-hda.c:338:16
  #11 0x55c7cb2e7198 in memory_region_write_accessor softmmu/memory.c:491:5
  #12 0x55c7cb2e6bd3 in access_with_adjusted_size softmmu/memory.c:552:18
  #13 0x55c7cb2e646c in memory_region_dispatch_write softmmu/memory.c
  #14 0x55c7cb2c8445 in flatview_write_continue softmmu/physmem.c:2759:23
  #15 0x55c7cb2bdfb8 in flatview_write softmmu/physmem.c:2799:14
  #16 0x55c7cb2bdfb8 in address_space_write softmmu/physmem.c:2891:18
  #17 0x55c7caae2c54 in dma_memory_rw_relaxed include/sysemu/dma.h:87:18
  #18 0x55c7caae2c54 in dma_memory_rw include/sysemu/dma.h:110:12
  #19 0x55c7caae2c54 in dma_memory_write include/sysemu/dma.h:122:12
  #20 0x55c7caae2c54 in stl_le_dma include/sysemu/dma.h:179:1
  #21 0x55c7caae2c54 in stl_le_pci_dma include/hw/pci/pci.h:816:1
  #22 0x55c7caae2c54 in intel_hda_response hw/audio/intel-hda.c:370:5
  #23 0x55c7caaeca00 in intel_hda_corb_run hw/audio/intel-hda.c:342:9
  #24 0x55c7cb2e7198 in memory_region_write_accessor softmmu/memory.c:491:5
  ...

  OSS-Fuzz Report: https://bugs.chromium.org/p/oss-
  fuzz/issues/detail?id=28435

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1907497/+subscriptions

[PATCH RFC 5/6] i386/acpi: Fix SRAT ranges in accordance to usable IOVA

2021-06-22 Thread Joao Martins

On configurations that lead to the creation of an SRAT with PXM entries
(-numa ...) because E820 and SRAT do not match, Linux tends to ignore
the ranges from SRAT, thus breaking NUMA topology in the guest.

When we start adding the ranges after 4G hole, use the newly added
iterator in add_srat_region() to create the SRAT PXM entries for the
usable GPA regions.

Signed-off-by: Joao Martins 
---
 hw/i386/acpi-build.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 796ffc6f5c40..bb0918025296 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -57,6 +57,7 @@
 #include "hw/acpi/pcihp.h"
 #include "hw/i386/fw_cfg.h"
 #include "hw/i386/ich9.h"
+#include "hw/i386/pc.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/q35.h"
 #include "hw/i386/x86-iommu.h"
@@ -1872,6 +1873,23 @@ build_tpm_tcpa(GArray *table_data, BIOSLinker *linker, 
GArray *tcpalog,
 #define HOLE_640K_START  (640 * KiB)
 #define HOLE_640K_END   (1 * MiB)
 
+static hwaddr add_srat_memory(hwaddr base, hwaddr size, GArray *table_data,
+  int pxm)
+{
+AcpiSratMemoryAffinity *numamem;
+hwaddr start, region_size;
+struct GPARange *range;
+uint32_t index;
+
+for_each_usable_range(index, base, size, range, start, region_size) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, start, region_size, pxm,
+  MEM_AFFINITY_ENABLED);
+}
+
+return start + region_size;
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
@@ -1967,9 +1985,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 }
 
 if (mem_len > 0) {
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
+next_base = add_srat_memory(mem_base, mem_len, table_data, i - 1);
 }
 }
 
-- 
2.17.1

Re: [PATCH 0/4] modules: update developer documentation

2021-06-22 Thread Paolo Bonzini


On 22/06/21 14:51, Gerd Hoffmann wrote:

Depends on the "modules: add meta-data database" patch series.

Gerd Hoffmann (4):
   modules: add documentation for module sourcesets
   modules: add module_obj() note to QOM docs
   modules: module.h kerneldoc annotations
   modules: hook up modules.h to docs build

  include/qemu/module.h   | 59 -
  docs/devel/build-system.rst | 17 +++
  docs/devel/index.rst|  1 +
  docs/devel/modules.rst  |  5 
  docs/devel/qom.rst  |  8 +
  5 files changed, 76 insertions(+), 14 deletions(-)
  create mode 100644 docs/devel/modules.rst



Reviewed-by: Paolo Bonzini 

Thank you very much!

Paolo

[PATCH RFC 4/6] i386/pc: Keep PCI 64-bit hole within usable IOVA space

2021-06-22 Thread Joao Martins

pci_memory initialized by q35 and i440fx is set to a range
of 0 .. UINT64_MAX, and as a consequence when ACPI and pci-host
pick the hole64_start it does not account for allowed IOVA ranges.

Rather than blindly returning, round up the hole64_start
value to the allowable IOVA range, such that it accounts for
the 1Tb hole *on AMD*. On Intel it returns the input value
for hole64 start.

Suggested-by: David Edmondson 
Signed-off-by: Joao Martins 
---
 hw/i386/pc.c | 17 +++--
 hw/pci-host/i440fx.c |  4 +++-
 hw/pci-host/q35.c|  4 +++-
 include/hw/i386/pc.h |  3 ++-
 4 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2e2ea82a4661..65885cc16037 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1141,7 +1141,7 @@ void pc_memory_init(PCMachineState *pcms,
  * The 64bit pci hole starts after "above 4G RAM" and
  * potentially the space reserved for memory hotplug.
  */
-uint64_t pc_pci_hole64_start(void)
+uint64_t pc_pci_hole64_start(uint64_t size)
 {
 PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
@@ -1155,12 +1155,25 @@ uint64_t pc_pci_hole64_start(void)
 hole64_start += memory_region_size(>device_memory->mr);
 }
 } else {
-hole64_start = 0x1ULL + x86ms->above_4g_mem_size;
+if (!x86ms->above_1t_mem_size) {
+hole64_start = 0x1ULL + x86ms->above_4g_mem_size;
+} else {
+hole64_start = x86ms->above_1t_maxram_start;
+}
 }
+hole64_start = allowed_round_up(hole64_start, size);
 
 return ROUND_UP(hole64_start, 1 * GiB);
 }
 
+uint64_t pc_pci_hole64_start_aligned(uint64_t start, uint64_t size)
+{
+if (nb_iova_ranges == DEFAULT_NR_USABLE_IOVAS) {
+return start;
+}
+return allowed_round_up(start, size);
+}
+
 DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus)
 {
 DeviceState *dev = NULL;
diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c
index 28c9bae89944..e8eaebfe1034 100644
--- a/hw/pci-host/i440fx.c
+++ b/hw/pci-host/i440fx.c
@@ -163,8 +163,10 @@ static uint64_t 
i440fx_pcihost_get_pci_hole64_start_value(Object *obj)
 pci_bus_get_w64_range(h->bus, );
 value = range_is_empty() ? 0 : range_lob();
 if (!value && s->pci_hole64_fix) {
-value = pc_pci_hole64_start();
+value = pc_pci_hole64_start(s->pci_hole64_size);
 }
+/* This returns @value when not on AMD */
+value = pc_pci_hole64_start_aligned(value, s->pci_hole64_size);
 return value;
 }
 
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 2eb729dff585..d556eb965ddb 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -126,8 +126,10 @@ static uint64_t q35_host_get_pci_hole64_start_value(Object 
*obj)
 pci_bus_get_w64_range(h->bus, );
 value = range_is_empty() ? 0 : range_lob();
 if (!value && s->pci_hole64_fix) {
-value = pc_pci_hole64_start();
+value = pc_pci_hole64_start(s->mch.pci_hole64_size);
 }
+/* This returns @value when not on AMD */
+value = pc_pci_hole64_start_aligned(value, s->mch.pci_hole64_size);
 return value;
 }
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 73b8e2900c72..b924aef3a218 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -217,7 +217,8 @@ void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
 MemoryRegion **ram_memory);
-uint64_t pc_pci_hole64_start(void);
+uint64_t pc_pci_hole64_start(uint64_t size);
+uint64_t pc_pci_hole64_start_aligned(uint64_t value, uint64_t size);
 DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
 void pc_basic_device_init(struct PCMachineState *pcms,
   ISABus *isa_bus, qemu_irq *gsi,
-- 
2.17.1

Re: [PATCH v3 02/24] modules: collect module meta-data

2021-06-22 Thread Paolo Bonzini


On 21/06/21 14:52, Gerd Hoffmann wrote:

ninja: error: 'libui-curses.a.p/meson-generated_.._config-host.h.o', needed by 
'ui-curses.modinfo.test', missing and no known rule to make it

Hmm, not sure where this comes from.  meson doesn't try to link
config-host.h.o into libui-curses.a, so why does extract_all_objects()
return it?

Test patch (incremental to this series) below.


Bug in Meson, fix at https://github.com/mesonbuild/meson/pull/8900.  You 
can just ignore missing files.


Paolo

[PATCH RFC 3/6] pc/cmos: Adjust CMOS above 4G memory size according to 1Tb boundary

2021-06-22 Thread Joao Martins

CMOS doesn't have the notion of reserved spaces, much like E820, so
limit the amount of memory above 4G to not acount for the memory
above 1Tb.

Suggested-by: David Edmondson 
Signed-off-by: Joao Martins 
---
 hw/i386/pc.c  | 14 --
 include/hw/i386/x86.h |  4 
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 94497f22b908..2e2ea82a4661 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -644,8 +644,12 @@ void pc_cmos_init(PCMachineState *pcms,
 val = 65535;
 rtc_set_memory(s, 0x34, val);
 rtc_set_memory(s, 0x35, val >> 8);
-/* memory above 4GiB */
-val = x86ms->above_4g_mem_size / 65536;
+/* memory above 4GiB but below 1Tib (where applicable) */
+if (!x86ms->above_1t_mem_size) {
+val = x86ms->above_4g_mem_size / 65536;
+} else {
+val = (x86ms->above_4g_mem_size - x86ms->above_1t_mem_size) / 65536;
+}
 rtc_set_memory(s, 0x5b, val);
 rtc_set_memory(s, 0x5c, val >> 8);
 rtc_set_memory(s, 0x5d, val >> 16);
@@ -1019,6 +1023,12 @@ void pc_memory_init(PCMachineState *pcms,
  x86ms->above_4g_mem_size);
 exit(EXIT_FAILURE);
 }
+
+if (nb_iova_ranges != DEFAULT_NR_USABLE_IOVAS) {
+x86ms->above_1t_maxram_start = maxram_start;
+if (maxram_start > AMD_MAX_PHYSADDR_BELOW_1TB)
+x86ms->above_1t_mem_size = maxram_start - 1 * TiB;
+}
 }
 
 if (!pcmc->has_reserved_memory &&
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 25a1f16f0121..cc22e30bd08c 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -57,6 +57,10 @@ struct X86MachineState {
 /* RAM information (sizes, addresses, configuration): */
 ram_addr_t below_4g_mem_size, above_4g_mem_size;
 
+/* RAM information when there's a hole in 1Tb */
+ram_addr_t above_1t_mem_size;
+uint64_t above_1t_maxram_start;
+
 /* CPU and apic information: */
 bool apic_xrupt_override;
 unsigned pci_irq_mask;
-- 
2.17.1

[PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary

2021-06-22 Thread Joao Martins

It is assumed that the whole GPA space is available to be
DMA addressable, within a given address space limit. Since
v5.4 based that is not true, and VFIO will validate whether
the selected IOVA is indeed valid i.e. not reserved by IOMMU
on behalf of some specific devices or platform-defined.

AMD systems with an IOMMU are examples of such platforms and
particularly may export only these ranges as allowed:

 - fedf (0  .. 3.982G)
fef0 - 00fc (3.983G .. 1011.9G)
0100 -  (1Tb.. 16Pb)

We already know of accounting for the 4G hole, albeit if the
guest is big enough we will fail to allocate a >1010G given
the ~12G hole at the 1Tb boundary, reserved for HyperTransport.

When creating the region above 4G, take into account what
IOVAs are allowed by defining the known allowed ranges
and search for the next free IOVA ranges. When finding a
invalid IOVA we mark them as reserved and proceed to the
next allowed IOVA region.

After accounting for the 1Tb hole on AMD hosts, mtree should
look like:

0001-00fc (prio 0, i/o):
alias ram-above-4g @pc.ram 8000-00fc7fff
0100-01037fff (prio 0, i/o):
alias ram-above-1t @pc.ram 00fc8000-00ff

Co-developed-by: Daniel Jordan 
Signed-off-by: Daniel Jordan 
Signed-off-by: Joao Martins 
---
 hw/i386/pc.c | 103 +++
 include/hw/i386/pc.h |  57 
 target/i386/cpu.h|   3 ++
 3 files changed, 154 insertions(+), 9 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c6d8d0d84d91..52a5473ba846 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -91,6 +91,7 @@
 #include "qapi/qmp/qerror.h"
 #include "e820_memory_layout.h"
 #include "fw_cfg.h"
+#include "target/i386/cpu.h"
 #include "trace.h"
 #include CONFIG_DEVICES
 
@@ -860,6 +861,93 @@ void xen_load_linux(PCMachineState *pcms)
 x86ms->fw_cfg = fw_cfg;
 }
 
+struct GPARange usable_iova_ranges[] = {
+{ .start = 4 * GiB, .end = UINT64_MAX, .name = "ram-above-4g" },
+
+/*
+ * AMD systems with an IOMMU have an additional hole close to the
+ * 1Tb, which are special GPAs that cannot be DMA mapped. Depending
+ * on kernel version, VFIO may or may not let you DMA map those ranges.
+ * Starting v5.4 we validate it, and can't create guests on AMD machines
+ * with certain memory sizes. The range is:
+ *
+ * FD__h - FF__h
+ *
+ * The ranges represent the following:
+ *
+ * Base Address   Top Address  Use
+ *
+ * FD__h FD_F7FF_h Reserved interrupt address space
+ * FD_F800_h FD_F8FF_h Interrupt/EOI IntCtl
+ * FD_F900_h FD_F90F_h Legacy PIC IACK
+ * FD_F910_h FD_F91F_h System Management
+ * FD_F920_h FD_FAFF_h Reserved Page Tables
+ * FD_FB00_h FD_FBFF_h Address Translation
+ * FD_FC00_h FD_FDFF_h I/O Space
+ * FD_FE00_h FD__h Configuration
+ * FE__h FE_1FFF_h Extended Configuration/Device Messages
+ * FE_2000_h FF__h Reserved
+ *
+ * See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology",
+ * Table 3: Special Address Controls (GPA) for more information.
+ */
+#define DEFAULT_NR_USABLE_IOVAS 1
+#define AMD_MAX_PHYSADDR_BELOW_1TB  0xfc
+{ .start = 1 * TiB, .end = UINT64_MAX, .name = "ram-above-1t" },
+};
+
+ uint32_t nb_iova_ranges = DEFAULT_NR_USABLE_IOVAS;
+
+static void init_usable_iova_ranges(void)
+{
+uint32_t eax, vendor[3];
+
+host_cpuid(0x0, 0, , [0], [2], [1]);
+if (IS_AMD_VENDOR(vendor)) {
+usable_iova_ranges[0].end = AMD_MAX_PHYSADDR_BELOW_1TB;
+nb_iova_ranges++;
+}
+}
+
+static void add_memory_region(MemoryRegion *system_memory, MemoryRegion *ram,
+hwaddr base, hwaddr size, hwaddr offset)
+{
+hwaddr start, region_size, resv_start, resv_end;
+struct GPARange *range;
+MemoryRegion *region;
+uint32_t index;
+
+for_each_usable_range(index, base, size, range, start, region_size) {
+region = g_malloc(sizeof(*region));
+memory_region_init_alias(region, NULL, range->name, ram,
+ offset, region_size);
+memory_region_add_subregion(system_memory, start, region);
+e820_add_entry(start, region_size, E820_RAM);
+
+assert(size >= region_size);
+if (size == region_size) {
+return;
+}
+
+/*
+ * There's memory left to create a region for, so there should be
+ * another valid IOVA range left.  Creating the reserved region
+ * would also be pointless.
+ */
+if (index + 1 == nb_iova_ranges) {
+return;
+}
+
+resv_start = start + region_size;
+resv_end = usable_iova_ranges[index + 1].start;
+
+/* Create a reserved region in the IOVA hole. */
+

[PATCH RFC 6/6] i386/pc: Add a machine property for AMD-only enforcing of valid IOVAs

2021-06-22 Thread Joao Martins

The added enforcing is only relevant in the case of AMD where the range
right before the 1TB is restricted and cannot be DMA mapped by the
kernel consequently leading to IOMMU INVALID_DEVICE_REQUEST or possibly
other kinds of IOMMU events in the AMD IOMMU.

Although, there's a case where it may make sense to disable the IOVA
relocation/validation when migrating from a non-valid-IOVA-aware qemu to
one that supports it.

Relocating RAM regions to after the 1Tb hole has consequences for guest
ABI because we are changing the memory mapping, and thus it may make
sense to allow admin to disable the validation (e.g. upon migration) to
either 1) Fail early when the VFIO DMA_MAP ioctl fails thus preventing
the migration to happen 'gracefully' or 2) allow booting a guest
unchanged from source host without risking changing the PCI mmio hole64
or other things we consider in the valid IOVA range changing underneath
the guest.

Signed-off-by: Joao Martins 
---
 hw/i386/pc.c | 29 +++--
 hw/i386/pc_piix.c|  2 ++
 hw/i386/pc_q35.c |  2 ++
 include/hw/i386/pc.h |  2 ++
 4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 65885cc16037..eb08a6d1a2b9 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -902,10 +902,14 @@ struct GPARange usable_iova_ranges[] = {
 
  uint32_t nb_iova_ranges = DEFAULT_NR_USABLE_IOVAS;
 
-static void init_usable_iova_ranges(void)
+static void init_usable_iova_ranges(PCMachineClass *pcmc)
 {
 uint32_t eax, vendor[3];
 
+if (!pcmc->enforce_valid_iova) {
+return;
+}
+
 host_cpuid(0x0, 0, , [0], [2], [1]);
 if (IS_AMD_VENDOR(vendor)) {
 usable_iova_ranges[0].end = AMD_MAX_PHYSADDR_BELOW_1TB;
@@ -1000,7 +1004,7 @@ void pc_memory_init(PCMachineState *pcms,
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
 
-init_usable_iova_ranges();
+init_usable_iova_ranges(pcmc);
 
 linux_boot = (machine->kernel_filename != NULL);
 
@@ -1685,6 +1689,23 @@ static void pc_machine_set_hpet(Object *obj, bool value, 
Error **errp)
 pcms->hpet_enabled = value;
 }
 
+static bool pc_machine_get_enforce_valid_iova(Object *obj, Error **errp)
+{
+PCMachineState *pcms = PC_MACHINE(obj);
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+
+return pcmc->enforce_valid_iova;
+}
+
+static void pc_machine_set_enforce_valid_iova(Object *obj, bool value,
+  Error **errp)
+{
+PCMachineState *pcms = PC_MACHINE(obj);
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+
+pcmc->enforce_valid_iova = value;
+}
+
 static void pc_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
 const char *name, void *opaque,
 Error **errp)
@@ -1851,6 +1872,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 pcmc->has_reserved_memory = true;
 pcmc->kvmclock_enabled = true;
 pcmc->enforce_aligned_dimm = true;
+pcmc->enforce_valid_iova = true;
 /* BIOS ACPI tables: 128K. Other BIOS datastructures: less than 4K reported
  * to be used at the moment, 32K should be enough for a while.  */
 pcmc->acpi_data_size = 0x2 + 0x8000;
@@ -1913,6 +1935,9 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 NULL, NULL);
 object_class_property_set_description(oc, PC_MACHINE_MAX_FW_SIZE,
 "Maximum combined firmware size");
+
+object_class_property_add_bool(oc, PC_MACHINE_ENFORCE_VALID_IOVA,
+pc_machine_get_enforce_valid_iova, pc_machine_set_enforce_valid_iova);
 }
 
 static const TypeInfo pc_machine_info = {
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 30b8bd6ea92d..21a08e2f6a4c 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -427,11 +427,13 @@ DEFINE_I440FX_MACHINE(v6_1, "pc-i440fx-6.1", NULL,
 
 static void pc_i440fx_6_0_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_6_1_machine_options(m);
 m->alias = NULL;
 m->is_default = false;
 compat_props_add(m->compat_props, hw_compat_6_0, hw_compat_6_0_len);
 compat_props_add(m->compat_props, pc_compat_6_0, pc_compat_6_0_len);
+pcmc->enforce_valid_iova = false;
 }
 
 DEFINE_I440FX_MACHINE(v6_0, "pc-i440fx-6.0", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 46a0f196f413..80bb89a9bae1 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -357,10 +357,12 @@ DEFINE_Q35_MACHINE(v6_1, "pc-q35-6.1", NULL,
 
 static void pc_q35_6_0_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_6_1_machine_options(m);
 m->alias = NULL;
 compat_props_add(m->compat_props, hw_compat_6_0, hw_compat_6_0_len);
 compat_props_add(m->compat_props, pc_compat_6_0, pc_compat_6_0_len);
+pcmc->enforce_valid_iova = false;
 }

Re: [PATCH] virtiofsd: Don't allow file creation with FUSE_OPEN

2021-06-22 Thread Greg Kurz

On Mon, 21 Jun 2021 14:36:12 +0100
Stefan Hajnoczi  wrote:

> On Thu, Jun 17, 2021 at 04:15:18PM +0200, Greg Kurz wrote:
> > A well behaved FUSE client uses FUSE_CREATE to create files. It isn't
> > supposed to pass O_CREAT along a FUSE_OPEN request, as documented in
> > the "fuse_lowlevel.h" header :
> > 
> > /**
> >  * Open a file
> >  *
> >  * Open flags are available in fi->flags. The following rules
> >  * apply.
> >  *
> >  *  - Creation (O_CREAT, O_EXCL, O_NOCTTY) flags will be
> >  *filtered out / handled by the kernel.
> > 
> > But if it does anyway, virtiofsd crashes with:
> > 
> > *** invalid openat64 call: O_CREAT or O_TMPFILE without mode ***: terminated
> > 
> > This is because virtiofsd ends up passing this flag to openat() without
> > passing a mode_t 4th argument which is mandatory with O_CREAT, and glibc
> > aborts.
> > 
> > The offending path is:
> > 
> > lo_open()
> > lo_do_open()
> > lo_inode_open()
> > 
> > Other callers of lo_inode_open() only pass O_RDWR and lo_create()
> > passes a valid fd to lo_do_open() which thus doesn't even call
> > lo_inode_open() in this case.
> > 
> > Specifying O_CREAT with FUSE_OPEN is a protocol violation. Check this
> > in lo_open() and return an error to the client : EINVAL since this is
> > already what glibc returns with other illegal flag combinations.
> > 
> > The FUSE filesystem doesn't currently support O_TMPFILE, but the very
> > same would happen if O_TMPFILE was passed in a FUSE_OPEN request. Check
> > that as well.
> > 
> > Signed-off-by: Greg Kurz 
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 6 ++
> >  1 file changed, 6 insertions(+)
> 
> Thank you!
> 
> Reviewed-by: Stefan Hajnoczi 

Upstream libfuse folks suggested to do the change in fuse_lowlevel.c so
that it fixes all filesystems, not only those based on passthrough_ll.c.

I'll thus post a new version.

They also seemed to be a little concerned by open() returning EINVAL
to the end user who did nothing wrong (kernel did). They suggested
that the server should rather print out an error and exit... which
isn't really an option for us. And anyway, we already return EINVAL
when we can't extract the arguments of the request. So I won't
address this concern, but I still wanted to share it here.


pgprIvT9dZPyy.pgp
Description: OpenPGP digital signature

[PATCH RFC 2/6] i386/pc: Round up the hotpluggable memory within valid IOVA ranges

2021-06-22 Thread Joao Martins

When accounting for allowed IOVA above 4G hole we also need to
consider the hotplug memory sits within allowed ranges.

Failure to do such validation, means that when we hotplug memory
and DMA map it, the DMA_MAP ioctl() fails given invalid IOVA use
but also leading to a catastrophic failure and exiting Qemu.

Similar to the region above 4G we need to make also do create a
region for the [ram .. maxram] GPA range, and select one which
is within the allowed IOVA ranges, preventing any such failures
in the future.

Co-developed-by: Daniel Jordan 
Signed-off-by: Daniel Jordan 
Signed-off-by: Joao Martins 
---
 hw/i386/pc.c | 55 ++--
 1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 52a5473ba846..94497f22b908 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -909,7 +909,35 @@ static void init_usable_iova_ranges(void)
 }
 }
 
-static void add_memory_region(MemoryRegion *system_memory, MemoryRegion *ram,
+static hwaddr allowed_round_up(hwaddr base, hwaddr size)
+{
+hwaddr base_aligned = ROUND_UP(base, 1 * GiB), addr;
+uint32_t index;
+
+for (index = 0; index < nb_iova_ranges; index++) {
+hwaddr min_iova, max_iova;
+
+min_iova = usable_iova_ranges[index].start;
+max_iova = usable_iova_ranges[index].end;
+
+if (max_iova < base_aligned) {
+continue;
+}
+
+addr = MAX(ROUND_UP(min_iova, 1 * GiB), base_aligned);
+if (addr > max_iova) {
+continue;
+}
+
+if (max_iova - addr >= size) {
+return addr;
+}
+}
+
+return 0;
+}
+
+static hwaddr add_memory_region(MemoryRegion *system_memory, MemoryRegion *ram,
 hwaddr base, hwaddr size, hwaddr offset)
 {
 hwaddr start, region_size, resv_start, resv_end;
@@ -926,7 +954,7 @@ static void add_memory_region(MemoryRegion *system_memory, 
MemoryRegion *ram,
 
 assert(size >= region_size);
 if (size == region_size) {
-return;
+return start + region_size;
 }
 
 /*
@@ -935,7 +963,7 @@ static void add_memory_region(MemoryRegion *system_memory, 
MemoryRegion *ram,
  * would also be pointless.
  */
 if (index + 1 == nb_iova_ranges) {
-return;
+break;
 }
 
 resv_start = start + region_size;
@@ -946,6 +974,8 @@ static void add_memory_region(MemoryRegion *system_memory, 
MemoryRegion *ram,
 
 offset += region_size;
 }
+
+return 0;
 }
 
 void pc_memory_init(PCMachineState *pcms,
@@ -961,6 +991,7 @@ void pc_memory_init(PCMachineState *pcms,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
+hwaddr maxram_start = 4 * GiB + x86ms->above_4g_mem_size;
 
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
@@ -981,8 +1012,13 @@ void pc_memory_init(PCMachineState *pcms,
 
 e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM);
 if (x86ms->above_4g_mem_size > 0) {
-add_memory_region(system_memory, machine->ram, 4 * GiB,
+maxram_start = add_memory_region(system_memory, machine->ram, 4 * GiB,
   x86ms->above_4g_mem_size, x86ms->below_4g_mem_size);
+if (!maxram_start) {
+error_report("unsupported amount of memory: %"PRIu64,
+ x86ms->above_4g_mem_size);
+exit(EXIT_FAILURE);
+}
 }
 
 if (!pcmc->has_reserved_memory &&
@@ -1001,6 +1037,7 @@ void pc_memory_init(PCMachineState *pcms,
 if (pcmc->has_reserved_memory &&
 (machine->ram_size < machine->maxram_size)) {
 ram_addr_t device_mem_size = machine->maxram_size - machine->ram_size;
+hwaddr device_mem_base;
 
 if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
 error_report("unsupported amount of memory slots: %"PRIu64,
@@ -1015,8 +1052,14 @@ void pc_memory_init(PCMachineState *pcms,
 exit(EXIT_FAILURE);
 }
 
-machine->device_memory->base =
-ROUND_UP(0x1ULL + x86ms->above_4g_mem_size, 1 * GiB);
+device_mem_base = allowed_round_up(maxram_start, device_mem_size);
+if (!device_mem_base) {
+error_report("unable to find device memory base for %"PRIu64
+ " - %"PRIu64, maxram_start, device_mem_size);
+exit(EXIT_FAILURE);
+}
+
+machine->device_memory->base = device_mem_base;
 
 if (pcmc->enforce_aligned_dimm) {
 /* size device region assuming 1G page max alignment per slot */
-- 
2.17.1

[PATCH RFC 0/6] i386/pc: Fix creation of >= 1Tb guests on AMD systems with IOMMU

2021-06-22 Thread Joao Martins

Hey,

This series lets Qemu properly spawn i386 guests with >= 1Tb with VFIO, 
particularly
when running on AMD systems with an IOMMU.

Since Linux v5.4, VFIO validates whether the IOVA in DMA_MAP ioctl is valid and 
it
will return -EINVAL on those cases. On x86, Intel hosts aren't particularly
affected by this extra validation. But AMD systems with IOMMU have a hole in
the 1TB boundary which is *reserved* for HyperTransport I/O addresses located
here  FD__h - FF__h. See IOMMU manual [1], specifically
section '2.1.2 IOMMU Logical Topology', Table 3 on what those addresses mean.

VFIO DMA_MAP calls in this IOVA address range fall through this check and hence 
return
 -EINVAL, consequently failing the creation the guests bigger than 1010G. 
Example
of the failure:

qemu-system-x86_64: -device vfio-pci,host=:41:10.1,bootindex=-1: 
VFIO_MAP_DMA: -22
qemu-system-x86_64: -device vfio-pci,host=:41:10.1,bootindex=-1: vfio 
:41:10.1: 
failed to setup container for group 258: memory listener initialization 
failed:
Region pc.ram: vfio_dma_map(0x55ba53e7a9d0, 0x1, 
0xff3000, 0x7ed243e0) = -22 (Invalid argument)

Prior to v5.4, we could map using these IOVAs *but* that's still not the right 
thing
to do and could trigger certain IOMMU events (e.g. INVALID_DEVICE_REQUEST), or
spurious guest VF failures from the resultant IOMMU target abort (see Errata 
1155[2])
as documented on the links down below.

This series tries to address that by dealing with this AMD-specific 1Tb hole,
similarly to how we deal with the 4G hole today in x86 in general. It is 
splitted
as following:

* patch 1: initialize the valid IOVA ranges above 4G, adding an iterator
   which gets used too in other parts of pc/acpi besides MR creation. 
The
   allowed IOVA *only* changes if it's an AMD host, so no change for
   Intel. We walk the allowed ranges for memory above 4G, and
   add a E820_RESERVED type everytime we find a hole (which is at the
   1TB boundary).
   
   NOTE: For purposes of this RFC, I rely on cpuid in hw/i386/pc.c but I
   understand that it doesn't cover the non-x86 host case running TCG.

   Additionally, an alternative to hardcoded ranges as we do today,
   VFIO could advertise the platform valid IOVA ranges without 
necessarily
   requiring to have a PCI device added in the vfio container. That 
would
   fetching the valid IOVA ranges from VFIO, rather than hardcoded IOVA
   ranges as we do today. But sadly, wouldn't work for older 
hypervisors.

* patch 2 - 5: cover the remaining parts of the surrounding the mem map, 
particularly
   ACPI SRAT ranges, CMOS, hotplug as well as the PCI 64-bit hole.

* patch 6: Add a machine property which is disabled for older machine types 
(<=6.0)
   to keep things as is.

The 'consequence' of this approach is that we may need more than the default
phys-bits e.g. a guest with 1024G, will have ~13G be put after the 1TB
address, consequently needing 41 phys-bits as opposed to the default of 40.
I can't figure a reasonable way to establish the required phys-bits we
need for the memory map in a dynamic way, especially considering that
today there's already a precedent to depend on the user to pick the right value
of phys-bits (regardless of this series).

Additionally, the reserved region is always added regardless of whether we have
VFIO devices to cover the VFIO device hotplug case.

Other options considered:

a) Consider the reserved range part of RAM, and just marking it as
E820_RESERVED without SPA allocated for it. So a -m 1024G guest would
only allocate 1010G of RAM and the remaining would be marked reserved.
This is not how what we do today for the 4G hole i.e. the RAM
actually allocated is the value specified by the user and thus RAM available
to the guest (IIUC).

b) Avoid VFIO DMA_MAP ioctl() calls to the reserved range. Similar to a) but 
done at a
later stage when RAM mrs are already allocated at the invalid GPAs. Albeit that
alone wouldn't fix the way MRs are laid out which is where fundamentally the
problem is.

The proposed approach in this series works regardless of the kernel, and
relevant for old and new Qemu.

Open to alternatives/comments/suggestions that I should pursue instead.

Joao

[1] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
[2] https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf

Joao Martins (6):
  i386/pc: Account IOVA reserved ranges above 4G boundary
  i386/pc: Round up the hotpluggable memory within valid IOVA ranges
  pc/cmos: Adjust CMOS above 4G memory size according to 1Tb boundary
  i386/pc: Keep PCI 64-bit hole within usable IOVA space
  i386/acpi: Fix SRAT ranges in accordance to usable IOVA
  i386/pc: Add a machine property for AMD-only enforcing of valid IOVAs

 hw/i386/acpi-build.c  |  22 -
 hw/i386/pc.c  | 206

[PATCH v3 13/15] target/cris: Improve JMP_INDIRECT

2021-06-22 Thread Richard Henderson

Use movcond instead of brcond to set env_pc.
Discard the btarget and btaken variables to improve
register allocation and avoid unnecessary writeback.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index ea6efe19d9..05be0a41bd 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -522,17 +522,6 @@ static void t_gen_swapr(TCGv d, TCGv s)
 tcg_temp_free(org_s);
 }
 
-static void t_gen_cc_jmp(TCGv pc_true, TCGv pc_false)
-{
-TCGLabel *l1 = gen_new_label();
-
-/* Conditional jmp.  */
-tcg_gen_mov_tl(env_pc, pc_false);
-tcg_gen_brcondi_tl(TCG_COND_EQ, env_btaken, 0, l1);
-tcg_gen_mov_tl(env_pc, pc_true);
-gen_set_label(l1);
-}
-
 static bool use_goto_tb(DisasContext *dc, target_ulong dest)
 {
 return ((dest ^ dc->base.pc_first) & TARGET_PAGE_MASK) == 0;
@@ -3319,8 +3308,17 @@ static void cris_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 /* fall through */
 
 case JMP_INDIRECT:
-t_gen_cc_jmp(env_btarget, tcg_constant_tl(npc));
+tcg_gen_movcond_tl(TCG_COND_NE, env_pc,
+   env_btaken, tcg_constant_tl(0),
+   env_btarget, tcg_constant_tl(npc));
 is_jmp = dc->cpustate_changed ? DISAS_UPDATE : DISAS_JUMP;
+
+/*
+ * We have now consumed btaken and btarget.  Hint to the
+ * tcg compiler that the writeback to env may be dropped.
+ */
+tcg_gen_discard_tl(env_btaken);
+tcg_gen_discard_tl(env_btarget);
 break;
 
 default:
-- 
2.25.1

[PATCH v3 15/15] target/cris: Do not exit tb for X_FLAG changes

2021-06-22 Thread Richard Henderson

We always know the exact value of X, that's all that matters.
This avoids splitting the TB e.g. between "ax" and "addq".

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 45548ffb5e..78cc70a320 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -3160,9 +3160,6 @@ static void cris_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cs)
 cris_clear_x_flag(dc);
 }
 
-/* Fold unhandled changes to X_FLAG into cpustate_changed. */
-dc->cpustate_changed |= dc->flags_x != (dc->base.tb->flags & X_FLAG);
-
 /*
  * All branches are delayed branches, handled immediately below.
  * We don't expect to see odd combinations of exit conditions.
-- 
2.25.1

[PATCH v3 11/15] target/cris: Add DISAS_DBRANCH

2021-06-22 Thread Richard Henderson

Move delayed branch handling to tb_stop, where we can re-use other
end-of-tb code, e.g. the evaluation of flags.  Honor single stepping.
Validate that we aren't losing state by overwriting is_jmp.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 96 -
 1 file changed, 56 insertions(+), 40 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index c9822eae4c..f58f6f2e5e 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -61,6 +61,8 @@
 #define DISAS_UPDATEDISAS_TARGET_1
 /* Cpu state was modified dynamically, excluding pc -- use npc */
 #define DISAS_UPDATE_NEXT   DISAS_TARGET_2
+/* PC update for delayed branch, see cpustate_changed otherwise */
+#define DISAS_DBRANCH   DISAS_TARGET_3
 
 /* Used by the decoder.  */
 #define EXTRACT_FIELD(src, start, end) \
@@ -3228,50 +3230,22 @@ static void cris_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cs)
 dc->cpustate_changed |= dc->flags_x != (dc->base.tb->flags & X_FLAG);
 
 /*
- * Check for delayed branches here.  If we do it before
- * actually generating any host code, the simulator will just
- * loop doing nothing for on this program location.
+ * All branches are delayed branches, handled immediately below.
+ * We don't expect to see odd combinations of exit conditions.
  */
+assert(dc->base.is_jmp == DISAS_NEXT || dc->cpustate_changed);
+
 if (dc->delayed_branch && --dc->delayed_branch == 0) {
-if (dc->base.tb->flags & 7) {
-t_gen_movi_env_TN(dslot, 0);
-}
+dc->base.is_jmp = DISAS_DBRANCH;
+return;
+}
 
-if (dc->cpustate_changed) {
-cris_store_direct_jmp(dc);
-}
-
-if (dc->clear_locked_irq) {
-dc->clear_locked_irq = 0;
-t_gen_movi_env_TN(locked_irq, 0);
-}
-
-if (dc->jmp == JMP_DIRECT_CC) {
-TCGLabel *l1 = gen_new_label();
-cris_evaluate_flags(dc);
-
-/* Conditional jmp.  */
-tcg_gen_brcondi_tl(TCG_COND_EQ, env_btaken, 0, l1);
-gen_goto_tb(dc, 1, dc->jmp_pc);
-gen_set_label(l1);
-gen_goto_tb(dc, 0, dc->pc);
-dc->base.is_jmp = DISAS_NORETURN;
-dc->jmp = JMP_NOJMP;
-} else if (dc->jmp == JMP_DIRECT) {
-cris_evaluate_flags(dc);
-gen_goto_tb(dc, 0, dc->jmp_pc);
-dc->base.is_jmp = DISAS_NORETURN;
-dc->jmp = JMP_NOJMP;
-} else {
-TCGv c = tcg_const_tl(dc->pc);
-t_gen_cc_jmp(env_btarget, c);
-tcg_temp_free(c);
-dc->base.is_jmp = DISAS_JUMP;
-}
+if (dc->base.is_jmp != DISAS_NEXT) {
+return;
 }
 
 /* Force an update if the per-tb cpu state has changed.  */
-if (dc->base.is_jmp == DISAS_NEXT && dc->cpustate_changed) {
+if (dc->cpustate_changed) {
 dc->base.is_jmp = DISAS_UPDATE_NEXT;
 return;
 }
@@ -3281,8 +3255,7 @@ static void cris_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cs)
  * If we can detect the length of the next insn easily, we should.
  * In the meantime, simply stop when we do cross.
  */
-if (dc->base.is_jmp == DISAS_NEXT
-&& ((dc->pc ^ dc->base.pc_first) & TARGET_PAGE_MASK) != 0) {
+if ((dc->pc ^ dc->base.pc_first) & TARGET_PAGE_MASK) {
 dc->base.is_jmp = DISAS_TOO_MANY;
 }
 }
@@ -3312,6 +3285,49 @@ static void cris_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 
 cris_evaluate_flags(dc);
 
+/* Evaluate delayed branch destination and fold to another is_jmp case. */
+if (is_jmp == DISAS_DBRANCH) {
+if (dc->base.tb->flags & 7) {
+t_gen_movi_env_TN(dslot, 0);
+}
+
+switch (dc->jmp) {
+case JMP_DIRECT:
+npc = dc->jmp_pc;
+is_jmp = dc->cpustate_changed ? DISAS_UPDATE_NEXT : DISAS_TOO_MANY;
+break;
+
+case JMP_DIRECT_CC:
+/*
+ * Use a conditional branch if either taken or not-taken path
+ * can use goto_tb.  If neither can, then treat it as indirect.
+ */
+if (likely(!dc->base.singlestep_enabled)
+&& likely(!dc->cpustate_changed)
+&& (use_goto_tb(dc, dc->jmp_pc) || use_goto_tb(dc, npc))) {
+TCGLabel *not_taken = gen_new_label();
+
+tcg_gen_brcondi_tl(TCG_COND_EQ, env_btaken, 0, not_taken);
+gen_goto_tb(dc, 1, dc->jmp_pc);
+gen_set_label(not_taken);
+
+/* not-taken case handled below. */
+is_jmp = DISAS_TOO_MANY;
+break;
+}
+tcg_gen_movi_tl(env_btarget, dc->jmp_pc);
+/* fall through */
+
+case JMP_INDIRECT:
+t_gen_cc_jmp(env_btarget, tcg_constant_tl(npc));
+is_jmp = dc->cpustate_changed

[PATCH v3 08/15] target/cris: Mark static arrays const

2021-06-22 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 19 ++-
 target/cris/translate_v10.c.inc |  6 +++---
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index eabede5251..e14b7acb10 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -137,14 +137,15 @@ static void gen_BUG(DisasContext *dc, const char *file, 
int line)
 cpu_abort(CPU(dc->cpu), "%s:%d pc=%x\n", file, line, dc->pc);
 }
 
-static const char *regnames_v32[] =
+static const char * const regnames_v32[] =
 {
 "$r0", "$r1", "$r2", "$r3",
 "$r4", "$r5", "$r6", "$r7",
 "$r8", "$r9", "$r10", "$r11",
 "$r12", "$r13", "$sp", "$acr",
 };
-static const char *pregnames_v32[] =
+
+static const char * const pregnames_v32[] =
 {
 "$bz", "$vr", "$pid", "$srs",
 "$wz", "$exs", "$eda", "$mof",
@@ -153,7 +154,7 @@ static const char *pregnames_v32[] =
 };
 
 /* We need this table to handle preg-moves with implicit width.  */
-static int preg_sizes[] = {
+static const int preg_sizes[] = {
 1, /* bz.  */
 1, /* vr.  */
 4, /* pid.  */
@@ -475,9 +476,9 @@ static inline void t_gen_swapw(TCGv d, TCGv s)
((T0 >> 5) & 0x02020202) |
((T0 >> 7) & 0x01010101));
  */
-static inline void t_gen_swapr(TCGv d, TCGv s)
+static void t_gen_swapr(TCGv d, TCGv s)
 {
-struct {
+static const struct {
 int shift; /* LSL when positive, LSR when negative.  */
 uint32_t mask;
 } bitrev[] = {
@@ -1279,7 +1280,7 @@ static int dec_prep_alu_m(CPUCRISState *env, DisasContext 
*dc,
 #if DISAS_CRIS
 static const char *cc_name(int cc)
 {
-static const char *cc_names[16] = {
+static const char * const cc_names[16] = {
 "cc", "cs", "ne", "eq", "vc", "vs", "pl", "mi",
 "ls", "hi", "ge", "lt", "gt", "le", "a", "p"
 };
@@ -2926,7 +2927,7 @@ static int dec_null(CPUCRISState *env, DisasContext *dc)
 return 2;
 }
 
-static struct decoder_info {
+static const struct decoder_info {
 struct {
 uint32_t bits;
 uint32_t mask;
@@ -3363,8 +3364,8 @@ void cris_cpu_dump_state(CPUState *cs, FILE *f, int flags)
 {
 CRISCPU *cpu = CRIS_CPU(cs);
 CPUCRISState *env = >env;
-const char **regnames;
-const char **pregnames;
+const char * const *regnames;
+const char * const *pregnames;
 int i;
 
 if (!env) {
diff --git a/target/cris/translate_v10.c.inc b/target/cris/translate_v10.c.inc
index 0ba2aca96f..4ab43dc404 100644
--- a/target/cris/translate_v10.c.inc
+++ b/target/cris/translate_v10.c.inc
@@ -21,7 +21,7 @@
 #include "qemu/osdep.h"
 #include "crisv10-decode.h"
 
-static const char *regnames_v10[] =
+static const char * const regnames_v10[] =
 {
 "$r0", "$r1", "$r2", "$r3",
 "$r4", "$r5", "$r6", "$r7",
@@ -29,7 +29,7 @@ static const char *regnames_v10[] =
 "$r12", "$r13", "$sp", "$pc",
 };
 
-static const char *pregnames_v10[] =
+static const char * const pregnames_v10[] =
 {
 "$bz", "$vr", "$p2", "$p3",
 "$wz", "$ccr", "$p6-prefix", "$mof",
@@ -38,7 +38,7 @@ static const char *pregnames_v10[] =
 };
 
 /* We need this table to handle preg-moves with implicit width.  */
-static int preg_sizes_v10[] = {
+static const int preg_sizes_v10[] = {
 1, /* bz.  */
 1, /* vr.  */
 1, /* pid. */
-- 
2.25.1

[PATCH v3 12/15] target/cris: Use tcg_gen_lookup_and_goto_ptr

2021-06-22 Thread Richard Henderson

We can use this in gen_goto_tb and for DISAS_JUMP
to indirectly chain to the next TB.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index f58f6f2e5e..ea6efe19d9 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -546,7 +546,7 @@ static void gen_goto_tb(DisasContext *dc, int n, 
target_ulong dest)
 tcg_gen_exit_tb(dc->base.tb, n);
 } else {
 tcg_gen_movi_tl(env_pc, dest);
-tcg_gen_exit_tb(NULL, 0);
+tcg_gen_lookup_and_goto_ptr();
 }
 }
 
@@ -3352,6 +3352,8 @@ static void cris_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 tcg_gen_movi_tl(env_pc, npc);
 /* fall through */
 case DISAS_JUMP:
+tcg_gen_lookup_and_goto_ptr();
+break;
 case DISAS_UPDATE:
 /* Indicate that interupts must be re-evaluated before the next TB. */
 tcg_gen_exit_tb(NULL, 0);
-- 
2.25.1

[PATCH v3 05/15] target/cris: Fix use_goto_tb

2021-06-22 Thread Richard Henderson

Do not skip the page check for user-only -- mmap/mprotect can
still change page mappings.  Only check dc->base.pc_first, not
dc->ppc -- the start page is the only one that's relevant.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 24dbae6d58..9e1f2f9239 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -524,14 +524,9 @@ static void t_gen_cc_jmp(TCGv pc_true, TCGv pc_false)
 gen_set_label(l1);
 }
 
-static inline bool use_goto_tb(DisasContext *dc, target_ulong dest)
+static bool use_goto_tb(DisasContext *dc, target_ulong dest)
 {
-#ifndef CONFIG_USER_ONLY
-return (dc->base.pc_first & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) 
||
-   (dc->ppc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK);
-#else
-return true;
-#endif
+return ((dest ^ dc->base.pc_first) & TARGET_PAGE_MASK) == 0;
 }
 
 static void gen_goto_tb(DisasContext *dc, int n, target_ulong dest)
-- 
2.25.1

[PATCH v3 10/15] target/cris: Add DISAS_UPDATE_NEXT

2021-06-22 Thread Richard Henderson

Move this pc update into tb_stop.
We will be able to re-use this code shortly.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 80276ae84d..c9822eae4c 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -52,9 +52,15 @@
 #define BUG() (gen_BUG(dc, __FILE__, __LINE__))
 #define BUG_ON(x) ({if (x) BUG();})
 
-/* is_jmp field values */
-#define DISAS_JUMPDISAS_TARGET_0 /* only pc was modified dynamically */
-#define DISAS_UPDATE  DISAS_TARGET_1 /* cpu state was modified dynamically */
+/*
+ * Target-specific is_jmp field values
+ */
+/* Only pc was modified dynamically */
+#define DISAS_JUMP  DISAS_TARGET_0
+/* Cpu state was modified dynamically, including pc */
+#define DISAS_UPDATEDISAS_TARGET_1
+/* Cpu state was modified dynamically, excluding pc -- use npc */
+#define DISAS_UPDATE_NEXT   DISAS_TARGET_2
 
 /* Used by the decoder.  */
 #define EXTRACT_FIELD(src, start, end) \
@@ -3266,8 +3272,8 @@ static void cris_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cs)
 
 /* Force an update if the per-tb cpu state has changed.  */
 if (dc->base.is_jmp == DISAS_NEXT && dc->cpustate_changed) {
-dc->base.is_jmp = DISAS_UPDATE;
-tcg_gen_movi_tl(env_pc, dc->pc);
+dc->base.is_jmp = DISAS_UPDATE_NEXT;
+return;
 }
 
 /*
@@ -3309,6 +3315,7 @@ static void cris_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 if (unlikely(dc->base.singlestep_enabled)) {
 switch (is_jmp) {
 case DISAS_TOO_MANY:
+case DISAS_UPDATE_NEXT:
 tcg_gen_movi_tl(env_pc, npc);
 /* fall through */
 case DISAS_JUMP:
@@ -3325,6 +3332,9 @@ static void cris_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 case DISAS_TOO_MANY:
 gen_goto_tb(dc, 0, npc);
 break;
+case DISAS_UPDATE_NEXT:
+tcg_gen_movi_tl(env_pc, npc);
+/* fall through */
 case DISAS_JUMP:
 case DISAS_UPDATE:
 /* Indicate that interupts must be re-evaluated before the next TB. */
-- 
2.25.1

Re: [PATCH v4 0/6] acpi: Error Record Serialization Table, ERST, support for QEMU

2021-06-22 Thread Igor Mammedov

On Fri, 11 Jun 2021 14:31:17 -0400
Eric DeVolder  wrote:

> This patchset introduces support for the ACPI Error Record
> Serialization Table, ERST.
> 
> Linux uses the persistent storage filesystem, pstore, to record
> information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
> independent of, and runs before, kdump.  In certain scenarios (ie.
> hosts/guests with root filesystems on NFS/iSCSI where networking
> software and/or hardware fails), pstore may contain the only
> information available for post-mortem debugging.
> 
> Two common storage backends for the pstore filesystem are ACPI ERST
> and UEFI. Most BIOS implement ACPI ERST; however, ACPI ERST is not
> currently supported in QEMU, and UEFI is not utilized in all guests.
> By implementing ACPI ERST within QEMU, then the ACPI ERST becomes a
> viable pstore storage backend for virtual machines (as it is now for
> bare metal machines).
> 
> Enabling support for ACPI ERST facilitates a consistent method to
> capture kernel panic information in a wide range of guests: from
> resource-constrained microvms to very large guests, and in
> particular, in direct-boot environments (which would lack UEFI
> run-time services).
> 
> Note that Microsoft Windows also utilizes the ACPI ERST for certain
> crash information, if available.
> 
> The ACPI ERST persistent storage is contained within a single backing
> file. The size and location of the backing file is specified upon
> QEMU startup of the ACPI ERST device.
> 
> The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
> (APEI)", and specifically subsection "Error Serialization", outlines
> a method for storing error records into persistent storage.
> 
> [1] "Advanced Configuration and Power Interface Specification",
> version 6.2, May 2017.
> https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> 
> [2] "Unified Extensible Firmware Interface Specification",
> version 2.8, March 2019.
> https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> 
> Suggested-by: Konrad Wilk 
> Signed-off-by: Eric DeVolder 
> 
> ---
> v4: 11jun2021
>  - Converted to a PCI device, per Igor.

Series looks much better now that impl. were split into
backend/frontend parts and dynamic MMIO placement.

I left some mandatory nit-picking about
comments, styles, overall documentation, leftovers
from previous revisions.
And also some how we can simplify impl. a bit more.


>  - Updated qtest.
> 
> v3: 28may2021
>  - Converted to using a TYPE_MEMORY_BACKEND_FILE object rather than
>internal array with explicit file operations, per Igor.
>  - Changed the way the qdev and base address are handled, allowing
>ERST to be disabled at run-time. Also aligns better with other
>existing code.
> 
> v2: 8feb2021
>  - Added qtest/smoke test per Paolo Bonzini
>  - Split patch into smaller chunks, per Igo Mammedov
>  - Did away with use of ACPI packed structures, per Igo Mammedov
> 
> v1: 26oct2020
>  - initial post
> 
> ---
> Eric DeVolder (6):
>   ACPI ERST: bios-tables-test.c steps 1 and 2
>   ACPI ERST: header file for ERST
>   ACPI ERST: support for ACPI ERST feature
>   ACPI ERST: create ACPI ERST table for pc/x86 machines.
>   ACPI ERST: qtest for ERST
>   ACPI ERST: step 6 of bios-tables-test.c
> 
>  hw/acpi/erst.c   | 880 
> +++
>  hw/acpi/meson.build  |   1 +
>  hw/i386/acpi-build.c |   5 +
>  include/hw/acpi/erst.h   |  79 
>  tests/data/acpi/microvm/ERST |   0
>  tests/data/acpi/pc/ERST  | Bin 0 -> 976 bytes
>  tests/data/acpi/q35/ERST | Bin 0 -> 976 bytes
>  tests/qtest/erst-test.c  | 109 ++
>  tests/qtest/meson.build  |   2 +
>  9 files changed, 1076 insertions(+)
>  create mode 100644 hw/acpi/erst.c
>  create mode 100644 include/hw/acpi/erst.h
>  create mode 100644 tests/data/acpi/microvm/ERST
>  create mode 100644 tests/data/acpi/pc/ERST
>  create mode 100644 tests/data/acpi/q35/ERST
>  create mode 100644 tests/qtest/erst-test.c
>

[PATCH v3 07/15] target/cris: Mark helper_raise_exception noreturn

2021-06-22 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/cris/helper.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/cris/helper.h b/target/cris/helper.h
index 20d21c4358..3abf608682 100644
--- a/target/cris/helper.h
+++ b/target/cris/helper.h
@@ -1,4 +1,4 @@
-DEF_HELPER_2(raise_exception, void, env, i32)
+DEF_HELPER_2(raise_exception, noreturn, env, i32)
 DEF_HELPER_2(tlb_flush_pid, void, env, i32)
 DEF_HELPER_2(spc_write, void, env, i32)
 DEF_HELPER_1(rfe, void, env)
-- 
2.25.1

[PATCH v3 04/15] target/cris: Mark exceptions as DISAS_NORETURN

2021-06-22 Thread Richard Henderson

After we've raised the exception, we have left the TB.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 5 +++--
 target/cris/translate_v10.c.inc | 3 ++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index e086ff9131..24dbae6d58 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -2873,6 +2873,7 @@ static int dec_rfe_etc(CPUCRISState *env, DisasContext 
*dc)
-offsetof(CRISCPU, env) + offsetof(CPUState, halted));
 tcg_gen_movi_tl(env_pc, dc->pc + 2);
 t_gen_raise_exception(EXCP_HLT);
+dc->base.is_jmp = DISAS_NORETURN;
 return 2;
 }
 
@@ -2900,7 +2901,7 @@ static int dec_rfe_etc(CPUCRISState *env, DisasContext 
*dc)
 /* Breaks start at 16 in the exception vector.  */
 t_gen_movi_env_TN(trap_vector, dc->op1 + 16);
 t_gen_raise_exception(EXCP_BREAK);
-dc->base.is_jmp = DISAS_UPDATE;
+dc->base.is_jmp = DISAS_NORETURN;
 break;
 default:
 printf("op2=%x\n", dc->op2);
@@ -3188,7 +3189,7 @@ void gen_intermediate_code(CPUState *cs, TranslationBlock 
*tb, int max_insns)
 cris_evaluate_flags(dc);
 tcg_gen_movi_tl(env_pc, dc->pc);
 t_gen_raise_exception(EXCP_DEBUG);
-dc->base.is_jmp = DISAS_UPDATE;
+dc->base.is_jmp = DISAS_NORETURN;
 /* The address covered by the breakpoint must be included in
[tb->pc, tb->pc + tb->size) in order to for it to be
properly cleared -- thus we increment the PC here so that
diff --git a/target/cris/translate_v10.c.inc b/target/cris/translate_v10.c.inc
index dd44a7eb97..0ba2aca96f 100644
--- a/target/cris/translate_v10.c.inc
+++ b/target/cris/translate_v10.c.inc
@@ -61,6 +61,7 @@ static inline void cris_illegal_insn(DisasContext *dc)
 {
 qemu_log_mask(LOG_GUEST_ERROR, "illegal insn at pc=%x\n", dc->pc);
 t_gen_raise_exception(EXCP_BREAK);
+dc->base.is_jmp = DISAS_NORETURN;
 }
 
 static void gen_store_v10_conditional(DisasContext *dc, TCGv addr, TCGv val,
@@ -1169,7 +1170,7 @@ static unsigned int dec10_ind(CPUCRISState *env, 
DisasContext *dc)
 t_gen_mov_env_TN(trap_vector, c);
 tcg_temp_free(c);
 t_gen_raise_exception(EXCP_BREAK);
-dc->base.is_jmp = DISAS_UPDATE;
+dc->base.is_jmp = DISAS_NORETURN;
 return insn_len;
 }
 LOG_DIS("%d: jump.%d %d r%d r%d\n", __LINE__, size,
-- 
2.25.1

[PATCH v3 14/15] target/cris: Remove dc->flagx_known

2021-06-22 Thread Richard Henderson

Ever since 2a44f7f17364, flagx_known is always true.
Fold away all of the tests against the flag.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 99 -
 target/cris/translate_v10.c.inc |  6 +-
 2 files changed, 24 insertions(+), 81 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 05be0a41bd..45548ffb5e 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -120,8 +120,6 @@ typedef struct DisasContext {
 
 int cc_x_uptodate;  /* 1 - ccs, 2 - known | X_FLAG. 0 not up-to-date.  */
 int flags_uptodate; /* Whether or not $ccs is up-to-date.  */
-int flagx_known; /* Whether or not flags_x has the x flag known at
-translation time.  */
 int flags_x;
 
 int clear_x; /* Clear x after this insn?  */
@@ -377,66 +375,26 @@ static inline void t_gen_add_flag(TCGv d, int flag)
 
 static inline void t_gen_addx_carry(DisasContext *dc, TCGv d)
 {
-if (dc->flagx_known) {
-if (dc->flags_x) {
-TCGv c;
-
-c = tcg_temp_new();
-t_gen_mov_TN_preg(c, PR_CCS);
-/* C flag is already at bit 0.  */
-tcg_gen_andi_tl(c, c, C_FLAG);
-tcg_gen_add_tl(d, d, c);
-tcg_temp_free(c);
-}
-} else {
-TCGv x, c;
+if (dc->flags_x) {
+TCGv c = tcg_temp_new();
 
-x = tcg_temp_new();
-c = tcg_temp_new();
-t_gen_mov_TN_preg(x, PR_CCS);
-tcg_gen_mov_tl(c, x);
-
-/* Propagate carry into d if X is set. Branch free.  */
+t_gen_mov_TN_preg(c, PR_CCS);
+/* C flag is already at bit 0.  */
 tcg_gen_andi_tl(c, c, C_FLAG);
-tcg_gen_andi_tl(x, x, X_FLAG);
-tcg_gen_shri_tl(x, x, 4);
-
-tcg_gen_and_tl(x, x, c);
-tcg_gen_add_tl(d, d, x);
-tcg_temp_free(x);
+tcg_gen_add_tl(d, d, c);
 tcg_temp_free(c);
 }
 }
 
 static inline void t_gen_subx_carry(DisasContext *dc, TCGv d)
 {
-if (dc->flagx_known) {
-if (dc->flags_x) {
-TCGv c;
-
-c = tcg_temp_new();
-t_gen_mov_TN_preg(c, PR_CCS);
-/* C flag is already at bit 0.  */
-tcg_gen_andi_tl(c, c, C_FLAG);
-tcg_gen_sub_tl(d, d, c);
-tcg_temp_free(c);
-}
-} else {
-TCGv x, c;
+if (dc->flags_x) {
+TCGv c = tcg_temp_new();
 
-x = tcg_temp_new();
-c = tcg_temp_new();
-t_gen_mov_TN_preg(x, PR_CCS);
-tcg_gen_mov_tl(c, x);
-
-/* Propagate carry into d if X is set. Branch free.  */
+t_gen_mov_TN_preg(c, PR_CCS);
+/* C flag is already at bit 0.  */
 tcg_gen_andi_tl(c, c, C_FLAG);
-tcg_gen_andi_tl(x, x, X_FLAG);
-tcg_gen_shri_tl(x, x, 4);
-
-tcg_gen_and_tl(x, x, c);
-tcg_gen_sub_tl(d, d, x);
-tcg_temp_free(x);
+tcg_gen_sub_tl(d, d, c);
 tcg_temp_free(c);
 }
 }
@@ -541,11 +499,9 @@ static void gen_goto_tb(DisasContext *dc, int n, 
target_ulong dest)
 
 static inline void cris_clear_x_flag(DisasContext *dc)
 {
-if (dc->flagx_known && dc->flags_x) {
+if (dc->flags_x) {
 dc->flags_uptodate = 0;
 }
-
-dc->flagx_known = 1;
 dc->flags_x = 0;
 }
 
@@ -630,12 +586,10 @@ static void cris_evaluate_flags(DisasContext *dc)
 break;
 }
 
-if (dc->flagx_known) {
-if (dc->flags_x) {
-tcg_gen_ori_tl(cpu_PR[PR_CCS], cpu_PR[PR_CCS], X_FLAG);
-} else if (dc->cc_op == CC_OP_FLAGS) {
-tcg_gen_andi_tl(cpu_PR[PR_CCS], cpu_PR[PR_CCS], ~X_FLAG);
-}
+if (dc->flags_x) {
+tcg_gen_ori_tl(cpu_PR[PR_CCS], cpu_PR[PR_CCS], X_FLAG);
+} else if (dc->cc_op == CC_OP_FLAGS) {
+tcg_gen_andi_tl(cpu_PR[PR_CCS], cpu_PR[PR_CCS], ~X_FLAG);
 }
 dc->flags_uptodate = 1;
 }
@@ -670,16 +624,11 @@ static void cris_update_cc_op(DisasContext *dc, int op, 
int size)
 static inline void cris_update_cc_x(DisasContext *dc)
 {
 /* Save the x flag state at the time of the cc snapshot.  */
-if (dc->flagx_known) {
-if (dc->cc_x_uptodate == (2 | dc->flags_x)) {
-return;
-}
-tcg_gen_movi_tl(cc_x, dc->flags_x);
-dc->cc_x_uptodate = 2 | dc->flags_x;
-} else {
-tcg_gen_andi_tl(cc_x, cpu_PR[PR_CCS], X_FLAG);
-dc->cc_x_uptodate = 1;
+if (dc->cc_x_uptodate == (2 | dc->flags_x)) {
+return;
 }
+tcg_gen_movi_tl(cc_x, dc->flags_x);
+dc->cc_x_uptodate = 2 | dc->flags_x;
 }
 
 /* Update cc prior to executing ALU op. Needs source operands untouched.  */
@@ -1131,7 +1080,7 @@ static void gen_store (DisasContext *dc, TCGv addr, TCGv 
val,
 
 /* Conditional writes. We only support the kind were X and P are known
at translation time.  */
-if (dc->flagx_known && dc->flags_x && (dc->tb_flags & P_FLAG)) {
+if (dc->flags_x &&

[PATCH v3 09/15] target/cris: Fold unhandled X_FLAG changes into cpustate_changed

2021-06-22 Thread Richard Henderson

We really do this already, by including them into the same test.
This just hoists the expression up a bit.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index e14b7acb10..80276ae84d 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -3217,6 +3217,10 @@ static void cris_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cs)
 cris_clear_x_flag(dc);
 }
 
+/* Fold unhandled changes to X_FLAG into cpustate_changed. */
+dc->cpustate_changed |= !dc->flagx_known;
+dc->cpustate_changed |= dc->flags_x != (dc->base.tb->flags & X_FLAG);
+
 /*
  * Check for delayed branches here.  If we do it before
  * actually generating any host code, the simulator will just
@@ -3227,9 +3231,7 @@ static void cris_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cs)
 t_gen_movi_env_TN(dslot, 0);
 }
 
-if (dc->cpustate_changed
-|| !dc->flagx_known
-|| (dc->flags_x != (dc->base.tb->flags & X_FLAG))) {
+if (dc->cpustate_changed) {
 cris_store_direct_jmp(dc);
 }
 
@@ -3263,10 +3265,7 @@ static void cris_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cs)
 }
 
 /* Force an update if the per-tb cpu state has changed.  */
-if (dc->base.is_jmp == DISAS_NEXT
-&& (dc->cpustate_changed
-|| !dc->flagx_known
-|| (dc->flags_x != (dc->base.tb->flags & X_FLAG {
+if (dc->base.is_jmp == DISAS_NEXT && dc->cpustate_changed) {
 dc->base.is_jmp = DISAS_UPDATE;
 tcg_gen_movi_tl(env_pc, dc->pc);
 }
-- 
2.25.1

[PATCH v3 06/15] target/cris: Convert to TranslatorOps

2021-06-22 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 317 ++--
 1 file changed, 174 insertions(+), 143 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 9e1f2f9239..eabede5251 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -3114,17 +3114,12 @@ static unsigned int crisv32_decoder(CPUCRISState *env, 
DisasContext *dc)
  *
  */
 
-/* generate intermediate code for basic block 'tb'.  */
-void gen_intermediate_code(CPUState *cs, TranslationBlock *tb, int max_insns)
+static void cris_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
 {
+DisasContext *dc = container_of(dcbase, DisasContext, base);
 CPUCRISState *env = cs->env_ptr;
+uint32_t tb_flags = dc->base.tb->flags;
 uint32_t pc_start;
-unsigned int insn_len;
-struct DisasContext ctx;
-struct DisasContext *dc = 
-uint32_t page_start;
-target_ulong npc;
-int num_insns;
 
 if (env->pregs[PR_VR] == 32) {
 dc->decoder = crisv32_decoder;
@@ -3134,150 +3129,174 @@ void gen_intermediate_code(CPUState *cs, 
TranslationBlock *tb, int max_insns)
 dc->clear_locked_irq = 1;
 }
 
-/* Odd PC indicates that branch is rexecuting due to exception in the
+/*
+ * Odd PC indicates that branch is rexecuting due to exception in the
  * delayslot, like in real hw.
  */
-pc_start = tb->pc & ~1;
-
-dc->base.tb = tb;
+pc_start = dc->base.pc_first & ~1;
 dc->base.pc_first = pc_start;
 dc->base.pc_next = pc_start;
-dc->base.is_jmp = DISAS_NEXT;
-dc->base.singlestep_enabled = cs->singlestep_enabled;
 
 dc->cpu = env_archcpu(env);
 dc->ppc = pc_start;
 dc->pc = pc_start;
 dc->flags_uptodate = 1;
 dc->flagx_known = 1;
-dc->flags_x = tb->flags & X_FLAG;
+dc->flags_x = tb_flags & X_FLAG;
 dc->cc_x_uptodate = 0;
 dc->cc_mask = 0;
 dc->update_cc = 0;
 dc->clear_prefix = 0;
+dc->cpustate_changed = 0;
 
 cris_update_cc_op(dc, CC_OP_FLAGS, 4);
 dc->cc_size_uptodate = -1;
 
 /* Decode TB flags.  */
-dc->tb_flags = tb->flags & (S_FLAG | P_FLAG | U_FLAG \
-| X_FLAG | PFIX_FLAG);
-dc->delayed_branch = !!(tb->flags & 7);
+dc->tb_flags = tb_flags & (S_FLAG | P_FLAG | U_FLAG | X_FLAG | PFIX_FLAG);
+dc->delayed_branch = !!(tb_flags & 7);
 if (dc->delayed_branch) {
 dc->jmp = JMP_INDIRECT;
 } else {
 dc->jmp = JMP_NOJMP;
 }
+}
 
-dc->cpustate_changed = 0;
+static void cris_tr_tb_start(DisasContextBase *db, CPUState *cpu)
+{
+}
 
-page_start = pc_start & TARGET_PAGE_MASK;
-num_insns = 0;
+static void cris_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
+{
+DisasContext *dc = container_of(dcbase, DisasContext, base);
 
-gen_tb_start(tb);
-do {
-tcg_gen_insn_start(dc->delayed_branch == 1
-   ? dc->ppc | 1 : dc->pc);
-num_insns++;
+tcg_gen_insn_start(dc->delayed_branch == 1 ? dc->ppc | 1 : dc->pc);
+}
 
-if (unlikely(cpu_breakpoint_test(cs, dc->pc, BP_ANY))) {
+static bool cris_tr_breakpoint_check(DisasContextBase *dcbase, CPUState *cpu,
+ const CPUBreakpoint *bp)
+{
+DisasContext *dc = container_of(dcbase, DisasContext, base);
+
+cris_evaluate_flags(dc);
+tcg_gen_movi_tl(env_pc, dc->pc);
+t_gen_raise_exception(EXCP_DEBUG);
+dc->base.is_jmp = DISAS_NORETURN;
+/*
+ * The address covered by the breakpoint must be included in
+ * [tb->pc, tb->pc + tb->size) in order to for it to be
+ * properly cleared -- thus we increment the PC here so that
+ * the logic setting tb->size below does the right thing.
+ */
+dc->pc += 2;
+return true;
+}
+
+static void cris_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
+{
+DisasContext *dc = container_of(dcbase, DisasContext, base);
+CPUCRISState *env = cs->env_ptr;
+unsigned int insn_len;
+
+/* Pretty disas.  */
+LOG_DIS("%8.8x:\t", dc->pc);
+
+dc->clear_x = 1;
+
+insn_len = dc->decoder(env, dc);
+dc->ppc = dc->pc;
+dc->pc += insn_len;
+dc->base.pc_next += insn_len;
+
+if (dc->base.is_jmp == DISAS_NORETURN) {
+return;
+}
+
+if (dc->clear_x) {
+cris_clear_x_flag(dc);
+}
+
+/*
+ * Check for delayed branches here.  If we do it before
+ * actually generating any host code, the simulator will just
+ * loop doing nothing for on this program location.
+ */
+if (dc->delayed_branch && --dc->delayed_branch == 0) {
+if (dc->base.tb->flags & 7) {
+t_gen_movi_env_TN(dslot, 0);
+}
+
+if (dc->cpustate_changed
+|| !dc->flagx_known
+|| (dc->flags_x != (dc->base.tb->flags & X_FLAG))) {
+cris_store_direct_jmp(dc);
+}
+
+if (dc->clear_locked_irq) {
+dc->clear_locked_irq = 0;
+

[PATCH v3 00/15] target/cris: Convert to TranslatorOps

2021-06-22 Thread Richard Henderson

Changes for v3:
  * Fix delayed branch changes vs cpustate_changed.
  * Tidy some X_FLAG handling.

Changes for v2:
  * Fix (drop) singlestep check for max_insns.
We already do that generically.
  * Move delay branch handling to tb_stop.
  * Improve tcg_gen_lookup_and_goto_ptr patch.
  * Patch 8, Use movcond for t_gen_cc_jmp, now
folded into single caller, for JMP_INDIRECT.


r~


Richard Henderson (15):
  target/cris: Add DisasContextBase to DisasContext
  target/cris: Remove DISAS_SWI
  target/cris: Replace DISAS_TB_JUMP with DISAS_NORETURN
  target/cris: Mark exceptions as DISAS_NORETURN
  target/cris: Fix use_goto_tb
  target/cris: Convert to TranslatorOps
  target/cris: Mark helper_raise_exception noreturn
  target/cris: Mark static arrays const
  target/cris: Fold unhandled X_FLAG changes into cpustate_changed
  target/cris: Add DISAS_UPDATE_NEXT
  target/cris: Add DISAS_DBRANCH
  target/cris: Use tcg_gen_lookup_and_goto_ptr
  target/cris: Improve JMP_INDIRECT
  target/cris: Remove dc->flagx_known
  target/cris: Do not exit tb for X_FLAG changes

 target/cris/helper.h|   2 +-
 target/cris/translate.c | 513 
 target/cris/translate_v10.c.inc |  17 +-
 3 files changed, 262 insertions(+), 270 deletions(-)

-- 
2.25.1

[PATCH v3 03/15] target/cris: Replace DISAS_TB_JUMP with DISAS_NORETURN

2021-06-22 Thread Richard Henderson

The only semantic of DISAS_TB_JUMP is that we've done goto_tb,
which is the same as DISAS_NORETURN -- we've exited the tb.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 8c1bad9564..e086ff9131 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -55,7 +55,6 @@
 /* is_jmp field values */
 #define DISAS_JUMPDISAS_TARGET_0 /* only pc was modified dynamically */
 #define DISAS_UPDATE  DISAS_TARGET_1 /* cpu state was modified dynamically */
-#define DISAS_TB_JUMP DISAS_TARGET_2 /* only pc was modified statically */
 
 /* Used by the decoder.  */
 #define EXTRACT_FIELD(src, start, end) \
@@ -3242,12 +3241,12 @@ void gen_intermediate_code(CPUState *cs, 
TranslationBlock *tb, int max_insns)
 gen_goto_tb(dc, 1, dc->jmp_pc);
 gen_set_label(l1);
 gen_goto_tb(dc, 0, dc->pc);
-dc->base.is_jmp = DISAS_TB_JUMP;
+dc->base.is_jmp = DISAS_NORETURN;
 dc->jmp = JMP_NOJMP;
 } else if (dc->jmp == JMP_DIRECT) {
 cris_evaluate_flags(dc);
 gen_goto_tb(dc, 0, dc->jmp_pc);
-dc->base.is_jmp = DISAS_TB_JUMP;
+dc->base.is_jmp = DISAS_NORETURN;
 dc->jmp = JMP_NOJMP;
 } else {
 TCGv c = tcg_const_tl(dc->pc);
@@ -3309,7 +3308,7 @@ void gen_intermediate_code(CPUState *cs, TranslationBlock 
*tb, int max_insns)
to find the next TB */
 tcg_gen_exit_tb(NULL, 0);
 break;
-case DISAS_TB_JUMP:
+case DISAS_NORETURN:
 /* nothing more to generate */
 break;
 }
-- 
2.25.1

[PATCH v3 02/15] target/cris: Remove DISAS_SWI

2021-06-22 Thread Richard Henderson

This value is unused.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index bed7a7ed10..8c1bad9564 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -56,7 +56,6 @@
 #define DISAS_JUMPDISAS_TARGET_0 /* only pc was modified dynamically */
 #define DISAS_UPDATE  DISAS_TARGET_1 /* cpu state was modified dynamically */
 #define DISAS_TB_JUMP DISAS_TARGET_2 /* only pc was modified statically */
-#define DISAS_SWI DISAS_TARGET_3
 
 /* Used by the decoder.  */
 #define EXTRACT_FIELD(src, start, end) \
@@ -3310,7 +3309,6 @@ void gen_intermediate_code(CPUState *cs, TranslationBlock 
*tb, int max_insns)
to find the next TB */
 tcg_gen_exit_tb(NULL, 0);
 break;
-case DISAS_SWI:
 case DISAS_TB_JUMP:
 /* nothing more to generate */
 break;
-- 
2.25.1

[PATCH v3 01/15] target/cris: Add DisasContextBase to DisasContext

2021-06-22 Thread Richard Henderson

Migrate the is_jmp, tb and singlestep_enabled fields
from DisasContext into the base.

Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 49 +
 target/cris/translate_v10.c.inc |  4 +--
 2 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 6dd5a267a6..bed7a7ed10 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -85,6 +85,8 @@ static TCGv env_pc;
 
 /* This is the state at translation time.  */
 typedef struct DisasContext {
+DisasContextBase base;
+
 CRISCPU *cpu;
 target_ulong pc, ppc;
 
@@ -121,7 +123,6 @@ typedef struct DisasContext {
 int clear_locked_irq; /* Clear the irq lockout.  */
 int cpustate_changed;
 unsigned int tb_flags; /* tb dependent flags.  */
-int is_jmp;
 
 #define JMP_NOJMP 0
 #define JMP_DIRECT1
@@ -131,9 +132,6 @@ typedef struct DisasContext {
 uint32_t jmp_pc;
 
 int delayed_branch;
-
-TranslationBlock *tb;
-int singlestep_enabled;
 } DisasContext;
 
 static void gen_BUG(DisasContext *dc, const char *file, int line)
@@ -531,7 +529,7 @@ static void t_gen_cc_jmp(TCGv pc_true, TCGv pc_false)
 static inline bool use_goto_tb(DisasContext *dc, target_ulong dest)
 {
 #ifndef CONFIG_USER_ONLY
-return (dc->tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) ||
+return (dc->base.pc_first & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) 
||
(dc->ppc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK);
 #else
 return true;
@@ -543,7 +541,7 @@ static void gen_goto_tb(DisasContext *dc, int n, 
target_ulong dest)
 if (use_goto_tb(dc, dest)) {
 tcg_gen_goto_tb(n);
 tcg_gen_movi_tl(env_pc, dest);
-tcg_gen_exit_tb(dc->tb, n);
+tcg_gen_exit_tb(dc->base.tb, n);
 } else {
 tcg_gen_movi_tl(env_pc, dest);
 tcg_gen_exit_tb(NULL, 0);
@@ -2037,14 +2035,14 @@ static int dec_setclrf(CPUCRISState *env, DisasContext 
*dc)
 /* Break the TB if any of the SPI flag changes.  */
 if (flags & (P_FLAG | S_FLAG)) {
 tcg_gen_movi_tl(env_pc, dc->pc + 2);
-dc->is_jmp = DISAS_UPDATE;
+dc->base.is_jmp = DISAS_UPDATE;
 dc->cpustate_changed = 1;
 }
 
 /* For the I flag, only act on posedge.  */
 if ((flags & I_FLAG)) {
 tcg_gen_movi_tl(env_pc, dc->pc + 2);
-dc->is_jmp = DISAS_UPDATE;
+dc->base.is_jmp = DISAS_UPDATE;
 dc->cpustate_changed = 1;
 }
 
@@ -2886,14 +2884,14 @@ static int dec_rfe_etc(CPUCRISState *env, DisasContext 
*dc)
 LOG_DIS("rfe\n");
 cris_evaluate_flags(dc);
 gen_helper_rfe(cpu_env);
-dc->is_jmp = DISAS_UPDATE;
+dc->base.is_jmp = DISAS_UPDATE;
 break;
 case 5:
 /* rfn.  */
 LOG_DIS("rfn\n");
 cris_evaluate_flags(dc);
 gen_helper_rfn(cpu_env);
-dc->is_jmp = DISAS_UPDATE;
+dc->base.is_jmp = DISAS_UPDATE;
 break;
 case 6:
 LOG_DIS("break %d\n", dc->op1);
@@ -2904,7 +2902,7 @@ static int dec_rfe_etc(CPUCRISState *env, DisasContext 
*dc)
 /* Breaks start at 16 in the exception vector.  */
 t_gen_movi_env_TN(trap_vector, dc->op1 + 16);
 t_gen_raise_exception(EXCP_BREAK);
-dc->is_jmp = DISAS_UPDATE;
+dc->base.is_jmp = DISAS_UPDATE;
 break;
 default:
 printf("op2=%x\n", dc->op2);
@@ -3146,13 +3144,16 @@ void gen_intermediate_code(CPUState *cs, 
TranslationBlock *tb, int max_insns)
  * delayslot, like in real hw.
  */
 pc_start = tb->pc & ~1;
-dc->cpu = env_archcpu(env);
-dc->tb = tb;
 
-dc->is_jmp = DISAS_NEXT;
+dc->base.tb = tb;
+dc->base.pc_first = pc_start;
+dc->base.pc_next = pc_start;
+dc->base.is_jmp = DISAS_NEXT;
+dc->base.singlestep_enabled = cs->singlestep_enabled;
+
+dc->cpu = env_archcpu(env);
 dc->ppc = pc_start;
 dc->pc = pc_start;
-dc->singlestep_enabled = cs->singlestep_enabled;
 dc->flags_uptodate = 1;
 dc->flagx_known = 1;
 dc->flags_x = tb->flags & X_FLAG;
@@ -3189,7 +3190,7 @@ void gen_intermediate_code(CPUState *cs, TranslationBlock 
*tb, int max_insns)
 cris_evaluate_flags(dc);
 tcg_gen_movi_tl(env_pc, dc->pc);
 t_gen_raise_exception(EXCP_DEBUG);
-dc->is_jmp = DISAS_UPDATE;
+dc->base.is_jmp = DISAS_UPDATE;
 /* The address covered by the breakpoint must be included in
[tb->pc, tb->pc + tb->size) in order to for it to be
properly cleared -- thus we increment the PC here so that
@@ -3242,18 +3243,18 @@ void gen_intermediate_code(CPUState *cs, 
TranslationBlock *tb, int max_insns)
 gen_goto_tb(dc, 1, dc->jmp_pc);
 gen_set_label(l1);
 gen_goto_tb(dc, 0, dc->pc);
-dc->is_jmp = DISAS_TB_JUMP;
+

Re: [PATCH 2/4] Python QEMU utils: introduce a generic feature list

2021-06-22 Thread John Snow


On 6/8/21 10:09 AM, Cleber Rosa wrote:

Which can be used to check for any "feature" that is available as a
QEMU command line option, and that will return its list of available
options.

This is a generalization of the list_accel() utility function, which
is itself re-implemented in terms of the more generic feature.

Signed-off-by: Cleber Rosa 
---
  python/qemu/utils/__init__.py |  2 ++
  python/qemu/utils/accel.py| 15 ++--
  python/qemu/utils/feature.py  | 44 +++
  3 files changed, 48 insertions(+), 13 deletions(-)
  create mode 100644 python/qemu/utils/feature.py

diff --git a/python/qemu/utils/__init__.py b/python/qemu/utils/__init__.py
index 7f1a5138c4..1d0789eaa2 100644
--- a/python/qemu/utils/__init__.py
+++ b/python/qemu/utils/__init__.py
@@ -20,12 +20,14 @@
  
  # pylint: disable=import-error

  from .accel import kvm_available, list_accel, tcg_available
+from .feature import list_feature
  
  
  __all__ = (

  'get_info_usernet_hostfwd_port',
  'kvm_available',
  'list_accel',
+'list_feature',
  'tcg_available',
  )
  
diff --git a/python/qemu/utils/accel.py b/python/qemu/utils/accel.py

index 297933df2a..b5bb80c6d3 100644
--- a/python/qemu/utils/accel.py
+++ b/python/qemu/utils/accel.py
@@ -14,13 +14,11 @@
  # the COPYING file in the top-level directory.
  #
  
-import logging

  import os
-import subprocess
  from typing import List, Optional
  
+from qemu.utils.feature import list_feature
  
-LOG = logging.getLogger(__name__)
  
  # Mapping host architecture to any additional architectures it can

  # support which often includes its 32 bit cousin.
@@ -39,16 +37,7 @@ def list_accel(qemu_bin: str) -> List[str]:
  @raise Exception: if failed to run `qemu -accel help`
  @return a list of accelerator names.
  """
-if not qemu_bin:
-return []
-try:
-out = subprocess.check_output([qemu_bin, '-accel', 'help'],
-  universal_newlines=True)
-except:
-LOG.debug("Failed to get the list of accelerators in %s", qemu_bin)
-raise
-# Skip the first line which is the header.
-return [acc.strip() for acc in out.splitlines()[1:]]
+return list_feature(qemu_bin, 'accel')
  
  
  def kvm_available(target_arch: Optional[str] = None,

diff --git a/python/qemu/utils/feature.py b/python/qemu/utils/feature.py
new file mode 100644
index 00..b4a5f929ab
--- /dev/null
+++ b/python/qemu/utils/feature.py
@@ -0,0 +1,44 @@
+"""
+QEMU feature module:
+
+This module provides a utility for discovering the availability of
+generic features.
+"""
+# Copyright (C) 2022 Red Hat Inc.
+#
+# Authors:
+#  Cleber Rosa 
+#
+# This work is licensed under the terms of the GNU GPL, version 2.  See
+# the COPYING file in the top-level directory.
+#
+
+import logging
+import subprocess
+from typing import List
+
+
+LOG = logging.getLogger(__name__)
+
+
+def list_feature(qemu_bin: str, feature: str) -> List[str]:
+"""
+List available options the QEMU binary for a given feature type.
+
+By calling a "qemu $feature -help" and parsing its output.
+
+@param qemu_bin (str): path to the QEMU binary.
+@param feature (str): feature name, matching the command line option.
+@raise Exception: if failed to run `qemu -feature help`
+@return a list of available options for the given feature.
+"""
+if not qemu_bin:
+return []
+try:
+out = subprocess.check_output([qemu_bin, '-%s' % feature, 'help'],
+  universal_newlines=True)
+except:
+LOG.debug("Failed to get the list of %s(s) in %s", feature, qemu_bin)
+raise
+# Skip the first line which is the header.
+return [item.split(' ', 1)[0] for item in out.splitlines()[1:]]



It's messy stuff, but all of machine.py is pretty messy stuff right now. 
No real qualms with more messy stuff going into qemu.utils for the time 
being.


Eventually, we will want to come up with a more universal way to 
interrogate features present in QEMU binaries. Using introspection data 
or QMP queries would be my preferred (and ideally SOLE) way to detect 
QEMU features.


But that's something to worry about later, I suppose.

As long as it passes the CI and doesn't break any tests, I'll toss you 
my ACK here and trust your judgment:


Acked-by: John Snow 

--js

Re: [RFC PATCH v4 0/7] hw/arm/virt: Introduce cpu topology support

2021-06-22 Thread Igor Mammedov

On Tue, 22 Jun 2021 16:29:15 +0200
Andrew Jones  wrote:

> On Tue, Jun 22, 2021 at 03:10:57PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Jun 22, 2021 at 10:04:52PM +0800, wangyanan (Y) wrote:  
> > > Hi Daniel,
> > > 
> > > On 2021/6/22 20:41, Daniel P. Berrangé wrote:  
> > > > On Tue, Jun 22, 2021 at 08:31:22PM +0800, wangyanan (Y) wrote:  
> > > > > 
> > > > > On 2021/6/22 19:46, Andrew Jones wrote:  
> > > > > > On Tue, Jun 22, 2021 at 11:18:09AM +0100, Daniel P. Berrangé wrote: 
> > > > > >  
> > > > > > > On Tue, Jun 22, 2021 at 05:34:06PM +0800, Yanan Wang wrote:  
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > This is v4 of the series [1] that I posted to introduce support 
> > > > > > > > for
> > > > > > > > generating cpu topology descriptions to guest. Comments are 
> > > > > > > > welcome!
> > > > > > > > 
> > > > > > > > Description:
> > > > > > > > Once the view of an accurate virtual cpu topology is provided 
> > > > > > > > to guest,
> > > > > > > > with a well-designed vCPU pinning to the pCPU we may get a huge 
> > > > > > > > benefit,
> > > > > > > > e.g., the scheduling performance improvement. See Dario 
> > > > > > > > Faggioli's
> > > > > > > > research and the related performance tests in [2] for 
> > > > > > > > reference. So here
> > > > > > > > we go, this patch series introduces cpu topology support for 
> > > > > > > > ARM platform.
> > > > > > > > 
> > > > > > > > In this series, instead of quietly enforcing the support for 
> > > > > > > > the latest
> > > > > > > > machine type, a new parameter "expose=on|off" in -smp command 
> > > > > > > > line is
> > > > > > > > introduced to leave QEMU users a choice to decide whether to 
> > > > > > > > enable the
> > > > > > > > feature or not. This will allow the feature to work on 
> > > > > > > > different machine
> > > > > > > > types and also ideally compat with already in-use -smp command 
> > > > > > > > lines.
> > > > > > > > Also we make much stricter requirement for the topology 
> > > > > > > > configuration
> > > > > > > > with "expose=on".  
> > > > > > > Seeing this 'expose=on' parameter feels to me like we're adding a
> > > > > > > "make-it-work=yes" parameter. IMHO this is just something that 
> > > > > > > should
> > > > > > > be done by default for the current machine type version and 
> > > > > > > beyond.
> > > > > > > I don't see the need for a parameter to turnthis on, especially 
> > > > > > > since
> > > > > > > it is being made architecture specific.
> > > > > > >   
> > > > > > I agree.
> > > > > > 
> > > > > > Yanan, we never discussed an "expose" parameter in the previous 
> > > > > > versions
> > > > > > of this series. We discussed a "strict" parameter though, which 
> > > > > > would
> > > > > > allow existing command lines to "work" using assumptions of what 
> > > > > > the user
> > > > > > meant and strict=on users to get what they mean or an error saying 
> > > > > > that
> > > > > > they asked for something that won't work or would require 
> > > > > > unreasonable
> > > > > > assumptions. Why was this changed to an "expose" parameter?  
> > > > > Yes, we indeed discuss a new "strict" parameter but not a "expose" in 
> > > > > v2 [1]
> > > > > of this series.
> > > > > [1] 
> > > > > https://patchwork.kernel.org/project/qemu-devel/patch/20210413080745.33004-6-wangyana...@huawei.com/
> > > > > 
> > > > > And in the discussion, we hoped things would work like below with 
> > > > > "strict"
> > > > > parameter:
> > > > > Users who want to describe cpu topology should provide cmdline like
> > > > > 
> > > > > -smp strict=on,cpus=4,sockets=2,cores=2,threads=1
> > > > > 
> > > > > and in this case we require an more accurate -smp configuration and
> > > > > then generate the cpu topology description through ACPI/DT.
> > > > > 
> > > > > While without a strict description, no cpu topology description would
> > > > > be generated, so they get nothing through ACPI/DT.
> > > > > 
> > > > > It seems to me that the "strict" parameter actually serves as a knob 
> > > > > to
> > > > > turn on/off the exposure of topology, and this is the reason I changed
> > > > > the name.  
> > > > Yes, the use of 'strict=on' is no better than expose=on IMHO.
> > > > 
> > > > If I give QEMU a cli
> > > > 
> > > >-smp cpus=4,sockets=2,cores=2,threads=1
> > > > 
> > > > then I expect that topology to be exposed to the guest. I shouldn't
> > > > have to add extra flags to make that happen.
> > > > 
> > > > Looking at the thread, it seems the concern was around the fact that
> > > > the settings were not honoured historically and thus the CLI values
> > > > could be garbage. ie  -smp cpus=4,sockets=8,cores=3,thread=9  
> > > This "-smp cpus=4,sockets=8,cores=3,threads=9" behaviors as a wrong
> > > configuration, and the parsing function already report error for this 
> > > case.
> > > 
> > > We hope more complete config like "-smp 4,sockets=2,cores=2,threads=1"
> > > for exposure of topology, and the incomplete ones

Re: [PATCH v2 2/1] qemu-img: Add "backing":true to unallocated map segments

2021-06-22 Thread Kevin Wolf

Am 11.06.2021 um 21:03 hat Eric Blake geschrieben:
> To save the user from having to check 'qemu-img info --backing-chain'
> or other followup command to determine which "depth":n goes beyond the
> chain, add a boolean field "backing" that is set only for unallocated
> portions of the disk.
> 
> Signed-off-by: Eric Blake 
> ---
> 
> Touches the same iotest output as 1/1.  If we decide that switching to
> "depth":n+1 is too risky, and that the mere addition of "backing":true
> while keeping "depth":n is good enough, then we'd have just one patch,
> instead of this double churn.  Preferences?

I think the additional flag is better because it's guaranteed to be
backwards compatible, and because you don't need to know the number of
layers to infer whether a cluster was allocated in the whole backing
chain. And by exposing ALLOCATED we definitely give access to the whole
information that exists in QEMU.

However, to continue with the bike shedding: I won't insist on
"allocated" even if that is what the flag is called internally and
consistency is usually helpful, but "backing" is misleading, too,
because intuitively it doesn't cover the top layer or standalone images
without a backing file. How about something like "present"?

Kevin

Re: [PATCH 4/4] iotests/308: Test allow-other

2021-06-22 Thread Max Reitz


On 22.06.21 17:08, Kevin Wolf wrote:

Am 14.06.2021 um 16:44 hat Max Reitz geschrieben:

We cannot reasonably test the main point of allow-other, which is to
allow users other than the current one to access the FUSE export,
because that would require access to sudo, which this test most likely
will not have.  (Also, we would need to figure out some user/group that
is on the machine and that is not the current user/group, which may
become a bit hairy.)

But we can test some byproducts: First, whether changing permissions
works (our FUSE code only allows so for allow-other=true), and second,
whether the kernel applies permission checks with allow-other=true
(because that implies default_permissions).

Signed-off-by: Max Reitz 

This seems to have the problem that you mentioned:

--- /home/kwolf/source/qemu/tests/qemu-iotests/308.out
+++ 308.out.bad
@@ -205,7 +205,9 @@
   'writable': true,
   'allow-other': true
} }
-{"return": {}}
+fusermount3: option allow_other only allowed if 'user_allow_other' is set in 
/etc/fuse.conf
+{"error": {"class": "GenericError", "desc": "Failed to mount FUSE session to 
export"}}
+Timeout waiting for return on handle 2
  (Invoking chmod)
  Permissions post-chmod: 666
  (Removing all permissions)

Maybe it should be a separate test case that is skipped with
user_allow_other is disabled.


Right.


  tests/qemu-iotests/308 | 91 ++
  tests/qemu-iotests/308.out | 47 
  2 files changed, 138 insertions(+)

diff --git a/tests/qemu-iotests/308 b/tests/qemu-iotests/308
index f122065d0f..1b2f908947 100755
--- a/tests/qemu-iotests/308
+++ b/tests/qemu-iotests/308
@@ -334,6 +334,97 @@ echo '=== Compare copy with original ==='
  
  $QEMU_IMG compare -f raw -F $IMGFMT "$COPIED_IMG" "$TEST_IMG"
  
+echo

+echo '=== Test permissions ==='
+
+# Test that you can only change permissions on the export with 
allow-other=true.
+# We cannot really test the primary reason behind allow-other (i.e. to allow
+# users other than the current one access to the export), because for that we
+# would need sudo, which realistically nobody will allow this test to use.
+# What we can do is test that allow-other=true also enables 
default_permissions,
+# i.e. whether we can still read from the file if we remove the read 
permission.

We already have other test cases that use sudo if available. Though I
guess it means that these tests aren't run very often.


Yes, I know, but honestly I don’t really want to deal with user 
management either.  I had a paragraph about that in a preliminary 
version but decided to cut it, because, I thought it wouldn’t really matter.


That problem is that I’d need to run qemu-io as some different user, and 
the question is, who is a different user?  I suppose I could rely on 
“root” and “nobody” being valid users on any system, but I don’t think I 
can be sure that the user running the tests isn’t either of those.  So I 
would have to check whether the current user is “root”, and then run it 
as “nobody”, or otherwise run it as “root”, but that just seems like I’m 
getting in too deep for something that isn’t really useful anyway, 
because on developers’ machines, it will most likely be skipped anyway.


Max

Re: [PATCH v4 3/6] ACPI ERST: support for ACPI ERST feature

2021-06-22 Thread Igor Mammedov

On Fri, 11 Jun 2021 14:31:20 -0400
Eric DeVolder  wrote:

> This change implements the support for the ACPI ERST feature[1,2].
> 
> To utilize ACPI ERST, a memory-backend-file object and acpi-erst
> device must be created, for example:
> 
>  qemu ...
>  -object memory-backend-file,id=erstram,mem-path=acpi-erst.backing,
>   size=0x1,shared=on
>  -device acpi-erst,memdev=erstram,bus=pcie.0
> 
> For proper operation, the ACPI ERST device needs a memory-backend-file
> object with the following parameters mem-path, size, and shared.
> 
>  - id: The id of the memory-backend-file object is used to associate
>this memory with the acpi-erst device.
> 
>  - size: The size of the ACPI ERST backing storage. This parameter is
>required.
>  - mem-path: The location of the ACPI ERST backing storage file. This
>parameter is also required.
> 
>  - shared: The shared=on parameter is required so that updates to the
>ERST back store are written to the file immediately as well. Without
>it, updates the the backing file are unpredictable and may not
>properly persist (eg. if qemu should crash).
> 
> The ACPI ERST device is a simple PCI device, and requires these two
> parameters:
> 
>  - memdev: Is the object id of the memory-backend-file.
> 
>  - bus: The name of the pci bus to which to connect.

isn't it picked automatically for you if omitted?

> 
> This change also includes erst.c in the build of general ACPI support.
> 
> [1] "Advanced Configuration and Power Interface Specification",
> version 6.2, May 2017.
> https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
for specifications it's usually sufficient to point out its name and revision,
that provides enough info to find document.
(as links tend to go stale)

> 
> [2] "Unified Extensible Firmware Interface Specification",
> version 2.8, March 2019.
> https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> 
> Signed-off-by: Eric DeVolder 
> ---
>  hw/acpi/erst.c  | 880 
> 

To simplify review, please split patch in 2 parts at least,
  1: hw (device implementation)
  2: ACPI tables

Also spec (ERST) is rather (maybe intentionally) vague on specifics,
so it would be better that before a patch that implements hw part
were a doc patch describing concrete implementation. As model
you can use docs/specs/acpi_hest_ghes.rst or other docs/specs/acpi_* files.
I'd start posting/discussing that spec within these thread
to avoid spamming list until doc is settled up.

It shall describe ABI exposed to guest (mapping of MMIO regions)
and what is actually supported/implemented. It would be good
if it would give general understanding how new interface is
supposed to work. Maybe describe backed file layout.

Seem more comments below, mostly ACPI parts and
I just skimmed through MMIO registers handling and will leave it
out until there is a doc describing implementation.

>  hw/acpi/meson.build |   1 +
>  2 files changed, 881 insertions(+)
>  create mode 100644 hw/acpi/erst.c
> 
> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
> new file mode 100644
> index 000..1a72fad
> --- /dev/null
> +++ b/hw/acpi/erst.c
> @@ -0,0 +1,880 @@
> +/*
> + * ACPI Error Record Serialization Table, ERST, Implementation
> + *
> + * Copyright (c) 2021 Oracle and/or its affiliates.
> + *
> + * See ACPI specification,
> + * "ACPI Platform Error Interfaces" : "Error Serialization"
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation;
> + * version 2 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see 
> 
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/qdev-core.h"
> +#include "exec/memory.h"
> +#include "qom/object.h"
> +#include "hw/pci/pci.h"
> +#include "qom/object_interfaces.h"
> +#include "qemu/error-report.h"
> +#include "migration/vmstate.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/acpi-defs.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "exec/address-spaces.h"
> +#include "sysemu/hostmem.h"
> +#include "hw/acpi/erst.h"
> +


> +#ifdef _ERST_DEBUG
> +#define erst_debug(fmt, ...) \
> +do { fprintf(stderr, fmt, ## __VA_ARGS__); fflush(stderr); } while (0)
> +#else
> +#define erst_debug(fmt, ...) do { } while (0)
> +#endif
see docs/devel/tracing.rst for how to do tracing with current QEMU

Re: [PATCH 3/4] export/fuse: Let permissions be adjustable

2021-06-22 Thread Max Reitz


On 22.06.21 17:02, Kevin Wolf wrote:

Am 14.06.2021 um 16:44 hat Max Reitz geschrieben:

Allow changing the file mode, UID, and GID through SETATTR.

This only really makes sense with allow-other, though (because without
it, the effective access mode is fixed to be 0600 (u+rw) with qemu's
user being the file's owner), so changing these stat fields is not
allowed without allow-other.

Signed-off-by: Max Reitz 
---
  block/export/fuse.c | 48 ++---
  1 file changed, 37 insertions(+), 11 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 1d54286d90..742e0af657 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -47,6 +47,10 @@ typedef struct FuseExport {
  bool writable;
  bool growable;
  bool allow_other;
+
+mode_t st_mode;
+uid_t st_uid;
+gid_t st_gid;
  } FuseExport;
  
  static GHashTable *exports;

@@ -120,6 +124,13 @@ static int fuse_export_create(BlockExport *blk_exp,
  exp->growable = args->growable;
  exp->allow_other = args->allow_other;
  
+exp->st_mode = S_IFREG | S_IRUSR;

+if (exp->writable) {
+exp->st_mode |= S_IWUSR;
+}
+exp->st_uid = getuid();
+exp->st_gid = getgid();
+
  ret = setup_fuse_export(exp, args->mountpoint, errp);
  if (ret < 0) {
  goto fail;
@@ -329,7 +340,6 @@ static void fuse_getattr(fuse_req_t req, fuse_ino_t inode,
  int64_t length, allocated_blocks;
  time_t now = time(NULL);
  FuseExport *exp = fuse_req_userdata(req);
-mode_t mode;
  
  length = blk_getlength(exp->common.blk);

  if (length < 0) {
@@ -344,17 +354,12 @@ static void fuse_getattr(fuse_req_t req, fuse_ino_t inode,
  allocated_blocks = DIV_ROUND_UP(allocated_blocks, 512);
  }
  
-mode = S_IFREG | S_IRUSR;

-if (exp->writable) {
-mode |= S_IWUSR;
-}
-
  statbuf = (struct stat) {
  .st_ino = inode,
-.st_mode= mode,
+.st_mode= exp->st_mode,
  .st_nlink   = 1,
-.st_uid = getuid(),
-.st_gid = getgid(),
+.st_uid = exp->st_uid,
+.st_gid = exp->st_gid,
  .st_size= length,
  .st_blksize = blk_bs(exp->common.blk)->bl.request_alignment,
  .st_blocks  = allocated_blocks,
@@ -400,15 +405,23 @@ static int fuse_do_truncate(const FuseExport *exp, 
int64_t size,
  }
  
  /**

- * Let clients set file attributes.  Only resizing is supported.
+ * Let clients set file attributes.  With allow_other, only resizing and
+ * changing permissions (st_mode, st_uid, st_gid) is allowed.  Without
+ * allow_other, only resizing is supported.
   */
  static void fuse_setattr(fuse_req_t req, fuse_ino_t inode, struct stat 
*statbuf,
   int to_set, struct fuse_file_info *fi)
  {
  FuseExport *exp = fuse_req_userdata(req);
+int supported_attrs;
  int ret;
  
-if (to_set & ~FUSE_SET_ATTR_SIZE) {

+supported_attrs = FUSE_SET_ATTR_SIZE;
+if (exp->allow_other) {
+supported_attrs |= FUSE_SET_ATTR_MODE | FUSE_SET_ATTR_UID |
+FUSE_SET_ATTR_GID;
+}
+if (to_set & ~supported_attrs) {
  fuse_reply_err(req, ENOTSUP);
  return;
  }
@@ -426,6 +439,19 @@ static void fuse_setattr(fuse_req_t req, fuse_ino_t inode, 
struct stat *statbuf,
  }
  }
  
+if (to_set & FUSE_SET_ATTR_MODE) {

+/* Only allow changing the file mode, not the type */
+exp->st_mode = (statbuf->st_mode & 0) | S_IFREG;
+}

Should we check that the mode actually makes sense? Not sure if making
an image executable has a good use case, and making it writable in the
permissions for a read-only export isn't a good idea either.


I mean, I don’t mind what the user does.  It doesn’t really faze us, I 
believe.  If the image contains an executable ELF and the user wants to 
run it directly from FUSE...  I don’t mind.


As for +w on RO exports, I’m not sure.  This reminds me of `sudo chattr 
+i $file`, which effectively makes any regular file read-only, too, and 
it can still keep +w.  So the file permissions are basically just ACLs, 
getting permission for something doesn’t mean you can actually do it.


OTOH, the difference to `chattr +i` is that we’d allow opening the 
export R/W, only writing would then fail.  `chattr +i` does give EPERM 
when opening the file.


So I’m not quite sure.  I don’t really want to prevent the user from 
setting any access restrictions they want, but on the other hand, if 
writing can never work, then there really is no point in allowing +w.  
(Then I’m wondering, if we don’t allow +w, should we silently drop it or 
return an error?  I guess returning success but not actually succeeding 
is weird, so we probably should return EROFS.)


But +x can technically work, so I wouldn’t disallow it.

Max

Auditing QEMU to replace NULL with _abort

2021-06-22 Thread John Snow

One of our Bite-Sized tasks on the wiki was to audit QEMU and, where 
applicable, replace NULL with _abort.


Everywhere else where it is intentional, we ought to add a comment or 
some other indication explaining why it's the right thing to do in that 
case.


That task was ported to GitLab here:
https://gitlab.com/qemu-project/qemu/-/issues/414

mreitz and thuth have chimed in with excellent clarifications. Phil 
suggests that we should replace the intentional cases of NULL with 
_ignore, to possibly log squelched errors in debugging mode. This 
sounds like a great idea to me:


- It allows us to remove NULL entirely, which as mreitz states "is 
fishy", but sometimes valid.
- It annotates callsites where we have decided the ignore is intentional 
and correct.
- It gives us a review opportunity to require good comments at those 
callsites.
- It gives us a good way to measure progress of the audit by making the 
removal of NULL a concrete goal. (Can we use coccinelle to find all 
instances of the literal NULL being passed to a variable named errp?)


From a brief chat on IRC, Markus is "reluctant to deviate from GError 
even more". It sounds like there isn't consensus here. We should 
probably reach consensus on this point before trying to pass the task 
off to a neophyte, though -- so I'm raising this discussion on the list 
and CC'ing Markus to see if we can define the task better or not.


--js


(Personally, I've got no horse in the race beyond moving these tasks off 
the wiki and onto the tracker. Since I moved the issue, though, I might 
as well make sure the filing is accurate.)

Re: [RFC PATCH v4 0/7] hw/arm/virt: Introduce cpu topology support

2021-06-22 Thread Daniel P . Berrangé

On Tue, Jun 22, 2021 at 04:29:15PM +0200, Andrew Jones wrote:
> On Tue, Jun 22, 2021 at 03:10:57PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Jun 22, 2021 at 10:04:52PM +0800, wangyanan (Y) wrote:
> > > Hi Daniel,
> > > 
> > > On 2021/6/22 20:41, Daniel P. Berrangé wrote:
> > > > On Tue, Jun 22, 2021 at 08:31:22PM +0800, wangyanan (Y) wrote:
> > > > > 
> > > > > On 2021/6/22 19:46, Andrew Jones wrote:
> > > > > > On Tue, Jun 22, 2021 at 11:18:09AM +0100, Daniel P. Berrangé wrote:
> > > > > > > On Tue, Jun 22, 2021 at 05:34:06PM +0800, Yanan Wang wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > This is v4 of the series [1] that I posted to introduce support 
> > > > > > > > for
> > > > > > > > generating cpu topology descriptions to guest. Comments are 
> > > > > > > > welcome!
> > > > > > > > 
> > > > > > > > Description:
> > > > > > > > Once the view of an accurate virtual cpu topology is provided 
> > > > > > > > to guest,
> > > > > > > > with a well-designed vCPU pinning to the pCPU we may get a huge 
> > > > > > > > benefit,
> > > > > > > > e.g., the scheduling performance improvement. See Dario 
> > > > > > > > Faggioli's
> > > > > > > > research and the related performance tests in [2] for 
> > > > > > > > reference. So here
> > > > > > > > we go, this patch series introduces cpu topology support for 
> > > > > > > > ARM platform.
> > > > > > > > 
> > > > > > > > In this series, instead of quietly enforcing the support for 
> > > > > > > > the latest
> > > > > > > > machine type, a new parameter "expose=on|off" in -smp command 
> > > > > > > > line is
> > > > > > > > introduced to leave QEMU users a choice to decide whether to 
> > > > > > > > enable the
> > > > > > > > feature or not. This will allow the feature to work on 
> > > > > > > > different machine
> > > > > > > > types and also ideally compat with already in-use -smp command 
> > > > > > > > lines.
> > > > > > > > Also we make much stricter requirement for the topology 
> > > > > > > > configuration
> > > > > > > > with "expose=on".
> > > > > > > Seeing this 'expose=on' parameter feels to me like we're adding a
> > > > > > > "make-it-work=yes" parameter. IMHO this is just something that 
> > > > > > > should
> > > > > > > be done by default for the current machine type version and 
> > > > > > > beyond.
> > > > > > > I don't see the need for a parameter to turnthis on, especially 
> > > > > > > since
> > > > > > > it is being made architecture specific.
> > > > > > > 
> > > > > > I agree.
> > > > > > 
> > > > > > Yanan, we never discussed an "expose" parameter in the previous 
> > > > > > versions
> > > > > > of this series. We discussed a "strict" parameter though, which 
> > > > > > would
> > > > > > allow existing command lines to "work" using assumptions of what 
> > > > > > the user
> > > > > > meant and strict=on users to get what they mean or an error saying 
> > > > > > that
> > > > > > they asked for something that won't work or would require 
> > > > > > unreasonable
> > > > > > assumptions. Why was this changed to an "expose" parameter?
> > > > > Yes, we indeed discuss a new "strict" parameter but not a "expose" in 
> > > > > v2 [1]
> > > > > of this series.
> > > > > [1] 
> > > > > https://patchwork.kernel.org/project/qemu-devel/patch/20210413080745.33004-6-wangyana...@huawei.com/
> > > > > 
> > > > > And in the discussion, we hoped things would work like below with 
> > > > > "strict"
> > > > > parameter:
> > > > > Users who want to describe cpu topology should provide cmdline like
> > > > > 
> > > > > -smp strict=on,cpus=4,sockets=2,cores=2,threads=1
> > > > > 
> > > > > and in this case we require an more accurate -smp configuration and
> > > > > then generate the cpu topology description through ACPI/DT.
> > > > > 
> > > > > While without a strict description, no cpu topology description would
> > > > > be generated, so they get nothing through ACPI/DT.
> > > > > 
> > > > > It seems to me that the "strict" parameter actually serves as a knob 
> > > > > to
> > > > > turn on/off the exposure of topology, and this is the reason I changed
> > > > > the name.
> > > > Yes, the use of 'strict=on' is no better than expose=on IMHO.
> > > > 
> > > > If I give QEMU a cli
> > > > 
> > > >-smp cpus=4,sockets=2,cores=2,threads=1
> > > > 
> > > > then I expect that topology to be exposed to the guest. I shouldn't
> > > > have to add extra flags to make that happen.
> > > > 
> > > > Looking at the thread, it seems the concern was around the fact that
> > > > the settings were not honoured historically and thus the CLI values
> > > > could be garbage. ie  -smp cpus=4,sockets=8,cores=3,thread=9
> > > This "-smp cpus=4,sockets=8,cores=3,threads=9" behaviors as a wrong
> > > configuration, and the parsing function already report error for this 
> > > case.
> > > 
> > > We hope more complete config like "-smp 4,sockets=2,cores=2,threads=1"
> > > for exposure of topology, and the incomplete ones like "-smp 4,sockets=1"
> > > or

[PATCH v7 1/7] virtiofsd: Fix fuse setxattr() API change issue

2021-06-22 Thread Vivek Goyal

With kernel header updates fuse_setxattr_in struct has grown in size.
But this new struct size only takes affect if user has opted in
for fuse feature FUSE_SETXATTR_EXT otherwise fuse continues to
send "fuse_setxattr_in" of older size. Older size is determined
by FUSE_COMPAT_SETXATTR_IN_SIZE.

Fix this. If we have not opted in for FUSE_SETXATTR_EXT, then
expect that we will get fuse_setxattr_in of size FUSE_COMPAT_SETXATTR_IN_SIZE
and not sizeof(struct fuse_sexattr_in).

Signed-off-by: Vivek Goyal 
---
 tools/virtiofsd/fuse_common.h   | 5 +
 tools/virtiofsd/fuse_lowlevel.c | 7 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index fa9671872e..0c2665b977 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -372,6 +372,11 @@ struct fuse_file_info {
  */
 #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
 
+/**
+ * Indicates that file server supports extended struct fuse_setxattr_in
+ */
+#define FUSE_CAP_SETXATTR_EXT (1 << 29)
+
 /**
  * Ioctl flags
  *
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 7fe2cef1eb..c2b6ff1686 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1419,8 +1419,13 @@ static void do_setxattr(fuse_req_t req, fuse_ino_t 
nodeid,
 struct fuse_setxattr_in *arg;
 const char *name;
 const char *value;
+bool setxattr_ext = req->se->conn.want & FUSE_CAP_SETXATTR_EXT;
 
-arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+if (setxattr_ext) {
+arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
+} else {
+arg = fuse_mbuf_iter_advance(iter, FUSE_COMPAT_SETXATTR_IN_SIZE);
+}
 name = fuse_mbuf_iter_advance_str(iter);
 if (!arg || !name) {
 fuse_reply_err(req, EINVAL);
-- 
2.25.4

[PATCH v7 5/7] virtiofsd: Add capability to change/restore umask

2021-06-22 Thread Vivek Goyal

When parent directory has default acl and a file is created in that
directory, then umask is ignored and final file permissions are
determined using default acl instead. (man 2 umask).

Currently, fuse applies the umask and sends modified mode in create
request accordingly. fuse server can set FUSE_DONT_MASK and tell
fuse client to not apply umask and fuse server will take care of
it as needed.

With posix acls enabled, requirement will be that we want umask
to determine final file mode if parent directory does not have
default acl.

So if posix acls are enabled, opt in for FUSE_DONT_MASK. virtiofsd
will set umask of the thread doing file creation. And host kernel
should use that umask if parent directory does not have default
acls, otherwise umask does not take affect.

Miklos mentioned that we already call unshare(CLONE_FS) for
every thread. That means umask has now become property of per
thread and it should be ok to manipulate it in file creation path.

This patch only adds capability to change umask and restore it. It
does not enable it yet. Next few patches will add capability to enable it
based on if user enabled posix_acl or not.

This should fix fstest generic/099.

Reported-by: Luis Henriques 
Signed-off-by: Vivek Goyal 
Reviewed-by: Stefan Hajnoczi 
---
 tools/virtiofsd/passthrough_ll.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 9f5cd98fb5..0c9084ea15 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -122,6 +122,7 @@ struct lo_inode {
 struct lo_cred {
 uid_t euid;
 gid_t egid;
+mode_t umask;
 };
 
 enum {
@@ -172,6 +173,8 @@ struct lo_data {
 /* An O_PATH file descriptor to /proc/self/fd/ */
 int proc_self_fd;
 int user_killpriv_v2, killpriv_v2;
+/* If set, virtiofsd is responsible for setting umask during creation */
+bool change_umask;
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -1134,7 +1137,8 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, 
const char *name)
  * ownership of caller.
  * TODO: What about selinux context?
  */
-static int lo_change_cred(fuse_req_t req, struct lo_cred *old)
+static int lo_change_cred(fuse_req_t req, struct lo_cred *old,
+  bool change_umask)
 {
 int res;
 
@@ -1154,11 +1158,14 @@ static int lo_change_cred(fuse_req_t req, struct 
lo_cred *old)
 return errno_save;
 }
 
+if (change_umask) {
+old->umask = umask(req->ctx.umask);
+}
 return 0;
 }
 
 /* Regain Privileges */
-static void lo_restore_cred(struct lo_cred *old)
+static void lo_restore_cred(struct lo_cred *old, bool restore_umask)
 {
 int res;
 
@@ -1173,6 +1180,9 @@ static void lo_restore_cred(struct lo_cred *old)
 fuse_log(FUSE_LOG_ERR, "setegid(%u): %m\n", old->egid);
 exit(1);
 }
+
+if (restore_umask)
+umask(old->umask);
 }
 
 static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
@@ -1202,7 +1212,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t 
parent,
 return;
 }
 
-saverr = lo_change_cred(req, );
+saverr = lo_change_cred(req, , lo->change_umask && !S_ISLNK(mode));
 if (saverr) {
 goto out;
 }
@@ -1211,7 +1221,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t 
parent,
 
 saverr = errno;
 
-lo_restore_cred();
+lo_restore_cred(, lo->change_umask && !S_ISLNK(mode));
 
 if (res == -1) {
 goto out;
@@ -1917,7 +1927,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, 
const char *name,
 return;
 }
 
-err = lo_change_cred(req, );
+err = lo_change_cred(req, , lo->change_umask);
 if (err) {
 goto out;
 }
@@ -1928,7 +1938,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, 
const char *name,
 fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode);
 err = fd == -1 ? errno : 0;
 
-lo_restore_cred();
+lo_restore_cred(, lo->change_umask);
 
 /* Ignore the error if file exists and O_EXCL was not given */
 if (err && (err != EEXIST || (fi->flags & O_EXCL))) {
-- 
2.25.4

[PATCH v7 2/7] virtiofsd: Fix xattr operations overwriting errno

2021-06-22 Thread Vivek Goyal

getxattr/setxattr/removexattr/listxattr operations handle regualar
and non-regular files differently. For the case of non-regular files
we do fchdir(/proc/self/fd) and the xattr operation and then revert
back to original working directory. After this we are saving errno
and that's buggy because fchdir() will overwrite the errno.

FCHDIR_NOFAIL(lo->proc_self_fd);
ret = getxattr(procname, name, value, size);
FCHDIR_NOFAIL(lo->root.fd);

if (ret == -1)
saverr = errno

In above example, if getxattr() failed, we will still return 0 to caller
as errno must have been written by FCHDIR_NOFAIL(lo->root.fd) call.
Fix all such instances and capture "errno" early and save in "saverr"
variable.

Signed-off-by: Vivek Goyal 
---
 tools/virtiofsd/passthrough_ll.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 49c21fd855..ec91b3c133 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2791,15 +2791,17 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, 
const char *in_name,
 goto out_err;
 }
 ret = fgetxattr(fd, name, value, size);
+saverr = ret == -1 ? errno : 0;
 } else {
 /* fchdir should not fail here */
 FCHDIR_NOFAIL(lo->proc_self_fd);
 ret = getxattr(procname, name, value, size);
+saverr = ret == -1 ? errno : 0;
 FCHDIR_NOFAIL(lo->root.fd);
 }
 
 if (ret == -1) {
-goto out_err;
+goto out;
 }
 if (size) {
 saverr = 0;
@@ -2864,15 +2866,17 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t 
ino, size_t size)
 goto out_err;
 }
 ret = flistxattr(fd, value, size);
+saverr = ret == -1 ? errno : 0;
 } else {
 /* fchdir should not fail here */
 FCHDIR_NOFAIL(lo->proc_self_fd);
 ret = listxattr(procname, value, size);
+saverr = ret == -1 ? errno : 0;
 FCHDIR_NOFAIL(lo->root.fd);
 }
 
 if (ret == -1) {
-goto out_err;
+goto out;
 }
 if (size) {
 saverr = 0;
@@ -2998,15 +3002,15 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, 
const char *in_name,
 goto out;
 }
 ret = fsetxattr(fd, name, value, size, flags);
+saverr = ret == -1 ? errno : 0;
 } else {
 /* fchdir should not fail here */
 FCHDIR_NOFAIL(lo->proc_self_fd);
 ret = setxattr(procname, name, value, size, flags);
+saverr = ret == -1 ? errno : 0;
 FCHDIR_NOFAIL(lo->root.fd);
 }
 
-saverr = ret == -1 ? errno : 0;
-
 out:
 if (fd >= 0) {
 close(fd);
@@ -3064,15 +3068,15 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t 
ino, const char *in_name)
 goto out;
 }
 ret = fremovexattr(fd, name);
+saverr = ret == -1 ? errno : 0;
 } else {
 /* fchdir should not fail here */
 FCHDIR_NOFAIL(lo->proc_self_fd);
 ret = removexattr(procname, name);
+saverr = ret == -1 ? errno : 0;
 FCHDIR_NOFAIL(lo->root.fd);
 }
 
-saverr = ret == -1 ? errno : 0;
-
 out:
 if (fd >= 0) {
 close(fd);
-- 
2.25.4

[PATCH v7 7/7] virtiofsd: Add an option to enable/disable posix acls

2021-06-22 Thread Vivek Goyal

fuse has an option FUSE_POSIX_ACL which needs to be opted in by fuse
server to enable posix acls. As of now we are not opting in for this,
so posix acls are disabled on virtiofs by default.

Add virtiofsd option "-o posix_acl/no_posix_acl" to let users enable/disable
posix acl support. By default it is disabled as of now due to performance
concerns with cache=none.

Currently even if file server has not opted in for FUSE_POSIX_ACL, user can
still query acl and set acl, and system.posix_acl_access and
system.posix_acl_default xattrs show up listxattr response.

Miklos said this is confusing. So he said lets block and filter
system.posix_acl_access and system.posix_acl_default xattrs in
getxattr/setxattr/listxattr if user has explicitly disabled
posix acls using -o no_posix_acl.

As of now continuing to keeping the existing behavior if user did not
specify any option to disable acl support due to concerns about backward
compatibility.

Signed-off-by: Vivek Goyal 
---
 docs/tools/virtiofsd.rst |   3 +
 tools/virtiofsd/helper.c |   1 +
 tools/virtiofsd/passthrough_ll.c | 115 ++-
 3 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/docs/tools/virtiofsd.rst b/docs/tools/virtiofsd.rst
index 00554c75bd..a41f934999 100644
--- a/docs/tools/virtiofsd.rst
+++ b/docs/tools/virtiofsd.rst
@@ -101,6 +101,9 @@ Options
 Enable/disable extended attributes (xattr) on files and directories.  The
 default is ``no_xattr``.
 
+  * posix_acl|no_posix_acl -
+Enable/disable posix acl support.  Posix ACLs are disabled by default`.
+
 .. option:: --socket-path=PATH
 
   Listen on vhost-user UNIX domain socket at PATH.
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 5e98ed702b..a8295d975a 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -186,6 +186,7 @@ void fuse_cmdline_help(void)
"   to virtiofsd from guest 
applications.\n"
"   default: no_allow_direct_io\n"
"-o announce_submounts  Announce sub-mount points to the 
guest\n"
+   "-o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: 
disabled)\n"
);
 }
 
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 113c725def..e80fd76d71 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -175,7 +175,7 @@ struct lo_data {
 int user_killpriv_v2, killpriv_v2;
 /* If set, virtiofsd is responsible for setting umask during creation */
 bool change_umask;
-int posix_acl;
+int user_posix_acl, posix_acl;
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -208,6 +208,8 @@ static const struct fuse_opt lo_opts[] = {
 { "announce_submounts", offsetof(struct lo_data, announce_submounts), 1 },
 { "killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 1 },
 { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
+{ "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
+{ "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
 FUSE_OPT_END
 };
 static bool use_syslog = false;
@@ -706,6 +708,32 @@ static void lo_init(void *userdata, struct fuse_conn_info 
*conn)
 conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
 lo->killpriv_v2 = 0;
 }
+
+if (lo->user_posix_acl == 1) {
+/*
+ * User explicitly asked for this option. Enable it unconditionally.
+ * If connection does not have this capability, print error message
+ * now. It will fail later in fuse_lowlevel.c
+ */
+if (!(conn->capable & FUSE_CAP_POSIX_ACL) ||
+!(conn->capable & FUSE_CAP_DONT_MASK) ||
+!(conn->capable & FUSE_CAP_SETXATTR_EXT)) {
+fuse_log(FUSE_LOG_ERR, "lo_init: Can not enable posix acl."
+ " kernel does not support FUSE_POSIX_ACL, FUSE_DONT_MASK"
+ " or FUSE_SETXATTR_EXT capability.\n");
+} else {
+fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling posix acl\n");
+}
+
+conn->want |= FUSE_CAP_POSIX_ACL | FUSE_CAP_DONT_MASK |
+  FUSE_CAP_SETXATTR_EXT;
+lo->change_umask = true;
+lo->posix_acl = true;
+} else {
+/* User either did not specify anything or wants it disabled */
+fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling posix_acl\n");
+conn->want &= ~FUSE_CAP_POSIX_ACL;
+}
 }
 
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
@@ -2783,6 +2811,63 @@ static int xattr_map_server(const struct lo_data *lo, 
const char *server_name,
 assert(fchdir_res == 0);   \
 } while (0)
 
+static bool block_xattr(struct lo_data *lo, const char *name)
+{
+/*
+ * If user explicitly enabled posix_acl or did not provide any option,
+ * do not block acl. Otherwise block system.posix_acl_access and
+

[PATCH v7 6/7] virtiofsd: Switch creds, drop FSETID for system.posix_acl_access xattr

2021-06-22 Thread Vivek Goyal

When posix access acls are set on a file, it can lead to adjusting file
permissions (mode) as well. If caller does not have CAP_FSETID and it
also does not have membership of owner group, this will lead to clearing
SGID bit in mode.

Current fuse code is written in such a way that it expects file server
to take care of chaning file mode (permission), if there is a need.
Right now, host kernel does not clear SGID bit because virtiofsd is
running as root and has CAP_FSETID. For host kernel to clear SGID,
virtiofsd need to switch to gid of caller in guest and also drop
CAP_FSETID (if caller did not have it to begin with).

If SGID needs to be cleared, client will set the flag
FUSE_SETXATTR_ACL_KILL_SGID in setxattr request. In that case server
should kill sgid.

Currently just switch to uid/gid of the caller and drop CAP_FSETID
and that should do it.

This should fix the xfstest generic/375 test case.

We don't have to switch uid for this to work. That could be one optimization
that pass a parameter to lo_change_cred() to only switch gid and not uid.

Also this will not work whenever (if ever) we support idmapped mounts. In
that case it is possible that uid/gid in request are 0/0 but still we
need to clear SGID. So we will have to pick a non-root sgid and switch
to that instead. That's an TODO item for future when idmapped mount
support is introduced.

This patch only adds the capability to switch creds and drop FSETID
when acl xattr is set. This does not take affect yet. It can take
affect when next patch adds the capability to enable posix_acl.

Reported-by: Luis Henriques 
Signed-off-by: Vivek Goyal 
---
 tools/virtiofsd/passthrough_ll.c | 75 
 1 file changed, 75 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 0c9084ea15..113c725def 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -175,6 +175,7 @@ struct lo_data {
 int user_killpriv_v2, killpriv_v2;
 /* If set, virtiofsd is responsible for setting umask during creation */
 bool change_umask;
+int posix_acl;
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -1185,6 +1186,51 @@ static void lo_restore_cred(struct lo_cred *old, bool 
restore_umask)
 umask(old->umask);
 }
 
+/*
+ * A helper to change cred and drop capability. Returns 0 on success and
+ * errno on error
+ */
+static int lo_drop_cap_change_cred(fuse_req_t req, struct lo_cred *old,
+   bool change_umask, const char *cap_name,
+   bool *cap_dropped)
+{
+int ret;
+bool __cap_dropped;
+
+assert(cap_name);
+
+ret = drop_effective_cap(cap_name, &__cap_dropped);
+if (ret) {
+return ret;
+}
+
+ret = lo_change_cred(req, old, change_umask);
+if (ret) {
+if (__cap_dropped) {
+if (gain_effective_cap(cap_name)) {
+fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_%s\n", cap_name);
+}
+}
+}
+
+if (cap_dropped) {
+*cap_dropped = __cap_dropped;
+}
+return ret;
+}
+
+static void lo_restore_cred_gain_cap(struct lo_cred *old, bool restore_umask,
+ const char *cap_name)
+{
+assert(cap_name);
+
+lo_restore_cred(old, restore_umask);
+
+if (gain_effective_cap(cap_name)) {
+fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_%s\n", cap_name);
+}
+}
+
 static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
  const char *name, mode_t mode, dev_t rdev,
  const char *link)
@@ -2976,6 +3022,9 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, 
const char *in_name,
 ssize_t ret;
 int saverr;
 int fd = -1;
+bool switched_creds = false;
+bool cap_fsetid_dropped = false;
+struct lo_cred old = {};
 
 mapped_name = NULL;
 name = in_name;
@@ -3006,6 +3055,26 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, 
const char *in_name,
  ", name=%s value=%s size=%zd)\n", ino, name, value, size);
 
 sprintf(procname, "%i", inode->fd);
+/*
+ * If we are setting posix access acl and if SGID needs to be
+ * cleared, then switch to caller's gid and drop CAP_FSETID
+ * and that should make sure host kernel clears SGID.
+ *
+ * This probably will not work when we support idmapped mounts.
+ * In that case we will need to find a non-root gid and switch
+ * to it. (Instead of gid in request). Fix it when we support
+ * idmapped mounts.
+ */
+if (lo->posix_acl && !strcmp(name, "system.posix_acl_access")
+&& (extra_flags & FUSE_SETXATTR_ACL_KILL_SGID)) {
+ret = lo_drop_cap_change_cred(req, , false, "FSETID",
+  _fsetid_dropped);
+if (ret) {
+saverr = ret;
+goto out;
+}
+switched_creds = true;
+}
 if

[PATCH v7 0/7] virtiofsd: Add support to enable/disable posix acls

2021-06-22 Thread Vivek Goyal

Hi,

This is V7 of the patches.

Changes since V6.

- Dropped kernel header update patch as somebody else did it.
- Fixed coding style issues.

Currently posix ACL support does not work well with virtiofs and bunch
of tests fail when I run xfstests "./check -g acl".

This patches series fixes the issues with virtiofs posix acl support
and provides options to enable/disable posix acl (-o posix_acl/no_posix_acl).
By default posix_acls are disabled.

With this patch series applied and virtiofsd running with "-o posix_acl",
xfstests "./check -g acl" passes.

Thanks
Vivek


Vivek Goyal (7):
  virtiofsd: Fix fuse setxattr() API change issue
  virtiofsd: Fix xattr operations overwriting errno
  virtiofsd: Add support for extended setxattr
  virtiofsd: Add umask to seccom allow list
  virtiofsd: Add capability to change/restore umask
  virtiofsd: Switch creds, drop FSETID for system.posix_acl_access xattr
  virtiofsd: Add an option to enable/disable posix acls

 docs/tools/virtiofsd.rst  |   3 +
 tools/virtiofsd/fuse_common.h |  10 ++
 tools/virtiofsd/fuse_lowlevel.c   |  18 +-
 tools/virtiofsd/fuse_lowlevel.h   |   3 +-
 tools/virtiofsd/helper.c  |   1 +
 tools/virtiofsd/passthrough_ll.c  | 229 --
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 7 files changed, 249 insertions(+), 16 deletions(-)

-- 
2.25.4

[PATCH v7 4/7] virtiofsd: Add umask to seccom allow list

2021-06-22 Thread Vivek Goyal

Patches in this series  are going to make use of "umask" syscall.
So allow it.

Signed-off-by: Vivek Goyal 
Reviewed-by: Stefan Hajnoczi 
---
 tools/virtiofsd/passthrough_seccomp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/virtiofsd/passthrough_seccomp.c 
b/tools/virtiofsd/passthrough_seccomp.c
index 62441cfcdb..f49ed94b5e 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -114,6 +114,7 @@ static const int syscall_allowlist[] = {
 SCMP_SYS(utimensat),
 SCMP_SYS(write),
 SCMP_SYS(writev),
+SCMP_SYS(umask),
 };
 
 /* Syscalls used when --syslog is enabled */
-- 
2.25.4

1 2 3 >

1 - 100 of 281 matches

Mail list logo