date:20210222

Re: Editing QEMU POWER Platform wiki page

2021-02-22 Thread Greg Kurz

On Tue, 23 Feb 2021 15:51:19 +1100
David Gibson  wrote:

> On Mon, Feb 22, 2021 at 06:18:08PM -0300, Leonardo Augusto Guimarães Garcia 
> wrote:
> > On 2/22/21 8:01 AM, Greg Kurz wrote:
> > > On Thu, 18 Feb 2021 10:16:25 -0300
> > > Leonardo Augusto Guimarães Garcia  wrote:
> > > 
> > > > Hi there,
> > > > 
> > > > I would like to edit the wiki page at [0] as it contains some outdated
> > > > information. Could anyone that has access to the wiki please help me
> > > > create a user so that I can edit it?
> > > > 
> > > > 0. https://wiki.qemu.org/Documentation/Platforms/POWER
> > > > 
> > > Hi Leo,
> > > 
> > > User creation isn't publicly available to avoid spam : only an existing
> > > user can create a new account.
> > 
> > Yeah, I saw that. That's why I asked here.
> 
> The other concerns raised in this thread are valid, but those
> notwithstanding, I think it makes sense to let you update the Wiki if
> you have the time and inclination.
> 

Sure, but the point is that this incentive to update documentation
would be better used in the main QEMU documentation, i.e. the
docs/system/ppc/pseries.rst file in Cedric's "docs/system: Extend PPC
section" patch.

> I have a wiki account, and I know who you are, so I've created an
> account for you.  Credentials to follow via private communication.
> 



pgp_UvY9uakne.pgp
Description: OpenPGP digital signature

Re: [PATCH qemu v13] spapr: Implement Open Firmware client interface

2021-02-22 Thread Alexey Kardashevskiy





On 23/02/2021 16:28, David Gibson wrote:

On Tue, Feb 23, 2021 at 04:01:00PM +1100, Alexey Kardashevskiy wrote:



On 23/02/2021 14:07, David Gibson wrote:

On Tue, Feb 09, 2021 at 10:02:52PM +1100, Alexey Kardashevskiy wrote:

The PAPR platform which describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.

The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..4000 - the initial firmware
1..18 - stack

This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735

This does not use the QEMU coding style for the firmware as using
it for assemler is rather strange in the POWERPC world, according
to POWERPC veterans; mixing styles in the firmware's .c and .s is
weird too IMO.

Signed-off-by: Alexey Kardashevskiy 
---

The example command line is:

-c 0 /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-enable-kvm \
-m 2G \
-machine 
pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
 \
-kernel pbuild/kernel-le-guest/vmlinux \
-initrd t/le.cpio \
-drive 
id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof/nvram.bin,format=raw \
-global spapr-nvram.drive=DRIVE0 \
-snapshot \
-smp 8,threads=8 \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors \
-chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
-mon chardev=SOCKET0,mode=control



---
Changes:
v13:
* rebase on latest ppc-for-6.0
* shuffled code around to touch spapr.c less

v12:
* split VOF and SPAPR

v11:
* added g_autofree
* fixed gcc warnings
* fixed few leaks
* added nvram image to make "nvram --print-config" not crash;
Note that contrary to  MIN_NVRAM_SIZE (8 * KiB), the actual minimum size
is 16K, or it just does not work (empty output from "nvram")

v10:
* now rebased to compile with meson

v9:
* remove special handling of /rtas/rtas-size as now we always add it in QEMU
* removed leftovers from scsi/grub/stdout/stdin/...

v8:
* no read/write/seek
* no @dev in instances
* the machine flag is "x-vof" for now

v7:
* now we have a small firmware which loads at 0 as SLOF and starts from
0x100 as SLOF
* no MBR/ELF/GRUB business in QEMU anymore
* blockdev is a separate patch
* networking is a separate patch

v6:
* borrowed a big chunk of commit log introduction from David
* fixed initial stack pointer (points to the highest address of stack)
* traces for "interpret" and others
* disabled  translate_kernel_address() hack so grub can load (work in
progress)
* added "milliseconds" for grub
* fixed "claim" allocator again
* moved FDT_MAX_SIZE to spapr.h as

[PATCH] target/riscv: fix vs() to return proper error code

2021-02-22 Thread frank . chang

From: Frank Chang 

vs() should return -RISCV_EXCP_ILLEGAL_INST instead of -1 if rvv feature
is not enabled.

If -1 is returned, exception will be raised and cs->exception_index will
be set to the negative return value. The exception will then be treated
as an instruction access fault instead of illegal instruction fault.

Signed-off-by: Frank Chang 
---
 target/riscv/csr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index fd2e6363f39..d2ae73e4a08 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -54,7 +54,7 @@ static int vs(CPURISCVState *env, int csrno)
 if (env->misa & RVV) {
 return 0;
 }
-return -1;
+return -RISCV_EXCP_ILLEGAL_INST;
 }
 
 static int ctr(CPURISCVState *env, int csrno)
-- 
2.17.1

Re: who's using the ozlabs patchwork install for QEMU patches ?

2021-02-22 Thread Bin Meng

On Mon, Feb 22, 2021 at 10:43 PM Greg Kurz  wrote:
>
> On Mon, 22 Feb 2021 13:59:34 +
> Peter Maydell  wrote:
>
> > On Mon, 22 Feb 2021 at 07:21, Greg Kurz  wrote:
> > >
> > > On Fri, 19 Feb 2021 17:51:02 +0100
> > > Thomas Huth  wrote:
> > >
> > > > On 19/02/2021 17.26, Peter Maydell wrote:
> > > > > Does anybody use the ozlabs patchwork install for QEMU patches,
> > > > > either occasionally or on a regular basis ?
> > > > > http://patchwork.ozlabs.org/project/qemu-devel/list/
> > > > > The admins for that system are trying to identify which of
> > > > > the various projects are really using their patchwork instances,
> > > > > so I figured I'd do a quick survey here. We don't use it
> > > > > as an official project tool but it's certainly possible to
> > > > > use it as an individual developer in one way or another.
> > > >
> > > > I think it might be used by some of the ppc hackers ... so CC:-ing to
> > > > qemu-pcc ...
> > > >
> > >
> > > I do on a very regular basis.
> >
> > Thanks for the reports. Do you use the features like assigning
> > patches to people and changing patch status, or do you mostly
> > just use it as a read-only archive-of-patches ?
> >
>
> Only the latter but mostly because I don't have the permissions
> to change status, e.g. when trying to change status of this
> recent patch from Cedric to rearrange the PowerPC docs:
>
> You don't have permissions to edit patch 'docs/system: Extend PPC section'
>
> My understanding is that users must be "maintainer" to edit other's
> patches. Only three 'maintainers' are currently listed at ozlabs for
> QEMU:

I can update my patch status in the QEMU project. I am not sure if
this is due to I am a maintainer of another project hosted on
ozlabs.org.

>
> https://patchwork.ozlabs.org/api/1.0/projects/14/
>
> We had a discussion about that a few months back with Christian Schoenebeck
> (9pfs maintainer, Cc'd) who also uses patchworks. It turned out we didn't
> quite know how to go further because of lack of documentation, but I'd be
> glad to experiment the full patchwork experience if someone knows how to
> do it :-)

I personally found patchwork is really helpful for mainatiner's work.
But it looks the maintainers from the QEMU community do not use it.

Regards,
Bin

[PATCH v4] configure: Improve OpenGL dependency detections

2021-02-22 Thread Akihiko Odaki

This has the following visible changes:

- GBM is required only for OpenGL dma-buf.
- X11 is explicitly required by gtk-egl.
- EGL is now mandatory for the OpenGL displays.

The last one needs some detailed description. Before this change,
EGL was tested only for OpenGL dma-buf with the check of
EGL_MESA_image_dma_buf_export. However, all of the OpenGL
displays depend on EGL and EGL_MESA_image_dma_buf_export is always
defined by epoxy's EGL interface.
Therefore, it makes more sense to always check the presence of EGL
and say the OpenGL displays are available along with OpenGL dma-buf
if it is present.

Signed-off-by: Akihiko Odaki 
---
 configure| 37 +++-
 docs/interop/vhost-user.json |  3 ++-
 include/ui/egl-helpers.h |  9 -
 include/ui/spice-display.h   |  2 +-
 meson.build  |  2 +-
 ui/egl-helpers.c |  8 ++--
 ui/gtk-egl.c |  6 +++---
 ui/gtk-gl-area.c |  2 +-
 ui/gtk.c | 14 ++
 ui/meson.build   |  8 +---
 10 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/configure b/configure
index a79b3746d4c..b922d1ea260 100755
--- a/configure
+++ b/configure
@@ -394,7 +394,6 @@ u2f="auto"
 libusb="$default_feature"
 usb_redir="$default_feature"
 opengl="$default_feature"
-opengl_dmabuf="no"
 cpuid_h="no"
 avx2_opt="$default_feature"
 capstone="auto"
@@ -3607,14 +3606,24 @@ if $pkg_config gbm; then
 fi
 
 if test "$opengl" != "no" ; then
-  opengl_pkgs="epoxy gbm"
-  if $pkg_config $opengl_pkgs; then
-opengl_cflags="$($pkg_config --cflags $opengl_pkgs)"
-opengl_libs="$($pkg_config --libs $opengl_pkgs)"
+  epoxy=no
+  if $pkg_config epoxy; then
+cat > $TMPC << EOF
+#include 
+int main(void) { return 0; }
+EOF
+if compile_prog "" "" ; then
+  epoxy=yes
+fi
+  fi
+
+  if test "$epoxy" = "yes" ; then
+opengl_cflags="$($pkg_config --cflags epoxy)"
+opengl_libs="$($pkg_config --libs epoxy)"
 opengl=yes
   else
 if test "$opengl" = "yes" ; then
-  feature_not_found "opengl" "Please install opengl (mesa) devel pkgs: 
$opengl_pkgs"
+  feature_not_found "opengl" "Please install epoxy with EGL"
 fi
 opengl_cflags=""
 opengl_libs=""
@@ -3622,19 +3631,6 @@ if test "$opengl" != "no" ; then
   fi
 fi
 
-if test "$opengl" = "yes"; then
-  cat > $TMPC << EOF
-#include 
-#ifndef EGL_MESA_image_dma_buf_export
-# error mesa/epoxy lacks support for dmabufs (mesa 10.6+)
-#endif
-int main(void) { return 0; }
-EOF
-  if compile_prog "" "" ; then
-opengl_dmabuf=yes
-  fi
-fi
-
 ##
 # libxml2 probe
 if test "$libxml2" != "no" ; then
@@ -5837,9 +5833,6 @@ if test "$opengl" = "yes" ; then
   echo "CONFIG_OPENGL=y" >> $config_host_mak
   echo "OPENGL_CFLAGS=$opengl_cflags" >> $config_host_mak
   echo "OPENGL_LIBS=$opengl_libs" >> $config_host_mak
-  if test "$opengl_dmabuf" = "yes" ; then
-echo "CONFIG_OPENGL_DMABUF=y" >> $config_host_mak
-  fi
 fi
 
 if test "$gbm" = "yes" ; then
diff --git a/docs/interop/vhost-user.json b/docs/interop/vhost-user.json
index feb5fe58cad..b6ade9e4931 100644
--- a/docs/interop/vhost-user.json
+++ b/docs/interop/vhost-user.json
@@ -250,7 +250,8 @@
 #   "type": "gpu",
 #   "binary": "/usr/libexec/qemu/vhost-user-gpu",
 #   "tags": [
-# "CONFIG_OPENGL_DMABUF=y"
+# "CONFIG_OPENGL=y",
+# "CONFIG_GBM=y"
 #   ]
 # }
 #
diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
index 5b1f7fafe0b..f1bf8f97fc3 100644
--- a/include/ui/egl-helpers.h
+++ b/include/ui/egl-helpers.h
@@ -3,7 +3,9 @@
 
 #include 
 #include 
+#ifdef CONFIG_GBM
 #include 
+#endif
 #include "ui/console.h"
 #include "ui/shader.h"
 
@@ -31,7 +33,7 @@ void egl_texture_blit(QemuGLShader *gls, egl_fb *dst, egl_fb 
*src, bool flip);
 void egl_texture_blend(QemuGLShader *gls, egl_fb *dst, egl_fb *src, bool flip,
int x, int y, double scale_x, double scale_y);
 
-#ifdef CONFIG_OPENGL_DMABUF
+#ifdef CONFIG_GBM
 
 extern int qemu_egl_rn_fd;
 extern struct gbm_device *qemu_egl_rn_gbm_dev;
@@ -48,8 +50,13 @@ void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf);
 
 EGLSurface qemu_egl_init_surface_x11(EGLContext ectx, EGLNativeWindowType win);
 
+#if defined(CONFIG_X11) || defined(CONFIG_GBM)
+
 int qemu_egl_init_dpy_x11(EGLNativeDisplayType dpy, DisplayGLMode mode);
 int qemu_egl_init_dpy_mesa(EGLNativeDisplayType dpy, DisplayGLMode mode);
+
+#endif
+
 EGLContext qemu_egl_init_ctx(void);
 bool qemu_egl_has_dmabuf(void);
 
diff --git a/include/ui/spice-display.h b/include/ui/spice-display.h
index 4a47ffdd4c8..ed298d58f06 100644
--- a/include/ui/spice-display.h
+++ b/include/ui/spice-display.h
@@ -27,7 +27,7 @@
 #include "ui/qemu-pixman.h"
 #include "ui/console.h"
 
-#if defined(CONFIG_OPENGL_DMABUF)
+#if defined(CONFIG_OPENGL) && defined(CONFIG_GBM)
 # if SPICE_SERVER_VERSION >= 0x000d01 /* release 0.13.1 */
 #  define HAVE_SPICE_GL

Re: [PATCH] docs/system: Extend PPC section

2021-02-22 Thread Cédric Le Goater

On 2/23/21 1:28 AM, David Gibson wrote:
> On Mon, Feb 22, 2021 at 03:04:41PM +0100, Greg Kurz wrote:
>> On Mon, 22 Feb 2021 14:39:56 +0100
>> Cédric Le Goater  wrote:
>>
>>> This moves the current documentation in files specific to each
>>> platform family. PowerNV machine is updated, the other machines need
>>> to be done.
>>>
>>> Signed-off-by: Cédric Le Goater 
>>> ---
>>
>> Looks pretty good to me. Just one small nit in docs/system/target-ppc.rst.
>>
>> Reviewed-by: Greg Kurz 
> 
> Applied to ppc-for-6.0...
> 
> [snip]
> 
>>> -QEMU emulates the following PowerMac peripherals:
>>> +you can get a complete list by running ``qemu-system-ppc64 --machine
>>
>> Usual capitalization rules call for s/you/You .
> 
> .. and I corrected that inline.
> 

Thanks,

C.

[PATCH v3] configure: Improve OpenGL dependency detections

2021-02-22 Thread Akihiko Odaki

This has the following visible changes:

- GBM is required only for OpenGL dma-buf.
- X11 is explicitly required by gtk-egl.
- EGL is now mandatory for the OpenGL displays.

The last one needs some detailed description. Before this change,
EGL was tested only for OpenGL dma-buf with the check of
EGL_MESA_image_dma_buf_export. However, all of the OpenGL
displays depend on EGL and EGL_MESA_image_dma_buf_export is always
defined by epoxy's EGL interface.
Therefore, it makes more sense to always check the presence of EGL
and say the OpenGL displays are available along with OpenGL dma-buf
if it is present.

Signed-off-by: Akihiko Odaki 
---
 configure| 36 +++-
 docs/interop/vhost-user.json |  3 ++-
 include/ui/egl-helpers.h |  9 -
 include/ui/spice-display.h   |  2 +-
 meson.build  |  2 +-
 ui/egl-helpers.c |  8 ++--
 ui/gtk-egl.c |  6 +++---
 ui/gtk.c | 14 ++
 ui/meson.build   |  8 +---
 9 files changed, 55 insertions(+), 33 deletions(-)

diff --git a/configure b/configure
index a79b3746d4c..f0869ca65c1 100755
--- a/configure
+++ b/configure
@@ -3607,14 +3607,24 @@ if $pkg_config gbm; then
 fi
 
 if test "$opengl" != "no" ; then
-  opengl_pkgs="epoxy gbm"
-  if $pkg_config $opengl_pkgs; then
-opengl_cflags="$($pkg_config --cflags $opengl_pkgs)"
-opengl_libs="$($pkg_config --libs $opengl_pkgs)"
+  epoxy=no
+  if $pkg_config epoxy; then
+cat > $TMPC << EOF
+#include 
+int main(void) { return 0; }
+EOF
+if compile_prog "" "" ; then
+  epoxy=yes
+fi
+  fi
+
+  if test "$epoxy" = "yes" ; then
+opengl_cflags="$($pkg_config --cflags epoxy)"
+opengl_libs="$($pkg_config --libs epoxy)"
 opengl=yes
   else
 if test "$opengl" = "yes" ; then
-  feature_not_found "opengl" "Please install opengl (mesa) devel pkgs: 
$opengl_pkgs"
+  feature_not_found "opengl" "Please install epoxy with EGL"
 fi
 opengl_cflags=""
 opengl_libs=""
@@ -3622,19 +3632,6 @@ if test "$opengl" != "no" ; then
   fi
 fi
 
-if test "$opengl" = "yes"; then
-  cat > $TMPC << EOF
-#include 
-#ifndef EGL_MESA_image_dma_buf_export
-# error mesa/epoxy lacks support for dmabufs (mesa 10.6+)
-#endif
-int main(void) { return 0; }
-EOF
-  if compile_prog "" "" ; then
-opengl_dmabuf=yes
-  fi
-fi
-
 ##
 # libxml2 probe
 if test "$libxml2" != "no" ; then
@@ -5837,9 +5834,6 @@ if test "$opengl" = "yes" ; then
   echo "CONFIG_OPENGL=y" >> $config_host_mak
   echo "OPENGL_CFLAGS=$opengl_cflags" >> $config_host_mak
   echo "OPENGL_LIBS=$opengl_libs" >> $config_host_mak
-  if test "$opengl_dmabuf" = "yes" ; then
-echo "CONFIG_OPENGL_DMABUF=y" >> $config_host_mak
-  fi
 fi
 
 if test "$gbm" = "yes" ; then
diff --git a/docs/interop/vhost-user.json b/docs/interop/vhost-user.json
index feb5fe58cad..b6ade9e4931 100644
--- a/docs/interop/vhost-user.json
+++ b/docs/interop/vhost-user.json
@@ -250,7 +250,8 @@
 #   "type": "gpu",
 #   "binary": "/usr/libexec/qemu/vhost-user-gpu",
 #   "tags": [
-# "CONFIG_OPENGL_DMABUF=y"
+# "CONFIG_OPENGL=y",
+# "CONFIG_GBM=y"
 #   ]
 # }
 #
diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
index 5b1f7fafe0b..f1bf8f97fc3 100644
--- a/include/ui/egl-helpers.h
+++ b/include/ui/egl-helpers.h
@@ -3,7 +3,9 @@
 
 #include 
 #include 
+#ifdef CONFIG_GBM
 #include 
+#endif
 #include "ui/console.h"
 #include "ui/shader.h"
 
@@ -31,7 +33,7 @@ void egl_texture_blit(QemuGLShader *gls, egl_fb *dst, egl_fb 
*src, bool flip);
 void egl_texture_blend(QemuGLShader *gls, egl_fb *dst, egl_fb *src, bool flip,
int x, int y, double scale_x, double scale_y);
 
-#ifdef CONFIG_OPENGL_DMABUF
+#ifdef CONFIG_GBM
 
 extern int qemu_egl_rn_fd;
 extern struct gbm_device *qemu_egl_rn_gbm_dev;
@@ -48,8 +50,13 @@ void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf);
 
 EGLSurface qemu_egl_init_surface_x11(EGLContext ectx, EGLNativeWindowType win);
 
+#if defined(CONFIG_X11) || defined(CONFIG_GBM)
+
 int qemu_egl_init_dpy_x11(EGLNativeDisplayType dpy, DisplayGLMode mode);
 int qemu_egl_init_dpy_mesa(EGLNativeDisplayType dpy, DisplayGLMode mode);
+
+#endif
+
 EGLContext qemu_egl_init_ctx(void);
 bool qemu_egl_has_dmabuf(void);
 
diff --git a/include/ui/spice-display.h b/include/ui/spice-display.h
index 4a47ffdd4c8..ed298d58f06 100644
--- a/include/ui/spice-display.h
+++ b/include/ui/spice-display.h
@@ -27,7 +27,7 @@
 #include "ui/qemu-pixman.h"
 #include "ui/console.h"
 
-#if defined(CONFIG_OPENGL_DMABUF)
+#if defined(CONFIG_OPENGL) && defined(CONFIG_GBM)
 # if SPICE_SERVER_VERSION >= 0x000d01 /* release 0.13.1 */
 #  define HAVE_SPICE_GL 1
 #  include "ui/egl-helpers.h"
diff --git a/meson.build b/meson.build
index 58321a0ea25..1f09e31bfbf 100644
--- a/meson.build
+++ b/meson.build
@@ -2639,7 +2639,7 @@ summary_info += {'U2F support':   u2f.found()}
 summary_info

[PATCH v2] virtio-blk: Respect discard granularity

2021-02-22 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
---
 hw/block/virtio-blk.c  | 8 +++-
 hw/core/machine.c  | 9 -
 include/hw/virtio/virtio-blk.h | 1 +
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index bac2d6fa2b2..f4378e61182 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -962,10 +962,14 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 blkcfg.wce = blk_enable_write_cache(s->blk);
 virtio_stw_p(vdev, _queues, s->conf.num_queues);
 if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD)) {
+uint32_t discard_granularity = conf->discard_granularity;
+if (discard_granularity == -1 || !s->conf.report_discard_granularity) {
+discard_granularity = blk_size;
+}
 virtio_stl_p(vdev, _discard_sectors,
  s->conf.max_discard_sectors);
 virtio_stl_p(vdev, _sector_alignment,
- blk_size >> BDRV_SECTOR_BITS);
+ discard_granularity >> BDRV_SECTOR_BITS);
 /*
  * We support only one segment per request since multiple segments
  * are not widely used and there are no userspace APIs that allow
@@ -1299,6 +1303,8 @@ static Property virtio_blk_properties[] = {
  IOThread *),
 DEFINE_PROP_BIT64("discard", VirtIOBlock, host_features,
   VIRTIO_BLK_F_DISCARD, true),
+DEFINE_PROP_BOOL("report-discard-granularity", VirtIOBlock,
+ conf.report_discard_granularity, true),
 DEFINE_PROP_BIT64("write-zeroes", VirtIOBlock, host_features,
   VIRTIO_BLK_F_WRITE_ZEROES, true),
 DEFINE_PROP_UINT32("max-discard-sectors", VirtIOBlock,
diff --git a/hw/core/machine.c b/hw/core/machine.c
index de3b8f1b318..3ba976e5bbc 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -33,7 +33,9 @@
 #include "migration/global_state.h"
 #include "migration/vmstate.h"
 
-GlobalProperty hw_compat_5_2[] = {};
+GlobalProperty hw_compat_5_2[] = {
+{ "virtio-blk-device", "report-discard-granularity", "off" },
+};
 const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
 
 GlobalProperty hw_compat_5_1[] = {
@@ -41,6 +43,7 @@ GlobalProperty hw_compat_5_1[] = {
 { "vhost-user-blk", "num-queues", "1"},
 { "vhost-user-scsi", "num_queues", "1"},
 { "virtio-blk-device", "num-queues", "1"},
+{ "virtio-blk-device", "report-discard-granularity", "off" },
 { "virtio-scsi-device", "num_queues", "1"},
 { "nvme", "use-intel-id", "on"},
 { "pvpanic", "events", "1"}, /* PVPANIC_PANICKED */
@@ -50,6 +53,7 @@ const size_t hw_compat_5_1_len = G_N_ELEMENTS(hw_compat_5_1);
 GlobalProperty hw_compat_5_0[] = {
 { "pci-host-bridge", "x-config-reg-migration-enabled", "off" },
 { "virtio-balloon-device", "page-poison", "false" },
+{ "virtio-blk-device", "report-discard-granularity", "off" },
 { "vmport", "x-read-set-eax", "off" },
 { "vmport", "x-signal-unsupported-cmd", "off" },
 { "vmport", "x-report-vmx-type", "off" },
@@ -59,6 +63,7 @@ GlobalProperty hw_compat_5_0[] = {
 const size_t hw_compat_5_0_len = G_N_ELEMENTS(hw_compat_5_0);
 
 GlobalProperty hw_compat_4_2[] = {
+{ "virtio-blk-device", "report-discard-granularity", "off" },
 { "virtio-blk-device", "queue-size", "128"},
 { "virtio-scsi-device", "virtqueue_size", "128"},
 { "virtio-blk-device", "x-enable-wce-if-config-wce", "off" },
@@ -74,6 +79,7 @@ GlobalProperty hw_compat_4_2[] = {
 const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2);
 
 GlobalProperty hw_compat_4_1[] = {
+{ "virtio-blk-device", "report-discard-granularity", "off" },
 { "virtio-pci", "x-pcie-flr-init", "off" },
 { "virtio-device", "use-disabled-flag", "false" },
 };
@@ -83,6 +89,7 @@ GlobalProperty hw_compat_4_0[] = {
 { "VGA","edid", "false" },
 { "secondary-vga",  "edid", "false" },
 { "bochs-display",  "edid", "false" },
+{ "virtio-blk-device", "report-discard-granularity", "off" },
 { "virtio-vga", "edid", "false" },
 { "virtio-gpu-device", "edid", "false" },
 { "virtio-device", "use-started", "false" },
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 214ab748229..29655a406dd 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -41,6 +41,7 @@ struct VirtIOBlkConf
 uint16_t num_queues;
 uint16_t queue_size;
 bool seg_max_adjust;
+bool report_discard_granularity;
 uint32_t max_discard_sectors;
 uint32_t max_write_zeroes_sectors;
 bool x_enable_wce_if_config_wce;
-- 
2.24.3 (Apple Git-128)

Re: [PATCH qemu v13] spapr: Implement Open Firmware client interface

2021-02-22 Thread David Gibson

On Tue, Feb 23, 2021 at 04:01:00PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 23/02/2021 14:07, David Gibson wrote:
> > On Tue, Feb 09, 2021 at 10:02:52PM +1100, Alexey Kardashevskiy wrote:
> > > The PAPR platform which describes an OS environment that's presented by
> > > a combination of a hypervisor and firmware. The features it specifies
> > > require collaboration between the firmware and the hypervisor.
> > > 
> > > Since the beginning, the runtime component of the firmware (RTAS) has
> > > been implemented as a 20 byte shim which simply forwards it to
> > > a hypercall implemented in qemu. The boot time firmware component is
> > > SLOF - but a build that's specific to qemu, and has always needed to be
> > > updated in sync with it. Even though we've managed to limit the amount
> > > of runtime communication we need between qemu and SLOF, there's some,
> > > and it has become increasingly awkward to handle as we've implemented
> > > new features.
> > > 
> > > This implements a boot time OF client interface (CI) which is
> > > enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
> > > Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
> > > which implements Open Firmware Client Interface (OF CI). This allows
> > > using a smaller stateless firmware which does not have to manage
> > > the device tree.
> > > 
> > > The new "vof.bin" firmware image is included with source code under
> > > pc-bios/. It also includes RTAS blob.
> > > 
> > > This implements a handful of CI methods just to get -kernel/-initrd
> > > working. In particular, this implements the device tree fetching and
> > > simple memory allocator - "claim" (an OF CI memory allocator) and updates
> > > "/memory@0/available" to report the client about available memory.
> > > 
> > > This implements changing some device tree properties which we know how
> > > to deal with, the rest is ignored. To allow changes, this skips
> > > fdt_pack() when x-vof=on as not packing the blob leaves some room for
> > > appending.
> > > 
> > > In absence of SLOF, this assigns phandles to device tree nodes to make
> > > device tree traversing work.
> > > 
> > > When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
> > > 
> > > This adds basic instances support which are managed by a hash map
> > > ihandle -> [phandle].
> > > 
> > > Before the guest started, the used memory is:
> > > 0..4000 - the initial firmware
> > > 1..18 - stack
> > > 
> > > This OF CI does not implement "interpret".
> > > 
> > > Unlike SLOF, this does not format uninitialized nvram. Instead, this
> > > includes a disk image with pre-formatted nvram.
> > > 
> > > With this basic support, this can only boot into kernel directly.
> > > However this is just enough for the petitboot kernel and initradmdisk to
> > > boot from any possible source. Note this requires reasonably recent guest
> > > kernel with:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
> > > 
> > > This does not use the QEMU coding style for the firmware as using
> > > it for assemler is rather strange in the POWERPC world, according
> > > to POWERPC veterans; mixing styles in the firmware's .c and .s is
> > > weird too IMO.
> > > 
> > > Signed-off-by: Alexey Kardashevskiy 
> > > ---
> > > 
> > > The example command line is:
> > > 
> > > -c 0 /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
> > > -nodefaults \
> > > -chardev stdio,id=STDIO0,signal=off,mux=on \
> > > -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> > > -mon id=MON0,chardev=STDIO0,mode=readline \
> > > -nographic \
> > > -vga none \
> > > -enable-kvm \
> > > -m 2G \
> > > -machine 
> > > pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
> > >  \
> > > -kernel pbuild/kernel-le-guest/vmlinux \
> > > -initrd t/le.cpio \
> > > -drive 
> > > id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof/nvram.bin,format=raw 
> > > \
> > > -global spapr-nvram.drive=DRIVE0 \
> > > -snapshot \
> > > -smp 8,threads=8 \
> > > -L /home/aik/t/qemu-ppc64-bios/ \
> > > -trace events=qemu_trace_events \
> > > -d guest_errors \
> > > -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
> > > -mon chardev=SOCKET0,mode=control
> > > 
> > > 
> > > 
> > > ---
> > > Changes:
> > > v13:
> > > * rebase on latest ppc-for-6.0
> > > * shuffled code around to touch spapr.c less
> > > 
> > > v12:
> > > * split VOF and SPAPR
> > > 
> > > v11:
> > > * added g_autofree
> > > * fixed gcc warnings
> > > * fixed few leaks
> > > * added nvram image to make "nvram --print-config" not crash;
> > > Note that contrary to  MIN_NVRAM_SIZE (8 * KiB), the actual minimum size
> > > is 16K, or it just does not work (empty output from "nvram")
> > > 
> > > v10:
> > > * now rebased to compile with meson
> > > 
> > > v9:
> > > * remove special handling of /rtas/rtas-size as now we always add it in 
> > > QEMU
> > > * removed leftovers

Re: [PATCH qemu v13] spapr: Implement Open Firmware client interface

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 04:01:06PM +0100, Greg Kurz wrote:
> On Mon, 22 Feb 2021 22:48:51 +1100
> Alexey Kardashevskiy  wrote:
> 
> > Ping?
> > 
> > I need community support here :) I am hearing that having this mode 
> > helps heaps with development in fully emulated environments as this 
> > skips SLOF entirely, for example. Another rumour I am hearing is that 
> > there is interest in running grub in the userspace which this VOF thing 
> > makes handy too.
> > 
> 
> I had tried a previous version of this : skipping SLOF is very
> beneficial to do guest work, even when running on KVM.
> 
> This patch is quite huge and I don't personally have time to review all
> of it. My main concern is that it doesn't impact support of the paths
> used in production (i.e. pseries + SLOF + KVM).

Yeah, it would be great if this could be split up a bit, but I see
that could be pretty tricky.

> Alexey has made a great job of separating VOF from the rest : changes in
> hw/ppc/spapr.c are simple enough for me now. So from that perspective:
> 
> Acked-by: Greg Kurz 
> 
> Then maybe it would make sense you also add yourself as maintainer
> for hw/ppc/*vof* to share the burden ?

That would be a good idea.

> 
> Anyway, David will have the final say on this patch.
> 
> > 
> > 
> > On 09/02/2021 22:02, Alexey Kardashevskiy wrote:
> > > The PAPR platform which describes an OS environment that's presented by
> > > a combination of a hypervisor and firmware. The features it specifies
> > > require collaboration between the firmware and the hypervisor.
> > > 
> > > Since the beginning, the runtime component of the firmware (RTAS) has
> > > been implemented as a 20 byte shim which simply forwards it to
> > > a hypercall implemented in qemu. The boot time firmware component is
> > > SLOF - but a build that's specific to qemu, and has always needed to be
> > > updated in sync with it. Even though we've managed to limit the amount
> > > of runtime communication we need between qemu and SLOF, there's some,
> > > and it has become increasingly awkward to handle as we've implemented
> > > new features.
> > > 
> > > This implements a boot time OF client interface (CI) which is
> > > enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
> > > Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
> > > which implements Open Firmware Client Interface (OF CI). This allows
> > > using a smaller stateless firmware which does not have to manage
> > > the device tree.
> > > 
> > > The new "vof.bin" firmware image is included with source code under
> > > pc-bios/. It also includes RTAS blob.
> > > 
> > > This implements a handful of CI methods just to get -kernel/-initrd
> > > working. In particular, this implements the device tree fetching and
> > > simple memory allocator - "claim" (an OF CI memory allocator) and updates
> > > "/memory@0/available" to report the client about available memory.
> > > 
> > > This implements changing some device tree properties which we know how
> > > to deal with, the rest is ignored. To allow changes, this skips
> > > fdt_pack() when x-vof=on as not packing the blob leaves some room for
> > > appending.
> > > 
> > > In absence of SLOF, this assigns phandles to device tree nodes to make
> > > device tree traversing work.
> > > 
> > > When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
> > > 
> > > This adds basic instances support which are managed by a hash map
> > > ihandle -> [phandle].
> > > 
> > > Before the guest started, the used memory is:
> > > 0..4000 - the initial firmware
> > > 1..18 - stack
> > > 
> > > This OF CI does not implement "interpret".
> > > 
> > > Unlike SLOF, this does not format uninitialized nvram. Instead, this
> > > includes a disk image with pre-formatted nvram.
> > > 
> > > With this basic support, this can only boot into kernel directly.
> > > However this is just enough for the petitboot kernel and initradmdisk to
> > > boot from any possible source. Note this requires reasonably recent guest
> > > kernel with:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
> > > 
> > > This does not use the QEMU coding style for the firmware as using
> > > it for assemler is rather strange in the POWERPC world, according
> > > to POWERPC veterans; mixing styles in the firmware's .c and .s is
> > > weird too IMO.
> > > 
> > > Signed-off-by: Alexey Kardashevskiy 
> > > ---
> > > 
> > > The example command line is:
> > > 
> > > -c 0 /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
> > > -nodefaults \
> > > -chardev stdio,id=STDIO0,signal=off,mux=on \
> > > -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> > > -mon id=MON0,chardev=STDIO0,mode=readline \
> > > -nographic \
> > > -vga none \
> > > -enable-kvm \
> > > -m 2G \
> > > -machine 
> > > pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
> > >  \
> > > -kernel

[PATCH] net/slirp: Fix incorrect permissions on samba >= 2.0.5

2021-02-22 Thread Niklas Hambüchen

As the added commend and `man smb.conf` explain, starting
with that samba version, `force user` must be configured
in `[global]` in order to access the configured `smb_dir`.

This broke `-net user,smb=/path/to/folder`:

The `chdir` into e.g. `/run/user/0/qemu-smb.DCZ8Y0` failed.
In verbose logs, this manifested as:

[..., effective(65534, 65534), real(65534, 0)] 
/source3/smbd/service.c:159(chdir_current_service)
  chdir (/run/user/0) failed, reason: Permission denied

[..., effective(65534, 65534), real(65534, 0)] 
/source3/smbd/service.c:167(chdir_current_service)
  chdir (/run/user/0) failed, reason: Permission denied

[..., effective(65534, 65534), real(65534, 0)] 
/source3/smbd/uid.c:448(change_to_user_internal)
  change_to_user_internal: chdir_current_service() failed!

This commit fixes it by setting the `[global]` force user to
the user that owns the directories `smbd` needs to access.

Signed-off-by: Niklas Hambüchen 
---
 net/slirp.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/slirp.c b/net/slirp.c
index be914c0be0..82387bdb19 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -850,6 +850,11 @@ static int slirp_smb(SlirpState* s, const char 
*exported_dir,
 }
 fprintf(f,
 "[global]\n"
+"# In Samba 2.0.5 and above the 'force user' parameter\n"
+"# also causes the primary group of the forced user to be used\n"
+"# as the primary group for all file activity.\n"
+"# This includes the various directories set below.\n"
+"force user=%s\n"
 "private dir=%s\n"
 "interfaces=127.0.0.1\n"
 "bind interfaces only=yes\n"
@@ -871,6 +876,7 @@ static int slirp_smb(SlirpState* s, const char 
*exported_dir,
 "read only=no\n"
 "guest ok=yes\n"
 "force user=%s\n",
+passwd->pw_name,
 s->smb_dir,
 s->smb_dir,
 s->smb_dir,
-- 
2.25.4

Re: [PATCH qemu v13] spapr: Implement Open Firmware client interface

2021-02-22 Thread Alexey Kardashevskiy





On 23/02/2021 14:07, David Gibson wrote:

On Tue, Feb 09, 2021 at 10:02:52PM +1100, Alexey Kardashevskiy wrote:

The PAPR platform which describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.

The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..4000 - the initial firmware
1..18 - stack

This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735

This does not use the QEMU coding style for the firmware as using
it for assemler is rather strange in the POWERPC world, according
to POWERPC veterans; mixing styles in the firmware's .c and .s is
weird too IMO.

Signed-off-by: Alexey Kardashevskiy 
---

The example command line is:

-c 0 /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-enable-kvm \
-m 2G \
-machine 
pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
 \
-kernel pbuild/kernel-le-guest/vmlinux \
-initrd t/le.cpio \
-drive 
id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof/nvram.bin,format=raw \
-global spapr-nvram.drive=DRIVE0 \
-snapshot \
-smp 8,threads=8 \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors \
-chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
-mon chardev=SOCKET0,mode=control



---
Changes:
v13:
* rebase on latest ppc-for-6.0
* shuffled code around to touch spapr.c less

v12:
* split VOF and SPAPR

v11:
* added g_autofree
* fixed gcc warnings
* fixed few leaks
* added nvram image to make "nvram --print-config" not crash;
Note that contrary to  MIN_NVRAM_SIZE (8 * KiB), the actual minimum size
is 16K, or it just does not work (empty output from "nvram")

v10:
* now rebased to compile with meson

v9:
* remove special handling of /rtas/rtas-size as now we always add it in QEMU
* removed leftovers from scsi/grub/stdout/stdin/...

v8:
* no read/write/seek
* no @dev in instances
* the machine flag is "x-vof" for now

v7:
* now we have a small firmware which loads at 0 as SLOF and starts from
0x100 as SLOF
* no MBR/ELF/GRUB business in QEMU anymore
* blockdev is a separate patch
* networking is a separate patch

v6:
* borrowed a big chunk of commit log introduction from David
* fixed initial stack pointer (points to the highest address of stack)
* traces for "interpret" and others
* disabled  translate_kernel_address() hack so grub can load (work in
progress)
* added "milliseconds" for grub
* fixed "claim" allocator again
* moved FDT_MAX_SIZE to spapr.h as spapr_of_client.c wants it too for CAS
* moved the most code possible from spapr.c to spapr_of_client.c, such as

Re: Editing QEMU POWER Platform wiki page

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 06:18:08PM -0300, Leonardo Augusto Guimarães Garcia 
wrote:
> On 2/22/21 8:01 AM, Greg Kurz wrote:
> > On Thu, 18 Feb 2021 10:16:25 -0300
> > Leonardo Augusto Guimarães Garcia  wrote:
> > 
> > > Hi there,
> > > 
> > > I would like to edit the wiki page at [0] as it contains some outdated
> > > information. Could anyone that has access to the wiki please help me
> > > create a user so that I can edit it?
> > > 
> > > 0. https://wiki.qemu.org/Documentation/Platforms/POWER
> > > 
> > Hi Leo,
> > 
> > User creation isn't publicly available to avoid spam : only an existing
> > user can create a new account.
> 
> Yeah, I saw that. That's why I asked here.

The other concerns raised in this thread are valid, but those
notwithstanding, I think it makes sense to let you update the Wiki if
you have the time and inclination.

I have a wiki account, and I know who you are, so I've created an
account for you.  Credentials to follow via private communication.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH] virtio-gpu: Respect graphics update interval for EDID

2021-02-22 Thread Akihiko Odaki

2021年2月22日(月) 19:57 Gerd Hoffmann :
>
> On Sun, Feb 21, 2021 at 10:34:14PM +0900, Akihiko Odaki wrote:
> > This change introduces an additional member, refresh_rate to
> > qemu_edid_info in include/hw/display/edid.h.
> >
> > This change also isolates the graphics update interval from the
> > display update interval. The guest will update the frame buffer
> > in the graphics update interval, but displays can be updated in a
> > dynamic interval, for example to save update costs aggresively
> > (vnc) or to respond to user-generated events (sdl).
> > It stabilizes the graphics update interval and prevents the guest
> > from being confused.
>
> Hmm.  What problem you are trying to solve here?
>
> The update throttle being visible by the guest was done intentionally,
> so the guest can throttle the display updates too in case nobody is
> watching those display updated anyway.

Indeed, we are throttling the update for vnc to avoid some worthless
work. But typically a guest cannot respond to update interval changes
so often because real display devices the guest is designed for does
not change the update interval in that way. That is why we have to
tell the guest a stable update interval even if it results in wasted
frames.

Regards,
Akihiko Odaki

>
> take care,
>   Gerd
>

Re: [PATCH v2] ui/console: Pass placeholder surface to displays

2021-02-22 Thread Akihiko Odaki

2021年2月22日(月) 19:51 Gerd Hoffmann :
>
>   Hi,
>
> >  #define QEMU_ALLOCATED_FLAG 0x01
> > +#define QEMU_PLACEHOLDER_FLAG   0x02
>
> > +static inline int is_placeholder(DisplaySurface *surface)
> > +{
> > +return surface->flags & QEMU_PLACEHOLDER_FLAG;
> > +}
>
> Interesting idea.  That approach makes sense too.
>
> > +if (!placeholder) {
> > +placeholder = qemu_create_message_surface(640, 480, 
> > placeholder_msg);
> > +placeholder->flags |= QEMU_PLACEHOLDER_FLAG;
>
> I think we should set the placeholder flag in
> qemu_create_message_surface() because every surface created with that
> function is some kind if placeholder.
>
> Also when replacing an existing surface we should make the placeholder
> the same size, to avoid pointless ui window resizes.
>
> > -if (!new_surface) {
> > +if (is_placeholder(new_surface)) {
>
> We should check whenever this is the primary or a secondary window here
> and only destroy secondary windows.  qemu hiding all windows but
> continuing to run has great potential for user confusion ...
>
> > -if (!new_surface) {
> > +if (is_placeholder(new_surface)) {
>
> Same here.

The other surfaces created by qemu_create_message_surface() are not
considered as "placeholder" here, and have contents to be displayed.
Since no emulated devices give NULL to dpy_gfx_replace_surface for the
primary connection, it will never get the "placeholder", and its
window will be always shown.

Regards,
Akihiko Odaki

>
> take care,
>   Gerd
>

Re: [PATCH v3 6/6] hw/ppc: Add emulation of Genesi/bPlan Pegasos II

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 04:22:06PM +0100, BALATON Zoltan wrote:
> Add new machine called pegasos2 emulating the Genesi/bPlan Pegasos II,
> a PowerPC board based on the Marvell MV64361 system controller and the
> VIA VT8231 integrated south bridge/superio chips. It can run Linux,
> AmigaOS and a wide range of MorphOS versions. Currently a firmware ROM
> image is needed to boot and only MorphOS has a video driver to produce
> graphics output. Linux could work too but distros that supported this
> machine don't include usual video drivers so those only run with
> serial console for now.
> 
> Signed-off-by: BALATON Zoltan 
> ---
>  default-configs/devices/ppc-softmmu.mak |   2 +
>  hw/ppc/Kconfig  |  10 ++
>  hw/ppc/meson.build  |   2 +
>  hw/ppc/pegasos2.c   | 144 
>  4 files changed, 158 insertions(+)
>  create mode 100644 hw/ppc/pegasos2.c
> 
> diff --git a/default-configs/devices/ppc-softmmu.mak 
> b/default-configs/devices/ppc-softmmu.mak
> index 61b78b844d..4535993d8d 100644
> --- a/default-configs/devices/ppc-softmmu.mak
> +++ b/default-configs/devices/ppc-softmmu.mak
> @@ -14,5 +14,7 @@ CONFIG_SAM460EX=y
>  CONFIG_MAC_OLDWORLD=y
>  CONFIG_MAC_NEWWORLD=y
>  
> +CONFIG_PEGASOS2=y
> +
>  # For PReP
>  CONFIG_PREP=y
> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
> index d11dc30509..98d8dd1a84 100644
> --- a/hw/ppc/Kconfig
> +++ b/hw/ppc/Kconfig
> @@ -68,6 +68,16 @@ config SAM460EX
>  select USB_OHCI
>  select FDT_PPC
>  
> +config PEGASOS2
> +bool
> +select MV64361
> +select VT82C686
> +select IDE_VIA
> +select SMBUS_EEPROM
> +# These should come with VT82C686
> +select APM
> +select ACPI_X86
> +
>  config PREP
>  bool
>  imply PCI_DEVICES
> diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
> index 218631c883..86d6f379d1 100644
> --- a/hw/ppc/meson.build
> +++ b/hw/ppc/meson.build
> @@ -78,5 +78,7 @@ ppc_ss.add(when: 'CONFIG_E500', if_true: files(
>  ))
>  # PowerPC 440 Xilinx ML507 reference board.
>  ppc_ss.add(when: 'CONFIG_VIRTEX', if_true: files('virtex_ml507.c'))
> +# Pegasos2
> +ppc_ss.add(when: 'CONFIG_PEGASOS2', if_true: files('pegasos2.c'))
>  
>  hw_arch += {'ppc': ppc_ss}
> diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
> new file mode 100644
> index 00..8b96961c90
> --- /dev/null
> +++ b/hw/ppc/pegasos2.c
> @@ -0,0 +1,144 @@
> +/*
> + * QEMU PowerPC CHRP (Genesi/bPlan Pegasos II) hardware System Emulator
> + *
> + * Copyright (c) 2018-2020 BALATON Zoltan
> + *
> + * This work is licensed under the GNU GPL license version 2 or later.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "hw/hw.h"
> +#include "hw/ppc/ppc.h"
> +#include "hw/sysbus.h"
> +#include "hw/pci/pci_host.h"
> +#include "hw/irq.h"
> +#include "hw/pci-host/mv64361.h"
> +#include "hw/isa/vt82c686.h"
> +#include "hw/ide/pci.h"
> +#include "hw/i2c/smbus_eeprom.h"
> +#include "hw/qdev-properties.h"
> +#include "sysemu/reset.h"
> +#include "hw/boards.h"
> +#include "hw/loader.h"
> +#include "hw/fw-path-provider.h"
> +#include "elf.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +#include "sysemu/kvm.h"
> +#include "kvm_ppc.h"
> +#include "exec/address-spaces.h"
> +#include "trace.h"
> +#include "qemu/datadir.h"
> +#include "sysemu/device_tree.h"
> +
> +#define PROM_FILENAME "pegasos2.rom"
> +#define PROM_ADDR 0xfff0
> +#define PROM_SIZE 0x8
> +
> +#define BUS_FREQ 1
> +
> +static void pegasos2_reset(void *opaque)

I'd suggest pegasos2_cpu_reset() for clarity.  With the current name
I'd assume it was the machine reset function.

> +{
> +PowerPCCPU *cpu = opaque;
> +
> +cpu_reset(CPU(cpu));
> +cpu->env.spr[SPR_HID1] = 7ULL << 28;
> +}
> +
> +static void pegasos2_init(MachineState *machine)
> +{
> +PowerPCCPU *cpu = NULL;
> +MemoryRegion *rom = g_new(MemoryRegion, 1);
> +DeviceState *mv;
> +PCIBus *pci_bus;
> +PCIDevice *dev;
> +I2CBus *i2c_bus;
> +const char *fwname = machine->firmware ?: PROM_FILENAME;
> +char *filename;
> +int sz;
> +uint8_t *spd_data;
> +
> +/* init CPU */
> +cpu = POWERPC_CPU(cpu_create(machine->cpu_type));
> +if (PPC_INPUT(>env) != PPC_FLAGS_INPUT_6xx) {
> +error_report("Incompatible CPU, only 6xx bus supported");
> +exit(1);
> +}
> +
> +/* Set time-base frequency */
> +cpu_ppc_tb_init(>env, BUS_FREQ / 4);
> +qemu_register_reset(pegasos2_reset, cpu);
> +
> +/* RAM */
> +memory_region_add_subregion(get_system_memory(), 0, machine->ram);
> +
> +/* allocate and load firmware */
> +filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, fwname);
> +if (!filename) {
> +error_report("Could not find firmware '%s'", fwname);
> +exit(1);
> +}
> +memory_region_init_rom(rom, NULL, "pegasos2.rom", PROM_SIZE, 
> _fatal);
> +

Re: [PATCH v3 0/6] Pegasos2 emulation

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 04:22:06PM +0100, BALATON Zoltan wrote:
> Hello,
> 
> This is adding a new PPC board called pegasos2. More info on it can be
> found at:
> 
> https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2
> 
> Currently it needs a firmware ROM image that I cannot include due to
> original copyright holder (bPlan) did not release it under a free
> licence but I have plans to write a replacement in the future. With
> the original board firmware it can boot MorphOS now as:
> 
> qemu-system-ppc -M pegasos2 -cdrom morphos.iso -device ati-vga,romfile="" 
> -serial stdio
> 
> then enter "boot cd boot.img" at the firmware "ok" prompt as described
> in the MorphOS.readme. To boot Linux use same command line with e.g.
> -cdrom debian-8.11.0-powerpc-netinst.iso then enter
> "boot cd install/pegasos"
> 
> The last patch adds the actual board code after previous patches
> adding VT8231 and MV64361 system controller chip emulation. The
> mv643xx.h header file is taken from Linux and produces a bunch of
> checkpatch warnings due to different formatting rules it follows, I'm
> not sure we want to adopt it and change formatting or keep it as it
> is.

A couple of overall comments:

 * Adding yourself to MAINTAINERS for the new files would be a good
   idea
 * At least some rudimentary tests would be good, though I guess that
   might be tricky with non-free firmware

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v3 1/6] vt82c686: Implement control of serial port io ranges via config regs

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 04:22:06PM +0100, BALATON Zoltan wrote:
> In VIA super south bridge the io ranges of superio components
> (parallel and serial ports and FDC) can be controlled by superio
> config registers to set their base address and enable/disable them.
> This is not easy to implement in QEMU because ISA emulation is only
> designed to set io base address once on creating the device and io
> ranges are registered at creation and cannot easily be disabled or
> moved later.
> 
> In this patch we hack around that but only for serial ports because
> those have a single io range at port base that's relatively easy to
> handle and it's what guests actually use and set address different
> than the default.
> 
> We do not attempt to handle controlling the parallel and FDC regions
> because those have multiple io ranges so handling them would be messy
> and guests either don't change their deafult or don't care. We could
> even get away with disabling and not emulating them, but since they
> are already there, this patch leaves them mapped at their default
> address just in case this could be useful for a guest in the future.
> 
> Signed-off-by: BALATON Zoltan 

The maintainers of the hw/isa/vt82c686.c should probably be CCed on this.

> ---
>  hw/isa/vt82c686.c | 84 +--
>  1 file changed, 82 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
> index 5db9b1706c..98bd57a074 100644
> --- a/hw/isa/vt82c686.c
> +++ b/hw/isa/vt82c686.c
> @@ -252,8 +252,24 @@ static const TypeInfo vt8231_pm_info = {
>  typedef struct SuperIOConfig {
>  uint8_t regs[0x100];
>  MemoryRegion io;
> +ISASuperIODevice *superio;
> +MemoryRegion *serial_io[SUPERIO_MAX_SERIAL_PORTS];
>  } SuperIOConfig;
>  
> +static MemoryRegion *find_subregion(ISADevice *d, MemoryRegion *parent,
> +int offs)
> +{
> +MemoryRegion *subregion, *mr = NULL;
> +
> +QTAILQ_FOREACH(subregion, >subregions, subregions_link) {
> +if (subregion->addr == offs) {
> +mr = subregion;
> +break;
> +}
> +}
> +return mr;
> +}
> +
>  static void superio_cfg_write(void *opaque, hwaddr addr, uint64_t data,
>unsigned size)
>  {
> @@ -279,7 +295,53 @@ static void superio_cfg_write(void *opaque, hwaddr addr, 
> uint64_t data,
>  case 0xfd ... 0xff:
>  /* ignore write to read only registers */
>  return;
> -/* case 0xe6 ... 0xe8: Should set base port of parallel and serial */
> +case 0xe2:
> +{
> +data &= 0x1f;
> +if (data & BIT(2)) { /* Serial port 1 enable */
> +ISADevice *dev = sc->superio->serial[0];
> +if (!memory_region_is_mapped(sc->serial_io[0])) {
> +memory_region_add_subregion(isa_address_space_io(dev),
> +dev->ioport_id, 
> sc->serial_io[0]);
> +}
> +} else {
> +MemoryRegion *io = isa_address_space_io(sc->superio->serial[0]);
> +if (memory_region_is_mapped(sc->serial_io[0])) {
> +memory_region_del_subregion(io, sc->serial_io[0]);
> +}
> +}
> +if (data & BIT(3)) { /* Serial port 2 enable */
> +ISADevice *dev = sc->superio->serial[1];
> +if (!memory_region_is_mapped(sc->serial_io[1])) {
> +memory_region_add_subregion(isa_address_space_io(dev),
> +dev->ioport_id, 
> sc->serial_io[1]);
> +}
> +} else {
> +MemoryRegion *io = isa_address_space_io(sc->superio->serial[1]);
> +if (memory_region_is_mapped(sc->serial_io[1])) {
> +memory_region_del_subregion(io, sc->serial_io[1]);
> +}
> +}
> +break;
> +}
> +case 0xe7: /* Serial port 1 io base address */
> +{
> +data &= 0xfe;
> +sc->superio->serial[0]->ioport_id = data << 2;
> +if (memory_region_is_mapped(sc->serial_io[0])) {
> +memory_region_set_address(sc->serial_io[0], data << 2);
> +}
> +break;
> +}
> +case 0xe8: /* Serial port 2 io base address */
> +{
> +data &= 0xfe;
> +sc->superio->serial[1]->ioport_id = data << 2;
> +if (memory_region_is_mapped(sc->serial_io[1])) {
> +memory_region_set_address(sc->serial_io[1], data << 2);
> +}
> +break;
> +}
>  default:
>  qemu_log_mask(LOG_UNIMP,
>"via_superio_cfg: unimplemented register 0x%x\n", idx);
> @@ -385,6 +447,7 @@ static void vt82c686b_realize(PCIDevice *d, Error **errp)
>  DeviceState *dev = DEVICE(d);
>  ISABus *isa_bus;
>  qemu_irq *isa_irq;
> +ISASuperIOClass *ic;
>  int i;
>  
>  qdev_init_gpio_out(dev, >cpu_intr, 1);
> @@ -394,7 +457,9 @@ static void vt82c686b_realize(PCIDevice

Re: [PATCH v3 5/6] hw/pci-host: Add emulation of Marvell MV64361 PPC system controller

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 04:22:06PM +0100, BALATON Zoltan wrote:
> The Marvell Discovery II aka. MV64361 is a PowerPC system controller
> chip that is used on the pegasos2 PPC board. This adds emulation of it
> that models the device enough to boot guests on this board. The
> mv643xx.h header with register definitions is taken from Linux 4.15.10
> only fixing end of line white space errors and removing not needed
> parts, it's otherwise keeps Linux formatting.
> 
> Signed-off-by: BALATON Zoltan 

This needs to go before the previous patch to avoid bisect breakage,
doesn't it?

> ---
>  hw/pci-host/Kconfig   |   3 +
>  hw/pci-host/meson.build   |   2 +
>  hw/pci-host/mv64361.c | 966 ++
>  hw/pci-host/mv643xx.h | 919 
>  hw/pci-host/trace-events  |   6 +
>  include/hw/pci-host/mv64361.h |   8 +
>  include/hw/pci/pci_ids.h  |   1 +
>  7 files changed, 1905 insertions(+)
>  create mode 100644 hw/pci-host/mv64361.c
>  create mode 100644 hw/pci-host/mv643xx.h
>  create mode 100644 include/hw/pci-host/mv64361.h
> 
> diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
> index 8b8c763c28..65a983d6fd 100644
> --- a/hw/pci-host/Kconfig
> +++ b/hw/pci-host/Kconfig
> @@ -68,3 +68,6 @@ config PCI_POWERNV
>  
>  config REMOTE_PCIHOST
>  bool
> +
> +config MV64361
> +bool
> diff --git a/hw/pci-host/meson.build b/hw/pci-host/meson.build
> index 1847c69905..3f9e716cfa 100644
> --- a/hw/pci-host/meson.build
> +++ b/hw/pci-host/meson.build
> @@ -18,6 +18,8 @@ pci_ss.add(when: 'CONFIG_GRACKLE_PCI', if_true: 
> files('grackle.c'))
>  pci_ss.add(when: 'CONFIG_UNIN_PCI', if_true: files('uninorth.c'))
>  # PowerPC E500 boards
>  pci_ss.add(when: 'CONFIG_PPCE500_PCI', if_true: files('ppce500.c'))
> +# Pegasos2
> +pci_ss.add(when: 'CONFIG_MV64361', if_true: files('mv64361.c'))
>  
>  # ARM devices
>  pci_ss.add(when: 'CONFIG_VERSATILE_PCI', if_true: files('versatile.c'))
> diff --git a/hw/pci-host/mv64361.c b/hw/pci-host/mv64361.c
> new file mode 100644
> index 00..d71402f8b5
> --- /dev/null
> +++ b/hw/pci-host/mv64361.c
> @@ -0,0 +1,966 @@
> +/*
> + * Marvell Discovery II MV64361 System Controller for
> + * QEMU PowerPC CHRP (Genesi/bPlan Pegasos II) hardware System Emulator
> + *
> + * Copyright (c) 2018-2020 BALATON Zoltan
> + *
> + * This work is licensed under the GNU GPL license version 2 or later.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "hw/hw.h"
> +#include "hw/sysbus.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/pci_host.h"
> +#include "hw/irq.h"
> +#include "hw/intc/i8259.h"
> +#include "hw/qdev-properties.h"
> +#include "exec/address-spaces.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +#include "trace.h"
> +#include "hw/pci-host/mv64361.h"
> +#include "mv643xx.h"
> +
> +#define TYPE_MV64361_PCI_BRIDGE "mv64361-pcibridge"
> +
> +static void mv64361_pcibridge_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> +k->vendor_id = PCI_VENDOR_ID_MARVELL;
> +k->device_id = PCI_DEVICE_ID_MARVELL_MV6436X;
> +k->class_id = PCI_CLASS_BRIDGE_HOST;
> +/*
> + * PCI-facing part of the host bridge,
> + * not usable without the host-facing part
> + */
> +dc->user_creatable = false;
> +}
> +
> +static const TypeInfo mv64361_pcibridge_info = {
> +.name  = TYPE_MV64361_PCI_BRIDGE,
> +.parent= TYPE_PCI_DEVICE,
> +.instance_size = sizeof(PCIDevice),
> +.class_init= mv64361_pcibridge_class_init,
> +.interfaces = (InterfaceInfo[]) {
> +{ INTERFACE_CONVENTIONAL_PCI_DEVICE },
> +{ },
> +},
> +};
> +
> +
> +#define TYPE_MV64361_PCI "mv64361-pcihost"
> +OBJECT_DECLARE_SIMPLE_TYPE(MV64361PCIState, MV64361_PCI)
> +
> +struct MV64361PCIState {
> +PCIHostState parent_obj;
> +
> +uint8_t index;
> +MemoryRegion io;
> +MemoryRegion mem;
> +qemu_irq irq[PCI_NUM_PINS];
> +
> +uint32_t io_base;
> +uint32_t io_size;
> +uint32_t mem_base[4];
> +uint32_t mem_size[4];
> +uint64_t remap[5];
> +};
> +
> +static int mv64361_pcihost_map_irq(PCIDevice *pci_dev, int n)
> +{
> +return (n + PCI_SLOT(pci_dev->devfn)) % PCI_NUM_PINS;
> +}
> +
> +static void mv64361_pcihost_set_irq(void *opaque, int n, int level)
> +{
> +MV64361PCIState *s = opaque;
> +qemu_set_irq(s->irq[n], level);
> +}
> +
> +static void mv64361_pcihost_realize(DeviceState *dev, Error **errp)
> +{
> +MV64361PCIState *s = MV64361_PCI(dev);
> +PCIHostState *h = PCI_HOST_BRIDGE(dev);
> +char *name;
> +
> +name = g_strdup_printf("pci%d-io", s->index);
> +memory_region_init(>io, OBJECT(dev), name, 0x1);
> +g_free(name);
> +name = g_strdup_printf("pci%d-mem", s->index);
> +memory_region_init(>mem,

Re: [PATCH v4 0/5] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 04:45:26PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This new version contains fixes proposed during the review of v3.
> Patches were rebased on top of David's ppc-for-6.0.

Applied to ppc-for-6.0.

> 
> 
> changes from v3:
> - former patch 1: already pushed to ppc-for-6.0
> - former patch 2: dropped
> - all patches: commit message trimmed to < 76 chars per line
> - all patches: added R-bs from previous review
> - patch 3:
> * removed the migratable state of the unplug timer
> * added a 'spapr_drc_start_unplug_timeout_timer()' helper to start the 
> timer
> * added a .post_load implementation to vmstate_spapr_drc, pointed to
>   a new spapr_drc_post_load() function
> * spapr_drc_post_load() starts the DRC unplug timer from the beginning 
> using
>   spapr_drc_start_unplug_timeout_timer()
> 
> - patch 4:
> * use spapr_drc_start_unplug_timeout_timer() to start the timer in 
> spapr_drc_unplug_request()
> (To David: I kept your Reviewed-by in this patch despite this change - 
> feel free
>  to review it again)
> 
> - patch 5:
> * removed the 'DIMM' wording when referring to kernel internals
> * move the g_assert() to spapr_clear_pending_dimm_unplug_state()
> * do not g_assert(dev), but g_assert(ds) if dev != NULL inside
>   spapr_clear_pending_dimm_unplug_state()
> 
> - v3 link: https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg04196.html
> 
> Daniel Henrique Barboza (5):
>   spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
>   spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
>   spapr_drc.c: introduce unplug_timeout_timer
>   spapr_drc.c: add hotunplug timeout for CPUs
>   spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
> 
>  hw/ppc/spapr.c | 53 ++--
>  hw/ppc/spapr_drc.c | 99 +++---
>  hw/ppc/spapr_pci.c |  4 +-
>  hw/ppc/trace-events|  2 +-
>  include/hw/ppc/spapr.h |  2 +
>  include/hw/ppc/spapr_drc.h |  7 ++-
>  6 files changed, 142 insertions(+), 25 deletions(-)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[PATCH v2] configure: Improve OpenGL dependency detections

2021-02-22 Thread Akihiko Odaki

This has the following visible changes:

- GBM is required only for OpenGL dma-buf.
- X11 is explicitly required by gtk-egl.
- EGL is now mandatory for the OpenGL displays.

The last one needs some detailed description. Before this change,
EGL was tested only for OpenGL dma-buf with the check of
EGL_MESA_image_dma_buf_export. However, all of the OpenGL
displays depend on EGL and EGL_MESA_image_dma_buf_export is always
defined by epoxy's EGL interface.
Therefore, it makes more sense to always check the presence of EGL
and say the OpenGL displays are available along with OpenGL dma-buf
if it is present.

Signed-off-by: Akihiko Odaki 
---
 configure| 36 +++-
 docs/interop/vhost-user.json |  3 ++-
 include/ui/egl-helpers.h |  9 -
 include/ui/spice-display.h   |  2 +-
 meson.build  |  2 +-
 ui/egl-helpers.c |  8 ++--
 ui/gtk-egl.c |  6 +++---
 ui/gtk.c | 14 ++
 ui/meson.build   |  8 +---
 9 files changed, 55 insertions(+), 33 deletions(-)

diff --git a/configure b/configure
index 9f016b06b54..bbcb9436827 100755
--- a/configure
+++ b/configure
@@ -3576,14 +3576,24 @@ if $pkg_config gbm; then
 fi
 
 if test "$opengl" != "no" ; then
-  opengl_pkgs="epoxy gbm"
-  if $pkg_config $opengl_pkgs; then
-opengl_cflags="$($pkg_config --cflags $opengl_pkgs)"
-opengl_libs="$($pkg_config --libs $opengl_pkgs)"
+  epoxy=no
+  if $pkg_config epoxy; then
+cat > $TMPC << EOF
+#include 
+int main(void) { return 0; }
+EOF
+if compile_prog "" "" ; then
+  epoxy=yes
+fi
+  fi
+
+  if test "$epoxy" = "yes" ; then
+opengl_cflags="$($pkg_config --cflags epoxy)"
+opengl_libs="$($pkg_config --libs epoxy)"
 opengl=yes
   else
 if test "$opengl" = "yes" ; then
-  feature_not_found "opengl" "Please install opengl (mesa) devel pkgs: 
$opengl_pkgs"
+  feature_not_found "opengl" "Please install epoxy with EGL"
 fi
 opengl_cflags=""
 opengl_libs=""
@@ -3591,19 +3601,6 @@ if test "$opengl" != "no" ; then
   fi
 fi
 
-if test "$opengl" = "yes"; then
-  cat > $TMPC << EOF
-#include 
-#ifndef EGL_MESA_image_dma_buf_export
-# error mesa/epoxy lacks support for dmabufs (mesa 10.6+)
-#endif
-int main(void) { return 0; }
-EOF
-  if compile_prog "" "" ; then
-opengl_dmabuf=yes
-  fi
-fi
-
 ##
 # libxml2 probe
 if test "$libxml2" != "no" ; then
@@ -5883,9 +5880,6 @@ if test "$opengl" = "yes" ; then
   echo "CONFIG_OPENGL=y" >> $config_host_mak
   echo "OPENGL_CFLAGS=$opengl_cflags" >> $config_host_mak
   echo "OPENGL_LIBS=$opengl_libs" >> $config_host_mak
-  if test "$opengl_dmabuf" = "yes" ; then
-echo "CONFIG_OPENGL_DMABUF=y" >> $config_host_mak
-  fi
 fi
 
 if test "$gbm" = "yes" ; then
diff --git a/docs/interop/vhost-user.json b/docs/interop/vhost-user.json
index feb5fe58cad..b6ade9e4931 100644
--- a/docs/interop/vhost-user.json
+++ b/docs/interop/vhost-user.json
@@ -250,7 +250,8 @@
 #   "type": "gpu",
 #   "binary": "/usr/libexec/qemu/vhost-user-gpu",
 #   "tags": [
-# "CONFIG_OPENGL_DMABUF=y"
+# "CONFIG_OPENGL=y",
+# "CONFIG_GBM=y"
 #   ]
 # }
 #
diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
index 94a4b3e6f3b..c64e048eb54 100644
--- a/include/ui/egl-helpers.h
+++ b/include/ui/egl-helpers.h
@@ -3,7 +3,9 @@
 
 #include 
 #include 
+#ifdef CONFIG_GBM
 #include 
+#endif
 #include "ui/console.h"
 #include "ui/shader.h"
 
@@ -31,7 +33,7 @@ void egl_texture_blit(QemuGLShader *gls, egl_fb *dst, egl_fb 
*src, bool flip);
 void egl_texture_blend(QemuGLShader *gls, egl_fb *dst, egl_fb *src, bool flip,
int x, int y, double scale_x, double scale_y);
 
-#ifdef CONFIG_OPENGL_DMABUF
+#ifdef CONFIG_GBM
 
 extern int qemu_egl_rn_fd;
 extern struct gbm_device *qemu_egl_rn_gbm_dev;
@@ -48,8 +50,13 @@ void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf);
 
 EGLSurface qemu_egl_init_surface_x11(EGLContext ectx, EGLNativeWindowType win);
 
+#if defined(CONFIG_X11) || defined(CONFIG_GBM)
+
 int qemu_egl_init_dpy_x11(EGLNativeDisplayType dpy, DisplayGLMode mode);
 int qemu_egl_init_dpy_mesa(EGLNativeDisplayType dpy, DisplayGLMode mode);
+
+#endif
+
 EGLContext qemu_egl_init_ctx(void);
 
 #endif /* EGL_HELPERS_H */
diff --git a/include/ui/spice-display.h b/include/ui/spice-display.h
index 4a47ffdd4c8..ed298d58f06 100644
--- a/include/ui/spice-display.h
+++ b/include/ui/spice-display.h
@@ -27,7 +27,7 @@
 #include "ui/qemu-pixman.h"
 #include "ui/console.h"
 
-#if defined(CONFIG_OPENGL_DMABUF)
+#if defined(CONFIG_OPENGL) && defined(CONFIG_GBM)
 # if SPICE_SERVER_VERSION >= 0x000d01 /* release 0.13.1 */
 #  define HAVE_SPICE_GL 1
 #  include "ui/egl-helpers.h"
diff --git a/meson.build b/meson.build
index 8302fcbd903..ae5be44898f 100644
--- a/meson.build
+++ b/meson.build
@@ -2477,7 +2477,7 @@ summary_info += {'U2F support':   u2f.found()}
 summary_info +=

Re: [RFC PATCH 02/23] kvm: Switch KVM_CAP_READONLY_MEM to a per-VM ioctl()

2021-02-22 Thread Isaku Yamahata

On Tue, Feb 16, 2021 at 08:56:45AM +0100,
Philippe Mathieu-Daudé  wrote:

> Hi Isaku,
> 
> On 2/16/21 3:12 AM, Isaku Yamahata wrote:
> > Switch to making a VM ioctl() call for KVM_CAP_READONLY_MEM, which may
> > be conditional on VM type in recent versions of KVM, e.g. when TDX is
> > supported.
> > 
> > Signed-off-by: Isaku Yamahata 
> > ---
> >  accel/kvm/kvm-all.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index 47516913b7..351c25a5cb 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -2164,7 +2164,7 @@ static int kvm_init(MachineState *ms)
> >  }
> >  
> >  kvm_readonly_mem_allowed =
> > -(kvm_check_extension(s, KVM_CAP_READONLY_MEM) > 0);
> > +(kvm_vm_check_extension(s, KVM_CAP_READONLY_MEM) > 0);
> 
> Can this check with "recent KVM" be a problem with older ones?
> 
> Maybe for backward compatibility we need:
> 
>   = (kvm_vm_check_extension(s, KVM_CAP_READONLY_MEM) > 0) ||
> (kvm_check_extension(s, KVM_CAP_READONLY_MEM) > 0);

Agreed. That's safer and it's difficult to check the very old version of kenel
and non-x86 arch.

Thanks,

-- 
Isaku Yamahata

Re: [PATCH v2 01/11] accel/kvm: Check MachineClass kvm_type() return value

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 06:50:44PM +0100, Cornelia Huck wrote:
> On Mon, 22 Feb 2021 18:41:07 +0100
> Philippe Mathieu-Daudé  wrote:
> 
> > On 2/22/21 6:24 PM, Cornelia Huck wrote:
> > > On Fri, 19 Feb 2021 18:38:37 +0100
> > > Philippe Mathieu-Daudé  wrote:
> > >   
> > >> MachineClass::kvm_type() can return -1 on failure.
> > >> Document it, and add a check in kvm_init().
> > >>
> > >> Signed-off-by: Philippe Mathieu-Daudé 
> > >> ---
> > >>  include/hw/boards.h | 3 ++-
> > >>  accel/kvm/kvm-all.c | 6 ++
> > >>  2 files changed, 8 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/include/hw/boards.h b/include/hw/boards.h
> > >> index a46dfe5d1a6..68d3d10f6b0 100644
> > >> --- a/include/hw/boards.h
> > >> +++ b/include/hw/boards.h
> > >> @@ -127,7 +127,8 @@ typedef struct {
> > >>   *implement and a stub device is required.
> > >>   * @kvm_type:
> > >>   *Return the type of KVM corresponding to the kvm-type string 
> > >> option or
> > >> - *computed based on other criteria such as the host kernel 
> > >> capabilities.
> > >> + *computed based on other criteria such as the host kernel 
> > >> capabilities
> > >> + *(which can't be negative), or -1 on error.
> > >>   * @numa_mem_supported:
> > >>   *true if '--numa node.mem' option is supported and false otherwise
> > >>   * @smp_parse:
> > >> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > >> index 84c943fcdb2..b069938d881 100644
> > >> --- a/accel/kvm/kvm-all.c
> > >> +++ b/accel/kvm/kvm-all.c
> > >> @@ -2057,6 +2057,12 @@ static int kvm_init(MachineState *ms)
> > >>  "kvm-type",
> > >>  
> > >> _abort);
> > >>  type = mc->kvm_type(ms, kvm_type);
> > >> +if (type < 0) {
> > >> +ret = -EINVAL;
> > >> +fprintf(stderr, "Failed to detect kvm-type for machine 
> > >> '%s'\n",
> > >> +mc->name);
> > >> +goto err;
> > >> +}
> > >>  }
> > >>  
> > >>  do {  
> > > 
> > > No objection to this patch; but I'm wondering why some non-pseries
> > > machines implement the kvm_type callback, when I see the kvm-type
> > > property only for pseries? Am I holding my git grep wrong?  
> > 
> > Can it be what David commented here?
> > https://www.mail-archive.com/qemu-devel@nongnu.org/msg784508.html
> > 
> 
> Ok, I might be confused about the other ppc machines; but I'm wondering
> about the kvm_type callback for mips and arm/virt. Maybe I'm just
> confused by the whole mechanism?

For ppc at least, not sure about in general, pseries is the only
machine type that can possibly work under more than one KVM flavour
(HV or PR).  So, it's the only one where it's actually useful to be
able to configure this.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH qemu v13] spapr: Implement Open Firmware client interface

2021-02-22 Thread David Gibson

On Tue, Feb 09, 2021 at 10:02:52PM +1100, Alexey Kardashevskiy wrote:
> The PAPR platform which describes an OS environment that's presented by
> a combination of a hypervisor and firmware. The features it specifies
> require collaboration between the firmware and the hypervisor.
> 
> Since the beginning, the runtime component of the firmware (RTAS) has
> been implemented as a 20 byte shim which simply forwards it to
> a hypercall implemented in qemu. The boot time firmware component is
> SLOF - but a build that's specific to qemu, and has always needed to be
> updated in sync with it. Even though we've managed to limit the amount
> of runtime communication we need between qemu and SLOF, there's some,
> and it has become increasingly awkward to handle as we've implemented
> new features.
> 
> This implements a boot time OF client interface (CI) which is
> enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
> which implements Open Firmware Client Interface (OF CI). This allows
> using a smaller stateless firmware which does not have to manage
> the device tree.
> 
> The new "vof.bin" firmware image is included with source code under
> pc-bios/. It also includes RTAS blob.
> 
> This implements a handful of CI methods just to get -kernel/-initrd
> working. In particular, this implements the device tree fetching and
> simple memory allocator - "claim" (an OF CI memory allocator) and updates
> "/memory@0/available" to report the client about available memory.
> 
> This implements changing some device tree properties which we know how
> to deal with, the rest is ignored. To allow changes, this skips
> fdt_pack() when x-vof=on as not packing the blob leaves some room for
> appending.
> 
> In absence of SLOF, this assigns phandles to device tree nodes to make
> device tree traversing work.
> 
> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
> 
> This adds basic instances support which are managed by a hash map
> ihandle -> [phandle].
> 
> Before the guest started, the used memory is:
> 0..4000 - the initial firmware
> 1..18 - stack
> 
> This OF CI does not implement "interpret".
> 
> Unlike SLOF, this does not format uninitialized nvram. Instead, this
> includes a disk image with pre-formatted nvram.
> 
> With this basic support, this can only boot into kernel directly.
> However this is just enough for the petitboot kernel and initradmdisk to
> boot from any possible source. Note this requires reasonably recent guest
> kernel with:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
> 
> This does not use the QEMU coding style for the firmware as using
> it for assemler is rather strange in the POWERPC world, according
> to POWERPC veterans; mixing styles in the firmware's .c and .s is
> weird too IMO.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> The example command line is:
> 
> -c 0 /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
> -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> -mon id=MON0,chardev=STDIO0,mode=readline \
> -nographic \
> -vga none \
> -enable-kvm \
> -m 2G \
> -machine 
> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
>  \
> -kernel pbuild/kernel-le-guest/vmlinux \
> -initrd t/le.cpio \
> -drive 
> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof/nvram.bin,format=raw \
> -global spapr-nvram.drive=DRIVE0 \
> -snapshot \
> -smp 8,threads=8 \
> -L /home/aik/t/qemu-ppc64-bios/ \
> -trace events=qemu_trace_events \
> -d guest_errors \
> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
> -mon chardev=SOCKET0,mode=control
> 
> 
> 
> ---
> Changes:
> v13:
> * rebase on latest ppc-for-6.0
> * shuffled code around to touch spapr.c less
> 
> v12:
> * split VOF and SPAPR
> 
> v11:
> * added g_autofree
> * fixed gcc warnings
> * fixed few leaks
> * added nvram image to make "nvram --print-config" not crash;
> Note that contrary to  MIN_NVRAM_SIZE (8 * KiB), the actual minimum size
> is 16K, or it just does not work (empty output from "nvram")
> 
> v10:
> * now rebased to compile with meson
> 
> v9:
> * remove special handling of /rtas/rtas-size as now we always add it in QEMU
> * removed leftovers from scsi/grub/stdout/stdin/...
> 
> v8:
> * no read/write/seek
> * no @dev in instances
> * the machine flag is "x-vof" for now
> 
> v7:
> * now we have a small firmware which loads at 0 as SLOF and starts from
> 0x100 as SLOF
> * no MBR/ELF/GRUB business in QEMU anymore
> * blockdev is a separate patch
> * networking is a separate patch
> 
> v6:
> * borrowed a big chunk of commit log introduction from David
> * fixed initial stack pointer (points to the highest address of stack)
> * traces for "interpret" and others
> * disabled  translate_kernel_address() hack so grub can load (work in
>

Re: [PATCH v4 2/5] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 04:45:28PM -0300, Daniel Henrique Barboza wrote:
> spapr_drc_detach() is not the best name for what the function does. The
> function does not detach the DRC, it makes an uncommited attempt to do
> it.  It'll mark the DRC as pending unplug, via the 'unplug_request'
> flag, and only if the DRC state is drck->empty_state it will detach the
> DRC, via spapr_drc_release().
> 
> This is a contrast with its pair spapr_drc_attach(), where the function
> is indeed creating the DRC QOM object. If you know what
> spapr_drc_attach() does, you can be misled into thinking that
> spapr_drc_detach() is removing the DRC from QEMU internal state, which
> isn't true.
> 
> The current role of this function is better described as a request for
> detach, since there's no guarantee that we're going to detach the DRC in
> the end.  Rename the function to spapr_drc_unplug_request to reflect
> what is is doing.
> 
> The initial idea was to change the name to spapr_drc_detach_request(),
> and later on change the unplug_request flag to detach_request. However,
> unplug_request is a migratable boolean for a long time now and renaming
> it is not worth the trouble. spapr_drc_unplug_request() setting
> drc->unplug_request is more natural than spapr_drc_detach_request
> setting drc->unplug_request.
> 
> Reviewed-by: Greg Kurz 
> Reviewed-by: David Gibson 
> Signed-off-by: Daniel Henrique Barboza 

Applied to ppc-for-6.0, thanks.

> ---
>  hw/ppc/spapr.c | 6 +++---
>  hw/ppc/spapr_drc.c | 4 ++--
>  hw/ppc/spapr_pci.c | 4 ++--
>  hw/ppc/trace-events| 2 +-
>  include/hw/ppc/spapr_drc.h | 2 +-
>  5 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 85fe65f894..b066df68cb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3654,7 +3654,7 @@ static void spapr_memory_unplug_request(HotplugHandler 
> *hotplug_dev,
>addr / SPAPR_MEMORY_BLOCK_SIZE);
>  g_assert(drc);
>  
> -spapr_drc_detach(drc);
> +spapr_drc_unplug_request(drc);
>  addr += SPAPR_MEMORY_BLOCK_SIZE;
>  }
>  
> @@ -3722,7 +3722,7 @@ void spapr_core_unplug_request(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  g_assert(drc);
>  
>  if (!spapr_drc_unplug_requested(drc)) {
> -spapr_drc_detach(drc);
> +spapr_drc_unplug_request(drc);
>  spapr_hotplug_req_remove_by_index(drc);
>  }
>  }
> @@ -3985,7 +3985,7 @@ static void spapr_phb_unplug_request(HotplugHandler 
> *hotplug_dev,
>  assert(drc);
>  
>  if (!spapr_drc_unplug_requested(drc)) {
> -spapr_drc_detach(drc);
> +spapr_drc_unplug_request(drc);
>  spapr_hotplug_req_remove_by_index(drc);
>  }
>  }
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 555a25517d..67041fb212 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -386,11 +386,11 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
>   NULL, 0);
>  }
>  
> -void spapr_drc_detach(SpaprDrc *drc)
> +void spapr_drc_unplug_request(SpaprDrc *drc)
>  {
>  SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  
> -trace_spapr_drc_detach(spapr_drc_index(drc));
> +trace_spapr_drc_unplug_request(spapr_drc_index(drc));
>  
>  g_assert(drc->dev);
>  
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index f1c7479816..b00e9609ae 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1723,12 +1723,12 @@ static void spapr_pci_unplug_request(HotplugHandler 
> *plug_handler,
>   * functions, even if their unplug weren't requested
>   * beforehand.
>   */
> -spapr_drc_detach(func_drc);
> +spapr_drc_unplug_request(func_drc);
>  }
>  }
>  }
>  
> -spapr_drc_detach(drc);
> +spapr_drc_unplug_request(drc);
>  
>  /* if this isn't func 0, defer unplug event. otherwise signal removal
>   * for all present functions
> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
> index 1e91984526..b4bbfbb013 100644
> --- a/hw/ppc/trace-events
> +++ b/hw/ppc/trace-events
> @@ -50,7 +50,7 @@ spapr_drc_set_allocation_state(uint32_t index, int state) 
> "drc: 0x%"PRIx32", sta
>  spapr_drc_set_allocation_state_finalizing(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_set_configured(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_attach(uint32_t index) "drc: 0x%"PRIx32
> -spapr_drc_detach(uint32_t index) "drc: 0x%"PRIx32
> +spapr_drc_unplug_request(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_awaiting_quiesce(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_reset(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_realize(uint32_t index) "drc: 0x%"PRIx32
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 8982927d5c..02a63b3666 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++

Re: [PATCH] target/ppc: Fix bcdsub. emulation when result overflows

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 04:40:35PM -0300, Fabiano Rosas wrote:
65;6203;1c> The commit d03b174a83 (target/ppc: simplify bcdadd/sub functions)
> meant to simplify some of the code but it inadvertently altered the
> way the CR6 field is set after the operation has overflowed.
> 
> The CR6 bits are set based on the *unbounded* result of the operation,
> so we need to look at the result before returning from bcd_add_mag,
> otherwise we will look at 0 when it overflows.
> 
> Consider the following subtraction:
> 
> v0 = 0x999c (maximum positive BCD value)
> v1 = 0x001d (negative one BCD value)
> bcdsub. v0,v0,v1,0
> 
> The Power ISA 2.07B says:
> If the unbounded result is greater than zero, do the following.
>   If PS=0, the sign code of the result is set to 0b1100.
>   If PS=1, the sign code of the result is set to 0b.
>   If the operation overflows, CR field 6 is set to 0b0101. Otherwise,
>   CR field 6 is set to 0b0100.
> 
> POWER9 hardware:
> vr0 = 0x000c (positive zero BCD value)
> cr6 = 0b0101 (0x5) (positive, overflow)
> 
> QEMU:
> vr0 = 0x000c (positive zero BCD value)
> cr6 = 0b0011 (0x3) (zero, overflow) <--- wrong
> 
> This patch reverts the part of d03b174a83 that introduced the
> problem and adds a test-case to avoid further regressions:
> 
> before:
> $ make run-tcg-tests-ppc64le-linux-user
> (...)
>   TESTbcdsub on ppc64le
> bcdsub: qemu/tests/tcg/ppc64le/bcdsub.c:58: test_bcdsub_gt:
> Assertion `(cr >> 4) == ((1 << 2) | (1 << 0))' failed.
> 
> Fixes: d03b174a83 (target/ppc: simplify bcdadd/sub functions)
> Reported-by: Paul Clarke 
> Signed-off-by: Fabiano Rosas 

Applied to ppc-for-6.0, thanks.

> ---
>  target/ppc/int_helper.c   |  13 ++-
>  tests/tcg/configure.sh|   6 ++
>  tests/tcg/ppc64/Makefile.target   |  13 +++
>  tests/tcg/ppc64le/Makefile.target |  12 +++
>  tests/tcg/ppc64le/bcdsub.c| 130 ++
>  5 files changed, 171 insertions(+), 3 deletions(-)
>  create mode 100644 tests/tcg/ppc64/Makefile.target
>  create mode 100644 tests/tcg/ppc64le/Makefile.target
>  create mode 100644 tests/tcg/ppc64le/bcdsub.c
> 
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 0b682a1f94..429de28494 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -2175,14 +2175,17 @@ static int bcd_cmp_mag(ppc_avr_t *a, ppc_avr_t *b)
>  return 0;
>  }
>  
> -static void bcd_add_mag(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b, int 
> *invalid,
> +static int bcd_add_mag(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b, int 
> *invalid,
> int *overflow)
>  {
>  int carry = 0;
>  int i;
> +int is_zero = 1;
> +
>  for (i = 1; i <= 31; i++) {
>  uint8_t digit = bcd_get_digit(a, i, invalid) +
>  bcd_get_digit(b, i, invalid) + carry;
> +is_zero &= (digit == 0);
>  if (digit > 9) {
>  carry = 1;
>  digit -= 10;
> @@ -2194,6 +2197,7 @@ static void bcd_add_mag(ppc_avr_t *t, ppc_avr_t *a, 
> ppc_avr_t *b, int *invalid,
>  }
>  
>  *overflow = carry;
> +return is_zero;
>  }
>  
>  static void bcd_sub_mag(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b, int 
> *invalid,
> @@ -2225,14 +2229,15 @@ uint32_t helper_bcdadd(ppc_avr_t *r,  ppc_avr_t *a, 
> ppc_avr_t *b, uint32_t ps)
>  int sgnb = bcd_get_sgn(b);
>  int invalid = (sgna == 0) || (sgnb == 0);
>  int overflow = 0;
> +int zero = 0;
>  uint32_t cr = 0;
>  ppc_avr_t result = { .u64 = { 0, 0 } };
>  
>  if (!invalid) {
>  if (sgna == sgnb) {
>  result.VsrB(BCD_DIG_BYTE(0)) = bcd_preferred_sgn(sgna, ps);
> -bcd_add_mag(, a, b, , );
> -cr = bcd_cmp_zero();
> +zero = bcd_add_mag(, a, b, , );
> +cr = (sgna > 0) ? CRF_GT : CRF_LT;
>  } else {
>  int magnitude = bcd_cmp_mag(a, b);
>  if (magnitude > 0) {
> @@ -2255,6 +2260,8 @@ uint32_t helper_bcdadd(ppc_avr_t *r,  ppc_avr_t *a, 
> ppc_avr_t *b, uint32_t ps)
>  cr = CRF_SO;
>  } else if (overflow) {
>  cr |= CRF_SO;
> +} else if (zero) {
> +cr |= CRF_EQ;
>  }
>  
>  *r = result;
> diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
> index 551c02f469..a0b709948c 100755
> --- a/tests/tcg/configure.sh
> +++ b/tests/tcg/configure.sh
> @@ -251,6 +251,12 @@ for target in $target_list; do
>  echo "CROSS_CC_HAS_ARMV8_MTE=y" >> $config_target_mak
>  fi
>  ;;
> +ppc*)
> +if do_compiler "$target_compiler" $target_compiler_cflags \
> +   -mpower8-vector -o $TMPE $TMPC; then
> +echo "CROSS_CC_HAS_POWER8_VECTOR=y" >> $config_target_mak
> +fi
> +;;
>  esac
>  
>  enabled_cross_compilers="$enabled_cross_compilers $target_compiler"
> diff

Re: [PATCH v2 01/11] accel/kvm: Check MachineClass kvm_type() return value

2021-02-22 Thread David Gibson

On Tue, Feb 23, 2021 at 10:33:55AM +1100, David Gibson wrote:
> On Mon, Feb 22, 2021 at 06:50:44PM +0100, Cornelia Huck wrote:
> > On Mon, 22 Feb 2021 18:41:07 +0100
> > Philippe Mathieu-Daudé  wrote:
> > 
> > > On 2/22/21 6:24 PM, Cornelia Huck wrote:
> > > > On Fri, 19 Feb 2021 18:38:37 +0100
> > > > Philippe Mathieu-Daudé  wrote:
> > > >   
> > > >> MachineClass::kvm_type() can return -1 on failure.
> > > >> Document it, and add a check in kvm_init().
> > > >>
> > > >> Signed-off-by: Philippe Mathieu-Daudé 
> > > >> ---
> > > >>  include/hw/boards.h | 3 ++-
> > > >>  accel/kvm/kvm-all.c | 6 ++
> > > >>  2 files changed, 8 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > >> index a46dfe5d1a6..68d3d10f6b0 100644
> > > >> --- a/include/hw/boards.h
> > > >> +++ b/include/hw/boards.h
> > > >> @@ -127,7 +127,8 @@ typedef struct {
> > > >>   *implement and a stub device is required.
> > > >>   * @kvm_type:
> > > >>   *Return the type of KVM corresponding to the kvm-type string 
> > > >> option or
> > > >> - *computed based on other criteria such as the host kernel 
> > > >> capabilities.
> > > >> + *computed based on other criteria such as the host kernel 
> > > >> capabilities
> > > >> + *(which can't be negative), or -1 on error.
> > > >>   * @numa_mem_supported:
> > > >>   *true if '--numa node.mem' option is supported and false 
> > > >> otherwise
> > > >>   * @smp_parse:
> > > >> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > > >> index 84c943fcdb2..b069938d881 100644
> > > >> --- a/accel/kvm/kvm-all.c
> > > >> +++ b/accel/kvm/kvm-all.c
> > > >> @@ -2057,6 +2057,12 @@ static int kvm_init(MachineState *ms)
> > > >>  
> > > >> "kvm-type",
> > > >>  
> > > >> _abort);
> > > >>  type = mc->kvm_type(ms, kvm_type);
> > > >> +if (type < 0) {
> > > >> +ret = -EINVAL;
> > > >> +fprintf(stderr, "Failed to detect kvm-type for machine 
> > > >> '%s'\n",
> > > >> +mc->name);
> > > >> +goto err;
> > > >> +}
> > > >>  }
> > > >>  
> > > >>  do {  
> > > > 
> > > > No objection to this patch; but I'm wondering why some non-pseries
> > > > machines implement the kvm_type callback, when I see the kvm-type
> > > > property only for pseries? Am I holding my git grep wrong?  
> > > 
> > > Can it be what David commented here?
> > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg784508.html
> > > 
> > 
> > Ok, I might be confused about the other ppc machines; but I'm wondering
> > about the kvm_type callback for mips and arm/virt. Maybe I'm just
> > confused by the whole mechanism?
> 
> For ppc at least, not sure about in general, pseries is the only
> machine type that can possibly work under more than one KVM flavour
> (HV or PR).  So, it's the only one where it's actually useful to be
> able to configure this.

Wait... I'm not sure that's true.  At least theoretically, some of the
Book3E platforms could work with either PR or the Book3E specific
KVM.  Not sure if KVM PR supports all the BookE instructions it would
need to in practice.

Possibly pseries is just the platform where there's been enough people
interested in setting the KVM flavour so far.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH qemu v13] spapr: Implement Open Firmware client interface

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 10:48:51PM +1100, Alexey Kardashevskiy wrote:
> Ping?
> 
> I need community support here :) I am hearing that having this mode helps
> heaps with development in fully emulated environments as this skips SLOF
> entirely, for example. Another rumour I am hearing is that there is interest
> in running grub in the userspace which this VOF thing makes handy
> too.

Yeah, sorry.  I finally allocated time today to go through this in
detail.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v4 1/5] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 04:45:27PM -0300, Daniel Henrique Barboza wrote:
> When moving a physical DRC to "Available", drc_isolate_physical() will
> move the DRC state to STATE_PHYSICAL_POWERON and, if the DRC is marked
> for unplug, call spapr_drc_detach(). For physical DRCs,
> drck->empty_state is STATE_PHYSICAL_POWERON, meaning that we're sure
> that spapr_drc_detach() will end up calling spapr_drc_release() in the
> end.
> 
> Likewise, for logical DRCs, drc_set_unusable will move the DRC to
> "Unusable" state, setting drc->state to STATE_LOGICAL_UNUSABLE, which is
> the drck->empty_state for logical DRCs. spapr_drc_detach() will call
> spapr_drc_release() in this case as well.
> 
> In both scenarios, spapr_drc_detach() is being used as a
> spapr_drc_release(), wrapper, where we also set unplug_requested (which
> is already true, otherwise spapr_drc_detach() wouldn't be called in the
> first place) and check if drc->state == drck->empty_state, which we also
> know it's guaranteed to be true because we just set it.
> 
> Just use spapr_drc_release() in these functions to be clear of our
> intentions in both these functions.
> 
> Reviewed-by: Greg Kurz 
> Reviewed-by: David Gibson 
> Signed-off-by: Daniel Henrique Barboza 

Applied to ppc-for-6.0, thanks.

> ---
>  hw/ppc/spapr_drc.c | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 84bd3c881f..555a25517d 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -50,6 +50,20 @@ uint32_t spapr_drc_index(SpaprDrc *drc)
>  | (drc->id & DRC_INDEX_ID_MASK);
>  }
>  
> +static void spapr_drc_release(SpaprDrc *drc)
> +{
> +SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +drck->release(drc->dev);
> +
> +drc->unplug_requested = false;
> +g_free(drc->fdt);
> +drc->fdt = NULL;
> +drc->fdt_start_offset = 0;
> +object_property_del(OBJECT(drc), "device");
> +drc->dev = NULL;
> +}
> +
>  static uint32_t drc_isolate_physical(SpaprDrc *drc)
>  {
>  switch (drc->state) {
> @@ -68,7 +82,7 @@ static uint32_t drc_isolate_physical(SpaprDrc *drc)
>  if (drc->unplug_requested) {
>  uint32_t drc_index = spapr_drc_index(drc);
>  trace_spapr_drc_set_isolation_state_finalizing(drc_index);
> -spapr_drc_detach(drc);
> +spapr_drc_release(drc);
>  }
>  
>  return RTAS_OUT_SUCCESS;
> @@ -209,7 +223,7 @@ static uint32_t drc_set_unusable(SpaprDrc *drc)
>  if (drc->unplug_requested) {
>  uint32_t drc_index = spapr_drc_index(drc);
>  trace_spapr_drc_set_allocation_state_finalizing(drc_index);
> -spapr_drc_detach(drc);
> +spapr_drc_release(drc);
>  }
>  
>  return RTAS_OUT_SUCCESS;
> @@ -372,20 +386,6 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
>   NULL, 0);
>  }
>  
> -static void spapr_drc_release(SpaprDrc *drc)
> -{
> -SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> -
> -drck->release(drc->dev);
> -
> -drc->unplug_requested = false;
> -g_free(drc->fdt);
> -drc->fdt = NULL;
> -drc->fdt_start_offset = 0;
> -object_property_del(OBJECT(drc), "device");
> -drc->dev = NULL;
> -}
> -
>  void spapr_drc_detach(SpaprDrc *drc)
>  {
>  SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH] docs/system: Extend PPC section

2021-02-22 Thread David Gibson

On Mon, Feb 22, 2021 at 03:04:41PM +0100, Greg Kurz wrote:
> On Mon, 22 Feb 2021 14:39:56 +0100
> Cédric Le Goater  wrote:
> 
> > This moves the current documentation in files specific to each
> > platform family. PowerNV machine is updated, the other machines need
> > to be done.
> > 
> > Signed-off-by: Cédric Le Goater 
> > ---
> 
> Looks pretty good to me. Just one small nit in docs/system/target-ppc.rst.
> 
> Reviewed-by: Greg Kurz 

Applied to ppc-for-6.0...

[snip]

> > -QEMU emulates the following PowerMac peripherals:
> > +you can get a complete list by running ``qemu-system-ppc64 --machine
> 
> Usual capitalization rules call for s/you/You .

.. and I corrected that inline.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[PATCH v3 1/3] vfio: Move the saving of the config space to the right place in VFIO migration

2021-02-22 Thread Shenming Lu

On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
setup, if the restoring of the VFIO PCI device config space is
before the VGIC, an error might occur in the kernel.

So we move the saving of the config space to the non-iterable
process, thus it will be called after the VGIC according to
their priorities.

As for the possible dependence of the device specific migration
data on it's config space, we can let the vendor driver to
include any config info it needs in its own data stream.

Signed-off-by: Shenming Lu 
---
 hw/vfio/migration.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 00daa50ed8..f5bf67f642 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -575,11 +575,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 return ret;
 }
 
-ret = vfio_save_device_config_state(f, opaque);
-if (ret) {
-return ret;
-}
-
 ret = vfio_update_pending(vbasedev);
 if (ret) {
 return ret;
@@ -620,6 +615,19 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 return ret;
 }
 
+static void vfio_save_state(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+int ret;
+
+ret = vfio_save_device_config_state(f, opaque);
+if (ret) {
+error_report("%s: Failed to save device config space",
+ vbasedev->name);
+qemu_file_set_error(f, ret);
+}
+}
+
 static int vfio_load_setup(QEMUFile *f, void *opaque)
 {
 VFIODevice *vbasedev = opaque;
@@ -670,11 +678,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int 
version_id)
 switch (data) {
 case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
 {
-ret = vfio_load_device_config_state(f, opaque);
-if (ret) {
-return ret;
-}
-break;
+return vfio_load_device_config_state(f, opaque);
 }
 case VFIO_MIG_FLAG_DEV_SETUP_STATE:
 {
@@ -720,6 +724,7 @@ static SaveVMHandlers savevm_vfio_handlers = {
 .save_live_pending = vfio_save_pending,
 .save_live_iterate = vfio_save_iterate,
 .save_live_complete_precopy = vfio_save_complete_precopy,
+.save_state = vfio_save_state,
 .load_setup = vfio_load_setup,
 .load_cleanup = vfio_load_cleanup,
 .load_state = vfio_load_state,
-- 
2.19.1

[PATCH v3 2/3] vfio: Set the priority of the VFIO VM state change handler explicitly

2021-02-22 Thread Shenming Lu

In the VFIO VM state change handler when stopping the VM, the _RUNNING
bit in device_state is cleared which makes the VFIO device stop, including
no longer generating interrupts. Then we can save the pending states of
all interrupts in the GIC VM state change handler (on ARM).

So we have to set the priority of the VFIO VM state change handler
explicitly (like virtio devices) to ensure it is called before the
GIC's in saving.

Signed-off-by: Shenming Lu 
Reviewed-by: Kirti Wankhede 
Reviewed-by: Cornelia Huck 
---
 hw/vfio/migration.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index f5bf67f642..b74982e3e6 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -862,7 +862,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, _vfio_handlers,
  vbasedev);
 
-migration->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
+migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
+   vfio_vmstate_change,
vbasedev);
 migration->migration_state.notify = vfio_migration_state_notifier;
 add_migration_state_change_notifier(>migration_state);
-- 
2.19.1

[PATCH v3 0/3] vfio: Some fixes and optimizations for VFIO migration

2021-02-22 Thread Shenming Lu

This patch set includes two fixes and one optimization for VFIO migration
as blew:

Patch 1-2:
- Fix two ordering problems in migration.

Patch 3:
- Optimize the enabling process of the MSI-X vectors in migration.

History:

v2 -> v3:
- Nit fixes.
- Set error in migration stream for migration to fail in Patch 1.
- Tested Patch 3 with a Windows guest.

Thanks,
Shenming


Shenming Lu (3):
  vfio: Move the saving of the config space to the right place in VFIO
migration
  vfio: Set the priority of the VFIO VM state change handler explicitly
  vfio: Avoid disabling and enabling vectors repeatedly in VFIO
migration

 hw/pci/msix.c |  2 +-
 hw/vfio/migration.c   | 28 +---
 hw/vfio/pci.c | 20 +---
 include/hw/pci/msix.h |  1 +
 4 files changed, 36 insertions(+), 15 deletions(-)

-- 
2.19.1

[PATCH v3 3/3] vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration

2021-02-22 Thread Shenming Lu

In VFIO migration resume phase and some guest startups, there are
already unmasked vectors in the vector table when calling
vfio_msix_enable(). So in order to avoid inefficiently disabling
and enabling vectors repeatedly, let's allocate all needed vectors
first and then enable these unmasked vectors one by one without
disabling.

Signed-off-by: Shenming Lu 
---
 hw/pci/msix.c |  2 +-
 hw/vfio/pci.c | 20 +---
 include/hw/pci/msix.h |  1 +
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index ae9331cd0b..e057958fcd 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -131,7 +131,7 @@ static void msix_handle_mask_update(PCIDevice *dev, int 
vector, bool was_masked)
 }
 }
 
-static bool msix_masked(PCIDevice *dev)
+bool msix_masked(PCIDevice *dev)
 {
 return dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] & 
MSIX_MASKALL_MASK;
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index f74be78209..088fd41926 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -569,6 +569,9 @@ static void vfio_msix_vector_release(PCIDevice *pdev, 
unsigned int nr)
 
 static void vfio_msix_enable(VFIOPCIDevice *vdev)
 {
+PCIDevice *pdev = >pdev;
+unsigned int nr, max_vec = 0;
+
 vfio_disable_interrupts(vdev);
 
 vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
@@ -587,11 +590,22 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
  * triggering to userspace, then immediately release the vector, leaving
  * the physical device with no vectors enabled, but MSI-X enabled, just
  * like the guest view.
+ * If there are already unmasked vectors (in migration resume phase and
+ * some guest startups) which will be enabled soon, we can allocate all
+ * of them here to avoid inefficiently disabling and enabling vectors
+ * repeatedly later.
  */
-vfio_msix_vector_do_use(>pdev, 0, NULL, NULL);
-vfio_msix_vector_release(>pdev, 0);
+if (!msix_masked(pdev)) {
+for (nr = 0; nr < msix_nr_vectors_allocated(pdev); nr++) {
+if (!msix_is_masked(pdev, nr)) {
+max_vec = nr;
+}
+}
+}
+vfio_msix_vector_do_use(pdev, max_vec, NULL, NULL);
+vfio_msix_vector_release(pdev, max_vec);
 
-if (msix_set_vector_notifiers(>pdev, vfio_msix_vector_use,
+if (msix_set_vector_notifiers(pdev, vfio_msix_vector_use,
   vfio_msix_vector_release, NULL)) {
 error_report("vfio: msix_set_vector_notifiers failed");
 }
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 4c4a60c739..b3cd88e262 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -28,6 +28,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f);
 
 int msix_enabled(PCIDevice *dev);
 int msix_present(PCIDevice *dev);
+bool msix_masked(PCIDevice *dev);
 
 bool msix_is_masked(PCIDevice *dev, unsigned vector);
 void msix_set_pending(PCIDevice *dev, unsigned vector);
-- 
2.19.1

[PATCH 1/3] migration/ram: Modify the code comment of ram_save_host_page()

2021-02-22 Thread Kunkun Jiang

The ram_save_host_page() has been modified several times
since its birth. But the comment hasn't been modified as it should
be. It'd better to modify the comment to explain ram_save_host_page()
more clearly.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 migration/ram.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 72143da0ac..fc49c3f898 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1970,15 +1970,16 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss,
 }
 
 /**
- * ram_save_host_page: save a whole host page
+ * ram_save_host_page: save a whole host page or the rest of a block
  *
- * Starting at *offset send pages up to the end of the current host
- * page. It's valid for the initial offset to point into the middle of
- * a host page in which case the remainder of the hostpage is sent.
- * Only dirty target pages are sent. Note that the host page size may
- * be a huge page for this block.
- * The saving stops at the boundary of the used_length of the block
- * if the RAMBlock isn't a multiple of the host page size.
+ * Starting at pss->page send pages up to the end of the current host
+ * page or the boundary of used_length of the block (if the RAMBlock
+ * isn't a multiple of the host page size). The min one is selected.
+ * Only dirty target pages are sent.
+ *
+ * Note that the host page size may be a huge page for this block, it's
+ * valid for the initial offset to point into the middle of a host page
+ * in which case the remainder of the hostpage is sent.
  *
  * Returns the number of pages written or negative on error
  *
-- 
2.23.0

[PATCH 2/3] migration/ram: Modify ram_save_host_page() to match the comment

2021-02-22 Thread Kunkun Jiang

According to the comment, when the host page is a huge page, the
migration_rate_limit() should be executed. If not, this function
can be omitted to save time.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 migration/ram.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index fc49c3f898..c7e18dc2fc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2017,7 +2017,9 @@ static int ram_save_host_page(RAMState *rs, 
PageSearchStatus *pss,
 pages += tmppages;
 pss->page++;
 /* Allow rate limiting to happen in the middle of huge pages */
-migration_rate_limit();
+if (pagesize_bits > 1) {
+migration_rate_limit();
+}
 } while ((pss->page & (pagesize_bits - 1)) &&
  offset_in_ramblock(pss->block,
 ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
-- 
2.23.0

[PATCH 3/3] migration/ram: Optimize ram_save_host_page()

2021-02-22 Thread Kunkun Jiang

Starting from pss->page, ram_save_host_page() will check every page
and send the dirty pages up to the end of the current host page or
the boundary of used_length of the block. If the host page size is
a huge page, the step "check" will take a lot of time.

This will improve performance to use migration_bitmap_find_dirty().

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 migration/ram.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index c7e18dc2fc..c7a2350198 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1994,6 +1994,8 @@ static int ram_save_host_page(RAMState *rs, 
PageSearchStatus *pss,
 int tmppages, pages = 0;
 size_t pagesize_bits =
 qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
+unsigned long hostpage_boundary =
+QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
 unsigned long start_page = pss->page;
 int res;
 
@@ -2005,8 +2007,7 @@ static int ram_save_host_page(RAMState *rs, 
PageSearchStatus *pss,
 do {
 /* Check the pages is dirty and if it is send it */
 if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
-pss->page++;
-continue;
+goto find_next;
 }
 
 tmppages = ram_save_target_page(rs, pss, last_stage);
@@ -2015,16 +2016,17 @@ static int ram_save_host_page(RAMState *rs, 
PageSearchStatus *pss,
 }
 
 pages += tmppages;
-pss->page++;
 /* Allow rate limiting to happen in the middle of huge pages */
 if (pagesize_bits > 1) {
 migration_rate_limit();
 }
-} while ((pss->page & (pagesize_bits - 1)) &&
+find_next:
+pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
+} while ((pss->page < hostpage_boundary) &&
  offset_in_ramblock(pss->block,
 ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
-/* The offset we leave with is the last one we looked at */
-pss->page--;
+/* The offset we leave with is the min boundary of host page and block */
+pss->page = MIN(pss->page, hostpage_boundary) - 1;
 
 res = ram_save_release_protection(rs, pss, start_page);
 return (res < 0 ? res : pages);
-- 
2.23.0

[PATCH 0/3] migration/ram: Some modifications about ram_save_host_page()

2021-02-22 Thread Kunkun Jiang

Hi,

This series include patches as below:
Patch 1-2:
- modified the comment and code of ram_save_host_page() to make them match each 
other

Patch 3:
- optimized ram_save_host_page() by using migration_bitmap_find_dirty() to find 
dirty
pages

Best Regards

Kunkun Jiang

Kunkun Jiang (3):
  migration/ram: Modify the code comment of ram_save_host_page()
  migration/ram: Modify ram_save_host_page() to match the comment
  migration/ram: Optimize ram_save_host_page()

 migration/ram.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

-- 
2.23.0

[PATCH v3 08/16] qapi/expr.py: add type hint annotations

2021-02-22 Thread John Snow

Annotations do not change runtime behavior.
This commit *only* adds annotations.

Signed-off-by: John Snow 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cleber Rosa 
---
 scripts/qapi/expr.py  | 71 ---
 scripts/qapi/mypy.ini |  5 ---
 2 files changed, 46 insertions(+), 30 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index f45d6be1f4c..df6c64950fa 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -15,7 +15,14 @@
 # See the COPYING file in the top-level directory.
 
 import re
-from typing import MutableMapping, Optional, cast
+from typing import (
+Iterable,
+List,
+MutableMapping,
+Optional,
+Union,
+cast,
+)
 
 from .common import c_name
 from .error import QAPISemError
@@ -23,9 +30,10 @@
 from .source import QAPISourceInfo
 
 
-# Expressions in their raw form are JSON-like structures with arbitrary forms.
-# Minimally, their top-level form must be a mapping of strings to values.
-Expression = MutableMapping[str, object]
+# Arbitrary form for a JSON-like object.
+_JSObject = MutableMapping[str, object]
+# Expressions in their raw form are (just) JSON-like objects.
+Expression = _JSObject
 
 
 # Names must be letters, numbers, -, and _.  They must start with letter,
@@ -35,14 +43,19 @@
 '[a-zA-Z][a-zA-Z0-9_-]*$')
 
 
-def check_name_is_str(name, info, source):
+def check_name_is_str(name: object,
+  info: QAPISourceInfo,
+  source: str) -> None:
 if not isinstance(name, str):
 raise QAPISemError(info, "%s requires a string name" % source)
 
 
-def check_name_str(name, info, source,
-   allow_optional=False, enum_member=False,
-   permit_upper=False):
+def check_name_str(name: str,
+   info: QAPISourceInfo,
+   source: str,
+   allow_optional: bool = False,
+   enum_member: bool = False,
+   permit_upper: bool = False) -> None:
 membername = name
 
 if allow_optional and name.startswith('*'):
@@ -62,16 +75,20 @@ def check_name_str(name, info, source,
 assert not membername.startswith('*')
 
 
-def check_defn_name_str(name, info, meta):
+def check_defn_name_str(name: str, info: QAPISourceInfo, meta: str) -> None:
 check_name_str(name, info, meta, permit_upper=True)
 if name.endswith('Kind') or name.endswith('List'):
 raise QAPISemError(
 info, "%s name should not end in '%s'" % (meta, name[-4:]))
 
 
-def check_keys(value, info, source, required, optional):
+def check_keys(value: _JSObject,
+   info: QAPISourceInfo,
+   source: str,
+   required: List[str],
+   optional: List[str]) -> None:
 
-def pprint(elems):
+def pprint(elems: Iterable[str]) -> str:
 return ', '.join("'" + e + "'" for e in sorted(elems))
 
 missing = set(required) - set(value)
@@ -91,7 +108,7 @@ def pprint(elems):
pprint(unknown), pprint(allowed)))
 
 
-def check_flags(expr, info):
+def check_flags(expr: Expression, info: QAPISourceInfo) -> None:
 for key in ['gen', 'success-response']:
 if key in expr and expr[key] is not False:
 raise QAPISemError(
@@ -109,9 +126,9 @@ def check_flags(expr, info):
  "are incompatible")
 
 
-def check_if(expr, info, source):
+def check_if(expr: _JSObject, info: QAPISourceInfo, source: str) -> None:
 
-def check_if_str(ifcond):
+def check_if_str(ifcond: object) -> None:
 if not isinstance(ifcond, str):
 raise QAPISemError(
 info,
@@ -137,7 +154,7 @@ def check_if_str(ifcond):
 expr['if'] = [ifcond]
 
 
-def normalize_members(members):
+def normalize_members(members: object) -> None:
 if isinstance(members, dict):
 for key, arg in members.items():
 if isinstance(arg, dict):
@@ -145,8 +162,11 @@ def normalize_members(members):
 members[key] = {'type': arg}
 
 
-def check_type(value, info, source,
-   allow_array=False, allow_dict=False):
+def check_type(value: Optional[object],
+   info: QAPISourceInfo,
+   source: str,
+   allow_array: bool = False,
+   allow_dict: Union[bool, str] = False) -> None:
 if value is None:
 return
 
@@ -190,7 +210,8 @@ def check_type(value, info, source,
 check_type(arg['type'], info, key_source, allow_array=True)
 
 
-def check_features(features, info):
+def check_features(features: Optional[object],
+   info: QAPISourceInfo) -> None:
 if features is None:
 return
 if not isinstance(features, list):
@@ -207,7 +228,7 @@ def check_features(features, info):
 check_if(f, info, source)
 
 
-def check_enum(expr, info):
+def check_enum(expr: Expression, info: QAPISourceInfo) -> None:
 name = expr['enum']

[PATCH v3 16/16] qapi/expr.py: Use an expression checker dispatch table

2021-02-22 Thread John Snow

This enforces a type signature against all of the top-level expression
check routines without necessarily needing to create some
overcomplicated class hierarchy for them.

Signed-off-by: John Snow 
---
 scripts/qapi/expr.py | 64 +++-
 1 file changed, 34 insertions(+), 30 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 3672637487b..f1c58483915 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -31,9 +31,12 @@
 structures and contextual semantic validation.
 """
 
+from enum import Enum
 import re
 from typing import (
+Callable,
 Collection,
+Dict,
 Iterable,
 List,
 MutableMapping,
@@ -505,6 +508,29 @@ def check_event(expr: Expression, info: QAPISourceInfo) -> 
None:
 check_type(args, info, "'data'", allow_dict=not boxed)
 
 
+class ExpressionType(str, Enum):
+INCLUDE = 'include'
+ENUM = 'enum'
+UNION = 'union'
+ALTERNATE = 'alternate'
+STRUCT = 'struct'
+COMMAND = 'command'
+EVENT = 'event'
+
+def __str__(self) -> str:
+return str(self.value)
+
+
+_CHECK_FN: Dict[str, Callable[[Expression, QAPISourceInfo], None]] = {
+'enum': check_enum,
+'union': check_union,
+'alternate': check_alternate,
+'struct': check_struct,
+'command': check_command,
+'event': check_event,
+}
+
+
 def check_exprs(exprs: List[_JSObject]) -> List[_JSObject]:
 """
 Validate and normalize a list of parsed QAPI schema expressions. [RW]
@@ -531,24 +557,16 @@ def check_exprs(exprs: List[_JSObject]) -> 
List[_JSObject]:
 assert tmp is None or isinstance(tmp, QAPIDoc)
 doc: Optional[QAPIDoc] = tmp
 
-if 'include' in expr:
-continue
-
-if 'enum' in expr:
-meta = 'enum'
-elif 'union' in expr:
-meta = 'union'
-elif 'alternate' in expr:
-meta = 'alternate'
-elif 'struct' in expr:
-meta = 'struct'
-elif 'command' in expr:
-meta = 'command'
-elif 'event' in expr:
-meta = 'event'
+for kind in ExpressionType:
+if kind in expr:
+meta = kind
+break
 else:
 raise QAPISemError(info, "expression is missing metatype")
 
+if meta == ExpressionType.INCLUDE:
+continue
+
 name = cast(str, expr[meta])  # Asserted right below:
 check_name_is_str(name, info, "'%s'" % meta)
 info.set_defn(meta, name)
@@ -563,21 +581,7 @@ def check_exprs(exprs: List[_JSObject]) -> List[_JSObject]:
 raise QAPISemError(info,
"documentation comment required")
 
-if meta == 'enum':
-check_enum(expr, info)
-elif meta == 'union':
-check_union(expr, info)
-elif meta == 'alternate':
-check_alternate(expr, info)
-elif meta == 'struct':
-check_struct(expr, info)
-elif meta == 'command':
-check_command(expr, info)
-elif meta == 'event':
-check_event(expr, info)
-else:
-assert False, 'unexpected meta type'
-
+_CHECK_FN[meta](expr, info)
 check_if(expr, info, meta)
 check_features(expr.get('features'), info)
 check_flags(expr, info)
-- 
2.29.2

[PATCH v3 15/16] qapi/expr.py: move related checks inside check_xxx functions

2021-02-22 Thread John Snow

There's not a big obvious difference between the types of checks that
happen in the main function versus the kind that happen in the
functions. Now they're in one place for each of the main types.

As part of the move, spell out the required and optional keywords so
they're obvious at a glance. Use tuples instead of lists for immutable
data, too.

Signed-off-by: John Snow 
Reviewed-by: Cleber Rosa 
---
 scripts/qapi/expr.py | 55 ++--
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 61699de8cd5..3672637487b 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -344,6 +344,10 @@ def check_enum(expr: Expression, info: QAPISourceInfo) -> 
None:
 :param expr: `Expression` to validate.
 :param info: QAPI source file information.
 """
+check_keys(expr, info, 'enum',
+   required=('enum', 'data'),
+   optional=('if', 'features', 'prefix'))
+
 name = expr['enum']
 members = expr['data']
 prefix = expr.get('prefix')
@@ -374,6 +378,11 @@ def check_struct(expr: Expression, info: QAPISourceInfo) 
-> None:
 :param expr: `Expression` to validate.
 :param info: QAPI source file information.
 """
+check_keys(expr, info, 'struct',
+   required=('struct', 'data'),
+   optional=('base', 'if', 'features'))
+normalize_members(expr['data'])
+
 name = cast(str, expr['struct'])  # Asserted in check_exprs
 members = expr['data']
 
@@ -388,6 +397,13 @@ def check_union(expr: Expression, info: QAPISourceInfo) -> 
None:
 :param expr: `Expression` to validate.
 :param info: QAPI source file information.
 """
+check_keys(expr, info, 'union',
+   required=('union', 'data'),
+   optional=('base', 'discriminator', 'if', 'features'))
+
+normalize_members(expr.get('base'))
+normalize_members(expr['data'])
+
 name = cast(str, expr['union'])  # Asserted in check_exprs
 base = expr.get('base')
 discriminator = expr.get('discriminator')
@@ -420,6 +436,11 @@ def check_alternate(expr: Expression, info: 
QAPISourceInfo) -> None:
 :param expr: Expression to validate.
 :param info: QAPI source file information.
 """
+check_keys(expr, info, 'alternate',
+   required=('alternate', 'data'),
+   optional=('if', 'features'))
+normalize_members(expr['data'])
+
 members = expr['data']
 
 if not members:
@@ -443,6 +464,13 @@ def check_command(expr: Expression, info: QAPISourceInfo) 
-> None:
 :param expr: `Expression` to validate.
 :param info: QAPI source file information.
 """
+check_keys(expr, info, 'command',
+   required=('command',),
+   optional=('data', 'returns', 'boxed', 'if', 'features',
+ 'gen', 'success-response', 'allow-oob',
+ 'allow-preconfig', 'coroutine'))
+normalize_members(expr.get('data'))
+
 args = expr.get('data')
 rets = expr.get('returns')
 boxed = expr.get('boxed', False)
@@ -464,6 +492,11 @@ def check_event(expr: Expression, info: QAPISourceInfo) -> 
None:
   :if: ``Optional[Ifcond]`` (see: `check_if`)
   :features: ``Optional[Features]`` (see: `check_features`)
 """
+check_keys(expr, info, 'event',
+   required=('event',),
+   optional=('data', 'boxed', 'if', 'features'))
+normalize_members(expr.get('data'))
+
 args = expr.get('data')
 boxed = expr.get('boxed', False)
 
@@ -531,38 +564,16 @@ def check_exprs(exprs: List[_JSObject]) -> 
List[_JSObject]:
"documentation comment required")
 
 if meta == 'enum':
-check_keys(expr, info, meta,
-   ['enum', 'data'], ['if', 'features', 'prefix'])
 check_enum(expr, info)
 elif meta == 'union':
-check_keys(expr, info, meta,
-   ['union', 'data'],
-   ['base', 'discriminator', 'if', 'features'])
-normalize_members(expr.get('base'))
-normalize_members(expr['data'])
 check_union(expr, info)
 elif meta == 'alternate':
-check_keys(expr, info, meta,
-   ['alternate', 'data'], ['if', 'features'])
-normalize_members(expr['data'])
 check_alternate(expr, info)
 elif meta == 'struct':
-check_keys(expr, info, meta,
-   ['struct', 'data'], ['base', 'if', 'features'])
-normalize_members(expr['data'])
 check_struct(expr, info)
 elif meta == 'command':
-check_keys(expr, info, meta,
-   ['command'],
-   ['data', 'returns', 'boxed', 'if', 'features',
-'gen', 'success-response', 'allow-oob',
-'allow-preconfig', 'coroutine'])

[PATCH v3 10/16] qapi/expr.py: Remove single-letter variable

2021-02-22 Thread John Snow

Signed-off-by: John Snow 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cleber Rosa 
---
 scripts/qapi/expr.py | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 3235a3b809e..473ee4f7f7e 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -214,14 +214,14 @@ def check_features(features: Optional[object],
 raise QAPISemError(info, "'features' must be an array")
 features[:] = [f if isinstance(f, dict) else {'name': f}
for f in features]
-for f in features:
+for feature in features:
 source = "'features' member"
-assert isinstance(f, dict)
-check_keys(f, info, source, ['name'], ['if'])
-check_name_is_str(f['name'], info, source)
-source = "%s '%s'" % (source, f['name'])
-check_name_str(f['name'], info, source)
-check_if(f, info, source)
+assert isinstance(feature, dict)
+check_keys(feature, info, source, ['name'], ['if'])
+check_name_is_str(feature['name'], info, source)
+source = "%s '%s'" % (source, feature['name'])
+check_name_str(feature['name'], info, source)
+check_if(feature, info, source)
 
 
 def check_enum(expr: Expression, info: QAPISourceInfo) -> None:
-- 
2.29.2

[PATCH v3 13/16] qapi/expr.py: Modify check_keys to accept any Collection

2021-02-22 Thread John Snow

This is a minor adjustment that allows the 'required' and 'optional'
keys fields to take a default value of an empty, immutable sequence (the
empty tuple).

This reveals a quirk of this function, which is that "a + b" is
list-specific behavior. We can accept a wider variety of types if we
avoid that behavior. Using Collection allows us to accept things like
lists, tuples, sets, and so on.

(Iterable would also have worked, but Iterable also includes things like
generator expressions which are consumed upon iteration, which would
require a rewrite to make sure that each input was only traversed once.)

Signed-off-by: John Snow 
---
 scripts/qapi/expr.py | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 2b96bec722f..0b841f292d7 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -33,6 +33,7 @@
 
 import re
 from typing import (
+Collection,
 Iterable,
 List,
 MutableMapping,
@@ -133,8 +134,8 @@ def check_defn_name_str(name: str, info: QAPISourceInfo, 
meta: str) -> None:
 def check_keys(value: _JSObject,
info: QAPISourceInfo,
source: str,
-   required: List[str],
-   optional: List[str]) -> None:
+   required: Collection[str] = (),
+   optional: Collection[str] = ()) -> None:
 """
 Ensures an object has a specific set of keys. [Const]
 
@@ -155,7 +156,7 @@ def pprint(elems: Iterable[str]) -> str:
 "%s misses key%s %s"
 % (source, 's' if len(missing) > 1 else '',
pprint(missing)))
-allowed = set(required + optional)
+allowed = set(required) | set(optional)
 unknown = set(value) - allowed
 if unknown:
 raise QAPISemError(
-- 
2.29.2

[PATCH v3 11/16] qapi/expr.py: enable pylint checks

2021-02-22 Thread John Snow

Signed-off-by: John Snow 
Tested-by: Eduardo Habkost 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cleber Rosa 
Tested-by: Cleber Rosa 
---
 scripts/qapi/pylintrc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/scripts/qapi/pylintrc b/scripts/qapi/pylintrc
index b9e077a1642..fb0386d529a 100644
--- a/scripts/qapi/pylintrc
+++ b/scripts/qapi/pylintrc
@@ -3,7 +3,6 @@
 # Add files or directories matching the regex patterns to the ignore list.
 # The regex matches against base names, not paths.
 ignore-patterns=error.py,
-expr.py,
 parser.py,
 schema.py,
 
-- 
2.29.2

[PATCH v3 14/16] qapi/expr.py: Use tuples instead of lists for static data

2021-02-22 Thread John Snow

It is -- maybe -- possibly -- three nanoseconds faster.

Signed-off-by: John Snow 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cleber Rosa 

---

This can be dropped if desired; it has no real functional impact I could
defend in code review court. I just happened to write it this way.

Signed-off-by: John Snow 
---
 scripts/qapi/expr.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 0b841f292d7..61699de8cd5 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -173,11 +173,11 @@ def check_flags(expr: Expression, info: QAPISourceInfo) 
-> None:
 :param expr: Expression to validate.
 :param info: QAPI source file information.
 """
-for key in ['gen', 'success-response']:
+for key in ('gen', 'success-response'):
 if key in expr and expr[key] is not False:
 raise QAPISemError(
 info, "flag '%s' may only use false value" % key)
-for key in ['boxed', 'allow-oob', 'allow-preconfig', 'coroutine']:
+for key in ('boxed', 'allow-oob', 'allow-preconfig', 'coroutine'):
 if key in expr and expr[key] is not True:
 raise QAPISemError(
 info, "flag '%s' may only use true value" % key)
-- 
2.29.2

[PATCH v3 12/16] qapi/expr.py: Add docstrings

2021-02-22 Thread John Snow

In this patch, I begin to adopt the idea that some functions can be
marked as "Const" and others "RW" to distinguish between functions that
perform a check-only, and those that perform normilization work and
modify the structure under consideration.

It is not any kind of doc standard, it was for my own benefit.

Signed-off-by: John Snow 
Reviewed-by: Cleber Rosa 
---
 scripts/qapi/expr.py | 167 ++-
 1 file changed, 164 insertions(+), 3 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 473ee4f7f7e..2b96bec722f 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -1,7 +1,5 @@
 # -*- coding: utf-8 -*-
 #
-# Check (context-free) QAPI schema expression structure
-#
 # Copyright IBM, Corp. 2011
 # Copyright (c) 2013-2019 Red Hat Inc.
 #
@@ -14,6 +12,25 @@
 # This work is licensed under the terms of the GNU GPL, version 2.
 # See the COPYING file in the top-level directory.
 
+"""
+Normalize and validate (context-free) QAPI schema expression structures.
+
+After QAPI expressions are parsed from disk, they are stored in
+recursively nested Python data structures using Dict, List, str, bool,
+and int. This module ensures that those nested structures have the
+correct type(s) and key(s) where appropriate for the QAPI context-free
+grammar.
+
+The QAPI schema expression language allows for syntactic sugar; this
+module also handles the normalization process of these nested
+structures.
+
+See `check_exprs` for the main entry point.
+
+See `schema.QAPISchema` for processing into native Python data
+structures and contextual semantic validation.
+"""
+
 import re
 from typing import (
 Iterable,
@@ -32,7 +49,7 @@
 
 # Arbitrary form for a JSON-like object.
 _JSObject = MutableMapping[str, object]
-# Expressions in their raw form are (just) JSON-like objects.
+#: Expressions in their unvalidated form are JSON-like objects.
 Expression = _JSObject
 
 
@@ -46,6 +63,7 @@
 def check_name_is_str(name: object,
   info: QAPISourceInfo,
   source: str) -> None:
+"""Ensures that ``name`` is a string. [Const]"""
 if not isinstance(name, str):
 raise QAPISemError(info, "%s requires a string name" % source)
 
@@ -56,6 +74,25 @@ def check_name_str(name: str,
allow_optional: bool = False,
enum_member: bool = False,
permit_upper: bool = False) -> None:
+"""
+Ensures a string is a legal name. [Const]
+
+A name is legal in the default case when:
+
+- It matches ``(__[a-z0-9.-]+_)?[a-z][a-z0-9_-]*``
+- It does not start with ``q_`` or ``q-``
+
+:param name:   Name to check.
+:param info:   QAPI source file information.
+:param source: Human-readable str describing "what" this name is.
+:param allow_optional: Allow the very first character to be ``*``.
+   (Cannot be used with ``enum_member``)
+:param enum_member:Allow the very first character to be a digit.
+   (Cannot be used with ``allow_optional``)
+:param permit_upper:   Allows upper-case characters wherever
+   lower-case characters are allowed.
+"""
+assert not (allow_optional and enum_member)
 membername = name
 
 if allow_optional and name.startswith('*'):
@@ -76,6 +113,17 @@ def check_name_str(name: str,
 
 
 def check_defn_name_str(name: str, info: QAPISourceInfo, meta: str) -> None:
+"""
+Ensures a name is a legal definition name. [Const]
+
+A legal definition name:
+ - Adheres to the criteria in `check_name_str`, with uppercase permitted
+ - Does not end with ``Kind`` or ``List``.
+
+:param name: Name to check.
+:param info: QAPI source file information.
+:param meta: Type name of the QAPI expression.
+"""
 check_name_str(name, info, meta, permit_upper=True)
 if name.endswith('Kind') or name.endswith('List'):
 raise QAPISemError(
@@ -87,6 +135,15 @@ def check_keys(value: _JSObject,
source: str,
required: List[str],
optional: List[str]) -> None:
+"""
+Ensures an object has a specific set of keys. [Const]
+
+:param value:The object to check.
+:param info: QAPI source file information.
+:param source:   Human-readable str describing "what" this object is.
+:param required: Keys that *must* be present.
+:param optional: Keys that *may* be present.
+"""
 
 def pprint(elems: Iterable[str]) -> str:
 return ', '.join("'" + e + "'" for e in sorted(elems))
@@ -109,6 +166,12 @@ def pprint(elems: Iterable[str]) -> str:
 
 
 def check_flags(expr: Expression, info: QAPISourceInfo) -> None:
+"""
+Ensures common fields in an Expression are correct. [Const]
+
+:param expr: Expression to validate.
+:param info: QAPI source file information.
+"""
 for key in ['gen',

[PATCH v3 09/16] qapi/expr.py: Consolidate check_if_str calls in check_if

2021-02-22 Thread John Snow

This is a small rewrite to address some minor style nits.

Don't compare against the empty list to check for the empty condition, and
move the normalization forward to unify the check on the now-normalized
structure.

With the check unified, the local nested function isn't needed anymore
and can be brought down into the normal flow of the function. With the
nesting level changed, shuffle the error strings around a bit to get
them to fit in 79 columns.

Note: though ifcond is typed as Sequence[str] elsewhere, we *know* that
the parser will produce real, bona-fide lists. It's okay to check
isinstance(ifcond, list) here.

Signed-off-by: John Snow 
---
 scripts/qapi/expr.py | 36 
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index df6c64950fa..3235a3b809e 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -128,30 +128,26 @@ def check_flags(expr: Expression, info: QAPISourceInfo) 
-> None:
 
 def check_if(expr: _JSObject, info: QAPISourceInfo, source: str) -> None:
 
-def check_if_str(ifcond: object) -> None:
-if not isinstance(ifcond, str):
-raise QAPISemError(
-info,
-"'if' condition of %s must be a string or a list of strings"
-% source)
-if ifcond.strip() == '':
-raise QAPISemError(
-info,
-"'if' condition '%s' of %s makes no sense"
-% (ifcond, source))
-
 ifcond = expr.get('if')
 if ifcond is None:
 return
-if isinstance(ifcond, list):
-if ifcond == []:
+
+# Normalize to a list
+if not isinstance(ifcond, list):
+ifcond = [ifcond]
+expr['if'] = ifcond
+
+if not ifcond:
+raise QAPISemError(info, f"'if' condition [] of {source} is useless")
+
+for element in ifcond:
+if not isinstance(element, str):
+raise QAPISemError(info, (
+f"'if' condition of {source}"
+" must be a string or a list of strings"))
+if element.strip() == '':
 raise QAPISemError(
-info, "'if' condition [] of %s is useless" % source)
-for elt in ifcond:
-check_if_str(elt)
-else:
-check_if_str(ifcond)
-expr['if'] = [ifcond]
+info, f"'if' condition '{element}' of {source} makes no sense")
 
 
 def normalize_members(members: object) -> None:
-- 
2.29.2

[PATCH v3 07/16] qapi/expr.py: Add casts in a few select cases

2021-02-22 Thread John Snow

Casts are instructions to the type checker only, they aren't "safe" and
should probably be avoided in general. In this case, when we perform
type checking on a nested structure, the type of each field does not
"stick".

We don't need to assert that something is a str if we've already checked
that it is -- use a cast instead for these cases.

Signed-off-by: John Snow 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cleber Rosa 
---
 scripts/qapi/expr.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index afa6bd07769..f45d6be1f4c 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -15,7 +15,7 @@
 # See the COPYING file in the top-level directory.
 
 import re
-from typing import MutableMapping, Optional
+from typing import MutableMapping, Optional, cast
 
 from .common import c_name
 from .error import QAPISemError
@@ -232,7 +232,7 @@ def check_enum(expr, info):
 
 
 def check_struct(expr, info):
-name = expr['struct']
+name = cast(str, expr['struct'])  # Asserted in check_exprs
 members = expr['data']
 
 check_type(members, info, "'data'", allow_dict=name)
@@ -240,7 +240,7 @@ def check_struct(expr, info):
 
 
 def check_union(expr, info):
-name = expr['union']
+name = cast(str, expr['union'])  # Asserted in check_exprs
 base = expr.get('base')
 discriminator = expr.get('discriminator')
 members = expr['data']
@@ -337,7 +337,7 @@ def check_exprs(exprs):
 else:
 raise QAPISemError(info, "expression is missing metatype")
 
-name = expr[meta]
+name = cast(str, expr[meta])  # Asserted right below:
 check_name_is_str(name, info, "'%s'" % meta)
 info.set_defn(meta, name)
 check_defn_name_str(name, info, meta)
-- 
2.29.2

[PATCH v3 00/16] qapi: static typing conversion, pt3

2021-02-22 Thread John Snow

Hi, this series adds static types to the QAPI module.
This is part three, and it focuses on expr.py.

This series is applied and hosted here:
https://gitlab.com/jsnow/qemu/-/tree/python-qapi-cleanup-pt3

Environment:
- Python >= 3.6, <= 3.8 *
- mypy >= 0.770
- pylint >= 2.6.0
- flake8
- isort

Every commit should pass with (from ./scripts/):
 - flake8 qapi/
 - pylint --rcfile=qapi/pylintrc qapi/
 - mypy --config-file=qapi/mypy.ini qapi/
 - pushd qapi && isort -c . && popd

Please read the changelog below for some review notes that may be of interest.

V3:

001/16:[] [--] 'qapi/expr.py: Remove 'info' argument from nested 
check_if_str'
002/16:[] [--] 'qapi/expr.py: Check for dict instead of OrderedDict'
003/16:[0004] [FC] 'qapi/expr.py: constrain incoming expression types'
004/16:[] [--] 'qapi/expr.py: Add assertion for union type 'check_dict''
005/16:[] [--] 'qapi/expr.py: move string check upwards in check_type'
006/16:[] [--] 'qapi/expr.py: Check type of 'data' member'
007/16:[0002] [FC] 'qapi/expr.py: Add casts in a few select cases'
008/16:[] [--] 'qapi/expr.py: add type hint annotations'
009/16:[down] 'qapi/expr.py: Consolidate check_if_str calls in check_if'
010/16:[] [--] 'qapi/expr.py: Remove single-letter variable'
011/16:[] [--] 'qapi/expr.py: enable pylint checks'
012/16:[] [-C] 'qapi/expr.py: Add docstrings'
013/16:[down] 'qapi/expr.py: Modify check_keys to accept any Collection'
014/16:[] [--] 'qapi/expr.py: Use tuples instead of lists for static data'
015/16:[0004] [FC] 'qapi/expr.py: move related checks inside check_xxx 
functions'
016/16:[0011] [FC] 'qapi/expr.py: Use an expression checker dispatch table'

 - Some RB-s added, some dropped; see "Review Status" section below.
 - ("pt0" series rebased on latest origin/master.)
 - Rebased on origin/master.
 - 03: Re-ordered the Expression unpacking slightly to match the other stanzas.
   (R-Bs kept.)
 - 07: Changed capitalization of a comment. (R-Bs kept.)
 - 09: Rewritten more aggressively. (R-Bs dropped.)
 - 13: Use "Collection" instead of "Iterable" because of concerns with the
   possibly consumptive nature of Iterable; change commit name & message.
   (R-Bs dropped.)
 - 15: Use tuples everywhere, even for single items. (R-Bs kept.)
 - 16: Update ExpressionType to define a __str__ method, which allows
   the meta variable to be passed and used directly as a string.
   (R-Bs dropped.)

RFCs/notes:

- This series was written long before pt1.5 and pt2. Keep that in mind! (Sorry.)

- I used MutableMapping instead of Dict in patch 8. There's no real
  reason I couldn't have used Dict, both work - this one is more
  abstract.  Both would work for dict/OrderedDict perfectly well.

  (I think I had some reason at one point or another, but I can no
  longer remember what it is, if I am being honest. It might have to do
  with Dict being invariant, but MutableMapping being covariant, which
  might come into play much later in the six-part series. I really
  forget.)

- The dreaded _DObject comes back, this time named _JSObject. It's a bad name.
  It means "Any JSON object deserialized as a Python dict". I didn't rename it
  because I didn't want to shed the R-Bs yet. Please suggest a name.
  (Or a way to avoid needing it at all.)

  You'll probably notice that I keep futzing with the documentation near
  this definition. I opted not to fix it to avoid touching patches that
  were (so far) fully reviewed.

- Patch 12 (the docstring patch) needs to be heavily copy-edited. I
  figured I would simply address it all at once after review from
  Markus. I ask that a review of this patch be exhaustive if at all
  possible.

  I start using [Const] and [RW] markers in the summary string; I think
  I will actually remove these as they are not real markup, but I'd like
  to solicit suggestions on how to differentiate functions that modify
  expr from ones that do not.

  I also start using some fairly arbitrary syntax to try and document
  the syntactic forms being checked and normalized here, but they are
  not very consistent. Suggestions welcome.

- Patch 16 is something I am not sure I really like anymore, it has some
  mild benefit but I don't like how I repeat the expression types twice
  in one file. I consider this patch optional for now. I suspect there's
  a neater way to write it that gives us the same benefit but looks less
  ugly.

Review Status:

[01] qapi-expr-py-remove-info   # [RB] CR,EH [SOB] JS
[02] qapi-expr-py-check-for-dict# [RB] CR,EH [SOB] JS
[03] qapi-expr-py-constrain # [RB] CR,EH [SOB] JS
[04] qapi-expr-py-add-assertion-for # [RB] CR,EH [SOB] JS
[05] qapi-expr-py-move-string-check # [RB] CR,EH [SOB] JS
[06] qapi-expr-py-check-type-of # [RB]EH [SOB] JS
[07] qapi-expr-py-add-casts-in-a# [RB] CR,EH [SOB] JS
[08] qapi-expr-py-add-type-hint # [RB] CR,EH [SOB] JS
[09] qapi-expr-py-consolidate   #[SOB] JS
[10]

[PATCH v3 04/16] qapi/expr.py: Add assertion for union type 'check_dict'

2021-02-22 Thread John Snow

mypy isn't fond of allowing you to check for bool membership in a
collection of str elements. Guard this lookup for precisely when we were
given a name.

Signed-off-by: John Snow 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cleber Rosa 
---
 scripts/qapi/expr.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 783282b53ce..138fab0711f 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -173,7 +173,9 @@ def check_type(value, info, source,
 raise QAPISemError(info,
"%s should be an object or type name" % source)
 
-permit_upper = allow_dict in info.pragma.name_case_whitelist
+permit_upper = False
+if isinstance(allow_dict, str):
+permit_upper = allow_dict in info.pragma.name_case_whitelist
 
 # value is a dictionary, check that each member is okay
 for (key, arg) in value.items():
-- 
2.29.2

[PATCH v3 06/16] qapi/expr.py: Check type of 'data' member

2021-02-22 Thread John Snow

Iterating over the members of data isn't going to work if it's not some
form of dict anyway, but for the sake of mypy, formalize it.

Signed-off-by: John Snow 
Reviewed-by: Eduardo Habkost 
---
 scripts/qapi/expr.py | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index c97e8ce8a4d..afa6bd07769 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -254,6 +254,9 @@ def check_union(expr, info):
 raise QAPISemError(info, "'discriminator' requires 'base'")
 check_name_is_str(discriminator, info, "'discriminator'")
 
+if not isinstance(members, dict):
+raise QAPISemError(info, "'data' must be an object")
+
 for (key, value) in members.items():
 source = "'data' member '%s'" % key
 check_name_str(key, info, source)
@@ -267,6 +270,10 @@ def check_alternate(expr, info):
 
 if not members:
 raise QAPISemError(info, "'data' must not be empty")
+
+if not isinstance(members, dict):
+raise QAPISemError(info, "'data' must be an object")
+
 for (key, value) in members.items():
 source = "'data' member '%s'" % key
 check_name_str(key, info, source)
-- 
2.29.2

[PATCH v3 05/16] qapi/expr.py: move string check upwards in check_type

2021-02-22 Thread John Snow

For readability purposes only, shimmy the early return upwards to the
top of the function, so cases proceed in order from least to most
complex.

Signed-off-by: John Snow 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cleber Rosa 
---
 scripts/qapi/expr.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 138fab0711f..c97e8ce8a4d 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -150,6 +150,10 @@ def check_type(value, info, source,
 if value is None:
 return
 
+# Type name
+if isinstance(value, str):
+return
+
 # Array type
 if isinstance(value, list):
 if not allow_array:
@@ -160,10 +164,6 @@ def check_type(value, info, source,
source)
 return
 
-# Type name
-if isinstance(value, str):
-return
-
 # Anonymous type
 
 if not allow_dict:
-- 
2.29.2

[PATCH v3 03/16] qapi/expr.py: constrain incoming expression types

2021-02-22 Thread John Snow

mypy does not know the types of values stored in Dicts that masquerade
as objects. Help the type checker out by constraining the type.

Signed-off-by: John Snow 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cleber Rosa 
---
 scripts/qapi/expr.py | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 5694c501fa3..783282b53ce 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -15,9 +15,17 @@
 # See the COPYING file in the top-level directory.
 
 import re
+from typing import MutableMapping, Optional
 
 from .common import c_name
 from .error import QAPISemError
+from .parser import QAPIDoc
+from .source import QAPISourceInfo
+
+
+# Expressions in their raw form are JSON-like structures with arbitrary forms.
+# Minimally, their top-level form must be a mapping of strings to values.
+Expression = MutableMapping[str, object]
 
 
 # Names must be letters, numbers, -, and _.  They must start with letter,
@@ -287,9 +295,20 @@ def check_event(expr, info):
 
 def check_exprs(exprs):
 for expr_elem in exprs:
-expr = expr_elem['expr']
-info = expr_elem['info']
-doc = expr_elem.get('doc')
+# Expression
+assert isinstance(expr_elem['expr'], dict)
+for key in expr_elem['expr'].keys():
+assert isinstance(key, str)
+expr: Expression = expr_elem['expr']
+
+# QAPISourceInfo
+assert isinstance(expr_elem['info'], QAPISourceInfo)
+info: QAPISourceInfo = expr_elem['info']
+
+# Optional[QAPIDoc]
+tmp = expr_elem.get('doc')
+assert tmp is None or isinstance(tmp, QAPIDoc)
+doc: Optional[QAPIDoc] = tmp
 
 if 'include' in expr:
 continue
-- 
2.29.2

[PATCH v3 01/16] qapi/expr.py: Remove 'info' argument from nested check_if_str

2021-02-22 Thread John Snow

The function can just use the argument from the scope above. Otherwise,
we get shadowed argument errors because the parameter name clashes with
the name of a variable already in-scope.

Signed-off-by: John Snow 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cleber Rosa 
---
 scripts/qapi/expr.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 2fcaaa2497a..35695c4c653 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -104,7 +104,7 @@ def check_flags(expr, info):
 
 def check_if(expr, info, source):
 
-def check_if_str(ifcond, info):
+def check_if_str(ifcond):
 if not isinstance(ifcond, str):
 raise QAPISemError(
 info,
@@ -124,9 +124,9 @@ def check_if_str(ifcond, info):
 raise QAPISemError(
 info, "'if' condition [] of %s is useless" % source)
 for elt in ifcond:
-check_if_str(elt, info)
+check_if_str(elt)
 else:
-check_if_str(ifcond, info)
+check_if_str(ifcond)
 expr['if'] = [ifcond]
 
 
-- 
2.29.2

[PATCH v3 02/16] qapi/expr.py: Check for dict instead of OrderedDict

2021-02-22 Thread John Snow

OrderedDict is a subtype of dict, so we can check for a more general
form. These functions do not themselves depend on it being any
particular type.

Signed-off-by: John Snow 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Cleber Rosa 
---
 scripts/qapi/expr.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 35695c4c653..5694c501fa3 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -14,7 +14,6 @@
 # This work is licensed under the terms of the GNU GPL, version 2.
 # See the COPYING file in the top-level directory.
 
-from collections import OrderedDict
 import re
 
 from .common import c_name
@@ -131,7 +130,7 @@ def check_if_str(ifcond):
 
 
 def normalize_members(members):
-if isinstance(members, OrderedDict):
+if isinstance(members, dict):
 for key, arg in members.items():
 if isinstance(arg, dict):
 continue
@@ -162,7 +161,7 @@ def check_type(value, info, source,
 if not allow_dict:
 raise QAPISemError(info, "%s should be a type name" % source)
 
-if not isinstance(value, OrderedDict):
+if not isinstance(value, dict):
 raise QAPISemError(info,
"%s should be an object or type name" % source)
 
-- 
2.29.2

[PATCH 0/2] gitlab-ci.yml: Add jobs to test CFI

2021-02-22 Thread Daniele Buono

For a few months now QEMU has had options to enable compiler-based
control-flow integrity if built with clang.

While this feature has a low maintenance, It's probably still better to
add tests to the CI environment to check that an update doesn't break it.

As an added benefit, this also inherently tests LTO. The patch allow
gitlab testing of:
* --enable-cfi: forward-edge cfi (function pointers)
* --enable-safe-stack: backward-edge cfi (return pointers)

My original intention was to create a single chain of
build -> check -> acceptance, with all the targets compiled by default.
Unfortunately, the resulting artifact is too big and won't be uploaded.

So I split the test in two chains, that should cover all non-deprecated
targets as of today.

I also had to add a small patch to allow a custom selection for make
parallelism. This is because the gitlab runner nodes only have ~3.5GB of
ram, and with the default parallelism (2), in some cases two ld
instances will start working on two binaries and exaust the memory.
By only forcing one make job at a time, this is avoided.

Test runs of the full pipeline are here (cfi-ci branch):
https://gitlab.com/dbuono/qemu/-/pipelines/259931154

Daniele Buono (2):
  gitlab-ci.yml: Allow custom make parallelism
  gitlab-ci.yml: Add jobs to test CFI flags

 .gitlab-ci.yml | 94 +-
 1 file changed, 93 insertions(+), 1 deletion(-)

-- 
2.30.0

[PATCH 2/2] gitlab-ci.yml: Add jobs to test CFI flags

2021-02-22 Thread Daniele Buono

QEMU has had options to enable control-flow integrity features
for a few months now. Add two sets of build/check/acceptance
jobs to ensure the binary produced is working fine.

The two sets allow testing of x86_64 binaries for every target
that is not deprecated.

Signed-off-by: Daniele Buono 
---
 .gitlab-ci.yml | 92 ++
 1 file changed, 92 insertions(+)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 5c198f05d4..f2fea8e2eb 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -479,6 +479,98 @@ clang-user:
   --extra-cflags=-fsanitize=undefined 
--extra-cflags=-fno-sanitize-recover=undefined
 MAKE_CHECK_ARGS: check-unit check-tcg
 
+# Set JOBS=1 because this requires LTO and ld consumes a large amount of 
memory.
+# On gitlab runners, default JOBS of 2 sometimes end up calling 2 lds 
concurrently
+# and triggers an Out-Of-Memory error
+#
+# Because of how slirp is used in QEMU, we need to have CFI also on libslirp.
+# System-wide version in fedora is not compiled with CFI so we recompile it 
using
+# -enable-slirp=git
+#
+# Split in two sets of build/check/acceptance because a single build job for 
every
+# target creates an artifact archive too big to be uploaded
+build-cfi-set1:
+  <<: *native_build_job_definition
+  needs:
+  - job: amd64-fedora-container
+  variables:
+JOBS: 1
+AR: llvm-ar
+IMAGE: fedora
+CONFIGURE_ARGS: --cc=clang --cxx=clang++ --enable-cfi --enable-cfi-debug
+  --enable-safe-stack --enable-slirp=git
+TARGETS: aarch64-softmmu arm-softmmu alpha-softmmu i386-softmmu ppc-softmmu
+  ppc64-softmmu riscv32-softmmu riscv64-softmmu s390x-softmmu sparc-softmmu
+  sparc64-softmmu x86_64-softmmu
+  aarch64-linux-user aarch64_be-linux-user arm-linux-user i386-linux-user
+  ppc64-linux-user ppc64le-linux-user s390x-linux-user x86_64-linux-user
+MAKE_CHECK_ARGS: check-build
+  timeout: 3h
+  artifacts:
+expire_in: 2 days
+paths:
+  - build
+
+check-cfi-set1:
+  <<: *native_test_job_definition
+  needs:
+- job: build-cfi-set1
+  artifacts: true
+  variables:
+IMAGE: fedora
+MAKE_CHECK_ARGS: check
+
+acceptance-cfi-set1:
+  <<: *native_test_job_definition
+  needs:
+- job: build-cfi-set1
+  artifacts: true
+  variables:
+IMAGE: fedora
+MAKE_CHECK_ARGS: check-acceptance
+  <<: *acceptance_definition
+
+build-cfi-set2:
+  <<: *native_build_job_definition
+  needs:
+  - job: amd64-fedora-container
+  variables:
+JOBS: 1
+AR: llvm-ar
+IMAGE: fedora
+CONFIGURE_ARGS: --cc=clang --cxx=clang++ --enable-cfi --enable-cfi-debug
+  --enable-safe-stack --enable-slirp=git
+TARGETS: avr-softmmu cris-softmmu hppa-softmmu m68k-softmmu
+  microblaze-softmmu microblazeel-softmmu mips-softmmu mips64-softmmu
+  mips64el-softmmu mipsel-softmmu moxie-softmmu nios2-softmmu or1k-softmmu
+  rx-softmmu sh4-softmmu sh4eb-softmmu tricore-softmmu xtensa-softmmu
+  xtensaeb-softmmu
+MAKE_CHECK_ARGS: check-build
+  timeout: 3h
+  artifacts:
+expire_in: 2 days
+paths:
+  - build
+
+check-cfi-set2:
+  <<: *native_test_job_definition
+  needs:
+- job: build-cfi-set2
+  artifacts: true
+  variables:
+IMAGE: fedora
+MAKE_CHECK_ARGS: check
+
+acceptance-cfi-set2:
+  <<: *native_test_job_definition
+  needs:
+- job: build-cfi-set2
+  artifacts: true
+  variables:
+IMAGE: fedora
+MAKE_CHECK_ARGS: check-acceptance
+  <<: *acceptance_definition
+
 tsan-build:
   <<: *native_build_job_definition
   variables:
-- 
2.30.0

[PATCH 1/2] gitlab-ci.yml: Allow custom make parallelism

2021-02-22 Thread Daniele Buono

Currently, make parallelism at build time is defined as #cpus+1.
Some build jobs may need (or benefit from) a different number.

An example is builds with LTO where, because of the huge demand
of memory at link time, gitlab runners fails if two linkers are
run concurrently

This patch retains the default value of #cpus+1 but allows setting
the "JOBS" variable to a different number when applying the template

Signed-off-by: Daniele Buono 
---
 .gitlab-ci.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 8b6d495288..5c198f05d4 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -17,7 +17,7 @@ include:
   stage: build
   image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest
   before_script:
-- JOBS=$(expr $(nproc) + 1)
+- JOBS=${JOBS:-$(expr $(nproc) + 1)}
   script:
 - mkdir build
 - cd build
-- 
2.30.0

[PATCH v3 10/10] target/mips: Extract MXU code to new mxu_translate.c file

2021-02-22 Thread Philippe Mathieu-Daudé

Extract 1600+ lines from the big translate.c into a new file.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/mxu_translate.c | 1625 +++
 target/mips/translate.c | 1613 --
 target/mips/meson.build |1 +
 3 files changed, 1626 insertions(+), 1613 deletions(-)
 create mode 100644 target/mips/mxu_translate.c

diff --git a/target/mips/mxu_translate.c b/target/mips/mxu_translate.c
new file mode 100644
index 000..a3a95b50c42
--- /dev/null
+++ b/target/mips/mxu_translate.c
@@ -0,0 +1,1625 @@
+/*
+ *  Ingenic XBurst Media eXtension Unit (MXU) translation routines.
+ *
+ *  Copyright (c) 2004-2005 Jocelyn Mayer
+ *  Copyright (c) 2006 Marius Groeger (FPU operations)
+ *  Copyright (c) 2006 Thiemo Seufer (MIPS32R2 support)
+ *  Copyright (c) 2009 CodeSourcery (MIPS16 and microMIPS support)
+ *  Copyright (c) 2012 Jia Liu & Dongxue Zhang (MIPS ASE DSP support)
+ *
+ * SPDX-License-Identifier: LGPL-2.1-or-later
+ *
+ * Datasheet:
+ *
+ *   "XBurst® Instruction Set Architecture MIPS eXtension/enhanced Unit
+ *   Programming Manual", Ingenic Semiconductor Co, Ltd., revision June 2, 2017
+ */
+
+#include "qemu/osdep.h"
+#include "tcg/tcg-op.h"
+#include "exec/helper-gen.h"
+#include "translate.h"
+
+#if !defined(TARGET_MIPS64)
+
+/*
+ *
+ *   AN OVERVIEW OF MXU EXTENSION INSTRUCTION SET
+ *   
+ *
+ *
+ * MXU (full name: MIPS eXtension/enhanced Unit) is a SIMD extension of MIPS32
+ * instructions set. It is designed to fit the needs of signal, graphical and
+ * video processing applications. MXU instruction set is used in Xburst family
+ * of microprocessors by Ingenic.
+ *
+ * MXU unit contains 17 registers called X0-X16. X0 is always zero, and X16 is
+ * the control register.
+ *
+ *
+ * The notation used in MXU assembler mnemonics
+ * 
+ *
+ *  Register operands:
+ *
+ *   XRa, XRb, XRc, XRd - MXU registers
+ *   Rb, Rc, Rd, Rs, Rt - general purpose MIPS registers
+ *
+ *  Non-register operands:
+ *
+ *   aptn1 - 1-bit accumulate add/subtract pattern
+ *   aptn2 - 2-bit accumulate add/subtract pattern
+ *   eptn2 - 2-bit execute add/subtract pattern
+ *   optn2 - 2-bit operand pattern
+ *   optn3 - 3-bit operand pattern
+ *   sft4  - 4-bit shift amount
+ *   strd2 - 2-bit stride amount
+ *
+ *  Prefixes:
+ *
+ *   Level of parallelism:Operand size:
+ *S - single operation at a time   32 - word
+ *D - two operations in parallel   16 - half word
+ *Q - four operations in parallel   8 - byte
+ *
+ *  Operations:
+ *
+ *   ADD   - Add or subtract
+ *   ADDC  - Add with carry-in
+ *   ACC   - Accumulate
+ *   ASUM  - Sum together then accumulate (add or subtract)
+ *   ASUMC - Sum together then accumulate (add or subtract) with carry-in
+ *   AVG   - Average between 2 operands
+ *   ABD   - Absolute difference
+ *   ALN   - Align data
+ *   AND   - Logical bitwise 'and' operation
+ *   CPS   - Copy sign
+ *   EXTR  - Extract bits
+ *   I2M   - Move from GPR register to MXU register
+ *   LDD   - Load data from memory to XRF
+ *   LDI   - Load data from memory to XRF (and increase the address base)
+ *   LUI   - Load unsigned immediate
+ *   MUL   - Multiply
+ *   MULU  - Unsigned multiply
+ *   MADD  - 64-bit operand add 32x32 product
+ *   MSUB  - 64-bit operand subtract 32x32 product
+ *   MAC   - Multiply and accumulate (add or subtract)
+ *   MAD   - Multiply and add or subtract
+ *   MAX   - Maximum between 2 operands
+ *   MIN   - Minimum between 2 operands
+ *   M2I   - Move from MXU register to GPR register
+ *   MOVZ  - Move if zero
+ *   MOVN  - Move if non-zero
+ *   NOR   - Logical bitwise 'nor' operation
+ *   OR- Logical bitwise 'or' operation
+ *   STD   - Store data from XRF to memory
+ *   SDI   - Store data from XRF to memory (and increase the address base)
+ *   SLT   - Set of less than comparison
+ *   SAD   - Sum of absolute differences
+ *   SLL   - Logical shift left
+ *   SLR   - Logical shift right
+ *   SAR   - Arithmetic shift right
+ *   SAT   - Saturation
+ *   SFL   - Shuffle
+ *   SCOP  - Calculate x’s scope (-1, means x<0; 0, means x==0; 1, means x>0)
+ *   XOR   - Logical bitwise 'exclusive or' operation
+ *
+ *  Suffixes:
+ *
+ *   E - Expand results
+ *   F - Fixed point multiplication
+ *   L - Low part result
+ *   R - Doing rounding
+ *   V - Variable instead of immediate
+ *   W - Combine above L and V
+ *
+ *
+ * The list of MXU instructions grouped by functionality
+ * ~
+ *
+ * Load/Store instructions   Multiplication instructions
+ * ---   ---
+ *
+ *  S32LDD XRa, Rb, s12   S32MADD XRa, XRd, Rs, Rt
+ *  S32STD XRa, Rb, s12   S32MADDU XRa, XRd, Rs, Rt
+ *  S32LDDV XRa, Rb, rc, strd2S32MSUB XRa,

[PATCH v3 08/10] target/mips: Make mxu_translate_init() / decode_ase_mxu() proto public

2021-02-22 Thread Philippe Mathieu-Daudé

To be able to move these functions out of the big translate.c,
make their prototype public.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.h | 6 ++
 target/mips/translate.c | 9 +++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/target/mips/translate.h b/target/mips/translate.h
index 468e29d7578..1801e7f819e 100644
--- a/target/mips/translate.h
+++ b/target/mips/translate.h
@@ -178,6 +178,12 @@ extern TCGv bcond;
 /* MSA */
 void msa_translate_init(void);
 
+/* MXU */
+#if !defined(TARGET_MIPS64)
+void mxu_translate_init(void);
+bool decode_ase_mxu(DisasContext *ctx, uint32_t insn);
+#endif /* !TARGET_MIPS64 */
+
 /* decodetree generated */
 bool decode_isa_rel6(DisasContext *ctx, uint32_t insn);
 bool decode_ase_msa(DisasContext *ctx, uint32_t insn);
diff --git a/target/mips/translate.c b/target/mips/translate.c
index 52a7005e18f..609798a0bee 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -2046,7 +2046,7 @@ static const char * const mxuregnames[] = {
 "XR9",  "XR10", "XR11", "XR12", "XR13", "XR14", "XR15", "MXU_CR",
 };
 
-static void mxu_translate_init(void)
+void mxu_translate_init(void)
 {
 for (unsigned i = 0; i < NUMBER_OF_MXU_REGISTERS - 1; i++) {
 mxu_gpr[i] = tcg_global_mem_new(cpu_env,
@@ -2058,6 +2058,11 @@ static void mxu_translate_init(void)
 offsetof(CPUMIPSState, active_tc.mxu_cr),
 mxuregnames[NUMBER_OF_MXU_REGISTERS - 1]);
 }
+#else /* !defined(TARGET_MIPS64) */
+void mxu_translate_init(void)
+{
+g_assert_not_reached();
+}
 #endif /* defined(TARGET_MIPS64) */
 
 /* General purpose registers moves. */
@@ -25789,7 +25794,7 @@ static void decode_opc_mxu__pool19(DisasContext *ctx)
 }
 }
 
-static bool decode_ase_mxu(DisasContext *ctx, uint32_t insn)
+bool decode_ase_mxu(DisasContext *ctx, uint32_t insn)
 {
 uint32_t opcode = extract32(insn, 0, 6);
 
-- 
2.26.2

[PATCH v3 06/10] target/mips: Use OPC_MUL instead of OPC__MXU_MUL

2021-02-22 Thread Philippe Mathieu-Daudé

We already have a macro and definition to extract / check
the Special2 MUL opcode. Use it instead of the unnecessary
OPC__MXU_MUL macro.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 6f853fcdcce..c897f3900d8 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -1464,7 +1464,6 @@ enum {
  */
 
 enum {
-OPC__MXU_MUL = 0x02,
 OPC_MXU__POOL00  = 0x03,
 OPC_MXU_D16MUL   = 0x08,
 OPC_MXU_D16MAC   = 0x0A,
@@ -25838,7 +25837,7 @@ static bool decode_ase_mxu(DisasContext *ctx, uint32_t 
insn)
  */
 static void decode_opc_mxu(DisasContext *ctx, uint32_t insn)
 {
-if (extract32(insn, 0, 6) == OPC__MXU_MUL) {
+if (MASK_SPECIAL2(insn) == OPC_MUL) {
 uint32_t  rs, rt, rd, op1;
 
 rs = extract32(insn, 21, 5);
-- 
2.26.2

[PATCH v3 05/10] target/mips: Extract decode_ase_mxu() from decode_opc_mxu()

2021-02-22 Thread Philippe Mathieu-Daudé

To easily convert MXU code to decodetree, extract decode_ase_mxu()
from decode_opc_mxu(), making it return a boolean.
We will keep decode_opc_mxu() in the translate.c unit because it
calls gen_arith().

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.c | 45 -
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 9e875fa4a25..6f853fcdcce 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -25777,34 +25777,18 @@ static void decode_opc_mxu__pool19(DisasContext *ctx)
 }
 }
 
-/*
- * Main MXU decoding function
- */
-static void decode_opc_mxu(DisasContext *ctx, uint32_t insn)
+static bool decode_ase_mxu(DisasContext *ctx, uint32_t insn)
 {
 uint32_t opcode = extract32(insn, 0, 6);
 
-if (opcode == OPC__MXU_MUL) {
-uint32_t  rs, rt, rd, op1;
-
-rs = extract32(insn, 21, 5);
-rt = extract32(insn, 16, 5);
-rd = extract32(insn, 11, 5);
-op1 = MASK_SPECIAL2(insn);
-
-gen_arith(ctx, op1, rd, rs, rt);
-
-return;
-}
-
 if (opcode == OPC_MXU_S32M2I) {
 gen_mxu_s32m2i(ctx);
-return;
+return true;
 }
 
 if (opcode == OPC_MXU_S32I2M) {
 gen_mxu_s32i2m(ctx);
-return;
+return true;
 }
 
 {
@@ -25845,6 +25829,29 @@ static void decode_opc_mxu(DisasContext *ctx, uint32_t 
insn)
 gen_set_label(l_exit);
 tcg_temp_free(t_mxu_cr);
 }
+
+return true;
+}
+
+/*
+ * Main MXU decoding function
+ */
+static void decode_opc_mxu(DisasContext *ctx, uint32_t insn)
+{
+if (extract32(insn, 0, 6) == OPC__MXU_MUL) {
+uint32_t  rs, rt, rd, op1;
+
+rs = extract32(insn, 21, 5);
+rt = extract32(insn, 16, 5);
+rd = extract32(insn, 11, 5);
+op1 = MASK_SPECIAL2(insn);
+
+gen_arith(ctx, op1, rd, rs, rt);
+
+return;
+}
+
+decode_ase_mxu(ctx, insn);
 }
 
 #endif /* !defined(TARGET_MIPS64) */
-- 
2.26.2

Re: [RFC PATCH 4/5] Add migration support for KVM guest with MTE

2021-02-22 Thread Richard Henderson

On 2/22/21 1:46 AM, Haibo Xu wrote:
> As I mentioned in the cover later, the reason to let the tag go with the
> memory data together is to make it easier to sync with each other. I think
> if we migratie them separately, it would be hard to keep the tags to sync
> with the data.
Well, maybe, maybe not.  See below.

> Saying if we migration all the data first, then the tags. If the data got
> dirty during the migration of the tag memory, we may need to send the data
> again, or freeze the source VM after data migration? What's more, the
> KVM_GET_DIRTY_LOG API may not be able to differentiate between a tag and
> data changes.
I would certainly expect KVM_GET_DIRTY_LOG to only care about the normal
memory.  That is, pages as viewed by the guest.

I would expect the separate tag_memory block to be private to the host.  If a
normal page is dirty, then we would read the tags into the tag_memory and
manually mark that dirty.  Migration would continue as normal, and eventually
both normal and tag memory would all be clean and migrated.

But I'll admit that it does require that we retain a buffer 1/16 the size of
main memory, which is otherwise unused, and thus this is less than ideal.  So
if we do it your way, we should arrange for tcg to migrate the tag data in the
same way.

I'll still wait for migration experts, of which I am not one.

r~

[PATCH v3 03/10] target/mips: Remove unused CPUMIPSState* from MXU functions

2021-02-22 Thread Philippe Mathieu-Daudé

None of these MXU functions use their CPUMIPSState* env argument,
remove it.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index a53ce6adb9a..6f5ccd667da 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -25694,7 +25694,7 @@ static void gen_mxu_S32ALNI(DisasContext *ctx)
  * ===
  */
 
-static void decode_opc_mxu__pool00(CPUMIPSState *env, DisasContext *ctx)
+static void decode_opc_mxu__pool00(DisasContext *ctx)
 {
 uint32_t opcode = extract32(ctx->opcode, 18, 3);
 
@@ -25718,7 +25718,7 @@ static void decode_opc_mxu__pool00(CPUMIPSState *env, 
DisasContext *ctx)
 }
 }
 
-static void decode_opc_mxu__pool04(CPUMIPSState *env, DisasContext *ctx)
+static void decode_opc_mxu__pool04(DisasContext *ctx)
 {
 uint32_t opcode = extract32(ctx->opcode, 20, 1);
 
@@ -25734,7 +25734,7 @@ static void decode_opc_mxu__pool04(CPUMIPSState *env, 
DisasContext *ctx)
 }
 }
 
-static void decode_opc_mxu__pool16(CPUMIPSState *env, DisasContext *ctx)
+static void decode_opc_mxu__pool16(DisasContext *ctx)
 {
 uint32_t opcode = extract32(ctx->opcode, 18, 3);
 
@@ -25761,7 +25761,7 @@ static void decode_opc_mxu__pool16(CPUMIPSState *env, 
DisasContext *ctx)
 }
 }
 
-static void decode_opc_mxu__pool19(CPUMIPSState *env, DisasContext *ctx)
+static void decode_opc_mxu__pool19(DisasContext *ctx)
 {
 uint32_t opcode = extract32(ctx->opcode, 22, 2);
 
@@ -25780,7 +25780,7 @@ static void decode_opc_mxu__pool19(CPUMIPSState *env, 
DisasContext *ctx)
 /*
  * Main MXU decoding function
  */
-static void decode_opc_mxu(CPUMIPSState *env, DisasContext *ctx)
+static void decode_opc_mxu(DisasContext *ctx)
 {
 uint32_t opcode = extract32(ctx->opcode, 0, 6);
 
@@ -25817,7 +25817,7 @@ static void decode_opc_mxu(CPUMIPSState *env, 
DisasContext *ctx)
 
 switch (opcode) {
 case OPC_MXU__POOL00:
-decode_opc_mxu__pool00(env, ctx);
+decode_opc_mxu__pool00(ctx);
 break;
 case OPC_MXU_D16MUL:
 gen_mxu_d16mul(ctx);
@@ -25826,16 +25826,16 @@ static void decode_opc_mxu(CPUMIPSState *env, 
DisasContext *ctx)
 gen_mxu_d16mac(ctx);
 break;
 case OPC_MXU__POOL04:
-decode_opc_mxu__pool04(env, ctx);
+decode_opc_mxu__pool04(ctx);
 break;
 case OPC_MXU_S8LDD:
 gen_mxu_s8ldd(ctx);
 break;
 case OPC_MXU__POOL16:
-decode_opc_mxu__pool16(env, ctx);
+decode_opc_mxu__pool16(ctx);
 break;
 case OPC_MXU__POOL19:
-decode_opc_mxu__pool19(env, ctx);
+decode_opc_mxu__pool19(ctx);
 break;
 default:
 MIPS_INVAL("decode_opc_mxu");
@@ -26995,7 +26995,7 @@ static bool decode_opc_legacy(CPUMIPSState *env, 
DisasContext *ctx)
 #endif
 #if !defined(TARGET_MIPS64)
 if (ctx->insn_flags & ASE_MXU) {
-decode_opc_mxu(env, ctx);
+decode_opc_mxu(ctx);
 break;
 }
 #endif
-- 
2.26.2

[PATCH v3 02/10] target/mips: Remove XBurst Media eXtension Unit dead code

2021-02-22 Thread Philippe Mathieu-Daudé

All these unimplemented MXU opcodes end up calling
gen_reserved_instruction() which is the default switch
case in decode_opc_mxu().

The translate.c file is already big enough and hard to maintain,
remove 1300 lines of unnecessary code and /* TODO */ comments.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.c | 1286 ---
 1 file changed, 1286 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 1f1c5f33c87..a53ce6adb9a 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -1464,70 +1464,16 @@ enum {
  */
 
 enum {
-OPC_MXU_S32MADD  = 0x00,
-OPC_MXU_S32MADDU = 0x01,
 OPC__MXU_MUL = 0x02,
 OPC_MXU__POOL00  = 0x03,
-OPC_MXU_S32MSUB  = 0x04,
-OPC_MXU_S32MSUBU = 0x05,
-OPC_MXU__POOL01  = 0x06,
-OPC_MXU__POOL02  = 0x07,
 OPC_MXU_D16MUL   = 0x08,
-OPC_MXU__POOL03  = 0x09,
 OPC_MXU_D16MAC   = 0x0A,
-OPC_MXU_D16MACF  = 0x0B,
-OPC_MXU_D16MADL  = 0x0C,
-OPC_MXU_S16MAD   = 0x0D,
-OPC_MXU_Q16ADD   = 0x0E,
-OPC_MXU_D16MACE  = 0x0F,
 OPC_MXU__POOL04  = 0x10,
-OPC_MXU__POOL05  = 0x11,
-OPC_MXU__POOL06  = 0x12,
-OPC_MXU__POOL07  = 0x13,
-OPC_MXU__POOL08  = 0x14,
-OPC_MXU__POOL09  = 0x15,
-OPC_MXU__POOL10  = 0x16,
-OPC_MXU__POOL11  = 0x17,
-OPC_MXU_D32ADD   = 0x18,
-OPC_MXU__POOL12  = 0x19,
-/* not assigned 0x1A */
-OPC_MXU__POOL13  = 0x1B,
-OPC_MXU__POOL14  = 0x1C,
-OPC_MXU_Q8ACCE   = 0x1D,
-/* not assigned 0x1E */
-/* not assigned 0x1F */
-/* not assigned 0x20 */
-/* not assigned 0x21 */
 OPC_MXU_S8LDD= 0x22,
-OPC_MXU_S8STD= 0x23,
-OPC_MXU_S8LDI= 0x24,
-OPC_MXU_S8SDI= 0x25,
-OPC_MXU__POOL15  = 0x26,
 OPC_MXU__POOL16  = 0x27,
-OPC_MXU__POOL17  = 0x28,
-/* not assigned 0x29 */
-OPC_MXU_S16LDD   = 0x2A,
-OPC_MXU_S16STD   = 0x2B,
-OPC_MXU_S16LDI   = 0x2C,
-OPC_MXU_S16SDI   = 0x2D,
 OPC_MXU_S32M2I   = 0x2E,
 OPC_MXU_S32I2M   = 0x2F,
-OPC_MXU_D32SLL   = 0x30,
-OPC_MXU_D32SLR   = 0x31,
-OPC_MXU_D32SARL  = 0x32,
-OPC_MXU_D32SAR   = 0x33,
-OPC_MXU_Q16SLL   = 0x34,
-OPC_MXU_Q16SLR   = 0x35,
-OPC_MXU__POOL18  = 0x36,
-OPC_MXU_Q16SAR   = 0x37,
 OPC_MXU__POOL19  = 0x38,
-OPC_MXU__POOL20  = 0x39,
-OPC_MXU__POOL21  = 0x3A,
-OPC_MXU_Q16SCOP  = 0x3B,
-OPC_MXU_Q8MADL   = 0x3C,
-OPC_MXU_S32SFL   = 0x3D,
-OPC_MXU_Q8SAD= 0x3E,
-/* not assigned 0x3F */
 };
 
 
@@ -1541,39 +1487,6 @@ enum {
 OPC_MXU_D16MIN   = 0x03,
 OPC_MXU_Q8MAX= 0x04,
 OPC_MXU_Q8MIN= 0x05,
-OPC_MXU_Q8SLT= 0x06,
-OPC_MXU_Q8SLTU   = 0x07,
-};
-
-/*
- * MXU pool 01
- */
-enum {
-OPC_MXU_S32SLT   = 0x00,
-OPC_MXU_D16SLT   = 0x01,
-OPC_MXU_D16AVG   = 0x02,
-OPC_MXU_D16AVGR  = 0x03,
-OPC_MXU_Q8AVG= 0x04,
-OPC_MXU_Q8AVGR   = 0x05,
-OPC_MXU_Q8ADD= 0x07,
-};
-
-/*
- * MXU pool 02
- */
-enum {
-OPC_MXU_S32CPS   = 0x00,
-OPC_MXU_D16CPS   = 0x02,
-OPC_MXU_Q8ABD= 0x04,
-OPC_MXU_Q16SAT   = 0x06,
-};
-
-/*
- * MXU pool 03
- */
-enum {
-OPC_MXU_D16MULF  = 0x00,
-OPC_MXU_D16MULE  = 0x01,
 };
 
 /*
@@ -1584,136 +1497,17 @@ enum {
 OPC_MXU_S32LDDR  = 0x01,
 };
 
-/*
- * MXU pool 05
- */
-enum {
-OPC_MXU_S32STD   = 0x00,
-OPC_MXU_S32STDR  = 0x01,
-};
-
-/*
- * MXU pool 06
- */
-enum {
-OPC_MXU_S32LDDV  = 0x00,
-OPC_MXU_S32LDDVR = 0x01,
-};
-
-/*
- * MXU pool 07
- */
-enum {
-OPC_MXU_S32STDV  = 0x00,
-OPC_MXU_S32STDVR = 0x01,
-};
-
-/*
- * MXU pool 08
- */
-enum {
-OPC_MXU_S32LDI   = 0x00,
-OPC_MXU_S32LDIR  = 0x01,
-};
-
-/*
- * MXU pool 09
- */
-enum {
-OPC_MXU_S32SDI   = 0x00,
-OPC_MXU_S32SDIR  = 0x01,
-};
-
-/*
- * MXU pool 10
- */
-enum {
-OPC_MXU_S32LDIV  = 0x00,
-OPC_MXU_S32LDIVR = 0x01,
-};
-
-/*
- * MXU pool 11
- */
-enum {
-OPC_MXU_S32SDIV  = 0x00,
-OPC_MXU_S32SDIVR = 0x01,
-};
-
-/*
- * MXU pool 12
- */
-enum {
-OPC_MXU_D32ACC   = 0x00,
-OPC_MXU_D32ACCM  = 0x01,
-OPC_MXU_D32ASUM  = 0x02,
-};
-
-/*
- * MXU pool 13
- */
-enum {
-OPC_MXU_Q16ACC   = 0x00,
-OPC_MXU_Q16ACCM  = 0x01,
-OPC_MXU_Q16ASUM  = 0x02,
-};
-
-/*
- * MXU pool 14
- */
-enum {
-OPC_MXU_Q8ADDE   = 0x00,
-OPC_MXU_D8SUM= 0x01,
-OPC_MXU_D8SUMC   = 0x02,
-};
-
-/*
- * MXU pool 15
- */
-enum {
-OPC_MXU_S32MUL   = 0x00,
-OPC_MXU_S32MULU  = 0x01,
-OPC_MXU_S32EXTR  = 0x02,
-OPC_MXU_S32EXTRV = 0x03,
-};
-
 /*
  * MXU pool 16
  */
 enum {
-OPC_MXU_D32SARW  = 0x00,
-OPC_MXU_S32ALN   = 0x01,
 OPC_MXU_S32ALNI  = 0x02,
-OPC_MXU_S32LUI   = 0x03,
 OPC_MXU_S32NOR   = 0x04,
 OPC_MXU_S32AND   = 0x05,
 OPC_MXU_S32OR= 0x06,
 OPC_MXU_S32XOR   = 0x07,
 };
 
-/*
- * MXU pool 17
- */
-enum {
-OPC_MXU_LXB  = 0x00,
-OPC_MXU_LXH  = 0x01,
-OPC_MXU_LXW  = 0x03,
-OPC_MXU_LXBU

[PATCH v3 01/10] target/mips: Rewrite complex ifdef'ry

2021-02-22 Thread Philippe Mathieu-Daudé

No need for this obfuscated ifdef'ry, KISS.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 70891c37cdd..1f1c5f33c87 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28276,13 +28276,16 @@ static bool decode_opc_legacy(CPUMIPSState *env, 
DisasContext *ctx)
 #if defined(TARGET_MIPS64)
 if ((ctx->insn_flags & INSN_R5900) && (ctx->insn_flags & ASE_MMI)) {
 decode_mmi(env, ctx);
-#else
+break;
+}
+#endif
+#if !defined(TARGET_MIPS64)
 if (ctx->insn_flags & ASE_MXU) {
 decode_opc_mxu(env, ctx);
-#endif
-} else {
-decode_opc_special2_legacy(env, ctx);
+break;
 }
+#endif
+decode_opc_special2_legacy(env, ctx);
 break;
 case OPC_SPECIAL3:
 #if defined(TARGET_MIPS64)
-- 
2.26.2

[PATCH v3 04/10] target/mips: Pass instruction opcode to decode_opc_mxu()

2021-02-22 Thread Philippe Mathieu-Daudé

In the next commit we'll make decode_opc_mxu() match decodetree
prototype by returning a boolean. First pass ctx->opcode as an
argument.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 6f5ccd667da..9e875fa4a25 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -25780,17 +25780,17 @@ static void decode_opc_mxu__pool19(DisasContext *ctx)
 /*
  * Main MXU decoding function
  */
-static void decode_opc_mxu(DisasContext *ctx)
+static void decode_opc_mxu(DisasContext *ctx, uint32_t insn)
 {
-uint32_t opcode = extract32(ctx->opcode, 0, 6);
+uint32_t opcode = extract32(insn, 0, 6);
 
 if (opcode == OPC__MXU_MUL) {
 uint32_t  rs, rt, rd, op1;
 
-rs = extract32(ctx->opcode, 21, 5);
-rt = extract32(ctx->opcode, 16, 5);
-rd = extract32(ctx->opcode, 11, 5);
-op1 = MASK_SPECIAL2(ctx->opcode);
+rs = extract32(insn, 21, 5);
+rt = extract32(insn, 16, 5);
+rd = extract32(insn, 11, 5);
+op1 = MASK_SPECIAL2(insn);
 
 gen_arith(ctx, op1, rd, rs, rt);
 
@@ -26995,7 +26995,7 @@ static bool decode_opc_legacy(CPUMIPSState *env, 
DisasContext *ctx)
 #endif
 #if !defined(TARGET_MIPS64)
 if (ctx->insn_flags & ASE_MXU) {
-decode_opc_mxu(ctx);
+decode_opc_mxu(ctx, ctx->opcode);
 break;
 }
 #endif
-- 
2.26.2

[PATCH v3 07/10] target/mips: Introduce mxu_translate_init() helper

2021-02-22 Thread Philippe Mathieu-Daudé

Extract the MXU register initialization code from mips_tcg_init()
as a new mxu_translate_init() helper

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index c897f3900d8..52a7005e18f 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -2045,7 +2045,20 @@ static const char * const mxuregnames[] = {
 "XR1",  "XR2",  "XR3",  "XR4",  "XR5",  "XR6",  "XR7",  "XR8",
 "XR9",  "XR10", "XR11", "XR12", "XR13", "XR14", "XR15", "MXU_CR",
 };
-#endif
+
+static void mxu_translate_init(void)
+{
+for (unsigned i = 0; i < NUMBER_OF_MXU_REGISTERS - 1; i++) {
+mxu_gpr[i] = tcg_global_mem_new(cpu_env,
+offsetof(CPUMIPSState, 
active_tc.mxu_gpr[i]),
+mxuregnames[i]);
+}
+
+mxu_CR = tcg_global_mem_new(cpu_env,
+offsetof(CPUMIPSState, active_tc.mxu_cr),
+mxuregnames[NUMBER_OF_MXU_REGISTERS - 1]);
+}
+#endif /* defined(TARGET_MIPS64) */
 
 /* General purpose registers moves. */
 void gen_load_gpr(TCGv t, int reg)
@@ -28064,16 +28077,7 @@ void mips_tcg_init(void)
"llval");
 
 #if !defined(TARGET_MIPS64)
-for (i = 0; i < NUMBER_OF_MXU_REGISTERS - 1; i++) {
-mxu_gpr[i] = tcg_global_mem_new(cpu_env,
-offsetof(CPUMIPSState,
- active_tc.mxu_gpr[i]),
-mxuregnames[i]);
-}
-
-mxu_CR = tcg_global_mem_new(cpu_env,
-offsetof(CPUMIPSState, active_tc.mxu_cr),
-mxuregnames[NUMBER_OF_MXU_REGISTERS - 1]);
+mxu_translate_init();
 #endif /* !TARGET_MIPS64 */
 }
 
-- 
2.26.2

[PATCH v3 09/10] target/mips: Simplify 64-bit ifdef'ry of MXU code

2021-02-22 Thread Philippe Mathieu-Daudé

Check for 'TARGET_LONG_BITS == 32' and simplify 64-bit ifdef'ry.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.h |  2 --
 target/mips/translate.c | 18 ++
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/mips/translate.h b/target/mips/translate.h
index 1801e7f819e..a807b3d2566 100644
--- a/target/mips/translate.h
+++ b/target/mips/translate.h
@@ -179,10 +179,8 @@ extern TCGv bcond;
 void msa_translate_init(void);
 
 /* MXU */
-#if !defined(TARGET_MIPS64)
 void mxu_translate_init(void);
 bool decode_ase_mxu(DisasContext *ctx, uint32_t insn);
-#endif /* !TARGET_MIPS64 */
 
 /* decodetree generated */
 bool decode_isa_rel6(DisasContext *ctx, uint32_t insn);
diff --git a/target/mips/translate.c b/target/mips/translate.c
index 609798a0bee..68b5dee4bab 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -25850,6 +25850,15 @@ bool decode_ase_mxu(DisasContext *ctx, uint32_t insn)
 return true;
 }
 
+#else /* !defined(TARGET_MIPS64) */
+
+bool decode_ase_mxu(DisasContext *ctx, uint32_t insn)
+{
+return false;
+}
+
+#endif /* defined(TARGET_MIPS64) */
+
 /*
  * Main MXU decoding function
  */
@@ -25871,9 +25880,6 @@ static void decode_opc_mxu(DisasContext *ctx, uint32_t 
insn)
 decode_ase_mxu(ctx, insn);
 }
 
-#endif /* !defined(TARGET_MIPS64) */
-
-
 static void decode_opc_special2_legacy(CPUMIPSState *env, DisasContext *ctx)
 {
 int rs, rt, rd;
@@ -27017,12 +27023,10 @@ static bool decode_opc_legacy(CPUMIPSState *env, 
DisasContext *ctx)
 break;
 }
 #endif
-#if !defined(TARGET_MIPS64)
-if (ctx->insn_flags & ASE_MXU) {
+if ((TARGET_LONG_BITS == 32) && (ctx->insn_flags & ASE_MXU)) {
 decode_opc_mxu(ctx, ctx->opcode);
 break;
 }
-#endif
 decode_opc_special2_legacy(env, ctx);
 break;
 case OPC_SPECIAL3:
@@ -28081,9 +28085,7 @@ void mips_tcg_init(void)
 cpu_llval = tcg_global_mem_new(cpu_env, offsetof(CPUMIPSState, llval),
"llval");
 
-#if !defined(TARGET_MIPS64)
 mxu_translate_init();
-#endif /* !TARGET_MIPS64 */
 }
 
 void restore_state_to_opc(CPUMIPSState *env, TranslationBlock *tb,
-- 
2.26.2

[PATCH v3 00/10] target/mips: Extract MXU code to new mxu_translate.c file

2021-02-22 Thread Philippe Mathieu-Daudé

Hi,

This is a respin of "Extract XBurst Media eXtension Unit
translation routines" v2:
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05889.html

But instead of an included C file (.c.inc) we now have an
independent C unit. We gain faster recompilation time when
hacking translate.c or mxu_translate.c, and we also gain
in code maintainability.

Review should be trivial, with almost no logical code change.

Regards,

Phil.

Philippe Mathieu-Daudé (10):
  target/mips: Rewrite complex ifdef'ry
  target/mips: Remove XBurst Media eXtension Unit dead code
  target/mips: Remove unused CPUMIPSState* from MXU functions
  target/mips: Pass instruction opcode to decode_opc_mxu()
  target/mips: Extract decode_ase_mxu() from decode_opc_mxu()
  target/mips: Use OPC_MUL instead of OPC__MXU_MUL
  target/mips: Introduce mxu_translate_init() helper
  target/mips: Make mxu_translate_init() / decode_ase_mxu() proto public
  target/mips: Simplify 64-bit ifdef'ry of MXU code
  target/mips: Extract MXU code to new mxu_translate.c file

 target/mips/translate.h |4 +
 target/mips/mxu_translate.c | 1625 +++
 target/mips/translate.c | 2909 +--
 target/mips/meson.build |1 +
 4 files changed, 1645 insertions(+), 2894 deletions(-)
 create mode 100644 target/mips/mxu_translate.c

-- 
2.26.2

[Bug 1916394] Re: [git] Cannot build qemu: FAILED: target/hexagon/semantics_generated.pyinc

2021-02-22 Thread briancain

I can't seem to reproduce this locally, I will try to see if I can get
my configuration to match yours more closely.

Does the attached patch mitigate the build issue?  0001-target-hexagon-
fix-meson-build-failure.patch

** Patch added: "0001-target-hexagon-fix-meson-build-failure.patch"
   
https://bugs.launchpad.net/qemu/+bug/1916394/+attachment/5466027/+files/0001-target-hexagon-fix-meson-build-failure.patch

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1916394

Title:
  [git] Cannot build qemu: FAILED:
  target/hexagon/semantics_generated.pyinc

Status in QEMU:
  New

Bug description:
  Hello.

  I'm using Archlinux and I maintain qemu-git AUR package.

  I tried to build Qemu at commit
  4115aec9af2a3de5fa89a0b1daa12debcd7741ff but it stops with this error
  message:

  Found ninja-1.10.2 at /usr/bin/ninja
  [632/9068] Generating semantics_generated.pyinc with a custom command
  FAILED: target/hexagon/semantics_generated.pyinc
  @INPUT@ target/hexagon/semantics_generated.pyinc
  /bin/sh: line 1: @INPUT@: command not found
  [637/9068] Compiling C object 
fsdev/vi...proxy-helper.p/virtfs-proxy-helper.c.o
  ninja: build stopped: subcommand failed.

  ninja version: 1.10.2
  meson version: 0.57.1

  Downgrading meson doesn't change anything.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1916394/+subscriptions

Re: [PATCH V2 1/7] block/rbd: bump librbd requirement to luminous release

2021-02-22 Thread Jason Dillaman

On Mon, Feb 15, 2021 at 8:29 AM Peter Lieven  wrote:
>
> Am 15.02.21 um 13:13 schrieb Kevin Wolf:
> > Am 15.02.2021 um 12:45 hat Peter Lieven geschrieben:
> >> Am 15.02.21 um 12:41 schrieb Daniel P. Berrangé:
> >>> On Mon, Feb 15, 2021 at 12:32:24PM +0100, Peter Lieven wrote:
>  Am 15.02.21 um 11:24 schrieb Daniel P. Berrangé:
> > On Tue, Jan 26, 2021 at 12:25:34PM +0100, Peter Lieven wrote:
> >> even luminous (version 12.2) is unmaintained for over 3 years now.
> >> Bump the requirement to get rid of the ifdef'ry in the code.
> > We have clear rules on when we bump minimum versions, determined by
> > the OS platforms we target:
> >
> >  https://qemu.readthedocs.io/en/latest/system/build-platforms.html
> >
> > At this time RHEL-7 is usually the oldest platform, and it
> > builds with RBD 10.2.5, so we can't bump the version to 12.2.
> >
> > I'm afraid this patch has to be dropped.
>  I have asked exactly this question before I started work on this series 
>  and got reply
> 
>  from Jason that he sees no problem in bumping to a release which is 
>  already unmaintained
> 
>  for 3 years.
> >>> I'm afraid Jason is wrong here.  It doesn't matter what the upstream
> >>> consider the support status to be. QEMU targets what the OS vendors
> >>> ship, and they still consider this to be a supported version.
> >>
> >> Okay, but the whole coroutine stuff would get a total mess with all
> >> the ifdef'ry.
> > Hm, but how are these ifdefs even related to the coroutine conversation?
> > It's a bit more code that you're moving around, but shouldn't it be
> > unchanged from the old code, just moving from an AIO callback to a
> > coroutine? Or am I missing some complications?
>
>
> No, the ifdef's only come back in for the write zeroes part.
>
>
> >
> >> Would it be an option to make a big ifdef in the rbd driver? One with
> >> old code for < 12.0.0 and one
> >>
> >> with new code for >= 12.0.0?
> > I don't think this is a good idea, this would be a huge mess to
> > maintain.
> >
> > The conversion is probably a good idea in general, simply because it's
> > more in line with the rest of the block layer, but I don't think it adds
> > anything per se, so it's hard to justify such duplication with the
> > benefits it brings.
>
>
> I would wait for Jasons comment on the rbd part of the series and then spin a 
> V3
>
> with a for-6.1 tag.

Sorry for the long delay -- I was delayed from being out-of-town. I've
reviewed and play-tested the patches and it looks good for me. I'll
wait for V3 before adding my official review.

>
>
> Peter
>


-- 
Jason

Re: What prevents discarding a cluster during rewrite?

2021-02-22 Thread Vladimir Sementsov-Ogievskiy


23.02.2021 00:30, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

Thinking of how to prevent dereferencing to zero (and discard) of host cluster 
during flush of compressed cache (which I'm working on now), I have a 
question.. What prevents it for normal writes?



I have no idea about why didn't it fail for years.. May be, I'm missing 
something?

I have idea of fixing: increase the refcount of host cluster before write to 
data_file (it's important to increase refacount in same s->lock critical 
section where we get host_offset) and dereference it after write.. It should help. 
Any thoughts?

--
Best regards,
Vladimir

What prevents discarding a cluster during rewrite?

2021-02-22 Thread Vladimir Sementsov-Ogievskiy


Hi all!

Thinking of how to prevent dereferencing to zero (and discard) of host cluster 
during flush of compressed cache (which I'm working on now), I have a 
question.. What prevents it for normal writes?

A simple interactive qemu-io session on master branch:

./qemu-img create -f qcow2 x 1M

[root@kvm build]# ./qemu-io blkdebug::x


do initial write:

qemu-io> write -P 1 0 64K
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.12 sec (556.453 KiB/sec and 8.6946 ops/sec)


rewrite, and break before write (assume long write by fs or hardware for some 
reason)

qemu-io> break write_aio A
qemu-io> aio_write -P 2 0 64K
blkdebug: Suspended request 'A'


OK, we stopped before write. Everything is already allocated on initial write, 
mutex now resumed.. And suddenly we do discard:

qemu-io> discard 0 64K
discard 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.00 sec (146.034 MiB/sec and 2336.5414 ops/sec)


Now, start another write, to another place.. But it will allocate same host 
cluster!!!

qemu-io> write -P 3 128K 64K
wrote 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.08 sec (787.122 KiB/sec and 12.2988 ops/sec)

Check it:

qemu-io> read -P 3 128K 64K
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (188.238 MiB/sec and 3011.8033 ops/sec)


resume our old write:

qemu-io> resume A
blkdebug: Resuming request 'A'
qemu-io> wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0:05:07.10 (213.400382 bytes/sec and 0.0033 ops/sec)


of course it doesn't influence first cluster, as it is discarded:

qemu-io> read -P 2 0 64K
Pattern verification failed at offset 0, 65536 bytes
read 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.00 sec (726.246 MiB/sec and 11619.9352 ops/sec)
qemu-io> read -P 0 0 64K
read 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.00 sec (632.348 MiB/sec and 10117.5661 ops/sec)


But in 3rd cluster data is corrupted now:

qemu-io> read -P 3 128K 64K
Pattern verification failed at offset 131072, 65536 bytes
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (163.922 MiB/sec and 2622.7444 ops/sec)
qemu-io> read -P 2 128K 64K
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (257.058 MiB/sec and 4112.9245 ops/sec


So, that's a classical use-after-free... For user it looks like racy 
write/discard to one cluster may corrupt another cluster... It may be even 
worse, if use-after-free corrupts metadata.

Note, that initial write is significant, as when we do allocate cluster we 
write L2 entry after data write (as I understand), so the race doesn't happen. 
But, if consider compressed writes, they allocate everything before write.. 
Let's check:

[root@kvm build]# ./qemu-img create -f qcow2 x 1M; ./qemu-io blkdebug::x
Formatting 'x', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
qemu-io> break write_compressed A
qemu-io> aio_write -c -P 1 0 64K
qemu-io> compressed: 327680 79
blkdebug: Suspended request 'A'

qemu-io> discard 0 64K
discarded: 327680
discard 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.01 sec (7.102 MiB/sec and 113.6297 ops/sec)
qemu-io> write -P 3 128K 64K
normal cluster alloc: 327680
wrote 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.06 sec (1.005 MiB/sec and 16.0774 ops/sec)
qemu-io> resume A
blkdebug: Resuming request 'A'
qemu-io> wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0:00:15.90 (4.026 KiB/sec and 0.0629 ops/sec)

qemu-io> read -P 3 128K 64K
Pattern verification failed at offset 131072, 65536 bytes
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (237.791 MiB/sec and 3804.6539 ops/sec)


(strange, but seems it didn't fail several times for me.. But now it fails 
several times... Anyway, it's all not good).


--
Best regards,
Vladimir

Re: Editing QEMU POWER Platform wiki page

2021-02-22 Thread Leonardo Augusto Guimarães Garcia


On 2/22/21 8:01 AM, Greg Kurz wrote:

On Thu, 18 Feb 2021 10:16:25 -0300
Leonardo Augusto Guimarães Garcia  wrote:


Hi there,

I would like to edit the wiki page at [0] as it contains some outdated
information. Could anyone that has access to the wiki please help me
create a user so that I can edit it?

0. https://wiki.qemu.org/Documentation/Platforms/POWER


Hi Leo,

User creation isn't publicly available to avoid spam : only an existing
user can create a new account.



Yeah, I saw that. That's why I asked here.



This being said, wiki isn't the preferred
way to expose documentation since there's no review and things ultimately
bitrot. Page [0] you want to update is a perfect example of the mess :
not only it contains irrelevant data but also stuff that is definitely
wrong (e.g. 'compat' cpu property was deprecated in QEMU 5.0 and will
be removed in QEMU 6.0).

Ideally we'd want everything to be in the main QEMU doc and don't
even need a wiki.

On the PowerPC front, the most up-to-date docs are in the QEMU tree:

docs/system/ppc/embedded.rst
docs/system/ppc/powermac.rst
docs/system/ppc/powernv.rst
docs/system/ppc/prep.rst
docs/system/ppc/pseries.rst
docs/system/target-ppc.rst

So I don't know exactly what changes you had in mind, but maybe first
consider to update the main documentation.



I got here because someone pointed to me the wiki is saying that nested 
virtualization is not supported on Power, which is wrong. But I saw many 
other outdated information on the wiki as you pointed out.





On my side, I think I want do ditch all the current content and just put
links to https://qemu.readthedocs.io/en/latest/ instead. I can take care
of that, in which case you wouldn't need an account.



I agree this would be the preferable way. Could you go ahead and do 
that, please, if others agree as well?


Cheers,

Leo




Cheers,

--
Greg

PS:

Cedric reported that we also have a page for non-pseries
platforms:

https://wiki.qemu.org/Documentation/Platforms/PowerPC

I'm Cc'ing some regular contributors for those platforms so
they can  evaluate the bitrotting status of this wiki.


Cheers,

Leo

Re: [PATCH v2 5/5] hw/block/nvme: report non-mdts command size limit for dsm

2021-02-22 Thread Klaus Jensen

On Feb 23 05:55, Keith Busch wrote:
> On Mon, Feb 22, 2021 at 07:47:59PM +0100, Klaus Jensen wrote:
> > +typedef struct NvmeIdCtrlNvm {
> > +uint8_t vsl;
> > +uint8_t wzsl;
> > +uint8_t wusl;
> > +uint8_t dmrl;
> > +uint32_tdmrsl;
> > +uint64_tdmsl;
> > +uint8_t rsvd16[4080];
> > +} NvmeIdCtrlNvm;
> 
> TP 4040a still displays these fields with preceding '...' indicating
> something comes before this. Is that just left-over from the integration
> for TBD offsets, or is there something that still hasn't been accounted
> for?

Good question.

But since the TBDs have been assigned I believe it is just a left-over.
I must admit that I have not cross checked this with all other TPs, but
AFAIK this is the only ratified TP that adds something to the
NVM-specific identify controller data structure.

signature.asc
Description: PGP signature

Re: [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup

2021-02-22 Thread Keith Busch

These look good.

Reviewed-by: Keith Busch

Re: [PATCH v2 5/5] hw/block/nvme: report non-mdts command size limit for dsm

2021-02-22 Thread Keith Busch

On Mon, Feb 22, 2021 at 07:47:59PM +0100, Klaus Jensen wrote:
> +typedef struct NvmeIdCtrlNvm {
> +uint8_t vsl;
> +uint8_t wzsl;
> +uint8_t wusl;
> +uint8_t dmrl;
> +uint32_tdmrsl;
> +uint64_tdmsl;
> +uint8_t rsvd16[4080];
> +} NvmeIdCtrlNvm;

TP 4040a still displays these fields with preceding '...' indicating
something comes before this. Is that just left-over from the integration
for TBD offsets, or is there something that still hasn't been accounted
for?

Re: [PATCH V2 0/6] hw/block/nvme: support namespace attachment

2021-02-22 Thread Klaus Jensen

On Feb 11 01:09, Minwoo Im wrote:
> Hello,
> 
> This series supports namespace attachment: attach and detach.  This is
> the second version series with a fix a bug on choosing a controller to
> attach for a namespace in the attach command handler.
> 
> Since V1:
>   - Fix to take 'ctrl' which is given from the command rather than 'n'.
> (Klaus)
>   - Add a [7/7] patch to support CNS 12h Identify command (Namespace
> Attached Controller list).
> 

Good stuff Minwoo!

For the lot,

Tested-by: Klaus Jensen 
Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

Re: Plugin Address Translations Inconsistent/Incorrect?

2021-02-22 Thread Aaron Lindsay via

On Feb 22 19:30, Alex Bennée wrote:
> Aaron Lindsay  writes:
> > If I call (inside a memory callback):
> >
> > `uint64_t pa = qemu_plugin_hwaddr_device_offset(hwaddr);`
> >
> > I see that `pa` takes the value 0xe0e58760. If, however, I plumb
> > `cpu_get_phys_page_debug` through to the plugin interface and call it
> > like:
> >
> > `pa = cpu_get_phys_page_debug(current_cpu, va);`
> >
> > I see it takes the value 0x120e58760.
> >
> > I notice that 0x120e58760-0xe0e58760 is exactly one gigabyte, which is
> > also the offset of the beginning of RAM for the 'virt' AArch64 machine
> > I'm using. Furthermore, I see the name of the plugin function includes
> > "device_offset", so perhaps this discrepancy is by design. However, it
> > seems awkward to not be able to get a true physical address.
> 
> It certainly is by design. The comment for the helper states:
> 
>   /*
>* The following additional queries can be run on the hwaddr structure
>* to return information about it. For non-IO accesses the device
>* offset will be into the appropriate block of RAM.
>*/
> 
> > I've done some digging and found that inside `qemu_ram_addr_from_host`
> > (called by `qemu_plugin_hwaddr_device_offset`), `block->mr->addr`
> > appears to hold the offset of the beginning of RAM. 
> >
> > Do you think it would be reasonable to modify
> > `qemu_plugin_hwaddr_device_offset` to add the beginning of the RAM block
> > or otherwise return the true physical address (or at least expose a way
> > to find the beginning of it through the plugin interface)?
> 
> Well the problem here is what is the address map? For example if you
> have a secure block of RAM you might have two physical addresses which
> are the same. That said with the current qemu_plugin_hwaddr_device_name
> helper both will get reported as "RAM" so maybe it's not that helpful
> yet.

I don't think I yet understand why this is a problem. It seems to me
that the current implementation of `qemu_plugin_hwaddr_device_offset`
returns offsets which may already be ambiguous without additional
information about the underlying device/memory, and I'm not sure why
translating to full physical addresses would make that worse. It's
possible I'm not correctly interpreting your concern.

> I also worry about what happens if devices get moved around. Do you end
> up with aliasing of address space have a remap of the HW.

Would the `block->mr->addr` field I mentioned above be updated in such a
case?

> That said I think we could add an additional helper to translate a
> hwaddr to a global address space address. I'm open to suggestions of the
> best way to structure this.

Haven't put a ton of thought into it, but what about something like this
(untested):

uint64_t qemu_plugin_hwaddr_phys_addr(const struct qemu_plugin_hwaddr *haddr)
{
#ifdef CONFIG_SOFTMMU
if (haddr) {
if (!haddr->is_io) {
RAMBlock *block;
ram_addr_t offset;

block = qemu_ram_block_from_host((void *) haddr->v.ram.hostaddr, 
false, );
if (!block) {
error_report("Bad ram pointer %"PRIx64"", 
haddr->v.ram.hostaddr);
abort();
}

return block->offset + offset + block->mr->addr;
} else {
MemoryRegionSection *mrs = haddr->v.io.section;
return haddr->v.io.offset + mrs->mr->addr;
}
}
#endif
return 0;
}

The key differences from `qemu_plugin_hwaddr_device_offset` are using
`qemu_ram_block_from_host` directly instead of `qemu_ram_addr_from_host` (to
get a pointer to the RAMBlock), and adding `block->mr->addr` and
`mrs->mr->addr` to the returns for RAM and IO, respectively.

-Aaron

Re: [PATCH v5 3/4] Jobs based on custom runners: docs and gitlab-runner setup playbook

2021-02-22 Thread Wainer dos Santos Moschetta


Hi,

On 2/19/21 6:58 PM, Cleber Rosa wrote:

To have the jobs dispatched to custom runners, gitlab-runner must
be installed, active as a service and properly configured.  The
variables file and playbook introduced here should help with those
steps.

The playbook introduced here covers a number of different Linux
distributions and FreeBSD, and are intended to provide a reproducible
environment.

Signed-off-by: Cleber Rosa 
Reviewed-by: Daniel P. Berrangé 
---
  docs/devel/ci.rst  | 58 ++
  scripts/ci/setup/.gitignore|  1 +
  scripts/ci/setup/gitlab-runner.yml | 65 ++
  scripts/ci/setup/vars.yml.template | 13 ++
  4 files changed, 137 insertions(+)
  create mode 100644 scripts/ci/setup/.gitignore
  create mode 100644 scripts/ci/setup/gitlab-runner.yml
  create mode 100644 scripts/ci/setup/vars.yml.template

diff --git a/docs/devel/ci.rst b/docs/devel/ci.rst
index a556558435..9f9c4bd3f9 100644
--- a/docs/devel/ci.rst
+++ b/docs/devel/ci.rst
@@ -56,3 +56,61 @@ To run the playbook, execute::
  
cd scripts/ci/setup

ansible-playbook -i inventory build-environment.yml
+
+gitlab-runner setup and registration
+
+
+The gitlab-runner agent needs to be installed on each machine that
+will run jobs.  The association between a machine and a GitLab project
+happens with a registration token.  To find the registration token for
+your repository/project, navigate on GitLab's web UI to:
+
+ * Settings (the gears like icon), then
+ * CI/CD, then
+ * Runners, and click on the "Expand" button, then
+ * Under "Set up a specific Runner manually", look for the value under
+   "Use the following registration token during setup"
+
+Copy the ``scripts/ci/setup/vars.yml.template`` file to
+``scripts/ci/setup/vars.yml``.  Then, set the
+``gitlab_runner_registration_token`` variable to the value obtained
+earlier.
+
+.. note:: gitlab-runner is not available from the standard location
+  for all OS and architectures combinations.  For some systems,
+  a custom build may be necessary.  Some builds are avaiable
+  at https://cleber.fedorapeople.org/gitlab-runner/ and this
+  URI may be used as a value on ``vars.yml``

FYI the latest version (13.8.0) provides a s390x build.

+
+To run the playbook, execute::
+
+  cd scripts/ci/setup
+  ansible-playbook -i inventory gitlab-runner.yml
+
+Following the registration, it's necessary to configure the runner tags,
+and optionally other configurations on the GitLab UI.  Navigate to:
+
+ * Settings (the gears like icon), then
+ * CI/CD, then
+ * Runners, and click on the "Expand" button, then
+ * "Runners activated for this project", then
+ * Click on the "Edit" icon (next to the "Lock" Icon)
+
+Under tags, add values matching the jobs a runner should run.  For a
+Ubuntu 20.04 aarch64 system, the tags should be set as::
+
+  ubuntu_20.04,aarch64
+
+Because the job definition at ``.gitlab-ci.d/custom-runners.yml``
+would contain::
+
+  ubuntu-20.04-aarch64-all:
+   tags:
+   - ubuntu_20.04
+   - aarch64
+
+It's also recommended to:
+
+ * increase the "Maximum job timeout" to something like ``2h``
+ * uncheck the "Run untagged jobs" check box
+ * give it a better Description
diff --git a/scripts/ci/setup/.gitignore b/scripts/ci/setup/.gitignore
new file mode 100644
index 00..f112d05dd0
--- /dev/null
+++ b/scripts/ci/setup/.gitignore
@@ -0,0 +1 @@
+vars.yml
\ No newline at end of file
diff --git a/scripts/ci/setup/gitlab-runner.yml 
b/scripts/ci/setup/gitlab-runner.yml
new file mode 100644
index 00..ab1944965f
--- /dev/null
+++ b/scripts/ci/setup/gitlab-runner.yml
@@ -0,0 +1,65 @@
+---
+- name: Installation of gitlab-runner
+  hosts: all
+  vars_files:
+- vars.yml
+  tasks:
+- debug:
+msg: 'Checking for a valid GitLab registration token'
+  failed_when: "gitlab_runner_registration_token == 
'PLEASE_PROVIDE_A_VALID_TOKEN'"
+
+- name: Checks the availability of official gitlab-runner builds in the 
archive
+  uri:
+url: https://s3.amazonaws.com/gitlab-runner-downloads/v{{ 
gitlab_runner_version  }}/binaries/gitlab-runner-linux-386


Where it checks for 386 then later it uses gitlab_runner_arch (amd64 by 
default). It is not consistent.


Also, why not use ansible_machine + jinja2 to convert x86_64 -> amd64, 
aarch64 -> arm64...etc?



+method: HEAD
+status_code:
+  - 200
+  - 403
+  register: gitlab_runner_available_archive
+
+- name: Update base url
+  set_fact:
+gitlab_runner_base_url: 
https://s3.amazonaws.com/gitlab-runner-downloads/v{{ gitlab_runner_version  
}}/binaries/gitlab-runner-
+  when: gitlab_runner_available_archive.status == 200
+- debug:
+msg: Base gitlab-runner url is {{ gitlab_runner_base_url  }}
+
+- name: Create a group for the gitlab-runner service
+  group:
+name: gitlab-runner
+
+- name:

Re: [PATCH V2 6/7] hw/block/nvme: support namespace attachment command

2021-02-22 Thread Klaus Jensen

On Feb 11 01:09, Minwoo Im wrote:
> This patch supports Namespace Attachment command for the pre-defined
> nvme-ns device nodes.  Of course, attach/detach namespace should only be
> supported in case 'subsys' is given.  This is because if we detach a
> namespace from a controller, somebody needs to manage the detached, but
> allocated namespace in the NVMe subsystem.
> 
> Signed-off-by: Minwoo Im 
> ---
>  hw/block/nvme-subsys.h | 10 +++
>  hw/block/nvme.c| 59 ++
>  hw/block/nvme.h|  5 
>  hw/block/trace-events  |  2 ++
>  include/block/nvme.h   |  5 
>  5 files changed, 81 insertions(+)
> 
> diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
> index 14627f9ccb41..ef4bec928eae 100644
> --- a/hw/block/nvme-subsys.h
> +++ b/hw/block/nvme-subsys.h
> @@ -30,6 +30,16 @@ typedef struct NvmeSubsystem {
>  int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
>  int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
>  
> +static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
> +uint32_t cntlid)
> +{
> +if (!subsys) {
> +return NULL;
> +}
> +
> +return subsys->ctrls[cntlid];
> +}
> +
>  /*
>   * Return allocated namespace of the specified nsid in the subsystem.
>   */
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 697368a6ae0c..71bcd66f1956 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -183,6 +183,7 @@ static const uint32_t nvme_cse_acs[256] = {
>  [NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP,
>  [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
>  [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
> +[NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP,
>  };
>  
>  static const uint32_t nvme_cse_iocs_none[256];
> @@ -3766,6 +3767,62 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
>  return NVME_NO_COMPLETE;
>  }
>  
> +static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
> +static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
> +{
> +NvmeNamespace *ns;
> +NvmeCtrl *ctrl;
> +uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {};
> +uint32_t nsid = le32_to_cpu(req->cmd.nsid);
> +uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
> +bool attach = !(dw10 & 0xf);
> +uint16_t *nr_ids = [0];
> +uint16_t *ids = [1];
> +uint16_t ret;
> +int i;
> +
> +trace_pci_nvme_ns_attachment(nvme_cid(req), dw10 & 0xf);
> +
> +ns = nvme_subsys_ns(n->subsys, nsid);
> +if (!ns) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +ret = nvme_dma(n, (uint8_t *)list, 4096,
> +   DMA_DIRECTION_TO_DEVICE, req);
> +if (ret) {
> +return ret;
> +}
> +
> +if (!*nr_ids) {
> +return NVME_NS_CTRL_LIST_INVALID | NVME_DNR;
> +}
> +
> +for (i = 0; i < *nr_ids; i++) {
> +ctrl = nvme_subsys_ctrl(n->subsys, ids[i]);
> +if (!ctrl) {
> +return NVME_NS_CTRL_LIST_INVALID | NVME_DNR;
> +}
> +
> +if (attach) {
> +if (nvme_ns_is_attached(ctrl, ns)) {
> +return NVME_NS_ALREADY_ATTACHED | NVME_DNR;
> +}
> +
> +nvme_ns_attach(ctrl, ns);
> +__nvme_select_ns_iocs(ctrl, ns);
> +} else {
> +if (!nvme_ns_is_attached(ctrl, ns)) {
> +return NVME_NS_NOT_ATTACHED | NVME_DNR;
> +}
> +
> +nvme_ns_detach(ctrl, ns);
> +}
> +}
> +
> +return NVME_SUCCESS;
> +}
> +
>  static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
>  {
>  trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode,
> @@ -3797,6 +3854,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest 
> *req)
>  return nvme_get_feature(n, req);
>  case NVME_ADM_CMD_ASYNC_EV_REQ:
>  return nvme_aer(n, req);
> +case NVME_ADM_CMD_NS_ATTACHMENT:
> +return nvme_ns_attachment(n, req);
>  default:
>  assert(false);
>  }
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 1c7796b20996..5a1ab857d166 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -222,6 +222,11 @@ static inline void nvme_ns_attach(NvmeCtrl *n, 
> NvmeNamespace *ns)
>  n->namespaces[nvme_nsid(ns) - 1] = ns;
>  }
>  
> +static inline void nvme_ns_detach(NvmeCtrl *n, NvmeNamespace *ns)
> +{
> +n->namespaces[nvme_nsid(ns) - 1] = NULL;
> +}
> +
>  static inline NvmeCQueue *nvme_cq(NvmeRequest *req)
>  {
>  NvmeSQueue *sq = req->sq;
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index b6e972d733a6..bf67fe7873d2 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -80,6 +80,8 @@ pci_nvme_aer(uint16_t cid) "cid %"PRIu16""
>  pci_nvme_aer_aerl_exceeded(void) "aerl exceeded"
>  pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 
> 0x%"PRIx8""
>  pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t

[PATCH 2/3] hw/block/nvme: deduplicate bad mdts trace event

2021-02-22 Thread Klaus Jensen

From: Klaus Jensen 

If mdts is exceeded, trace it from a single place.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c   | 6 +-
 hw/block/trace-events | 2 +-
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6a27b28f2c2d..25a7726ca05b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1075,6 +1075,7 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, 
size_t len)
 uint8_t mdts = n->params.mdts;
 
 if (mdts && len > n->page_size << mdts) {
+trace_pci_nvme_err_mdts(len);
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
@@ -1945,7 +1946,6 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest 
*req)
 
 status = nvme_check_mdts(n, len);
 if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), len);
 return status;
 }
 
@@ -2048,7 +2048,6 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
 
 status = nvme_check_mdts(n, data_size);
 if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
 goto invalid;
 }
 
@@ -2116,7 +2115,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 if (!wrz) {
 status = nvme_check_mdts(n, data_size);
 if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
 goto invalid;
 }
 }
@@ -2610,7 +2608,6 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 
 status = nvme_check_mdts(n, data_size);
 if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
 return status;
 }
 
@@ -3052,7 +3049,6 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest 
*req)
 
 status = nvme_check_mdts(n, len);
 if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), len);
 return status;
 }
 
diff --git a/hw/block/trace-events b/hw/block/trace-events
index b04f7a3e1890..e1a85661cf3f 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -114,7 +114,7 @@ pci_nvme_clear_ns_close(uint32_t state, uint64_t slba) 
"zone state=%"PRIu32", sl
 pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", 
slba=%"PRIu64" transitioned to Empty state"
 
 # nvme traces for error conditions
-pci_nvme_err_mdts(uint16_t cid, size_t len) "cid %"PRIu16" len %zu"
+pci_nvme_err_mdts(size_t len) "len %zu"
 pci_nvme_err_req_status(uint16_t cid, uint32_t nsid, uint16_t status, uint8_t 
opc) "cid %"PRIu16" nsid %"PRIu32" status 0x%"PRIx16" opc 0x%"PRIx8""
 pci_nvme_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
 pci_nvme_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
-- 
2.30.1

[PATCH 0/3] hw/block/nvme: mdts/zasl cleanup

2021-02-22 Thread Klaus Jensen

From: Klaus Jensen 

The gist of this series is about aligning the zoned.zasl parameter with
the mdts parameter. I complained about this back when I was reviewing
the zoned series but was shot down. I relented on the size/capacity
debate (and still fully support that), but I never really liked that
ZASL is different from MDTS. Changing the definition makes the
validation code much simpler and, well, it aligns perfectly with the
existing mdts parameter, which is the goal here.

While the current definition of zasl is in master, it has not yet been
released, so this is sort of our last chance to change this before v6.0.

I'll repeat the commit message of [3/3] here for context:

ZASL (Zone Append Size Limit) is defined exactly like MDTS (Maximum Data
Transfer Size), that is, it is a value in units of the minimum memory
page size (CAP.MPSMIN) and is reported as a power of two.

The 'mdts' nvme device parameter is specified as in the spec, but the
'zoned.append_size_limit' parameter is specified in bytes. This is
suboptimal for a number of reasons:

  1. It is just plain confusing wrt. the definition of mdts.
  2. There is a lot of complexity involved in validating the value; it
 must be a power of two, it should be larger than 4k, if it is zero
 we set it internally to mdts, but still report it as zero.
  3. While "hw/block/nvme: improve invalid zasl value reporting"
 slightly improved the handling of the parameter, the validation is
 still wrong; it does not depend on CC.MPS, it depends on
 CAP.MPSMIN. And we are not even checking that it is actually less
 than or equal to MDTS, which is kinda the *one* condition it must
 satisfy.

Fix this by defining zasl exactly like mdts and checking the one thing
that it must satisfy (that it is less than or equal to mdts). Also,
change the default value from 128KiB to 0 (aka, whatever mdts is).

Klaus Jensen (3):
  hw/block/nvme: document 'mdts' nvme device parameter
  hw/block/nvme: deduplicate bad mdts trace event
  hw/block/nvme: align zoned.zasl with mdts

 hw/block/nvme.h   |  4 +--
 hw/block/nvme.c   | 67 ++-
 hw/block/trace-events |  4 +--
 3 files changed, 25 insertions(+), 50 deletions(-)

-- 
2.30.1

[PATCH 1/3] hw/block/nvme: document 'mdts' nvme device parameter

2021-02-22 Thread Klaus Jensen

From: Klaus Jensen 

Document the 'mdts' nvme device parameter.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 1cd82fa3c9fe..6a27b28f2c2d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -63,6 +63,12 @@
  *   completion when there are no outstanding AERs. When the maximum number of
  *   enqueued events are reached, subsequent events will be dropped.
  *
+ * - `mdts`
+ *   Indicates the maximum data transfer size for a command that transfers data
+ *   between host-accessible memory and the controller. The value is specified
+ *   as a power of two (2^n) and is in units of the minimum memory page size
+ *   (CAP.MPSMIN). The default value is 7 (i.e. 512 KiB).
+ *
  * - `zoned.append_size_limit`
  *   The maximum I/O size in bytes that is allowed in Zone Append command.
  *   The default is 128KiB. Since internally this this value is maintained as
-- 
2.30.1

[PATCH 3/3] hw/block/nvme: align zoned.zasl with mdts

2021-02-22 Thread Klaus Jensen

From: Klaus Jensen 

ZASL (Zone Append Size Limit) is defined exactly like MDTS (Maximum Data
Transfer Size), that is, it is a value in units of the minimum memory
page size (CAP.MPSMIN) and is reported as a power of two.

The 'mdts' nvme device parameter is specified as in the spec, but the
'zoned.append_size_limit' parameter is specified in bytes. This is
suboptimal for a number of reasons:

  1. It is just plain confusing wrt. the definition of mdts.
  2. There is a lot of complexity involved in validating the value; it
 must be a power of two, it should be larger than 4k, if it is zero
 we set it internally to mdts, but still report it as zero.
  3. While "hw/block/nvme: improve invalid zasl value reporting"
 slightly improved the handling of the parameter, the validation is
 still wrong; it does not depend on CC.MPS, it depends on
 CAP.MPSMIN. And we are not even checking that it is actually less
 than or equal to MDTS, which is kinda the *one* condition it must
 satisfy.

Fix this by defining zasl exactly like mdts and checking the one thing
that it must satisfy (that it is less than or equal to mdts). Also,
change the default value from 128KiB to 0 (aka, whatever mdts is).

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.h   |  4 +---
 hw/block/nvme.c   | 55 ---
 hw/block/trace-events |  2 +-
 3 files changed, 17 insertions(+), 44 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index cb2b5175f1a1..f45ace0cff5b 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -20,7 +20,7 @@ typedef struct NvmeParams {
 uint32_t aer_max_queued;
 uint8_t  mdts;
 bool use_intel_id;
-uint32_t zasl_bs;
+uint8_t  zasl;
 bool legacy_cmb;
 } NvmeParams;
 
@@ -171,8 +171,6 @@ typedef struct NvmeCtrl {
 QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue;
 int aer_queued;
 
-uint8_t zasl;
-
 NvmeSubsystem   *subsys;
 
 NvmeNamespace   namespace;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 25a7726ca05b..edd0b85c10ce 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -69,13 +69,11 @@
  *   as a power of two (2^n) and is in units of the minimum memory page size
  *   (CAP.MPSMIN). The default value is 7 (i.e. 512 KiB).
  *
- * - `zoned.append_size_limit`
- *   The maximum I/O size in bytes that is allowed in Zone Append command.
- *   The default is 128KiB. Since internally this this value is maintained as
- *   ZASL = log2( / ), some values assigned
- *   to this property may be rounded down and result in a lower maximum ZA
- *   data size being in effect. By setting this property to 0, users can make
- *   ZASL to be equal to MDTS. This property only affects zoned namespaces.
+ * - `zoned.zasl`
+ *   Indicates the maximum data transfer size for the Zone Append command. Like
+ *   `mdts`, the value is specified as a power of two (2^n) and is in units of
+ *   the minimum memory page size (CAP.MPSMIN). The default value is 0 (i.e.
+ *   defaulting to the value of `mdts`).
  *
  * nvme namespace device parameters
  * 
@@ -2135,10 +2133,9 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 goto invalid;
 }
 
-if (nvme_l2b(ns, nlb) > (n->page_size << n->zasl)) {
-trace_pci_nvme_err_append_too_large(slba, nlb, n->zasl);
-status = NVME_INVALID_FIELD;
-goto invalid;
+if (n->params.zasl && data_size > n->page_size << n->params.zasl) {
+trace_pci_nvme_err_zasl(data_size);
+return NVME_INVALID_FIELD | NVME_DNR;
 }
 
 slba = zone->w_ptr;
@@ -3212,9 +3209,8 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, 
NvmeRequest *req)
 if (c->csi == NVME_CSI_NVM) {
 return nvme_rpt_empty_id_struct(n, req);
 } else if (c->csi == NVME_CSI_ZONED) {
-if (n->params.zasl_bs) {
-id.zasl = n->zasl;
-}
+id.zasl = n->params.zasl;
+
 return nvme_dma(n, (uint8_t *), sizeof(id),
 DMA_DIRECTION_FROM_DEVICE, req);
 }
@@ -4088,19 +4084,6 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 nvme_init_sq(>admin_sq, n, n->bar.asq, 0, 0,
  NVME_AQA_ASQS(n->bar.aqa) + 1);
 
-if (!n->params.zasl_bs) {
-n->zasl = n->params.mdts;
-} else {
-if (n->params.zasl_bs < n->page_size) {
-NVME_GUEST_ERR(pci_nvme_err_startfail_zasl_too_small,
-   "Zone Append Size Limit (ZASL) of %d bytes is too "
-   "small; must be at least %d bytes",
-   n->params.zasl_bs, n->page_size);
-return -1;
-}
-n->zasl = 31 - clz32(n->params.zasl_bs / n->page_size);
-}
-
 nvme_set_timestamp(n, 0ULL);
 
 QTAILQ_INIT(>aer_queue);
@@ -4609,17 +4592,10 @@ static void

[PATCH] target/ppc: Fix bcdsub. emulation when result overflows

2021-02-22 Thread Fabiano Rosas

The commit d03b174a83 (target/ppc: simplify bcdadd/sub functions)
meant to simplify some of the code but it inadvertently altered the
way the CR6 field is set after the operation has overflowed.

The CR6 bits are set based on the *unbounded* result of the operation,
so we need to look at the result before returning from bcd_add_mag,
otherwise we will look at 0 when it overflows.

Consider the following subtraction:

v0 = 0x999c (maximum positive BCD value)
v1 = 0x001d (negative one BCD value)
bcdsub. v0,v0,v1,0

The Power ISA 2.07B says:
If the unbounded result is greater than zero, do the following.
  If PS=0, the sign code of the result is set to 0b1100.
  If PS=1, the sign code of the result is set to 0b.
  If the operation overflows, CR field 6 is set to 0b0101. Otherwise,
  CR field 6 is set to 0b0100.

POWER9 hardware:
vr0 = 0x000c (positive zero BCD value)
cr6 = 0b0101 (0x5) (positive, overflow)

QEMU:
vr0 = 0x000c (positive zero BCD value)
cr6 = 0b0011 (0x3) (zero, overflow) <--- wrong

This patch reverts the part of d03b174a83 that introduced the
problem and adds a test-case to avoid further regressions:

before:
$ make run-tcg-tests-ppc64le-linux-user
(...)
  TESTbcdsub on ppc64le
bcdsub: qemu/tests/tcg/ppc64le/bcdsub.c:58: test_bcdsub_gt:
Assertion `(cr >> 4) == ((1 << 2) | (1 << 0))' failed.

Fixes: d03b174a83 (target/ppc: simplify bcdadd/sub functions)
Reported-by: Paul Clarke 
Signed-off-by: Fabiano Rosas 
---
 target/ppc/int_helper.c   |  13 ++-
 tests/tcg/configure.sh|   6 ++
 tests/tcg/ppc64/Makefile.target   |  13 +++
 tests/tcg/ppc64le/Makefile.target |  12 +++
 tests/tcg/ppc64le/bcdsub.c| 130 ++
 5 files changed, 171 insertions(+), 3 deletions(-)
 create mode 100644 tests/tcg/ppc64/Makefile.target
 create mode 100644 tests/tcg/ppc64le/Makefile.target
 create mode 100644 tests/tcg/ppc64le/bcdsub.c

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 0b682a1f94..429de28494 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -2175,14 +2175,17 @@ static int bcd_cmp_mag(ppc_avr_t *a, ppc_avr_t *b)
 return 0;
 }
 
-static void bcd_add_mag(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b, int *invalid,
+static int bcd_add_mag(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b, int *invalid,
int *overflow)
 {
 int carry = 0;
 int i;
+int is_zero = 1;
+
 for (i = 1; i <= 31; i++) {
 uint8_t digit = bcd_get_digit(a, i, invalid) +
 bcd_get_digit(b, i, invalid) + carry;
+is_zero &= (digit == 0);
 if (digit > 9) {
 carry = 1;
 digit -= 10;
@@ -2194,6 +2197,7 @@ static void bcd_add_mag(ppc_avr_t *t, ppc_avr_t *a, 
ppc_avr_t *b, int *invalid,
 }
 
 *overflow = carry;
+return is_zero;
 }
 
 static void bcd_sub_mag(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b, int *invalid,
@@ -2225,14 +2229,15 @@ uint32_t helper_bcdadd(ppc_avr_t *r,  ppc_avr_t *a, 
ppc_avr_t *b, uint32_t ps)
 int sgnb = bcd_get_sgn(b);
 int invalid = (sgna == 0) || (sgnb == 0);
 int overflow = 0;
+int zero = 0;
 uint32_t cr = 0;
 ppc_avr_t result = { .u64 = { 0, 0 } };
 
 if (!invalid) {
 if (sgna == sgnb) {
 result.VsrB(BCD_DIG_BYTE(0)) = bcd_preferred_sgn(sgna, ps);
-bcd_add_mag(, a, b, , );
-cr = bcd_cmp_zero();
+zero = bcd_add_mag(, a, b, , );
+cr = (sgna > 0) ? CRF_GT : CRF_LT;
 } else {
 int magnitude = bcd_cmp_mag(a, b);
 if (magnitude > 0) {
@@ -2255,6 +2260,8 @@ uint32_t helper_bcdadd(ppc_avr_t *r,  ppc_avr_t *a, 
ppc_avr_t *b, uint32_t ps)
 cr = CRF_SO;
 } else if (overflow) {
 cr |= CRF_SO;
+} else if (zero) {
+cr |= CRF_EQ;
 }
 
 *r = result;
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 551c02f469..a0b709948c 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -251,6 +251,12 @@ for target in $target_list; do
 echo "CROSS_CC_HAS_ARMV8_MTE=y" >> $config_target_mak
 fi
 ;;
+ppc*)
+if do_compiler "$target_compiler" $target_compiler_cflags \
+   -mpower8-vector -o $TMPE $TMPC; then
+echo "CROSS_CC_HAS_POWER8_VECTOR=y" >> $config_target_mak
+fi
+;;
 esac
 
 enabled_cross_compilers="$enabled_cross_compilers $target_compiler"
diff --git a/tests/tcg/ppc64/Makefile.target b/tests/tcg/ppc64/Makefile.target
new file mode 100644
index 00..0c6a4585fc
--- /dev/null
+++ b/tests/tcg/ppc64/Makefile.target
@@ -0,0 +1,13 @@
+# -*- Mode: makefile -*-
+#
+# ppc64 specific tweaks
+
+VPATH += $(SRC_PATH)/tests/tcg/ppc64
+VPATH += $(SRC_PATH)/tests/tcg/ppc64le
+
+ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_POWER8_VECTOR),)

Re: [PATCH v2 0/4] improve do_strtosz precision

2021-02-22 Thread Eric Blake

On 2/11/21 2:44 PM, Eric Blake wrote:
> Parsing sizes with only 53 bits of precision is surprising; it's time
> to fix it to use a full 64 bits of precision.
> 
> v1 was here:
> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01800.html
> 
> Since then:
> - split testsuite improvements from code changes [Vladimir]
> - more tests for more corner cases [Vladimir, Rich, Dan]
> - fix handling of '123-45' when endptr is non-NULL [Vladimir]
> - fix handling of '1.k'
> - actually enable deprecation of '0x1k' [Vladimir]
> - include missing deprecation text for rounded fractions
> - improved commit messages
> 
> I'm still not sure I like patch 4, but it's at least worth considering.

Ping. I've also just realized that this series will fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1909185
"The error message of "qemu-img convert -r" should advertise the correct
maximum number"

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: Plugin Address Translations Inconsistent/Incorrect?

2021-02-22 Thread Peter Maydell

On Mon, 22 Feb 2021 at 19:53, Alex Bennée  wrote:
> It certainly is by design. The comment for the helper states:
>
>   /*
>* The following additional queries can be run on the hwaddr structure
>* to return information about it. For non-IO accesses the device
>* offset will be into the appropriate block of RAM.
>*/

That sounds like we're exposing ram_addrs to the plugin. Are we?
I'm not sure that's a good idea, as they're not a guest-relevant
construct.

thanks
-- PMM

Re: [PATCH] linux-user: manage binfmt-misc preserve-arg[0] flag

2021-02-22 Thread Michael Tokarev


22.02.2021 20:09, Laurent Vivier wrote:


Here it is: https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg04639.html


In this case, we don't want to modify QEMU to manage special case based on the 
binary name but
instead use a wrapper:


A wrapper immediately defeats the purpose of the fix-binary flag, unfortunately,
requiring the quemu-foo binary within the chroot again.  Such a wrapper has been
used for quite some time by Suse and others before the fix-binary flag appeared
in kernel. It was the first approach I considered and rejected.

Thanks,

/mjt

Re: [PATCH v2 04/11] hw/arm: Restrit KVM to the virt & versal machines

2021-02-22 Thread BALATON Zoltan


On Fri, 19 Feb 2021, Philippe Mathieu-Daudé wrote:

Restrit KVM to the following ARM machines:


Typo: "Restrict" (also in patch title).

Regards,
BALATON Zoltan


- virt
- xlnx-versal-virt

Signed-off-by: Philippe Mathieu-Daudé 
---
hw/arm/virt.c | 5 +
hw/arm/xlnx-versal-virt.c | 5 +
2 files changed, 10 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 371147f3ae9..8e9861b61a9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2527,6 +2527,10 @@ static HotplugHandler 
*virt_machine_get_hotplug_handler(MachineState *machine,
return NULL;
}

+static const char *const valid_accels[] = {
+"tcg", "kvm", "hvf", NULL
+};
+
/*
 * for arm64 kvm_type [7-0] encodes the requested number of bits
 * in the IPA address space
@@ -2582,6 +2586,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
+mc->valid_accelerators = valid_accels;
mc->kvm_type = virt_kvm_type;
assert(!mc->get_hotplug_handler);
mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index 8482cd61960..d424813cae1 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -610,6 +610,10 @@ static void versal_virt_machine_instance_init(Object *obj)
{
}

+static const char *const valid_accels[] = {
+"tcg", "kvm", NULL
+};
+
static void versal_virt_machine_class_init(ObjectClass *oc, void *data)
{
MachineClass *mc = MACHINE_CLASS(oc);
@@ -621,6 +625,7 @@ static void versal_virt_machine_class_init(ObjectClass *oc, 
void *data)
mc->default_cpus = XLNX_VERSAL_NR_ACPUS;
mc->no_cdrom = true;
mc->default_ram_id = "ddr";
+mc->valid_accelerators = valid_accels;
}

static const TypeInfo versal_virt_machine_init_typeinfo = {

Re: Plugin Address Translations Inconsistent/Incorrect?

2021-02-22 Thread Alex Bennée

Aaron Lindsay  writes:

> Hello,
>
> I've been doing some more work with plugins and found something I didn't
> expect with regards to address translation.
>
> If I call (inside a memory callback):
>
> `uint64_t pa = qemu_plugin_hwaddr_device_offset(hwaddr);`
>
> I see that `pa` takes the value 0xe0e58760. If, however, I plumb
> `cpu_get_phys_page_debug` through to the plugin interface and call it
> like:
>
> `pa = cpu_get_phys_page_debug(current_cpu, va);`
>
> I see it takes the value 0x120e58760.
>
> I notice that 0x120e58760-0xe0e58760 is exactly one gigabyte, which is
> also the offset of the beginning of RAM for the 'virt' AArch64 machine
> I'm using. Furthermore, I see the name of the plugin function includes
> "device_offset", so perhaps this discrepancy is by design. However, it
> seems awkward to not be able to get a true physical address.

It certainly is by design. The comment for the helper states:

  /*
   * The following additional queries can be run on the hwaddr structure
   * to return information about it. For non-IO accesses the device
   * offset will be into the appropriate block of RAM.
   */

> I've done some digging and found that inside `qemu_ram_addr_from_host`
> (called by `qemu_plugin_hwaddr_device_offset`), `block->mr->addr`
> appears to hold the offset of the beginning of RAM. 
>
> Do you think it would be reasonable to modify
> `qemu_plugin_hwaddr_device_offset` to add the beginning of the RAM block
> or otherwise return the true physical address (or at least expose a way
> to find the beginning of it through the plugin interface)?

Well the problem here is what is the address map? For example if you
have a secure block of RAM you might have two physical addresses which
are the same. That said with the current qemu_plugin_hwaddr_device_name
helper both will get reported as "RAM" so maybe it's not that helpful
yet.

I also worry about what happens if devices get moved around. Do you end
up with aliasing of address space have a remap of the HW.

That said I think we could add an additional helper to translate a
hwaddr to a global address space address. I'm open to suggestions of the
best way to structure this.

>
> Thanks!
>
> -Aaron

-- 
Alex Bennée

[PATCH v4 5/5] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state

2021-02-22 Thread Daniel Henrique Barboza

Handling errors in memory hotunplug in the pSeries machine is more
complex than any other device type, because there are all the
complications that other devices has, and more.

For instance, determining a timeout for a DIMM hotunplug must consider
if it's a Hash-MMU or a Radix-MMU guest, because Hash guests takes
longer to hotunplug DIMMs. The size of the DIMM is also a factor, given
that longer DIMMs naturally takes longer to be hotunplugged from the
kernel. And there's also the guest memory usage to be considered: if
there's a process that is consuming memory that would be lost by the
DIMM unplug, the kernel will postpone the unplug process until the
process finishes, and then initiate the regular hotunplug process. The
first two considerations are manageable, but the last one is a deal
breaker.

There is no sane way for the pSeries machine to determine the memory
load in the guest when attempting a DIMM hotunplug - and even if there
was a way, the guest can start using all the RAM in the middle of the
unplug process and invalidate our previous assumptions - and in result
we can't even begin to calculate a timeout for the operation. This means
that we can't implement a viable timeout mechanism for memory unplug in
pSeries.

Going back to why we would consider an unplug timeout, the reason is
that we can't know if the kernel is giving up the unplug. Turns out
that, sometimes, we can. Consider a failed memory hotunplug attempt
where the kernel will error out with the following message:

'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any
removed LMBs'

This happens when there is a LMB that the kernel gave up in removing,
and the LMBs previously marked for removal are now being added back.
This happens in the pseries kernel in [1], dlpar_memory_remove_by_ic()
into dlpar_add_lmb(), and after that update_lmb_associativity_index().
In this function, the kernel is configuring the LMB DRC connector again.
Note that this is a valid usage in LOPAR, as stated in section
"ibm,configure-connector RTAS Call":

'A subsequent sequence of calls to ibm,configure-connector with the same
entry from the “ibm,drc-indexes” or “ibm,drc-info” property will restart
the configuration of devices which were not completely configured.'

We can use this kernel behavior in our favor. If a DRC connector
reconfiguration for a LMB that we marked as unplug pending happens, this
indicates that the kernel changed its mind about the unplug and is
reasserting that it will keep using all the LMBs of the DIMM. In this
case, it's safe to assume that the whole DIMM device unplug was
cancelled.

This patch hops into rtas_ibm_configure_connector() and, in the scenario
described above, clear the unplug state for the DIMM device. This will
not solve all the problems we still have with memory unplug, but it will
cover this case where the kernel reconfigures LMBs after a failed
unplug. We are a bit more resilient, without using an unreliable
timeout, and we didn't make the remaining error cases any worse.

[1] arch/powerpc/platforms/pseries/hotplug-memory.c

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c | 43 ++
 hw/ppc/spapr_drc.c | 10 ++
 include/hw/ppc/spapr.h |  2 ++
 3 files changed, 55 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ecce8abf14..6eaddb12cb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3575,6 +3575,49 @@ static SpaprDimmState 
*spapr_recover_pending_dimm_state(SpaprMachineState *ms,
 return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm);
 }
 
+void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
+   DeviceState *dev)
+{
+SpaprDimmState *ds;
+PCDIMMDevice *dimm;
+SpaprDrc *drc;
+uint32_t nr_lmbs;
+uint64_t size, addr_start, addr;
+int i;
+
+if (!dev) {
+return;
+}
+
+dimm = PC_DIMM(dev);
+ds = spapr_pending_dimm_unplugs_find(spapr, dimm);
+
+/*
+ * 'ds == NULL' would mean that the DIMM doesn't have a pending
+ * unplug state, but one of its DRC is marked as unplug_requested.
+ * This is bad and weird enough to g_assert() out.
+ */
+g_assert(ds);
+
+spapr_pending_dimm_unplugs_remove(spapr, ds);
+
+size = memory_device_get_region_size(MEMORY_DEVICE(dimm), _abort);
+nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
+
+addr_start = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
+  _abort);
+
+addr = addr_start;
+for (i = 0; i < nr_lmbs; i++) {
+drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB,
+  addr / SPAPR_MEMORY_BLOCK_SIZE);
+g_assert(drc);
+
+drc->unplug_requested = false;
+addr += SPAPR_MEMORY_BLOCK_SIZE;
+}
+}
+
 /* Callback to be called during DRC release. */
 void spapr_lmb_release(DeviceState *dev)
 {
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c

[PATCH v4 3/5] spapr_drc.c: introduce unplug_timeout_timer

2021-02-22 Thread Daniel Henrique Barboza

The LoPAR spec provides no way for the guest kernel to report failure of
hotplug/hotunplug events. This wouldn't be bad if those operations were
granted to always succeed, but that's far for the reality.

What ends up happening is that, in the case of a failed hotunplug,
regardless of whether it was a QEMU error or a guest misbehavior, the
pSeries machine is retaining the unplug state of the device in the
running guest.  This state is cleanup in machine reset, where it is
assumed that this state represents a device that is pending unplug, and
the device is hotunpluged from the board. Until the reset occurs, any
hotunplug operation of the same device is forbid because there is a
pending unplug state.

This behavior has at least one undesirable side effect. A long standing
pending unplug state is, more often than not, the result of a hotunplug
error. The user had to dealt with it, since retrying to unplug the
device is noy allowed, and then in the machine reset we're removing the
device from the guest. This means that we're failing the user twice -
failed to hotunplug when asked, then hotunplugged without notice.

Solutions to this problem range between trying to predict when the
hotunplug will fail and forbid the operation from the QEMU layer, from
opening up the IRQ queue to allow for multiple hotunplug attempts, from
telling the users to 'reboot the machine if something goes wrong'. The
first solution is flawed because we can't fully predict guest behavior
from QEMU, the second solution is a trial and error remediation that
counts on a hope that the unplug will eventually succeed, and the third
is ... well.

This patch introduces a crude, but effective solution to hotunplug
errors in the pSeries machine. For each unplug done, we'll timeout after
some time. If a certain amount of time passes, we'll cleanup the
hotunplug state from the machine.  During the timeout period, any unplug
operations in the same device will still be blocked. After that, we'll
assume that the guest failed the operation, and allow the user to try
again. If the timeout is too short we'll prevent legitimate hotunplug
situations to occur, so we'll need to overestimate the regular time an
unplug operation takes to succeed to account that.

The true solution for the hotunplug errors in the pSeries machines is a
PAPR change to allow for the guest to warn the platform about it. For
now, the work done in this timeout design can be used for the new PAPR
'abort hcall' in the future, given that for both cases we'll need code
to cleanup the existing unplug states of the DRCs.

At this moment we're adding the basic wiring of the timer into the DRC.
Next patch will use the timer to timeout failed CPU hotunplugs.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_drc.c | 40 ++
 include/hw/ppc/spapr_drc.h |  4 
 2 files changed, 44 insertions(+)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 67041fb212..27adbc5c30 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -57,6 +57,8 @@ static void spapr_drc_release(SpaprDrc *drc)
 drck->release(drc->dev);
 
 drc->unplug_requested = false;
+timer_del(drc->unplug_timeout_timer);
+
 g_free(drc->fdt);
 drc->fdt = NULL;
 drc->fdt_start_offset = 0;
@@ -370,6 +372,17 @@ static void prop_get_fdt(Object *obj, Visitor *v, const 
char *name,
 } while (fdt_depth != 0);
 }
 
+static void spapr_drc_start_unplug_timeout_timer(SpaprDrc *drc)
+{
+SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+if (drck->unplug_timeout_seconds != 0) {
+timer_mod(drc->unplug_timeout_timer,
+  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+  drck->unplug_timeout_seconds * 1000);
+}
+}
+
 void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
 {
 trace_spapr_drc_attach(spapr_drc_index(drc));
@@ -475,11 +488,23 @@ static bool spapr_drc_needed(void *opaque)
 spapr_drc_unplug_requested(drc);
 }
 
+static int spapr_drc_post_load(void *opaque, int version_id)
+{
+SpaprDrc *drc = opaque;
+
+if (drc->unplug_requested) {
+spapr_drc_start_unplug_timeout_timer(drc);
+}
+
+return 0;
+}
+
 static const VMStateDescription vmstate_spapr_drc = {
 .name = "spapr_drc",
 .version_id = 1,
 .minimum_version_id = 1,
 .needed = spapr_drc_needed,
+.post_load = spapr_drc_post_load,
 .fields  = (VMStateField []) {
 VMSTATE_UINT32(state, SpaprDrc),
 VMSTATE_END_OF_LIST()
@@ -490,6 +515,15 @@ static const VMStateDescription vmstate_spapr_drc = {
 }
 };
 
+static void drc_unplug_timeout_cb(void *opaque)
+{
+SpaprDrc *drc = opaque;
+
+if (drc->unplug_requested) {
+drc->unplug_requested = false;
+}
+}
+
 static void drc_realize(DeviceState *d, Error **errp)
 {
 SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
@@ -512,6 +546,11 @@ static void drc_realize(DeviceState *d, Error **errp)

[PATCH v4 4/5] spapr_drc.c: add hotunplug timeout for CPUs

2021-02-22 Thread Daniel Henrique Barboza

There is a reliable way to make a CPU hotunplug fail in the pseries
machine. Hotplug a CPU A, then offline all other CPUs inside the guest
but A. When trying to hotunplug A the guest kernel will refuse to do it,
because A is now the last online CPU of the guest. PAPR has no 'error
callback' in this situation to report back to the platform, so the guest
kernel will deny the unplug in silent and QEMU will never know what
happened. The unplug pending state of A will remain until the guest is
shutdown or rebooted.

Previous attempts of fixing it (see [1] and [2]) were aimed at trying to
mitigate the effects of the problem. In [1] we were trying to guess
which guest CPUs were online to forbid hotunplug of the last online CPU
in the QEMU layer, avoiding the scenario described above because QEMU is
now failing in behalf of the guest. This is not robust because the last
online CPU of the guest can change while we're in the middle of the
unplug process, and our initial assumptions are now invalid. In [2] we
were accepting that our unplug process is uncertain and the user should
be allowed to spam the IRQ hotunplug queue of the guest in case the CPU
hotunplug fails.

This patch presents another alternative, using the timeout
infrastructure introduced in the previous patch. CPU hotunplugs in the
pSeries machine will now timeout after 15 seconds. This is a long time
for a single CPU unplug to occur, regardless of guest load - although
the user is *strongly* encouraged to *not* hotunplug devices from a
guest under high load - and we can be sure that something went wrong if
it takes longer than that for the guest to release the CPU (the same
can't be said about memory hotunplug - more on that in the next patch).

Timing out the unplug operation will reset the unplug state of the CPU
and allow the user to try it again, regardless of the error situation
that prevented the hotunplug to occur. Of all the not so pretty
fixes/mitigations for CPU hotunplug errors in pSeries, timing out the
operation is an admission that we have no control in the process, and
must assume the worst case if the operation doesn't succeed in a
sensible time frame.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03353.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html

Reported-by: Xujun Ma 
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414
Reviewed-by: David Gibson 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c |  4 
 hw/ppc/spapr_drc.c | 13 +
 include/hw/ppc/spapr_drc.h |  1 +
 3 files changed, 18 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b066df68cb..ecce8abf14 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3724,6 +3724,10 @@ void spapr_core_unplug_request(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 if (!spapr_drc_unplug_requested(drc)) {
 spapr_drc_unplug_request(drc);
 spapr_hotplug_req_remove_by_index(drc);
+} else {
+error_setg(errp, "core-id %d unplug is still pending, %d seconds "
+   "timeout remaining",
+   cc->core_id, spapr_drc_unplug_timeout_remaining_sec(drc));
 }
 }
 
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 27adbc5c30..fd2e45640f 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -409,6 +409,8 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
 
 drc->unplug_requested = true;
 
+spapr_drc_start_unplug_timeout_timer(drc);
+
 if (drc->state != drck->empty_state) {
 trace_spapr_drc_awaiting_quiesce(spapr_drc_index(drc));
 return;
@@ -417,6 +419,16 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
 spapr_drc_release(drc);
 }
 
+int spapr_drc_unplug_timeout_remaining_sec(SpaprDrc *drc)
+{
+if (drc->unplug_requested && timer_pending(drc->unplug_timeout_timer)) {
+return (qemu_timeout_ns_to_ms(drc->unplug_timeout_timer->expire_time) -
+qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL)) / 1000;
+}
+
+return 0;
+}
+
 bool spapr_drc_reset(SpaprDrc *drc)
 {
 SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
@@ -710,6 +722,7 @@ static void spapr_drc_cpu_class_init(ObjectClass *k, void 
*data)
 drck->drc_name_prefix = "CPU ";
 drck->release = spapr_core_release;
 drck->dt_populate = spapr_core_dt_populate;
+drck->unplug_timeout_seconds = 15;
 }
 
 static void spapr_drc_pci_class_init(ObjectClass *k, void *data)
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 38ec4c8091..26599c385a 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -248,6 +248,7 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, 
uint32_t drc_type_mask);
  */
 void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
 void spapr_drc_unplug_request(SpaprDrc *drc);
+int spapr_drc_unplug_timeout_remaining_sec(SpaprDrc *drc);
 
 /*
  * Reset all DRCs, causing pending hot-plug/unplug requests to complete.
-- 
2.29.2

[PATCH v4 2/5] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()

2021-02-22 Thread Daniel Henrique Barboza

spapr_drc_detach() is not the best name for what the function does. The
function does not detach the DRC, it makes an uncommited attempt to do
it.  It'll mark the DRC as pending unplug, via the 'unplug_request'
flag, and only if the DRC state is drck->empty_state it will detach the
DRC, via spapr_drc_release().

This is a contrast with its pair spapr_drc_attach(), where the function
is indeed creating the DRC QOM object. If you know what
spapr_drc_attach() does, you can be misled into thinking that
spapr_drc_detach() is removing the DRC from QEMU internal state, which
isn't true.

The current role of this function is better described as a request for
detach, since there's no guarantee that we're going to detach the DRC in
the end.  Rename the function to spapr_drc_unplug_request to reflect
what is is doing.

The initial idea was to change the name to spapr_drc_detach_request(),
and later on change the unplug_request flag to detach_request. However,
unplug_request is a migratable boolean for a long time now and renaming
it is not worth the trouble. spapr_drc_unplug_request() setting
drc->unplug_request is more natural than spapr_drc_detach_request
setting drc->unplug_request.

Reviewed-by: Greg Kurz 
Reviewed-by: David Gibson 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c | 6 +++---
 hw/ppc/spapr_drc.c | 4 ++--
 hw/ppc/spapr_pci.c | 4 ++--
 hw/ppc/trace-events| 2 +-
 include/hw/ppc/spapr_drc.h | 2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 85fe65f894..b066df68cb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3654,7 +3654,7 @@ static void spapr_memory_unplug_request(HotplugHandler 
*hotplug_dev,
   addr / SPAPR_MEMORY_BLOCK_SIZE);
 g_assert(drc);
 
-spapr_drc_detach(drc);
+spapr_drc_unplug_request(drc);
 addr += SPAPR_MEMORY_BLOCK_SIZE;
 }
 
@@ -3722,7 +3722,7 @@ void spapr_core_unplug_request(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 g_assert(drc);
 
 if (!spapr_drc_unplug_requested(drc)) {
-spapr_drc_detach(drc);
+spapr_drc_unplug_request(drc);
 spapr_hotplug_req_remove_by_index(drc);
 }
 }
@@ -3985,7 +3985,7 @@ static void spapr_phb_unplug_request(HotplugHandler 
*hotplug_dev,
 assert(drc);
 
 if (!spapr_drc_unplug_requested(drc)) {
-spapr_drc_detach(drc);
+spapr_drc_unplug_request(drc);
 spapr_hotplug_req_remove_by_index(drc);
 }
 }
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 555a25517d..67041fb212 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -386,11 +386,11 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
  NULL, 0);
 }
 
-void spapr_drc_detach(SpaprDrc *drc)
+void spapr_drc_unplug_request(SpaprDrc *drc)
 {
 SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
-trace_spapr_drc_detach(spapr_drc_index(drc));
+trace_spapr_drc_unplug_request(spapr_drc_index(drc));
 
 g_assert(drc->dev);
 
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index f1c7479816..b00e9609ae 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1723,12 +1723,12 @@ static void spapr_pci_unplug_request(HotplugHandler 
*plug_handler,
  * functions, even if their unplug weren't requested
  * beforehand.
  */
-spapr_drc_detach(func_drc);
+spapr_drc_unplug_request(func_drc);
 }
 }
 }
 
-spapr_drc_detach(drc);
+spapr_drc_unplug_request(drc);
 
 /* if this isn't func 0, defer unplug event. otherwise signal removal
  * for all present functions
diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
index 1e91984526..b4bbfbb013 100644
--- a/hw/ppc/trace-events
+++ b/hw/ppc/trace-events
@@ -50,7 +50,7 @@ spapr_drc_set_allocation_state(uint32_t index, int state) 
"drc: 0x%"PRIx32", sta
 spapr_drc_set_allocation_state_finalizing(uint32_t index) "drc: 0x%"PRIx32
 spapr_drc_set_configured(uint32_t index) "drc: 0x%"PRIx32
 spapr_drc_attach(uint32_t index) "drc: 0x%"PRIx32
-spapr_drc_detach(uint32_t index) "drc: 0x%"PRIx32
+spapr_drc_unplug_request(uint32_t index) "drc: 0x%"PRIx32
 spapr_drc_awaiting_quiesce(uint32_t index) "drc: 0x%"PRIx32
 spapr_drc_reset(uint32_t index) "drc: 0x%"PRIx32
 spapr_drc_realize(uint32_t index) "drc: 0x%"PRIx32
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 8982927d5c..02a63b3666 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -243,7 +243,7 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, 
uint32_t drc_type_mask);
  * beforehand (eg. check drc->dev at pre-plug).
  */
 void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
-void spapr_drc_detach(SpaprDrc *drc);
+void spapr_drc_unplug_request(SpaprDrc *drc);
 
 /*
  * Reset all DRCs, causing

[PATCH v4 0/5] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration

2021-02-22 Thread Daniel Henrique Barboza

Hi,

This new version contains fixes proposed during the review of v3.
Patches were rebased on top of David's ppc-for-6.0.


changes from v3:
- former patch 1: already pushed to ppc-for-6.0
- former patch 2: dropped
- all patches: commit message trimmed to < 76 chars per line
- all patches: added R-bs from previous review
- patch 3:
* removed the migratable state of the unplug timer
* added a 'spapr_drc_start_unplug_timeout_timer()' helper to start the timer
* added a .post_load implementation to vmstate_spapr_drc, pointed to
  a new spapr_drc_post_load() function
* spapr_drc_post_load() starts the DRC unplug timer from the beginning using
  spapr_drc_start_unplug_timeout_timer()

- patch 4:
* use spapr_drc_start_unplug_timeout_timer() to start the timer in 
spapr_drc_unplug_request()
(To David: I kept your Reviewed-by in this patch despite this change - feel 
free
 to review it again)

- patch 5:
* removed the 'DIMM' wording when referring to kernel internals
* move the g_assert() to spapr_clear_pending_dimm_unplug_state()
* do not g_assert(dev), but g_assert(ds) if dev != NULL inside
  spapr_clear_pending_dimm_unplug_state()

- v3 link: https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg04196.html

Daniel Henrique Barboza (5):
  spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
  spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
  spapr_drc.c: introduce unplug_timeout_timer
  spapr_drc.c: add hotunplug timeout for CPUs
  spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state

 hw/ppc/spapr.c | 53 ++--
 hw/ppc/spapr_drc.c | 99 +++---
 hw/ppc/spapr_pci.c |  4 +-
 hw/ppc/trace-events|  2 +-
 include/hw/ppc/spapr.h |  2 +
 include/hw/ppc/spapr_drc.h |  7 ++-
 6 files changed, 142 insertions(+), 25 deletions(-)

-- 
2.29.2

[PATCH v4 1/5] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable

2021-02-22 Thread Daniel Henrique Barboza

When moving a physical DRC to "Available", drc_isolate_physical() will
move the DRC state to STATE_PHYSICAL_POWERON and, if the DRC is marked
for unplug, call spapr_drc_detach(). For physical DRCs,
drck->empty_state is STATE_PHYSICAL_POWERON, meaning that we're sure
that spapr_drc_detach() will end up calling spapr_drc_release() in the
end.

Likewise, for logical DRCs, drc_set_unusable will move the DRC to
"Unusable" state, setting drc->state to STATE_LOGICAL_UNUSABLE, which is
the drck->empty_state for logical DRCs. spapr_drc_detach() will call
spapr_drc_release() in this case as well.

In both scenarios, spapr_drc_detach() is being used as a
spapr_drc_release(), wrapper, where we also set unplug_requested (which
is already true, otherwise spapr_drc_detach() wouldn't be called in the
first place) and check if drc->state == drck->empty_state, which we also
know it's guaranteed to be true because we just set it.

Just use spapr_drc_release() in these functions to be clear of our
intentions in both these functions.

Reviewed-by: Greg Kurz 
Reviewed-by: David Gibson 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_drc.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 84bd3c881f..555a25517d 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -50,6 +50,20 @@ uint32_t spapr_drc_index(SpaprDrc *drc)
 | (drc->id & DRC_INDEX_ID_MASK);
 }
 
+static void spapr_drc_release(SpaprDrc *drc)
+{
+SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+drck->release(drc->dev);
+
+drc->unplug_requested = false;
+g_free(drc->fdt);
+drc->fdt = NULL;
+drc->fdt_start_offset = 0;
+object_property_del(OBJECT(drc), "device");
+drc->dev = NULL;
+}
+
 static uint32_t drc_isolate_physical(SpaprDrc *drc)
 {
 switch (drc->state) {
@@ -68,7 +82,7 @@ static uint32_t drc_isolate_physical(SpaprDrc *drc)
 if (drc->unplug_requested) {
 uint32_t drc_index = spapr_drc_index(drc);
 trace_spapr_drc_set_isolation_state_finalizing(drc_index);
-spapr_drc_detach(drc);
+spapr_drc_release(drc);
 }
 
 return RTAS_OUT_SUCCESS;
@@ -209,7 +223,7 @@ static uint32_t drc_set_unusable(SpaprDrc *drc)
 if (drc->unplug_requested) {
 uint32_t drc_index = spapr_drc_index(drc);
 trace_spapr_drc_set_allocation_state_finalizing(drc_index);
-spapr_drc_detach(drc);
+spapr_drc_release(drc);
 }
 
 return RTAS_OUT_SUCCESS;
@@ -372,20 +386,6 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
  NULL, 0);
 }
 
-static void spapr_drc_release(SpaprDrc *drc)
-{
-SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
-
-drck->release(drc->dev);
-
-drc->unplug_requested = false;
-g_free(drc->fdt);
-drc->fdt = NULL;
-drc->fdt_start_offset = 0;
-object_property_del(OBJECT(drc), "device");
-drc->dev = NULL;
-}
-
 void spapr_drc_detach(SpaprDrc *drc)
 {
 SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
-- 
2.29.2

Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions

2021-02-22 Thread David Hildenbrand


The main motivation is to let listener decide how it wants to handle the
memory region. For example, for vhost, vdpa, kvm, ... I only want a
single region, not separate ones for each and every populated range,
punching out discarded ranges. Note that there are cases (i.e.,
anonymous memory), where it's even valid for the guest to read discarded
memory.


Yes, I agree with that.  You would still have the same
region-add/region_nop/region_del callbacks for KVM and friends; on top
of that you would have region_populate/region_discard callbacks for VFIO.


I think instead of region_populate/region_discard we would want
individual region_add/region_del when populating/discarding for all
MemoryListeners that opt-in somehow (e.g., VFIO, dump-guest-memory,
...). Similarly, we would want to call log_sync()/log_clear() then only
for these parts.

But what happens when I populate/discard some memory? I don't want to
trigger an address space transaction (begin()...region_nop()...commit())
- whenever I populate/discard memory (e.g., in 2 MB granularity).
Especially not, if nothing might have changed for most other
MemoryListeners.


Right, that was the reason why I was suggesting different callbacks.
For the VFIO listener, which doesn't have begin or commit callbacks, I
think you could just rename region_add to region_populate, and point
both region_del and region_discard to the existing region_del commit.

Calling log_sync/log_clear only for populated parts also makes sense.
log_sync and log_clear do not have to be within begin/commit, so you can
change the semantics to call them more than once.


So I looked at the simplest of all cases (discard) and I am not convinced yet
that this is the right approach. I can understand why it looks like this fits
into the MemoryListener, but I am not sure if gives us any real benefits or
makes the code any clearer (I'd even say it's the contrary).


+void memory_region_notify_discard(MemoryRegion *mr, hwaddr offset,
+  hwaddr size)
+{
+hwaddr mr_start, mr_end;
+MemoryRegionSection mrs;
+MemoryListener *listener;
+AddressSpace *as;
+FlatView *view;
+FlatRange *fr;
+
+QTAILQ_FOREACH(listener, _listeners, link) {
+if (!listener->region_discard) {
+continue;
+}
+as = listener->address_space;
+view = address_space_get_flatview(as);
+FOR_EACH_FLAT_RANGE(fr, view) {
+if (fr->mr != mr) {
+continue;
+}
+
+mrs = section_from_flat_range(fr, view);
+
+mr_start = MAX(mrs.offset_within_region, offset);
+mr_end = MIN(offset + size,
+  mrs.offset_within_region + int128_get64(mrs.size));
+mr_end = MIN(mr_end, offset + size);
+
+if (mr_start >= mr_end) {
+continue;
+}
+
+mrs.offset_within_address_space += mr_start -
+   mrs.offset_within_region;
+mrs.offset_within_region = mr_start;
+mrs.size = int128_make64(mr_end - mr_start);
+listener->region_discard(listener, );
+}
+flatview_unref(view);
+}
+}

Maybe I am missing something important. This looks highly inefficient.

1. Although we know the memory region we have to walk over the whole address
   space ... over and over again for each potential listener.

2. Even without any applicable listeners (=> ! VFIO) we loop over all listeners.
   There are ways around that but it doesn't make the code nicer IMHO.

3. In the future I am planning on sending populate/discard events
   without the BQL (in my approach, synchronizing internally against
   register/unregister/populate/discard ...). I don't see an easy way
   to achieve that here. I think we are required to hold the BQL on any
   updates.

memory_region_notify_populate() gets quite ugly when we realize halfway that
we have to revert what we already did by notifying about already populated
pieces ...

--
Thanks,

David / dhildenb

[PATCH 0/3] gitlab-pipeline-status script: provide more information on errors

2021-02-22 Thread Cleber Rosa

When things go wrong with the GitLab API requests, it's useful to give
users more information about the possible causes.

Cleber Rosa (3):
  scripts/ci/gitlab-pipeline-status: split utlity function for HTTP GET
  scripts/ci/gitlab-pipeline-status: give more information on failures
  scripts/ci/gitlab-pipeline-status: give more info when pipeline not
found

 scripts/ci/gitlab-pipeline-status | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

-- 
2.25.4

1 2 3 4 5 >

1 - 100 of 430 matches

Mail list logo