[PATCH v2 1/2] vnc: fix VNC artifacts

2020-01-20 Thread Cameron Esfahani via
Patch de3f7de7f4e257ce44cdabb90f5f17ee99624557 was too simplistic in its
implementation: it didn't account for the ZLIB z_stream mutating with
each compression.  Because of the mutation, simply resetting the output
buffer's offset wasn't sufficient to "rewind" the operation.  The mutated
z_stream would generate future zlib blocks which referred to symbols in
past blocks which weren't sent.  This would lead to artifacting.

This reverts commit de3f7de7f4e257ce44cdabb90f5f17ee99624557.

Fixes:  ("vnc: allow fall back to RAW encoding")
Signed-off-by: Cameron Esfahani 
---
 ui/vnc.c | 20 ++--
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 4100d6e404..3e8d1f1207 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -898,8 +898,6 @@ int vnc_raw_send_framebuffer_update(VncState *vs, int x, 
int y, int w, int h)
 int vnc_send_framebuffer_update(VncState *vs, int x, int y, int w, int h)
 {
 int n = 0;
-bool encode_raw = false;
-size_t saved_offs = vs->output.offset;
 
 switch(vs->vnc_encoding) {
 case VNC_ENCODING_ZLIB:
@@ -922,24 +920,10 @@ int vnc_send_framebuffer_update(VncState *vs, int x, int 
y, int w, int h)
 n = vnc_zywrle_send_framebuffer_update(vs, x, y, w, h);
 break;
 default:
-encode_raw = true;
+vnc_framebuffer_update(vs, x, y, w, h, VNC_ENCODING_RAW);
+n = vnc_raw_send_framebuffer_update(vs, x, y, w, h);
 break;
 }
-
-/* If the client has the same pixel format as our internal buffer and
- * a RAW encoding would need less space fall back to RAW encoding to
- * save bandwidth and processing power in the client. */
-if (!encode_raw && vs->write_pixels == vnc_write_pixels_copy &&
-12 + h * w * VNC_SERVER_FB_BYTES <= (vs->output.offset - saved_offs)) {
-vs->output.offset = saved_offs;
-encode_raw = true;
-}
-
-if (encode_raw) {
-vnc_framebuffer_update(vs, x, y, w, h, VNC_ENCODING_RAW);
-n = vnc_raw_send_framebuffer_update(vs, x, y, w, h);
-}
-
 return n;
 }
 
-- 
2.24.0




Re: [PATCH qemu v5] spapr: Kill SLOF

2020-01-20 Thread David Gibson
On Fri, Jan 10, 2020 at 01:09:25PM +1100, Alexey Kardashevskiy wrote:
> The Petitboot bootloader is way more advanced than SLOF is ever going to
> be as Petitboot comes with the full-featured Linux kernel with all
> the drivers, and initramdisk with quite user friendly interface.
> The problem with ditching SLOF is that an unmodified pseries kernel can
> either start via:
> 1. kexec, this requires presence of RTAS and skips
> ibm,client-architecture-support entirely;
> 2. normal boot, this heavily relies on the OF1275 client interface to
> fetch the device tree and do early setup (claim memory).
> 
> This adds a new bios-less mode to the pseries machine: "bios=on|off".
> When enabled, QEMU does not load SLOF and jumps to the kernel from
> "-kernel".

I don't love the name "bios" for this flag, since BIOS tends to refer
to old-school x86 firmware.  Given the various plans we're considering
the future, I'd suggest "firmware=slof" for the current in-guest SLOF
mode, and say "firmware=vof" (Virtual Open Firmware) for the new
model.  We can consider firmware=petitboot or firmware=none (for
direct kexec-style boot into -kernel) or whatever in the future

> The client interface is implemented exactly as RTAS - a 20 bytes blob,
> right next after the RTAS blob. The entry point is passed to the kernel
> via GPR5.
> 
> This implements a handful of client interface methods just to get going.
> In particular, this implements the device tree fetching,
> ibm,client-architecture-support and instantiate-rtas.
> 
> This implements changing FDT properties for RTAS (for vmlinux and zImage)
> and initramdisk location (for zImage). To make this work, this skips
> fdt_pack() when bios=off as not packing the blob leaves some room for
> appending.
> 
> This assigns "phandles" to device tree nodes as there is no more SLOF
> and OF nodes addresses of which served as phandle values.
> This keeps predefined nodes (such as XICS/NVLINK/...) unchanged.
> phandles are regenerated at every FDT rebuild.
> 
> When bios=off, this adds "/chosen" every time QEMU builds a tree.
> 
> This implements "claim" which the client (Linux) uses for memory
> allocation; this is also  used by QEMU for claiming kernel/initrd images,
> client interface entry point, RTAS and the initial stack.
> 
> While at this, add a "kernel-addr" machine parameter to allow moving
> the kernel in memory. This is useful for debugging if the kernel is
> loaded at @0, although not necessary.
> 
> This adds basic instances support which are managed by a hashmap
> ihandle->[phandle, DeviceState, Chardev].
> 
> Note that a 64bit PCI fix is required for Linux:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735e
> 
> The test command line:
> 
> qemu-system-ppc64 \
> -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> -mon id=MON0,chardev=STDIO0,mode=readline \
> -nographic \
> -vga none \
> -kernel pbuild/kernel-le-guest/arch/powerpc/boot/zImage.pseries \
> -machine pseries,bios=off,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken \
> -m 4G \
> -enable-kvm \
> -initrd pb/rootfs.cpio.xz \
> -device nec-usb-xhci,id=nec-usb-xhci0 \
> -netdev tap,id=TAP0,helper=/home/aik/qemu-bridge-helper --br=br0 \
> -device virtio-net-pci,id=vnet0,netdev=TAP0 img/f30le.qcow2 \
> -snapshot \
> -smp 8,threads=8 \
> -trace events=qemu_trace_events \
> -d guest_errors \
> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh54088 \
> -mon chardev=SOCKET0,mode=control
> 
> Signed-off-by: Alexey Kardashevskiy 

It'd be nice to split this patch up a bit, though I'll admit it's not
very obvious where to do so.

> ---
> Changes:
> v5:
> * made instances keep device and chardev pointers
> * removed VIO dependencies
> * print error if RTAS memory is not claimed as it should have been
> * pack FDT as "quiesce"
> 
> v4:
> * fixed open
> * validate ihandles in "call-method"
> 
> v3:
> * fixed phandles allocation
> * s/__be32/uint32_t/ as we do not normally have __be32 type in qemu
> * fixed size of /chosen/stdout
> * bunch of renames
> * do not create rtas properties at all, let the client deal with it;
> instead setprop allows changing these in the FDT
> * no more packing FDT when bios=off - nobody needs it and getprop does not
> work otherwise
> * allow updating initramdisk device tree properties (for zImage)
> * added instances
> * fixed stdout on OF's "write"
> * removed special handling for stdout in OF client, spapr-vty handles it
> instead
> 
> v2:
> * fixed claim()
> * added "setprop"
> * cleaner client interface and RTAS blobs management
> * boots to petitboot and further to the target system
> * more trace points
> ---
>  hw/ppc/Makefile.objs |   1 +
>  include/hw/ppc/spapr.h   |  28 +-
>  hw/ppc/spapr.c   | 266 ++--
>  hw/ppc/spapr_hcall.c |  74 +++--
>  hw/ppc/spapr_of_client.c | 633 +++
>  hw/ppc/trace-events  |  12 

[PATCH v2 2/2] vnc: prioritize ZRLE compression over ZLIB

2020-01-20 Thread Cameron Esfahani via
In my investigation, ZRLE always compresses better than ZLIB so
prioritize ZRLE over ZLIB, even if the client hints that ZLIB is
preferred.

zlib buffer is always reset in zrle_compress_data(), so using offset to
calculate next_out and avail_out is useless.

Signed-off-by: Cameron Esfahani 
---
 ui/vnc-enc-zrle.c |  4 ++--
 ui/vnc.c  | 11 +--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/ui/vnc-enc-zrle.c b/ui/vnc-enc-zrle.c
index 17fd28a2e2..b4f71e32cf 100644
--- a/ui/vnc-enc-zrle.c
+++ b/ui/vnc-enc-zrle.c
@@ -98,8 +98,8 @@ static int zrle_compress_data(VncState *vs, int level)
 /* set pointers */
 zstream->next_in = vs->zrle->zrle.buffer;
 zstream->avail_in = vs->zrle->zrle.offset;
-zstream->next_out = vs->zrle->zlib.buffer + vs->zrle->zlib.offset;
-zstream->avail_out = vs->zrle->zlib.capacity - vs->zrle->zlib.offset;
+zstream->next_out = vs->zrle->zlib.buffer;
+zstream->avail_out = vs->zrle->zlib.capacity;
 zstream->data_type = Z_BINARY;
 
 /* start encoding */
diff --git a/ui/vnc.c b/ui/vnc.c
index 3e8d1f1207..1d7138a3a0 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -2071,8 +2071,15 @@ static void set_encodings(VncState *vs, int32_t 
*encodings, size_t n_encodings)
 break;
 #endif
 case VNC_ENCODING_ZLIB:
-vs->features |= VNC_FEATURE_ZLIB_MASK;
-vs->vnc_encoding = enc;
+/*
+ * VNC_ENCODING_ZRLE compresses better than VNC_ENCODING_ZLIB.
+ * So prioritize ZRLE, even if the client hints that it prefers
+ * ZLIB.
+ */
+if ((vs->features & VNC_FEATURE_ZRLE_MASK) == 0) {
+vs->features |= VNC_FEATURE_ZLIB_MASK;
+vs->vnc_encoding = enc;
+}
 break;
 case VNC_ENCODING_ZRLE:
 vs->features |= VNC_FEATURE_ZRLE_MASK;
-- 
2.24.0




[PATCH v2 0/2] vnc: fix VNC artifacts

2020-01-20 Thread Cameron Esfahani via
Remove VNC optimization to reencode framebuffer update as raw if it's
smaller than the default encoding.  QEMU's implementation was naive and
didn't account for the ZLIB z_stream mutating with each compression.  Just
saving and restoring the output buffer offset wasn't sufficient to "rewind"
the previous encoding.  Considering that ZRLE is never larger than raw and
even though ZLIB can occasionally be fractionally larger than raw, the
overhead of implementing this optimization correctly isn't worth it.

While debugging this, I noticed ZRLE always compresses better than ZLIB.
Prioritize ZRLE over ZLIB, even if the client hints that ZLIB is preferred.

Cameron Esfahani (2):
  vnc: fix VNC artifacts
  vnc: prioritize ZRLE compression over ZLIB

 ui/vnc-enc-zrle.c |  4 ++--
 ui/vnc.c  | 31 +++
 2 files changed, 13 insertions(+), 22 deletions(-)

-- 
2.24.0




Re: Making QEMU easier for management tools and applications

2020-01-20 Thread Markus Armbruster
Stefan Hajnoczi  writes:

> On Wed, Jan 15, 2020 at 01:15:17PM +0100, Markus Armbruster wrote:
>> Christophe de Dinechin  writes:
>> >> On 15 Jan 2020, at 10:20, Markus Armbruster  wrote:
>> * qemuMonitorJSONSetIOThread() uses it to control iothread's properties
>>   poll-max-ns, poll-grow, poll-shrink.  Their use with -object is
>>   documented (in qemu-options.hx), their use with qom-set is not.
>
> I'm happy to use a different interface.
>
> Writing a boilerplate "iothread-set-poll-params" QMP command in C would
> be a step backwards.

No argument.

> Maybe the QAPI code generator could map something like this:
>
>   { 'command': 'iothread-set-poll-params',
> 'data': {
> 'id': 'str',
>   '*max-ns': 'uint64',
>   '*grow': 'uint64',
>   '*shrink': 'uint64'
> },
> 'map-to-qom-set': 'IOThread'
>   }
>
> And turn it into QOM accessors on the IOThread object.

I think a generic "set this configuration to that value" command is just
fine.  qom-set fails on several counts, though:

* Tolerable: qom-set is not actually generic, it applies only to QOM.

* qom-set lets you set tons of stuff that is not meant to be changed at
  run time.  If it breaks your guest, you get to keep the pieces.

* There is virtually no documentation on what can be set to what values,
  and their semantics.

In its current state, QOM is a user interface superfund site.




[PATCH v1 1/1] target/riscv: Correctly implement TSR trap

2020-01-20 Thread Alistair Francis
As reported in: https://bugs.launchpad.net/qemu/+bug/1851939 we weren't
correctly handling illegal instructions based on the value of MSTATUS_TSR
and the current privledge level.

This patch fixes the issue raised in the bug by raising an illegal
instruction if TSR is set and we are in S-Mode.

Signed-off-by: Alistair Francis 
---
 target/riscv/op_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
index 331cc36232..eed8eea6f2 100644
--- a/target/riscv/op_helper.c
+++ b/target/riscv/op_helper.c
@@ -83,7 +83,7 @@ target_ulong helper_sret(CPURISCVState *env, target_ulong 
cpu_pc_deb)
 }
 
 if (env->priv_ver >= PRIV_VERSION_1_10_0 &&
-get_field(env->mstatus, MSTATUS_TSR)) {
+get_field(env->mstatus, MSTATUS_TSR) && !(env->priv >= PRV_M)) {
 riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
 }
 
-- 
2.24.1




Re: [RFC PATCH] qapi: Incorrect attempt to fix building with MC146818RTC=n

2020-01-20 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 13/01/20 15:01, Markus Armbruster wrote:
>> Philippe Mathieu-Daudé  writes:
>> 
>>> When configured with --without-default-devices and setting
>>> MC146818RTC=n, the build fails:
>>>
>>> LINKx86_64-softmmu/qemu-system-x86_64
>>>   /usr/bin/ld: qapi/qapi-commands-misc-target.o: in function 
>>> `qmp_marshal_rtc_reset_reinjection':
>>>   qapi/qapi-commands-misc-target.c:46: undefined reference to 
>>> `qmp_rtc_reset_reinjection'
>>>   /usr/bin/ld: qapi/qapi-commands-misc-target.c:46: undefined reference to 
>>> `qmp_rtc_reset_reinjection'
>>>   collect2: error: ld returned 1 exit status
>>>   make[1]: *** [Makefile:206: qemu-system-x86_64] Error 1
>>>   make: *** [Makefile:483: x86_64-softmmu/all] Error 2
>>>
>>> This patch tries to fix this, but this is incorrect because QAPI
>>> scripts only provide TARGET definitions, so with MC146818RTC=y we
>>> get:
>>>
>>>   hw/rtc/mc146818rtc.c:113:6: error: no previous prototype for 
>>> ‘qmp_rtc_reset_reinjection’ [-Werror=missing-prototypes]
>>> 113 | void qmp_rtc_reset_reinjection(Error **errp)
>>> |  ^
>>>   cc1: all warnings being treated as errors
>>>   make[1]: *** [rules.mak:69: hw/rtc/mc146818rtc.o] Error 1
>>>
>>> Any idea? :)
>>>
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>> ---
>>>  qapi/misc-target.json | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/qapi/misc-target.json b/qapi/misc-target.json
>>> index a00fd821eb..8e49c113d1 100644
>>> --- a/qapi/misc-target.json
>>> +++ b/qapi/misc-target.json
>>> @@ -41,7 +41,7 @@
>>>  #
>>>  ##
>>>  { 'command': 'rtc-reset-reinjection',
>>> -  'if': 'defined(TARGET_I386)' }
>>> +  'if': 'defined(TARGET_I386) && defined(CONFIG_MC146818RTC)' }
>>>  
>>>  
>>>  ##
>> 
>> The generated qapi-commands-misc-target.h duly has
>> 
>> #if defined(TARGET_I386) && defined(CONFIG_MC146818RTC)
>> void qmp_rtc_reset_reinjection(Error **errp);
>> void qmp_marshal_rtc_reset_reinjection(QDict *args, QObject **ret, Error 
>> **errp);
>> #endif /* defined(TARGET_I386) && defined(CONFIG_MC146818RTC) */
>> 
>> mc146818rtc.c includes it.  But since it doesn't include
>> config-devices.h, CONFIG_MC146818RTC remains undefined, and the
>> prototype gets suppressed.
>> 
>> Crude fix: make mc146818rtc.c #include "config-devices.h".
>
> Can we modify the code generator to leave out the #if from the header,
> and only include it in the .c file?  An extra prototype is harmless.

Is *everything* we generate into headers just as harmless?

Another idea: provide a hook to make the generator insert the #include
necessary to get the macros used in 'if'.

Yet another idea: include it always in target-specific code, just like
config-target.h.




Re: [PATCH 3/3] hw/display/qxl.c: Use trace_event_get_state_backends()

2020-01-20 Thread Gerd Hoffmann
On Mon, Jan 20, 2020 at 03:11:42PM +, Peter Maydell wrote:
> The preferred way to test whether a trace event is enabled is to
> use trace_event_get_state_backends(), because this will give the
> correct answer (allowing expensive computations to be skipped)
> whether the trace event is compile-time or run-time disabled.
> Convert the old-style direct use of TRACE_FOO_ENABLED.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/display/qxl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/display/qxl.c b/hw/display/qxl.c
> index 6d43b7433cf..80a4dcc40e4 100644
> --- a/hw/display/qxl.c
> +++ b/hw/display/qxl.c
> @@ -1764,7 +1764,7 @@ async_common:
>  qxl_set_mode(d, val, 0);
>  break;
>  case QXL_IO_LOG:
> -if (TRACE_QXL_IO_LOG_ENABLED || d->guestdebug) {
> +if (trace_event_get_state_backends(TRACE_QXL_IO_LOG) || 
> d->guestdebug) {

Acked-by: Gerd Hoffmann 

>  /* We cannot trust the guest to NUL terminate d->ram->log_buf */
>  char *log_buf = g_strndup((const char *)d->ram->log_buf,
>sizeof(d->ram->log_buf));
> -- 
> 2.20.1
> 




Re: [PATCH v4 15/18] tests/boot-serial-test: Test some Arduino boards (AVR based)

2020-01-20 Thread Thomas Huth
On 20/01/2020 23.01, Philippe Mathieu-Daudé wrote:
> The Arduino Duemilanove is based on a AVR5 CPU, while the
> Arduino MEGA2560 on a AVR6 CPU.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  tests/qtest/boot-serial-test.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
> index e556f09db8..582a497963 100644
> --- a/tests/qtest/boot-serial-test.c
> +++ b/tests/qtest/boot-serial-test.c
> @@ -113,6 +113,8 @@ typedef struct testdef {
>  static testdef_t tests[] = {
>  { "alpha", "clipper", "", "PCI:" },
>  { "avr", "sample", "", "T", sizeof(bios_avr), NULL, bios_avr },
> +{ "avr", "arduino-duemilanove", "", "T", sizeof(bios_avr), NULL, 
> bios_avr },
> +{ "avr", "arduino-mega-2560-v3", "", "T", sizeof(bios_avr), NULL, 
> bios_avr},
>  { "ppc", "ppce500", "", "U-Boot" },
>  { "ppc", "40p", "-vga none -boot d", "Trying cd:," },
>  { "ppc", "g3beige", "", "PowerPC,750" },
> 

Acked-by: Thomas Huth 




[virtio-dev] [PATCH v2 0/5] virtio-mmio enhancement

2020-01-20 Thread Jing Liu
The current virtio over MMIO has some limitations that impact the performance.
It only supports single legacy, dedicated interrupt and one virtqueue
notification register for all virtqueues which cause performance penalties.

To address such limitations, we proposed to update virtio-mmio spec with
two new feature bits to support MSI interrupt and enhancing notification 
mechanism.

For keeping virtio-mmio simple as it was, and considering practical usages, this
provides two kinds of mapping mode for device: MSI non-sharing mode and MSI 
sharing mode.
MSI non-sharng mode indicates a fixed static vector and event relationship 
specified
in spec, which can simplify the setup process and reduce vmexit, fitting for
a high interrupt rate request.
MSI sharing mode indicates a dynamic mapping, which is more like PCI does, 
fitting
for a non-high interrupt rate request.

Change Log:
v1->v2:
* Change version update to feature bit
* Add mask/unmask support
* Add two MSI sharing/non-sharing modes
* Change MSI registers layout and bits

Jing Liu (5):
  virtio-mmio: Add feature bit for MMIO notification
  virtio-mmio: Enhance queue notification support
  virtio-mmio: Add feature bit for MMIO MSI
  virtio-mmio: Introduce MSI details
  virtio-mmio: MSI vector and event mapping

 content.tex | 286 ++--
 msi-state.c |   5 ++
 2 files changed, 262 insertions(+), 29 deletions(-)
 create mode 100644 msi-state.c

-- 
2.7.4




[virtio-dev] [PATCH v2 1/5] virtio-mmio: Add feature bit for MMIO notification

2020-01-20 Thread Jing Liu
All the queues notifications use the same register on MMIO transport
layer. Add a feature bit (39) for enhancing the notification capability.
The detailed mechanism would be in next patch.

Co-developed-by: Chao Peng 
Signed-off-by: Chao Peng 
Co-developed-by: Liu Jiang 
Signed-off-by: Liu Jiang 
Co-developed-by: Zha Bin 
Signed-off-by: Zha Bin 
Signed-off-by: Jing Liu 
---
 content.tex | 9 +
 1 file changed, 9 insertions(+)

diff --git a/content.tex b/content.tex
index d68cfaf..826bc7d 100644
--- a/content.tex
+++ b/content.tex
@@ -5810,6 +5810,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved 
Feature Bits}
   in its device notifications.
   See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / 
Driver notifications}.
 \end{description}
+  \item[VIRTIO_F_MMIO_NOTIFICATION(39)] This feature indicates
+  that the device supports enhanced notification mechanism on
+  MMIO transport layer.
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
 
@@ -5843,6 +5846,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved 
Feature Bits}
 or partially reset, and even without re-negotiating
 VIRTIO_F_SR_IOV after the reset.
 
+A driver SHOULD accept VIRTIO_F_MMIO_NOTIFICATION if it is offered.
+
 \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
 
 A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
@@ -5872,6 +5877,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved 
Feature Bits}
 and presents a PCI SR-IOV capability structure, otherwise
 it MUST NOT offer VIRTIO_F_SR_IOV.
 
+If VIRTIO_F_MMIO_NOTIFICATION has been negotiated, a device
+MUST support handling the notification from driver at the
+calculated location.
+
 \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature 
Bits / Legacy Interface: Reserved Feature Bits}
 
 Transitional devices MAY offer the following:
-- 
2.7.4




[virtio-dev] [PATCH v2 3/5] virtio-mmio: Add feature bit for MMIO MSI

2020-01-20 Thread Jing Liu
The current MMIO transport layer uses a single, dedicated interrupt
signal, which brings performance penalty. Add a feature bit (40)
for introducing MSI capability.

Co-developed-by: Chao Peng 
Signed-off-by: Chao Peng 
Co-developed-by: Liu Jiang 
Signed-off-by: Liu Jiang 
Co-developed-by: Zha Bin 
Signed-off-by: Zha Bin 
Signed-off-by: Jing Liu 
---
 content.tex | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/content.tex b/content.tex
index 5881253..ff151ba 100644
--- a/content.tex
+++ b/content.tex
@@ -5840,6 +5840,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved 
Feature Bits}
   \item[VIRTIO_F_MMIO_NOTIFICATION(39)] This feature indicates
   that the device supports enhanced notification mechanism on
   MMIO transport layer.
+  \item[VIRTIO_F_MMIO_MSI(40)] This feature indicates that the
+  device supports Message Signal Interrupts (MSI) mechanism on
+  MMIO transport layer.
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
 
@@ -5875,6 +5878,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved 
Feature Bits}
 
 A driver SHOULD accept VIRTIO_F_MMIO_NOTIFICATION if it is offered.
 
+A driver SHOULD accept VIRTIO_F_MMIO_MSI if it is offered.
+If VIRTIO_F_MMIO_MSI has been negotiated, a driver MUST try to
+set up MSI at first priority.
+
 \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
 
 A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
-- 
2.7.4




[virtio-dev] [PATCH v2 2/5] virtio-mmio: Enhance queue notification support

2020-01-20 Thread Jing Liu
With VIRTIO_F_MMIO_NOTIFICATION feature bit offered, the notification
mechanism is enhanced. Driver reads QueueNotify register to get
notification structure and calculate notification addresses of
each virtqueue.

Co-developed-by: Chao Peng 
Signed-off-by: Chao Peng 
Co-developed-by: Liu Jiang 
Signed-off-by: Liu Jiang 
Co-developed-by: Zha Bin 
Signed-off-by: Zha Bin 
Signed-off-by: Jing Liu 
---
 content.tex | 53 -
 1 file changed, 40 insertions(+), 13 deletions(-)

diff --git a/content.tex b/content.tex
index 826bc7d..5881253 100644
--- a/content.tex
+++ b/content.tex
@@ -1671,20 +1671,18 @@ \subsection{MMIO Device Register 
Layout}\label{sec:Virtio Transport Options / Vi
 accesses apply to the queue selected by writing to \field{QueueSel}.
   }
   \hline 
-  \mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{%
-Writing a value to this register notifies the device that
-there are new buffers to process in a queue.
+  \mmioreg{QueueNotify}{Queue notifier}{0x050}{RW}{%
+When VIRTIO_F_MMIO_NOTIFICATION has not been negotiated, writing to this
+register notifies the device that there are new buffers to process in a 
queue.
 
-When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
-the value written is the queue index.
+When VIRTIO_F_MMIO_NOTIFICATION has been negotiated, reading this register
+returns the virtqueue notification structure for calculating notification 
location.
 
-When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
-the \field{Notification data} value has the following format:
+See \ref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific 
Initialization And Device Operation / Notification Structure Layout}
+for the notification structure format.
 
-\lstinputlisting{notifications-le.c}
-
-See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / 
Driver notifications}
-for the definition of the components.
+See \ref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific 
Initialization And Device Operation / Available Buffer Notifications}
+for the notification data format.
   }
   \hline 
   \mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{%
@@ -1858,6 +1856,31 @@ \subsubsection{Device Initialization}\label{sec:Virtio 
Transport Options / Virti
 Further initialization MUST follow the procedure described in
 \ref{sec:General Initialization And Device Operation / Device 
Initialization}~\nameref{sec:General Initialization And Device Operation / 
Device Initialization}.
 
+\subsubsection{Notification Structure Layout}\label{sec:Virtio Transport 
Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation 
/ Notification Structure Layout}
+
+When VIRTIO_F_MMIO_NOTIFICATION has been negotiated, the notification location 
is calculated
+by notification structure. Driver reads \field{QueueNotify} to get this 
structure formatted
+as follows.
+
+\begin{lstlisting}
+le32 {
+notify_base : 16;
+notify_multiplier : 16;
+};
+\end{lstlisting}
+
+\field{notify_multiplier} is combined with virtqueue index to derive the Queue 
Notify address
+within a memory mapped control registers for a virtqueue:
+
+\begin{lstlisting}
+notify_base + queue_index * notify_multiplier
+\end{lstlisting}
+
+\begin{note}
+For example, if notify_multiplier is 0, the device uses the same Queue Notify 
address for all
+queues.
+\end{note}
+
 \subsubsection{Virtqueue Configuration}\label{sec:Virtio Transport Options / 
Virtio Over MMIO / MMIO-specific Initialization And Device Operation / 
Virtqueue Configuration}
 
 The driver will typically initialize the virtual queue in the following way:
@@ -1893,16 +1916,20 @@ \subsubsection{Available Buffer 
Notifications}\label{sec:Virtio Transport Option
 When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
 the driver sends an available buffer notification to the device by writing
 the 16-bit virtqueue index
-of the queue to be notified to \field{QueueNotify}.
+of the queue to be notified to Queue Notify address.
 
 When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
 the driver sends an available buffer notification to the device by writing
-the following 32-bit value to \field{QueueNotify}:
+the following 32-bit value to Queue Notify address:
 \lstinputlisting{notifications-le.c}
 
 See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / 
Driver notifications}
 for the definition of the components.
 
+For device not offering VIRTIO_F_MMIO_NOTIFICATION, the Queue Notify address 
is \field{QueueNotify}.
+For device offering VIRTIO_F_MMIO_NOTIFICATION, see \ref{sec:Virtio Transport 
Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation 
/ Notification Structure Layout}
+for how to calculate the Queue Notify address.
+
 \subsubsection{Notifications From The Device}\label{sec:Virtio Transport 
Options / Virtio Over MMIO / MMIO-specific 

[virtio-dev] [PATCH v2 5/5] virtio-mmio: MSI vector and event mapping

2020-01-20 Thread Jing Liu
Bit 1 msi_sharing reported in the MsiState register indicates the mapping mode
device uses.

Bit 1 is 0 - device uses MSI non-sharing mode. This indicates vector per event 
and
fixed static vectors and events relationship. This fits for devices with a high 
interrupt
rate and best performance;
Bit 1 is 1 - device uses MSI sharing mode. This indicates vectors and events
dynamic mapping and fits for devices not requiring a high interrupt rate.

Co-developed-by: Chao Peng 
Signed-off-by: Chao Peng 
Co-developed-by: Liu Jiang 
Signed-off-by: Liu Jiang 
Co-developed-by: Zha Bin 
Signed-off-by: Zha Bin 
Signed-off-by: Jing Liu 
---
 content.tex | 48 +++-
 msi-state.c |  3 ++-
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/content.tex b/content.tex
index dcf6c71..2fd1686 100644
--- a/content.tex
+++ b/content.tex
@@ -1770,7 +1770,8 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio 
Transport Options / Vi
   \hline
   \mmioreg{MsiState}{MSI state}{0x0c4}{R}{%
 When VIRTIO_F_MMIO_MSI has been negotiated, reading
-from this register returns the global MSI enable/disable status.
+from this register returns the global MSI enable/disable status
+and whether device uses MSI sharing mode.
 \lstinputlisting{msi-state.c}
   }
   \hline
@@ -1926,12 +1927,18 @@ \subsubsection{Device Initialization}\label{sec:Virtio 
Transport Options / Virti
 mask and unmask the MSI vector applying to the one selected by writing
 to \field{MsiVecSel}.
 
+VIRTIO_MMIO_MSI_CMD_MAP_CONFIG command is to set the configuration event and 
MSI vector
+mapping. VIRTIO_MMIO_MSI_CMD_MAP_QUEUE is to set the queue event and MSI vector
+mapping. They SHOULD only be used in MSI sharing mode.
+
 \begin{lstlisting}
 #define  VIRTIO_MMIO_MSI_CMD_ENABLE   0x1
 #define  VIRTIO_MMIO_MSI_CMD_DISABLE  0x2
 #define  VIRTIO_MMIO_MSI_CMD_CONFIGURE0x3
 #define  VIRTIO_MMIO_MSI_CMD_MASK 0x4
 #define  VIRTIO_MMIO_MSI_CMD_UNMASK   0x5
+#define  VIRTIO_MMIO_MSI_CMD_MAP_CONFIG   0x6
+#define  VIRTIO_MMIO_MSI_CMD_MAP_QUEUE0x7
 \end{lstlisting}
 
 Setting a special NO_VECTOR value means disabling an interrupt for an event 
type.
@@ -1941,10 +1948,49 @@ \subsubsection{Device Initialization}\label{sec:Virtio 
Transport Options / Virti
 #define VIRTIO_MMIO_MSI_NO_VECTOR 0x
 \end{lstlisting}
 
+\subparagraph{MSI Vector and Event Mapping}\label{sec:Virtio Transport Options 
/ Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device 
Initialization / MSI Vector Configuration}
+The reported \field{msi_sharing} bit in the \field{MsiState} return value shows
+the MSI sharing mode that device uses.
+
+When \field{msi_sharing} bit is 0, it indicates the device uses non-sharing 
mode
+and vector per event fixed static relationship is used. The first vector is 
for device
+configuraiton change event, the second vector is for virtqueue 1, the third 
vector
+is for virtqueue 2 and so on.
+
+When \field{msi_sharing} bit is 1, it indicates the device uses MSI sharing 
mode,
+and the vector and event mapping is dynamic. Writing \field{MsiVecSel}
+followed by writing 
VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command
+maps interrupts triggered by the configuration change/selected queue events 
respectively
+to the corresponding MSI vector.
+
+\devicenormative{\subparagraph}{MSI Vector Configuration}{Virtio Transport 
Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation 
/ MSI Vector Configuration}
+
+When the device reports \field{msi_sharing} bit as 0, it SHOULD support a 
number of
+vectors that greater than the maximum number of virtqueues.
+Device MUST report the number of vectors supported in \field{MsiVecNum}.
+
+When the device reports \field{msi_sharing} bit as 1, it SHOULD support at 
least
+2 MSI vectors and MUST report in \field{MsiVecNum}. Device SHOULD support 
mapping any
+event type to any vector under \field{MsiVecNum}.
+
+Device MUST support unmapping any event type (NO_VECTOR).
+
+The device SHOULD restrict the reported \field{msi_sharing} and 
\field{MsiVecNum}
+to a value that might benefit system performance.
+
+\begin{note}
+For example, a device which does not expect to send interrupts at a high rate 
might
+return \field{msi_sharing} bit as 1.
+\end{note}
+
 \drivernormative{\subparagraph}{MSI Vector Configuration}{Virtio Transport 
Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation 
/ MSI Vector Configuration}
 When VIRTIO_F_MMIO_MSI has been negotiated, driver should try to configure
 and enable MSI.
 
+To set up the event and vector mapping for MSI sharing mode, driver SHOULD
+write a valid \field{MsiVecSel} followed by 
VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE
+command to map the configuration change/selected queue events respectively.
+
 To configure MSI vector, driver SHOULD firstly 

[virtio-dev] [PATCH v2 4/5] virtio-mmio: Introduce MSI details

2020-01-20 Thread Jing Liu
With VIRTIO_F_MMIO_MSI feature bit offered, the Message Signal
Interrupts (MSI) is supported as first priority. For any reason it
fails to use MSI, it need use the single dedicated interrupt as before.

For MSI vectors and events mapping relationship, introduce in next patch.

Co-developed-by: Chao Peng 
Signed-off-by: Chao Peng 
Co-developed-by: Liu Jiang 
Signed-off-by: Liu Jiang 
Co-developed-by: Zha Bin 
Signed-off-by: Zha Bin 
Signed-off-by: Jing Liu 
---
 content.tex | 171 ++--
 msi-state.c |   4 ++
 2 files changed, 159 insertions(+), 16 deletions(-)
 create mode 100644 msi-state.c

diff --git a/content.tex b/content.tex
index ff151ba..dcf6c71 100644
--- a/content.tex
+++ b/content.tex
@@ -1687,7 +1687,8 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio 
Transport Options / Vi
   \hline 
   \mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{%
 Reading from this register returns a bit mask of events that
-caused the device interrupt to be asserted.
+caused the device interrupt to be asserted. This is only used
+when MSI is not enabled.
 The following events are possible:
 \begin{description}
   \item[Used Buffer Notification] - bit 0 - the interrupt was asserted
@@ -1701,7 +1702,7 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio 
Transport Options / Vi
   \mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{%
 Writing a value with bits set as defined in \field{InterruptStatus}
 to this register notifies the device that events causing
-the interrupt have been handled.
+the interrupt have been handled. This is only used when MSI is not enabled.
   }
   \hline 
   \mmioreg{Status}{Device status}{0x070}{RW}{%
@@ -1760,6 +1761,47 @@ \subsection{MMIO Device Register 
Layout}\label{sec:Virtio Transport Options / Vi
 \field{SHMSel} is unused) results in a base address of
 0x.
   }
+  \hline
+  \mmioreg{MsiVecNum}{MSI max vector number}{0x0c0}{R}{%
+When VIRTIO_F_MMIO_MSI has been negotiated, reading
+from this register returns the maximum MSI vector number
+that device supports.
+  }
+  \hline
+  \mmioreg{MsiState}{MSI state}{0x0c4}{R}{%
+When VIRTIO_F_MMIO_MSI has been negotiated, reading
+from this register returns the global MSI enable/disable status.
+\lstinputlisting{msi-state.c}
+  }
+  \hline
+  \mmioreg{MsiCmd}{MSI command}{0x0c8}{W}{%
+When VIRTIO_F_MMIO_MSI has been negotiated, writing
+to this register executes the corresponding command to device.
+Part of this applies to the MSI vector selected by writing to 
\field{MsiVecSel}.
+See \ref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific 
Initialization And Device Operation / Device Initialization / MSI Vector 
Configuration}
+for using details.
+  }
+  \hline
+  \mmioreg{MsiVecSel}{MSI vector index}{0x0d0}{W}{%
+When VIRTIO_F_MMIO_MSI has been negotiated, writing
+to this register selects the MSI vector index that the following operations
+on \field{MsiAddrLow}, \field{MsiAddrHigh}, \field{MsiData} and part of
+\field{MsiCmd} commands specified in \ref{sec:Virtio Transport Options / 
Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device 
Initialization / MSI Vector Configuration}
+apply to. The index number of the first vector is zero (0x0).
+  }
+  \hline
+  \mmiodreg{MsiAddrLow}{MsiAddrHigh}{MSI 64 bit address}{0x0d4}{0x0d8}{W}{%
+When VIRTIO_F_MMIO_MSI has been negotiated, writing
+to these two registers (lower 32 bits of the address to \field{MsiAddrLow},
+higher 32 bits to \field{MsiAddrHigh}) notifies the device about the
+MSI address. This applies to the MSI vector selected by writing to 
\field{MsiVecSel}.
+  }
+  \hline
+  \mmioreg{MsiData}{MSI 32 bit data}{0x0dc}{W}{%
+When VIRTIO_F_MMIO_MSI has been negotiated, writing
+to this register notifies the device about the MSI data.
+This applies to the MSI vector selected by writing to \field{MsiVecSel}.
+  }
   \hline 
   \mmioreg{ConfigGeneration}{Configuration atomicity value}{0x0fc}{R}{
 Reading from this register returns a value describing a version of the 
device-specific configuration space (see \field{Config}).
@@ -1783,10 +1825,16 @@ \subsection{MMIO Device Register 
Layout}\label{sec:Virtio Transport Options / Vi
 
 The device MUST return value 0x2 in \field{Version}.
 
-The device MUST present each event by setting the corresponding bit in 
\field{InterruptStatus} from the
+When MSI is disabled, the device MUST present each event by setting the
+corresponding bit in \field{InterruptStatus} from the
 moment it takes place, until the driver acknowledges the interrupt
-by writing a corresponding bit mask to the \field{InterruptACK} register.  
Bits which
-do not represent events which took place MUST be zero.
+by writing a corresponding bit mask to the \field{InterruptACK} register.
+Bits which do not 

Re: [PATCH v2 1/2] vnc: fix VNC artifacts

2020-01-20 Thread Gerd Hoffmann
On Mon, Jan 20, 2020 at 09:00:51PM -0800, Cameron Esfahani wrote:
> Patch de3f7de7f4e257ce44cdabb90f5f17ee99624557 was too simplistic in its
> implementation: it didn't account for the ZLIB z_stream mutating with
> each compression.  Because of the mutation, simply resetting the output
> buffer's offset wasn't sufficient to "rewind" the operation.  The mutated
> z_stream would generate future zlib blocks which referred to symbols in
> past blocks which weren't sent.  This would lead to artifacting.
> 
> This reverts commit de3f7de7f4e257ce44cdabb90f5f17ee99624557.
> 
> Fixes:  ("vnc: allow fall back to RAW encoding")
> Signed-off-by: Cameron Esfahani 

Looks like you didn't realize that "revert" was meant literally.  Git has a
revert subcommand, i.e. you can simply run "git revert de3f7de7f4e257" to
create a commit undoing the changes, with a commit message saying so.  The
generated text should be left intact, to make the job for tools analyzing git
commits easier.  The commit message (for reverts typically explaining why the
reverted commit was buggy) can go below the generated text.

Also note that only the patch commit messages end up in the commit log, the
cover letter text doesn't.  So any important details should (also) be in the
commit messages so they are recorded in the log.

Reworked the commit message, looks like this now:

-
commit 0780ec7be82dd4781e9fd216b5d99a125882ff5a (HEAD -> queue/ui)
Author: Gerd Hoffmann 
Date:   Tue Jan 21 07:02:10 2020 +0100

Revert "vnc: allow fall back to RAW encoding"

This reverts commit de3f7de7f4e257ce44cdabb90f5f17ee99624557.

Remove VNC optimization to reencode framebuffer update as raw if it's
smaller than the default encoding.

QEMU's implementation was naive and didn't account for the ZLIB z_stream
mutating with each compression.  Because of the mutation, simply
resetting the output buffer's offset wasn't sufficient to "rewind" the
operation.  The mutated z_stream would generate future zlib blocks which
referred to symbols in past blocks which weren't sent.  This would lead
to artifacting.

Considering that ZRLE is never larger than raw and even though ZLIB can
occasionally be fractionally larger than raw, the overhead of
implementing this optimization correctly isn't worth it.

Signed-off-by: Cameron Esfahani 
Signed-off-by: Gerd Hoffmann 
-

Modified patch queued up.
Patch 2/2 is fine as-is.

thanks,
  Gerd




Re: [PATCH] ui/console: Display the 'none' backend in '-display help'

2020-01-20 Thread Gerd Hoffmann
On Mon, Jan 20, 2020 at 08:29:47PM +0100, Philippe Mathieu-Daudé wrote:
> Commit c388f408b5 added the possibility to list the display
> backends using '-display help'. Since the 'none' backend is
> is not implemented as a DisplayChangeListenerOps, it is not
> registered to the dpys[] array with qemu_display_register(),
> and is not listed in the help output.
> 
> This might be confusing, as we list it in the man page:
> 
>   -display type
>   Select type of display to use. This option is a replacement for
>   the old style -sdl/-curses/... options. Valid values for type are
> 
>   none
>   Do not display video output. The guest will still see an
>   emulated graphics card, but its output will not be displayed
>   to the QEMU user. This option differs from the -nographic
>   option in that it only affects what is done with video
>   output; -nographic also changes the destination of the serial
>   and parallel port data.
> 
> Fix by manually listing the special 'none' backend in the help.
> 
> Suggested-by: Thomas Huth 
> Signed-off-by: Philippe Mathieu-Daudé 

Added to ui queue.

thanks,
  Gerd




Re: [PATCH 1/2] arm/virt/acpi: remove meaningless sub device "PR0" from PCI0

2020-01-20 Thread Michael S. Tsirkin
On Mon, Jan 13, 2020 at 01:37:02PM +0100, Igor Mammedov wrote:
> On Thu, 19 Dec 2019 14:47:58 +0800
> Heyi Guo  wrote:
> 
> > The sub device "PR0" under PCI0 in ACPI/DSDT does not make any sense,
> > so simply remote it.
> Could you make commit message more concrete so it would say
> why it doesn't make any sense.
> 
> It seems to be there to describe root port,
> I'd rather have PCI folk ack if it's ok to remove it.

An empty device like this doesn't really do anything useful I think.
commit log needs to be fixed up though.


> > 
> > Signed-off-by: Heyi Guo 
> > 
> > ---
> > Cc: Peter Maydell 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Igor Mammedov 
> > Cc: Shannon Zhao 
> > Cc: qemu-...@nongnu.org
> > Cc: qemu-devel@nongnu.org
> > ---
> >  hw/arm/virt-acpi-build.c  |   4 
> >  tests/data/acpi/virt/DSDT | Bin 18462 -> 18449 bytes
> >  tests/data/acpi/virt/DSDT.memhp   | Bin 19799 -> 19786 bytes
> >  tests/data/acpi/virt/DSDT.numamem | Bin 18462 -> 18449 bytes
> >  4 files changed, 4 deletions(-)
> > 
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index bd5f771e9b..9f4c7d1889 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -317,10 +317,6 @@ static void acpi_dsdt_add_pci(Aml *scope, const 
> > MemMapEntry *memmap,
> >  aml_append(method, aml_return(buf));
> >  aml_append(dev, method);
> >  
> > -Aml *dev_rp0 = aml_device("%s", "RP0");
> > -aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0)));
> > -aml_append(dev, dev_rp0);
> > -
> >  Aml *dev_res0 = aml_device("%s", "RES0");
> >  aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
> >  crs = aml_resource_template();
> > diff --git a/tests/data/acpi/virt/DSDT b/tests/data/acpi/virt/DSDT
> > index 
> > d0f3afeb134fdf1c11f64cd06dbcdd30be603b80..b5895cb22446860a0b9be3d32ec856feb388be4c
> >  100644
> > GIT binary patch
> > delta 39
> > vcmbO?fpOvlMlP3Nmk>b@1_q`B6S<_Bdg?Z+cXBfI+}XT|v(|R9jr$`2@RSW)  
> > 
> > delta 50
> > zcmbO@fpOjhMlP3Nmk>D*1_q{tiCof5o%I{lJ2{y;?{412S!>J19TZ>?^tF5;R%I  
> > G{V4!>hYx%J  
> > 
> > diff --git a/tests/data/acpi/virt/DSDT.memhp 
> > b/tests/data/acpi/virt/DSDT.memhp
> > index 
> > 41ccc6431b917252bcbaac86c33b340c796be5ce..69ad844f65d047973a3e55198beecd45a35b8fce
> >  100644
> > GIT binary patch
> > delta 40
> > wcmcaUi}BPfMlP3Nmk=*s1_q}3iCof5t(P{ccXBfI+}XT|v(|RAjk`1(02g)*ivR!s
> > 
> > delta 51
> > zcmX>#i}Cs_MlP3NmymE@1_mbiiCof5O_w*ScXBdy-rc;3v(}c2J1D>)o+IATC1|sb  
> > HyBr$;t7;Fc
> > 
> > diff --git a/tests/data/acpi/virt/DSDT.numamem 
> > b/tests/data/acpi/virt/DSDT.numamem
> > index 
> > d0f3afeb134fdf1c11f64cd06dbcdd30be603b80..b5895cb22446860a0b9be3d32ec856feb388be4c
> >  100644
> > GIT binary patch
> > delta 39
> > vcmbO?fpOvlMlP3Nmk>b@1_q`B6S<_Bdg?Z+cXBfI+}XT|v(|R9jr$`2@RSW)  
> > 
> > delta 50
> > zcmbO@fpOjhMlP3Nmk>D*1_q{tiCof5o%I{lJ2{y;?{412S!>J19TZ>?^tF5;R%I  
> > G{V4!>hYx%J  
> > 




Re: Proposal for handling .hx files with Sphinx

2020-01-20 Thread Markus Armbruster
John Snow  writes:

> On 1/17/20 12:30 PM, Peter Maydell wrote:
>> Currently our manual creation includes some .texi files which
>> are autogenerated from .hx files by running scripts/hxtool.
>> .hx files are a simple format, where where a line is either a
>> directive or literal text to be output:
>> 
>> HXCOMM
>>  -- comment lines, ignored
>> STEXI/ETEXI
>>  -- mark start/end of chunks of text to put into the texinfo output only
>> DEFHEADING/ARCHHEADING
>>  -- appear in the .h file output verbatim (they are defined as C macros);
>> for texi output they are parsed to add in header sections
>> 
>> For Sphinx, rather than creating a file to include, the most
>> natural way to handle this is to have a small custom Sphinx
>> extension which will read the .hx file and process it. So
>> instead of "makefile produces foo.texi from foo.hx, qemu-doc.texi
>> says '@include foo.texi'", we have "qemu-doc.rst says
>> 'hxtool-doc:: foo.hx', the Sphinx extension for hxtool has
>> code that runs to handle that Sphinx directive, it reads the .hx
>> file and emits the appropriate documentation contents". (This is
>> pretty much the same way the kerneldoc extension works right now.
>> It also has the advantage that it should work for third-party
>> services like readthedocs that expect to build the docs directly
>> with sphinx rather than by invoking our makefiles.)
>> 
>> We'll need to update what the markup is to handle having rST
>> fragments in it. A very minimalist approach to this would
>> simply define a new pair of SRST/ERST directives marking the
>> start/end of chunks of rST text to go into the rST only.
>> (We might be able to do better than that later, as there's
>> some repetition currently going on. But we'll probably get
>> a better idea of how easy it is to avoid the repetition if
>> we start with a simple conversion.)
>> 
>> Here's what we do with hx files at the moment. We have four:
>> 
>>  hmp-commands.hx
>>-- defines monitor commands used by monitor.c; generates
>>   qemu-monitor.texi, used by qemu-doc.texi
>>  hmp-commands-info.hx
>>-- ditto, for the "info" command's subcommand;
>>   generates qemu-monitor-info.texi, used by qemu-doc.texi
>> 
>> These two use only the "put this in the texi or in the .h file"
>> functionality, alternating "raw C code defining an entry for the
>> monitor command array" with "lump of raw texi for the docs".
>> 
>>  qemu-img-cmds.hx
>>-- defines options for qemu-img, used by qemu-img.texi
>> 
>> This uses the STEXI/ETEXI directives to alternate C and texi.
>> In the for-the-h-file section the only content is always a DEF()
>> macro invocation defining the option; in the texi is only the
>> synopsis of the command. This means there's a lot of repetition,
>> as the DEF macro includes an argument giving the text of the
>> option synopsis, and then the texi also has that synopsis with
>> some extra markup. Finally the main qemu-img.texi repeats the
>> marked-up synopsis later on when it has the actual main documentation
>> of each command.
>> 
>>  qemu-options.hx
>>-- options for qemu proper, used by qemu-doc.texi
>> 
>> This uses only the DEF, DEFHEADING, ARCHHEADING macros
>> in the for-the-h-file sections (and the DEFHEADING/ARCHHEADING
>> are read also for texi generation). This also repeats the
>> synopsis in the DEF macro and in the texi fragment.
>> 
>> So I think my current view is that we should do the very
>> simple "add SRST/ERST directives" to start with:
>>  * scripts/hxtool needs to recognize them and just ignore
>>the text inside them
>>  * write the hxtool sphinx extension (shouldn't be too hard)
>>  * conversion of any particular .hx file then involves
>>replacing the STEXI ...texi stuff... ETEXI sections with
>>SRST ...rst stuff... ERST. There's no need for any
>>particular .hx file to support both texi and rst output
>>at the same time
>> 
>> I would initially start with qemu-img-cmds.hx, since that's
>> pulled in by qemu-img.texi, which we can convert in the
>> same way I've been doing qemu-nbd and other standalone-ish
>> manpages. The others are part of the big fat qemu-doc.texi,
>> which is probably going to be the very last thing we convert...
>> 
>
> At one point I did a quick mockup of turning qemu-img-cmds.hx into json
> and wrote a small tool I called "pxtool" that was used for generating
> all the rest of the subsequent information -- an attempt at getting rid
> of .hx files *entirely*.
>
> The idea at heart was: "Can we remove .hx files and describe everything
> in terms of the QAPI schema instead?"
>
> I'm still a bit partial to that idea, but realize there are some nasty
> complexities when it comes to describing the QEMU CLI as a schema. One
> of those is that I doubt we even have a full understanding of what the
> CLI syntax is at all.

My CLI QAPIfication prototype[*] gets rid of qemu-options.hx.  Three
more are left: hmp-commands.hx, hmp-commands-info.hx, qemu-img-cmds.hx.
No idea whether 

Re: [PATCH 016/104] virtiofsd: Open vhost connection instead of mounting

2020-01-20 Thread Misono Tomohiro
> From: "Dr. David Alan Gilbert" 
> 
> When run with vhost-user options we conect to the QEMU instead
> via a socket.  Start this off by creating the socket.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---


> +/*
> + * Poison the fuse FD so we spot if we accidentally use it;
> + * DO NOT check for this value, check for fuse_lowlevel_is_virtio()
> + */
> +se->fd = 0xdaff0d11;


As a result of this, se->fd now holds dummy fd.
So we should remove close(se->fd) in fuse_session_destroy():
https://gitlab.com/virtio-fs/qemu/blob/virtio-fs-dev/tools/virtiofsd/fuse_lowlevel.c#L2663

Reviewed-by: Misono Tomohiro 



Re: [PATCH v4 5/7] tests/boot_linux_console: Test booting U-Boot on the Raspberry Pi 2

2020-01-20 Thread Gerd Hoffmann
On Tue, Jan 21, 2020 at 12:51:57AM +0100, Philippe Mathieu-Daudé wrote:
> This test runs U-Boot on the Raspberry Pi 2.

> U-Boot is built by the Debian project, see:
> https://wiki.debian.org/InstallingDebianOn/Allwinner#Creating_a_bootable_SD_Card_with_u-boot

We already have a u-boot submodule in roms/

I guess it makes sense to just build & ship our own binaries
instead of downloading them from Debian?

cheers,
  Gerd




Re: [PATCH] spapr: Migrate CAS reboot flag

2020-01-20 Thread Cédric Le Goater
On 1/21/20 4:41 AM, David Gibson wrote:
> On Wed, Jan 15, 2020 at 07:10:47PM +0100, Cédric Le Goater wrote:
>> On 1/15/20 6:48 PM, Greg Kurz wrote:
>>> Migration can potentially race with CAS reboot. If the migration thread
>>> completes migration after CAS has set spapr->cas_reboot but before the
>>> mainloop could pick up the reset request and reset the machine, the
>>> guest is migrated unrebooted and the destination doesn't reboot it
>>> either because it isn't aware a CAS reboot was needed (eg, because a
>>> device was added before CAS). This likely result in a broken or hung
>>> guest.
>>>
>>> Even if it is small, the window between CAS and CAS reboot is enough to
>>> re-qualify spapr->cas_reboot as state that we should migrate. Add a new
>>> subsection for that and always send it when a CAS reboot is pending.
>>> This may cause migration to older QEMUs to fail but it is still better
>>> than end up with a broken guest.
>>>
>>> The destination cannot honour the CAS reboot request from a post load
>>> handler because this must be done after the guest is fully restored.
>>> It is thus done from a VM change state handler.
>>>
>>> Reported-by: Lukáš Doktor 
>>> Signed-off-by: Greg Kurz 
>>
>> Cédric Le Goater 
>>
>> Nice work ! That was quite complex to catch !
> 
> It is a very nice analysis.  However, I'm disinclined to merge this
> for the time being.
> 
> My preferred approach would be to just eliminate CAS reboots
> altogether, since that has other benefits.  I'm feeling like this
> isn't super-urgent, since CAS reboots are extremely rare in practice,
> now that we've eliminated the one for the irq switchover.

Yes. The possibility of a migration in the window between CAS and 
CAS reboot must be even more rare.

C.

 
> However, if it's not looking like we'll be ready to do that as the
> qemu-5.0 release approaches, then I'll be more than willing to
> reconsider this.






Re: [PATCH 2/2] aspeed/scu: Implement chip ID register

2020-01-20 Thread Cédric Le Goater
On 1/21/20 2:33 AM, Joel Stanley wrote:
> This returns a fixed but non-zero value for the chip id.
> 
> Signed-off-by: Joel Stanley 

Reviewed-by: Cédric Le Goater 

> ---
>  hw/misc/aspeed_scu.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
> index 7108cad8c6a7..19d1780a40da 100644
> --- a/hw/misc/aspeed_scu.c
> +++ b/hw/misc/aspeed_scu.c
> @@ -77,6 +77,8 @@
>  #define CPU2_BASE_SEG4   TO_REG(0x110)
>  #define CPU2_BASE_SEG5   TO_REG(0x114)
>  #define CPU2_CACHE_CTRL  TO_REG(0x118)
> +#define CHIP_ID0 TO_REG(0x150)
> +#define CHIP_ID1 TO_REG(0x154)
>  #define UART_HPLL_CLKTO_REG(0x160)
>  #define PCIE_CTRLTO_REG(0x180)
>  #define BMC_MMIO_CTRLTO_REG(0x184)
> @@ -115,6 +117,8 @@
>  #define AST2600_HW_STRAP2_PROTTO_REG(0x518)
>  #define AST2600_RNG_CTRL  TO_REG(0x524)
>  #define AST2600_RNG_DATA  TO_REG(0x540)
> +#define AST2600_CHIP_ID0  TO_REG(0x5B0)
> +#define AST2600_CHIP_ID1  TO_REG(0x5B4)
>  
>  #define AST2600_CLK TO_REG(0x40)
>  
> @@ -182,6 +186,8 @@ static const uint32_t 
> ast2500_a1_resets[ASPEED_SCU_NR_REGS] = {
>   [CPU2_BASE_SEG1]  = 0x8000U,
>   [CPU2_BASE_SEG4]  = 0x1E60U,
>   [CPU2_BASE_SEG5]  = 0xC000U,
> + [CHIP_ID0]= 0x1234ABCDU,
> + [CHIP_ID1]= 0xU,
>   [UART_HPLL_CLK]   = 0x1903U,
>   [PCIE_CTRL]   = 0x007BU,
>   [BMC_DEV_ID]  = 0x2402U
> @@ -307,6 +313,8 @@ static void aspeed_ast2500_scu_write(void *opaque, hwaddr 
> offset,
>  case RNG_DATA:
>  case FREE_CNTR4:
>  case FREE_CNTR4_EXT:
> +case CHIP_ID0:
> +case CHIP_ID1:
>  qemu_log_mask(LOG_GUEST_ERROR,
>"%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
>__func__, offset);
> @@ -620,6 +628,8 @@ static void aspeed_ast2600_scu_write(void *opaque, hwaddr 
> offset,
>  case AST2600_RNG_DATA:
>  case AST2600_SILICON_REV:
>  case AST2600_SILICON_REV2:
> +case AST2600_CHIP_ID0:
> +case AST2600_CHIP_ID1:
>  /* Add read only registers here */
>  qemu_log_mask(LOG_GUEST_ERROR,
>"%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
> @@ -648,6 +658,9 @@ static const uint32_t 
> ast2600_a0_resets[ASPEED_AST2600_SCU_NR_REGS] = {
>  [AST2600_CLK_STOP_CTRL2]= 0xFFF0FFF0,
>  [AST2600_SDRAM_HANDSHAKE]   = 0x0040,  /* SoC completed DRAM init */
>  [AST2600_HPLL_PARAM]= 0x1000405F,
> +[AST2600_CHIP_ID0]  = 0x1234ABCD,
> +[AST2600_CHIP_ID1]  = 0x,
> +
>  };
>  
>  static void aspeed_ast2600_scu_reset(DeviceState *dev)
> 




Re: [PATCH 0/2] aspeed/scu: Implement chip id register

2020-01-20 Thread Cédric Le Goater
On 1/21/20 2:33 AM, Joel Stanley wrote:
> This implements the chip id register in the SCU for the ast2500 and
> ast2600. The first patch is a cleanup to separate out ast2400 and
> ast2500 functionality.

These patches apply cleanly on top of :

[v3,0/5] aspeed: extensions and fixes 
http://patchwork.ozlabs.org/cover/1222667/

Thanks,

C.

> 
> Joel Stanley (2):
>   aspeed/scu: Create separate write callbacks
>   aspeed/scu: Implement chip ID register
> 
>  hw/misc/aspeed_scu.c | 93 +---
>  1 file changed, 70 insertions(+), 23 deletions(-)
> 




Re: [PATCH 1/2] aspeed/scu: Create separate write callbacks

2020-01-20 Thread Cédric Le Goater
On 1/21/20 2:33 AM, Joel Stanley wrote:
> This splits the common write callback into separate ast2400 and ast2500
> implementations. This makes it clearer when implementing differing
> behaviour.
> 
> Signed-off-by: Joel Stanley 

Reviewed-by: Cédric Le Goater 

> ---
>  hw/misc/aspeed_scu.c | 80 +++-
>  1 file changed, 57 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
> index f62fa25e3474..7108cad8c6a7 100644
> --- a/hw/misc/aspeed_scu.c
> +++ b/hw/misc/aspeed_scu.c
> @@ -232,8 +232,47 @@ static uint64_t aspeed_scu_read(void *opaque, hwaddr 
> offset, unsigned size)
>  return s->regs[reg];
>  }
>  
> -static void aspeed_scu_write(void *opaque, hwaddr offset, uint64_t data,
> - unsigned size)
> +static void aspeed_ast2400_scu_write(void *opaque, hwaddr offset,
> + uint64_t data, unsigned size)
> +{
> +AspeedSCUState *s = ASPEED_SCU(opaque);
> +int reg = TO_REG(offset);
> +
> +if (reg >= ASPEED_SCU_NR_REGS) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "%s: Out-of-bounds write at offset 0x%" HWADDR_PRIx 
> "\n",
> +  __func__, offset);
> +return;
> +}
> +
> +if (reg > PROT_KEY && reg < CPU2_BASE_SEG1 &&
> +!s->regs[PROT_KEY]) {
> +qemu_log_mask(LOG_GUEST_ERROR, "%s: SCU is locked!\n", __func__);
> +}
> +
> +trace_aspeed_scu_write(offset, size, data);
> +
> +switch (reg) {
> +case PROT_KEY:
> +s->regs[reg] = (data == ASPEED_SCU_PROT_KEY) ? 1 : 0;
> +return;
> +case SILICON_REV:
> +case FREQ_CNTR_EVAL:
> +case VGA_SCRATCH1 ... VGA_SCRATCH8:
> +case RNG_DATA:
> +case FREE_CNTR4:
> +case FREE_CNTR4_EXT:
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
> +  __func__, offset);
> +return;
> +}
> +
> +s->regs[reg] = data;
> +}
> +
> +static void aspeed_ast2500_scu_write(void *opaque, hwaddr offset,
> + uint64_t data, unsigned size)
>  {
>  AspeedSCUState *s = ASPEED_SCU(opaque);
>  int reg = TO_REG(offset);
> @@ -257,25 +296,11 @@ static void aspeed_scu_write(void *opaque, hwaddr 
> offset, uint64_t data,
>  case PROT_KEY:
>  s->regs[reg] = (data == ASPEED_SCU_PROT_KEY) ? 1 : 0;
>  return;
> -case CLK_SEL:
> -s->regs[reg] = data;
> -break;
>  case HW_STRAP1:
> -if (ASPEED_IS_AST2500(s->regs[SILICON_REV])) {
> -s->regs[HW_STRAP1] |= data;
> -return;
> -}
> -/* Jump to assignment below */
> -break;
> +s->regs[HW_STRAP1] |= data;
> +return;
>  case SILICON_REV:
> -if (ASPEED_IS_AST2500(s->regs[SILICON_REV])) {
> -s->regs[HW_STRAP1] &= ~data;
> -} else {
> -qemu_log_mask(LOG_GUEST_ERROR,
> -  "%s: Write to read-only offset 0x%" HWADDR_PRIx 
> "\n",
> -  __func__, offset);
> -}
> -/* Avoid assignment below, we've handled everything */
> +s->regs[HW_STRAP1] &= ~data;
>  return;
>  case FREQ_CNTR_EVAL:
>  case VGA_SCRATCH1 ... VGA_SCRATCH8:
> @@ -291,9 +316,18 @@ static void aspeed_scu_write(void *opaque, hwaddr 
> offset, uint64_t data,
>  s->regs[reg] = data;
>  }
>  
> -static const MemoryRegionOps aspeed_scu_ops = {
> +static const MemoryRegionOps aspeed_ast2400_scu_ops = {
> +.read = aspeed_scu_read,
> +.write = aspeed_ast2400_scu_write,
> +.endianness = DEVICE_LITTLE_ENDIAN,
> +.valid.min_access_size = 4,
> +.valid.max_access_size = 4,
> +.valid.unaligned = false,
> +};
> +
> +static const MemoryRegionOps aspeed_ast2500_scu_ops = {
>  .read = aspeed_scu_read,
> -.write = aspeed_scu_write,
> +.write = aspeed_ast2500_scu_write,
>  .endianness = DEVICE_LITTLE_ENDIAN,
>  .valid.min_access_size = 4,
>  .valid.max_access_size = 4,
> @@ -469,7 +503,7 @@ static void aspeed_2400_scu_class_init(ObjectClass 
> *klass, void *data)
>  asc->calc_hpll = aspeed_2400_scu_calc_hpll;
>  asc->apb_divider = 2;
>  asc->nr_regs = ASPEED_SCU_NR_REGS;
> -asc->ops = _scu_ops;
> +asc->ops = _ast2400_scu_ops;
>  }
>  
>  static const TypeInfo aspeed_2400_scu_info = {
> @@ -489,7 +523,7 @@ static void aspeed_2500_scu_class_init(ObjectClass 
> *klass, void *data)
>  asc->calc_hpll = aspeed_2500_scu_calc_hpll;
>  asc->apb_divider = 4;
>  asc->nr_regs = ASPEED_SCU_NR_REGS;
> -asc->ops = _scu_ops;
> +asc->ops = _ast2500_scu_ops;
>  }
>  
>  static const TypeInfo aspeed_2500_scu_info = {
> 




Re: [PATCH qemu v5] spapr: Kill SLOF

2020-01-20 Thread Alexey Kardashevskiy



On 21/01/2020 16:11, David Gibson wrote:
> On Fri, Jan 10, 2020 at 01:09:25PM +1100, Alexey Kardashevskiy wrote:
>> The Petitboot bootloader is way more advanced than SLOF is ever going to
>> be as Petitboot comes with the full-featured Linux kernel with all
>> the drivers, and initramdisk with quite user friendly interface.
>> The problem with ditching SLOF is that an unmodified pseries kernel can
>> either start via:
>> 1. kexec, this requires presence of RTAS and skips
>> ibm,client-architecture-support entirely;
>> 2. normal boot, this heavily relies on the OF1275 client interface to
>> fetch the device tree and do early setup (claim memory).
>>
>> This adds a new bios-less mode to the pseries machine: "bios=on|off".
>> When enabled, QEMU does not load SLOF and jumps to the kernel from
>> "-kernel".
> 
> I don't love the name "bios" for this flag, since BIOS tends to refer
> to old-school x86 firmware.  Given the various plans we're considering
> the future, I'd suggest "firmware=slof" for the current in-guest SLOF
> mode, and say "firmware=vof" (Virtual Open Firmware) for the new
> model.  We can consider firmware=petitboot or firmware=none (for
> direct kexec-style boot into -kernel) or whatever in the future

Ok. We could also enforce default loading addresses for SLOF/kernel/grub
and drop "kernel-addr", although it is going to be confusing if it
changes in not so obvious way...

In fact, I will ideally need 3 flags:
-bios: on|off to stop loading SLOF;
-kernel-addr: 0x0 for slof/kernel; 0x2 for grub;
-kernel-translate-hack: on|off - as grub is linked to run from 0x2
and it only works when placed there, the hack breaks it.

Or we can pass grub via -bios and not via -kernel but strictly speaking
there is still a firmware - that new 20 bytes blob so it would not be
accurate.

We can put this all into a single
-firmware slof|vof|grub|linux. Not sure.


>> The client interface is implemented exactly as RTAS - a 20 bytes blob,
>> right next after the RTAS blob. The entry point is passed to the kernel
>> via GPR5.
>>
>> This implements a handful of client interface methods just to get going.
>> In particular, this implements the device tree fetching,
>> ibm,client-architecture-support and instantiate-rtas.
>>
>> This implements changing FDT properties for RTAS (for vmlinux and zImage)
>> and initramdisk location (for zImage). To make this work, this skips
>> fdt_pack() when bios=off as not packing the blob leaves some room for
>> appending.
>>
>> This assigns "phandles" to device tree nodes as there is no more SLOF
>> and OF nodes addresses of which served as phandle values.
>> This keeps predefined nodes (such as XICS/NVLINK/...) unchanged.
>> phandles are regenerated at every FDT rebuild.
>>
>> When bios=off, this adds "/chosen" every time QEMU builds a tree.
>>
>> This implements "claim" which the client (Linux) uses for memory
>> allocation; this is also  used by QEMU for claiming kernel/initrd images,
>> client interface entry point, RTAS and the initial stack.
>>
>> While at this, add a "kernel-addr" machine parameter to allow moving
>> the kernel in memory. This is useful for debugging if the kernel is
>> loaded at @0, although not necessary.
>>
>> This adds basic instances support which are managed by a hashmap
>> ihandle->[phandle, DeviceState, Chardev].
>>
>> Note that a 64bit PCI fix is required for Linux:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735e
>>
>> The test command line:
>>
>> qemu-system-ppc64 \
>> -nodefaults \
>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>> -mon id=MON0,chardev=STDIO0,mode=readline \
>> -nographic \
>> -vga none \
>> -kernel pbuild/kernel-le-guest/arch/powerpc/boot/zImage.pseries \
>> -machine pseries,bios=off,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken \
>> -m 4G \
>> -enable-kvm \
>> -initrd pb/rootfs.cpio.xz \
>> -device nec-usb-xhci,id=nec-usb-xhci0 \
>> -netdev tap,id=TAP0,helper=/home/aik/qemu-bridge-helper --br=br0 \
>> -device virtio-net-pci,id=vnet0,netdev=TAP0 img/f30le.qcow2 \
>> -snapshot \
>> -smp 8,threads=8 \
>> -trace events=qemu_trace_events \
>> -d guest_errors \
>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh54088 \
>> -mon chardev=SOCKET0,mode=control
>>
>> Signed-off-by: Alexey Kardashevskiy 
> 
> It'd be nice to split this patch up a bit, though I'll admit it's not
> very obvious where to do so.


v6 is a patchset.

>> ---
>> Changes:
>> v5:
>> * made instances keep device and chardev pointers
>> * removed VIO dependencies
>> * print error if RTAS memory is not claimed as it should have been
>> * pack FDT as "quiesce"
>>
>> v4:
>> * fixed open
>> * validate ihandles in "call-method"
>>
>> v3:
>> * fixed phandles allocation
>> * s/__be32/uint32_t/ as we do not normally have __be32 type in qemu
>> * fixed size of /chosen/stdout
>> * bunch of renames
>> * do not create rtas properties at all, let the client 

Re: [PATCH 1/2] qdev: Introduce qdev_get_bus_device

2020-01-20 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> Hi Julia,
>
> Cc'ing Markus for the qdev/qbus analysis.
>
> On 1/15/20 11:40 PM, Julia Suvorova wrote:
>> For bus devices, it is useful to be able to handle the parent device.
>>
>> Signed-off-by: Julia Suvorova 
>> ---
>>   hw/core/qdev.c  |  5 +
>>   hw/pci-bridge/pci_expander_bridge.c |  4 +++-
>>   hw/scsi/scsi-bus.c  |  4 +++-
>>   hw/usb/bus.c|  4 +++-
>>   hw/usb/dev-smartcard-reader.c   | 32 +
>>   hw/virtio/virtio-pci.c  | 16 +--
>>   include/hw/qdev-core.h  |  2 ++
>
> Please consider using the scripts/git.orderfile config.
>
>>   7 files changed, 54 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
>> index 9f1753f5cf..ad8226e240 100644
>> --- a/hw/core/qdev.c
>> +++ b/hw/core/qdev.c
>> @@ -114,6 +114,11 @@ void qdev_set_parent_bus(DeviceState *dev, BusState 
>> *bus)
>>   }
>>   }
>>   +DeviceState *qdev_get_bus_device(const DeviceState *dev)
>
> We have qdev_get_bus_hotplug_handler(), this follow the naming, OK.
>
>> +{
>> +return dev->parent_bus ? dev->parent_bus->parent : NULL;
>> +}
>> +
>>   /* Create a new device.  This only initializes the device state
>>  structure and allows properties to be set.  The device still needs
>>  to be realized.  See qdev-core.h.  */
>> diff --git a/hw/pci-bridge/pci_expander_bridge.c 
>> b/hw/pci-bridge/pci_expander_bridge.c
>> index 0592818447..63a6c07406 100644
>> --- a/hw/pci-bridge/pci_expander_bridge.c
>> +++ b/hw/pci-bridge/pci_expander_bridge.c
>> @@ -125,9 +125,11 @@ static char *pxb_host_ofw_unit_address(const 
>> SysBusDevice *dev)
>>   assert(position >= 0);
>> pxb_dev_base = DEVICE(pxb_dev);
>> -main_host = PCI_HOST_BRIDGE(pxb_dev_base->parent_bus->parent);
>> +main_host = PCI_HOST_BRIDGE(qdev_get_bus_device(pxb_dev_base));
>>   main_host_sbd = SYS_BUS_DEVICE(main_host);
>>   +g_assert(main_host);
>
> I found myself stuck reviewing this patch for 25min, I'm not sure
> what's bugging me yet, so I'll take notes a-la-Markus-style.
>
> We have the qdev API, with DeviceState.
>
>
> We have the qbus API, with BusState.
>
> A BusState is not a DeviceState but a raw Object.

It's a completely separate kind of Object.

> It keeps a pointer to the a DeviceState parent, a HotplugHandler, and
> a list of BusChild.
>
>
> BusChild are neither DeviceState nor Object, but keep a pointer the a
> DeviceState.

It's a thin wrapper around DeviceState to support collecting the
DeviceState into a list.

> TYPE_HOTPLUG_HANDLER is an interface. It can be implemented by any
> object, but its API seems expects a DeviceState as argument.

What do you mean by "interface expects an argument"?

The interface methods all take a HotplugHandler * and a DeviceState *.
The latter is the device being plugged / unplugged, the former is its
hotplug handler.  In the generic case, @dev's hotplug handler is
qdev_get_hotplug_handler(dev).

> Looking at examples implementing TYPE_HOTPLUG_HANDLER:
>
> - TYPE_USB_BUS. It inherits TYPE_BUS. Handlers will be called with
> USBDevice as argument (TYPE_USB_DEVICE -> TYPE_DEVICE).
>
> - TYPE_PCI_BRIDGE_DEV. Inherits TYPE_PCI_BRIDGE -> TYPE_PCI_DEVICE -> 
> TYPE_DEVICE. Handlers expects PCIDevice (TYPE_PCI_DEVICE).
>
> - TYPE_PC_MACHINE. It inherits TYPE_X86_MACHINE -> TYPE_MACHINE -> 
> TYPE_OBJECT. Not a TYPE_BUS. Handlers for TYPE_PC_DIMM, TYPE_CPU and
> TYPE_VIRTIO_PMEM_PCI. Complex... TYPE_PC_DIMM/TYPE_CPU are
> TYPE_DEVICE.
> For TYPE_VIRTIO_PMEM_PCI we have VirtIOPMEMPCI -> VirtIOPCIProxy -> 
> PCIDevice.
>
> - USB_CCID_DEV. Inherits TYPE_USB_DEVICE -> TYPE_DEVICE. Only one
> 'unplug' handler which likely expects USBCCIDState.
>
> - TYPE_SCSI_BUS. Inherits TYPE_BUS. Also a single 'unplug' handler
> expecting SCSIDevice.
>
> - TYPE_VIRTIO_SCSI. Inherits TYPE_VIRTIO_SCSI_COMMON -> 
> TYPE_VIRTIO_DEVICE -> TYPE_DEVICE. Handlers expect VirtIOSCSI.
>
>
> No simple pattern so far.
>
>
> Looking back at qbus. qbus_initfn() enforces a TYPE_HOTPLUG_HANDLER
> property on BusState (which is not a DeviceState). So IIUC TYPE_BUS
> also implements TYPE_HOTPLUG_HANDLER.

I think this merely creates a reference to the concrete bus's hotplug
handler.  TYPE_BUS is abstract, and doesn't implement any interfaces
(its .interfaces is empty).

Anything else you'd like me to check for you?

[...]




Re: [PATCH] spapr: Migrate CAS reboot flag

2020-01-20 Thread Greg Kurz
On Tue, 21 Jan 2020 14:41:26 +1100
David Gibson  wrote:

> On Wed, Jan 15, 2020 at 07:10:47PM +0100, Cédric Le Goater wrote:
> > On 1/15/20 6:48 PM, Greg Kurz wrote:
> > > Migration can potentially race with CAS reboot. If the migration thread
> > > completes migration after CAS has set spapr->cas_reboot but before the
> > > mainloop could pick up the reset request and reset the machine, the
> > > guest is migrated unrebooted and the destination doesn't reboot it
> > > either because it isn't aware a CAS reboot was needed (eg, because a
> > > device was added before CAS). This likely result in a broken or hung
> > > guest.
> > > 
> > > Even if it is small, the window between CAS and CAS reboot is enough to
> > > re-qualify spapr->cas_reboot as state that we should migrate. Add a new
> > > subsection for that and always send it when a CAS reboot is pending.
> > > This may cause migration to older QEMUs to fail but it is still better
> > > than end up with a broken guest.
> > > 
> > > The destination cannot honour the CAS reboot request from a post load
> > > handler because this must be done after the guest is fully restored.
> > > It is thus done from a VM change state handler.
> > > 
> > > Reported-by: Lukáš Doktor 
> > > Signed-off-by: Greg Kurz 
> > 
> > Cédric Le Goater 
> > 
> > Nice work ! That was quite complex to catch !
> 
> It is a very nice analysis.  However, I'm disinclined to merge this
> for the time being.
> 
> My preferred approach would be to just eliminate CAS reboots
> altogether, since that has other benefits.  I'm feeling like this
> isn't super-urgent, since CAS reboots are extremely rare in practice,
> now that we've eliminated the one for the irq switchover.
> 

Yeah. The only _true_ need for CAS rebooting now seems to be hotplug
before CAS, which is likely not something frequent.

> However, if it's not looking like we'll be ready to do that as the
> qemu-5.0 release approaches, then I'll be more than willing to
> reconsider this.
> 

I hope we can drop CAS reboot in time.


pgp3KJKQ24sp6.pgp
Description: OpenPGP digital signature


Re: [PATCH 02/13] qcrypto-luks: implement encryption key management

2020-01-20 Thread Markus Armbruster
Reviewing just the QAPI schema.

Maxim Levitsky  writes:

> Next few patches will expose that functionality
> to the user.
>
> Signed-off-by: Maxim Levitsky 
> ---
>  crypto/block-luks.c | 374 +++-
>  qapi/crypto.json|  50 +-
>  2 files changed, 421 insertions(+), 3 deletions(-)
>
> diff --git a/crypto/block-luks.c b/crypto/block-luks.c
> index 4861db810c..349e95fed3 100644
> --- a/crypto/block-luks.c
> +++ b/crypto/block-luks.c
> @@ -32,6 +32,7 @@
>  #include "qemu/uuid.h"
>  
>  #include "qemu/coroutine.h"
> +#include "qemu/bitmap.h"
>  
>  /*
>   * Reference for the LUKS format implemented here is
> @@ -70,6 +71,9 @@ typedef struct QCryptoBlockLUKSKeySlot 
> QCryptoBlockLUKSKeySlot;
>  
>  #define QCRYPTO_BLOCK_LUKS_SECTOR_SIZE 512LL
>  
> +#define QCRYPTO_BLOCK_LUKS_DEFAULT_ITER_TIME_MS 2000
> +#define QCRYPTO_BLOCK_LUKS_ERASE_ITERATIONS 40
> +
>  static const char qcrypto_block_luks_magic[QCRYPTO_BLOCK_LUKS_MAGIC_LEN] = {
>  'L', 'U', 'K', 'S', 0xBA, 0xBE
>  };
> @@ -219,6 +223,9 @@ struct QCryptoBlockLUKS {
>  
>  /* Hash algorithm used in pbkdf2 function */
>  QCryptoHashAlgorithm hash_alg;
> +
> +/* Name of the secret that was used to open the image */
> +char *secret;
>  };
>  
>  
> @@ -1069,6 +1076,112 @@ qcrypto_block_luks_find_key(QCryptoBlock *block,
>  return -1;
>  }
>  
> +/*
> + * Returns true if a slot i is marked as active
> + * (contains encrypted copy of the master key)
> + */
> +static bool
> +qcrypto_block_luks_slot_active(const QCryptoBlockLUKS *luks,
> +   unsigned int slot_idx)
> +{
> +uint32_t val = luks->header.key_slots[slot_idx].active;
> +return val ==  QCRYPTO_BLOCK_LUKS_KEY_SLOT_ENABLED;
> +}
> +
> +/*
> + * Returns the number of slots that are marked as active
> + * (slots that contain encrypted copy of the master key)
> + */
> +static unsigned int
> +qcrypto_block_luks_count_active_slots(const QCryptoBlockLUKS *luks)
> +{
> +size_t i = 0;
> +unsigned int ret = 0;
> +
> +for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
> +if (qcrypto_block_luks_slot_active(luks, i)) {
> +ret++;
> +}
> +}
> +return ret;
> +}
> +
> +/*
> + * Finds first key slot which is not active
> + * Returns the key slot index, or -1 if it doesn't exist
> + */
> +static int
> +qcrypto_block_luks_find_free_keyslot(const QCryptoBlockLUKS *luks)
> +{
> +size_t i;
> +
> +for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
> +if (!qcrypto_block_luks_slot_active(luks, i)) {
> +return i;
> +}
> +}
> +return -1;
> +
> +}
> +
> +/*
> + * Erases an keyslot given its index
> + * Returns:
> + *0 if the keyslot was erased successfully
> + *   -1 if a error occurred while erasing the keyslot
> + *
> + */
> +static int
> +qcrypto_block_luks_erase_key(QCryptoBlock *block,
> + unsigned int slot_idx,
> + QCryptoBlockWriteFunc writefunc,
> + void *opaque,
> + Error **errp)
> +{
> +QCryptoBlockLUKS *luks = block->opaque;
> +QCryptoBlockLUKSKeySlot *slot = >header.key_slots[slot_idx];
> +g_autofree uint8_t *garbagesplitkey = NULL;
> +size_t splitkeylen = luks->header.master_key_len * slot->stripes;
> +size_t i;
> +
> +assert(slot_idx < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS);
> +assert(splitkeylen > 0);
> +
> +garbagesplitkey = g_new0(uint8_t, splitkeylen);
> +
> +/* Reset the key slot header */
> +memset(slot->salt, 0, QCRYPTO_BLOCK_LUKS_SALT_LEN);
> +slot->iterations = 0;
> +slot->active = QCRYPTO_BLOCK_LUKS_KEY_SLOT_DISABLED;
> +
> +qcrypto_block_luks_store_header(block,  writefunc, opaque, errp);
> +
> +/*
> + * Now try to erase the key material, even if the header
> + * update failed
> + */
> +for (i = 0; i < QCRYPTO_BLOCK_LUKS_ERASE_ITERATIONS; i++) {
> +if (qcrypto_random_bytes(garbagesplitkey, splitkeylen, errp) < 0) {
> +/*
> + * If we failed to get the random data, still write
> + * at least zeros to the key slot at least once
> + */
> +if (i > 0) {
> +return -1;
> +}
> +}
> +
> +if (writefunc(block,
> +  slot->key_offset_sector * 
> QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
> +  garbagesplitkey,
> +  splitkeylen,
> +  opaque,
> +  errp) != splitkeylen) {
> +return -1;
> +}
> +}
> +return 0;
> +}
>  
>  static int
>  qcrypto_block_luks_open(QCryptoBlock *block,
> @@ -1099,6 +1212,7 @@ qcrypto_block_luks_open(QCryptoBlock *block,
>  
>  luks = g_new0(QCryptoBlockLUKS, 1);
>  block->opaque = luks;
> +luks->secret = g_strdup(options->u.luks.key_secret);
>  
>  if 

Re: [PATCH] spapr: Migrate CAS reboot flag

2020-01-20 Thread Greg Kurz
On Fri, 17 Jan 2020 16:44:27 +0100
Greg Kurz  wrote:

> On Fri, 17 Jan 2020 19:16:08 +1000
> David Gibson  wrote:
> 
> > On Thu, Jan 16, 2020 at 07:29:02PM +0100, Greg Kurz wrote:
> > > On Thu, 16 Jan 2020 13:14:35 +0100
> > > Greg Kurz  wrote:
> > > 
> > > > On Thu, 16 Jan 2020 11:37:24 +0100
> > > > Laurent Vivier  wrote:
> > > > 
> > > > > On 16/01/2020 09:48, Greg Kurz wrote:
> > > > > > On Wed, 15 Jan 2020 19:10:37 +0100
> > > > > > Laurent Vivier  wrote:
> > > > > > 
> > > > > >> Hi,
> > > > > >>
> > > > > >> On 15/01/2020 18:48, Greg Kurz wrote:
> > > > > >>> Migration can potentially race with CAS reboot. If the migration 
> > > > > >>> thread
> > > > > >>> completes migration after CAS has set spapr->cas_reboot but 
> > > > > >>> before the
> > > > > >>> mainloop could pick up the reset request and reset the machine, 
> > > > > >>> the
> > > > > >>> guest is migrated unrebooted and the destination doesn't reboot it
> > > > > >>> either because it isn't aware a CAS reboot was needed (eg, 
> > > > > >>> because a
> > > > > >>> device was added before CAS). This likely result in a broken or 
> > > > > >>> hung
> > > > > >>> guest.
> > > > > >>>
> > > > > >>> Even if it is small, the window between CAS and CAS reboot is 
> > > > > >>> enough to
> > > > > >>> re-qualify spapr->cas_reboot as state that we should migrate. Add 
> > > > > >>> a new
> > > > > >>> subsection for that and always send it when a CAS reboot is 
> > > > > >>> pending.
> > > > > >>> This may cause migration to older QEMUs to fail but it is still 
> > > > > >>> better
> > > > > >>> than end up with a broken guest.
> > > > > >>>
> > > > > >>> The destination cannot honour the CAS reboot request from a post 
> > > > > >>> load
> > > > > >>> handler because this must be done after the guest is fully 
> > > > > >>> restored.
> > > > > >>> It is thus done from a VM change state handler.
> > > > > >>>
> > > > > >>> Reported-by: Lukáš Doktor 
> > > > > >>> Signed-off-by: Greg Kurz 
> > > > > >>> ---
> > > > > >>>
> > > > > >>
> > > > > >> I'm wondering if the problem can be related with the fact that
> > > > > >> main_loop_should_exit() could release qemu_global_mutex in
> > > > > >> pause_all_vcpus() in the reset case?
> > > > > >>
> > > > > >> 1602 static bool main_loop_should_exit(void)
> > > > > >> 1603 {
> > > > > >> ...
> > > > > >> 1633 request = qemu_reset_requested();
> > > > > >> 1634 if (request) {
> > > > > >> 1635 pause_all_vcpus();
> > > > > >> 1636 qemu_system_reset(request);
> > > > > >> 1637 resume_all_vcpus();
> > > > > >> 1638 if (!runstate_check(RUN_STATE_RUNNING) &&
> > > > > >> 1639 !runstate_check(RUN_STATE_INMIGRATE)) {
> > > > > >> 1640 runstate_set(RUN_STATE_PRELAUNCH);
> > > > > >> 1641 }
> > > > > >> 1642 }
> > > > > >> ...
> > > > > >>
> > > > > >> I already sent a patch for this kind of problem (in current Juan 
> > > > > >> pull
> > > > > >> request):
> > > > > >>
> > > > > >> "runstate: ignore finishmigrate -> prelaunch transition"
> > > > > >>
> > > > > > 
> > > > > > IIUC your patch avoids an invalid 'prelaunch' -> 'postmigrate' 
> > > > > > runstate
> > > > > > transition that can happen if the migration thread sets the 
> > > > > > runstate to
> > > > > > 'finishmigrate' when pause_all_vcpus() releases the main loop mutex.
> > > > > > 
> > > > > > ie. symptom of the problem is QEMU aborting, correct ? The issue I'm
> > > > > > trying to fix is a guest breakage caused by a discrepancy between
> > > > > > QEMU and the guest after migration has succeeded.
> > > > > > 
> > > > > >> but I don't know if it could fix this one.
> > > > > >>
> > > > > > 
> > > > > > I don't think so and your patch kinda illustrates it. If the 
> > > > > > runstate
> > > > > > is 'finishmigrate' when returning from pause_all_vcpus(), this means
> > > > > > that state was sent to the destination before we could actually 
> > > > > > reset
> > > > > > the machine.
> > > > > 
> > > > > Yes, you're right.
> > > > > 
> > > > > But the question behind my comment was: is it expected to have a 
> > > > > pending
> > > > > reset while we are migrating?
> > > > > 
> > > > 
> > > > Nothing prevents qemu_system_reset_request() to be called when migration
> > > > is active. 
> > > > 
> > > > > Perhaps H_CAS can return H_BUSY and wait the end of the migration and
> > > > > then be fully executed on destination?
> > > > > 
> > > > 
> > > > And so we would need to teach SLOF to try H_CAS again until it stops
> > > > returning H_BUSY ? It seems safer to migrate the CAS reboot flag IMHO.
> > > > 
> > > 
> > > Ok I've tried that with a patched SLOF that sleeps 500ms and tries CAS
> > > again if H_BUSY was returned. It fixes the issue but it looks a bit
> > > ugly because of the polling with an arbitrary timeout in SLOF... I'm
> > > not very comfortable either with calling migration_is_active() from
> > > the CAS code in QEMU.
> > > 
> > > David,
> > > 
> > > Any 

Re: [PATCH] riscv: Fix defination of TW bits in mstatus CSR

2020-01-20 Thread Alistair Francis
On Mon, Jan 20, 2020 at 6:59 PM Ian Jiang  wrote:
>
> The origin defination of TW bits in mstatus is not correct.
> This patch fixes the problem.
>
> Signed-off-by: Ian Jiang 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu_bits.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index e99834856c..fb2e0b340e 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -349,7 +349,7 @@
>  #define MSTATUS_MXR 0x0008
>  #define MSTATUS_VM  0x1F00 /* until: priv-1.9.1 */
>  #define MSTATUS_TVM 0x0010 /* since: priv-1.10 */
> -#define MSTATUS_TW  0x2000 /* since: priv-1.10 */
> +#define MSTATUS_TW  0x0020 /* since: priv-1.10 */
>  #define MSTATUS_TSR 0x4000 /* since: priv-1.10 */
>  #define MSTATUS_MTL 0x40ULL
>  #define MSTATUS_MPV 0x80ULL
> --
> 2.17.1
>
>



Re: [Qemu-devel] What should a virtual board emulate?

2020-01-20 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 04/01/20 22:16, Philippe Mathieu-Daudé wrote:
>> 1/ the Radeon chip is soldered on the motherboard,
>> 
>> 2/ the default BIOS expects the Radeon chip to be
>>    unconditionally present,
>> 
>> I insist this patch is incorrect for the particular case of the
>> Fuloong2e board. I plan to revert it when I post the test.
>> 
>> BTW I'm not using --nodefault, I'm running default ./configure:
>> 
>> qemu-system-mips64el -M fulong2e -bios pmon_2e.bin \
>> -display none -vga none -serial stdio
>
> But if you're not specifying -nodefaults, why are you specifying a
> configuration that your BIOS does not support?  You should just remove
> -vga none and leave in -display none.

Is there any use for -vga none with this machine?  If no, then rejecting
it cleanly would be nicer than having the machine hang.




Re: [PATCH] python: Treat None-return of greeting cmd

2020-01-20 Thread Philippe Mathieu-Daudé




On 1/20/20 8:12 AM, Lukáš Doktor wrote:

In case qemu process dies the "monitor.cmd" returns None which gets
passed to the "__negotiate_capabilities" and leads to unhandled
exception. Let's only check the resp in case it has a value.

Signed-off-by: Lukáš Doktor 
---
  python/qemu/qmp.py | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/qemu/qmp.py b/python/qemu/qmp.py
index 5c8cf6a056..a3e5de718a 100644
--- a/python/qemu/qmp.py
+++ b/python/qemu/qmp.py
@@ -73,7 +73,7 @@ class QEMUMonitorProtocol(object):
  raise QMPConnectError
  # Greeting seems ok, negotiate capabilities
  resp = self.cmd('qmp_capabilities')
-if "return" in resp:
+if resp and "return" in resp:
  return greeting
  raise QMPCapabilitiesError
  



Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] qom/object: Display more helpful message when an interface is missing

2020-01-20 Thread Cornelia Huck
On Sat, 18 Jan 2020 17:23:48 +0100
Philippe Mathieu-Daudé  wrote:

> When adding new devices implementing QOM interfaces, we might
> forgot to add the Kconfig dependency that pulls the required
> objects in when building.
> 
> Since QOM dependencies are resolved at runtime, we don't get any
> link-time failures, and QEMU aborts while starting:
> 
>   $ qemu ...
>   Segmentation fault (core dumped)
> 
>   (gdb) bt
>   #0  0x7ff6e96b1e35 in raise () from /lib64/libc.so.6
>   #1  0x7ff6e969c895 in abort () from /lib64/libc.so.6
>   #2  0x5572bc5051cf in type_initialize (ti=0x5572be6f1200) at 
> qom/object.c:323
>   #3  0x5572bc505074 in type_initialize (ti=0x5572be6f1800) at 
> qom/object.c:301
>   #4  0x5572bc505074 in type_initialize (ti=0x5572be6e48e0) at 
> qom/object.c:301
>   #5  0x5572bc506939 in object_class_by_name (typename=0x5572bc56109a) at 
> qom/object.c:959
>   #6  0x5572bc503dd5 in cpu_class_by_name (typename=0x5572bc56109a, 
> cpu_model=0x5572be6d9930) at hw/core/cpu.c:286
> 
> Since the caller has access to the qdev parent/interface names,
> we can simply display them to avoid starting a debugger:
> 
>   $ qemu ...
>   qemu: missing interface 'fancy-if' for object 'fancy-dev'
>   Aborted (core dumped)
> 
> This commit is similar to e02bdf1cecd2 ("Display more helpful message
> when an object type is missing").
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Cornelia Huck 
> Cc: Stefano Garzarella 
> Cc: Daniel P. Berrangé 
> ---
>  qom/object.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/qom/object.c b/qom/object.c
> index 0d971ca897..36123fb330 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -317,6 +317,11 @@ static void type_initialize(TypeImpl *ti)
>  
>  for (i = 0; i < ti->num_interfaces; i++) {
>  TypeImpl *t = type_get_by_name(ti->interfaces[i].typename);
> +if (!t) {
> +error_report("missing interface '%s' for object '%s'",
> + ti->interfaces[i].typename, parent->name);
> +abort();
> +}
>  for (e = ti->class->interfaces; e; e = e->next) {
>  TypeImpl *target_type = OBJECT_CLASS(e->data)->type;
>  

Reviewed-by: Cornelia Huck 

...but I'm wondering if there are more cases like this? Just to avoid
playing whack-a-mole.




Re: [RFC PATCH v2 3/5] target/arm/kvm: Implement virtual time adjustment

2020-01-20 Thread Andrew Jones
On Thu, Dec 19, 2019 at 03:30:12PM +0100, Andrew Jones wrote:
> On Mon, Dec 16, 2019 at 06:06:30PM +, Peter Maydell wrote:
> > Your approach in this patchset reads and writes on vm-paused,
> > so it won't have the pre-2015 problems.
> > 
> > It still feels odd that we're storing this bit of guest state
> > in two places now though -- in kvm_vtime, and also in its usual
> > place in the cpreg_array data structures (we write back the
> > value from kvm_vtime when the VM starts running, and we write
> > back the value from the cpreg_array for a PUT_FULL_STATE, which
> > the comments claim is only on startup or when we just loaded
> > migration state (and also undocumentedly but reasonably on
> > cpu-hotplug, which arm doesn't have yet).

I tried to get rid of the extra state location (kvm_vtime), but we still
need it because kvm_arch_get_registers() doesn't take 'level', like
kvm_arch_put_registers() does. Maybe it should? Without being able to
filter out TIMER_CNT at get time too, then if we migrate a paused guest
we'll resume with vtime including the ticks between the pause and the
start of the migration. Adding the additional state (kvm_vtime) and a
cpu_pre_save() hook to fixup the cpreg value is a possible way to resolve
that. That's what I've done for v3, which I'll post shortly.

> > 
> > I've just spent a little while digging through code, and
> > haven't been able to satisfy myself on the ordering of which
> > writeback wins: for a loadvm I think we first do a
> > cpu_synchronize_all_post_init() (writing back the counter
> > value from the migration data) and then after than we will
> > unpause the VM -- why doesn't this overwrite the correct
> > value with the wrong value from kvm_vtime ?

It wasn't overwriting because we weren't detecting a runstate
transition from paused to running. However, for v3, I've dropped
the explicit running/pause transition checking and now ensured
we get the right value with a cpu_post_load() hook.

> 
> > I just noticed also that the logic used in this patch
> > doesn't match what other architectures do in their vm_state_change
> > function -- eg cpu_ppc_clock_vm_state_change() has an
> > "if (running) { load } else { save }", and kvmclock_vm_state_change()
> > for i386 also has "if (running) { ... } else { ... }", though
> > it has an extra wrinkle where it captures "are we PAUSED?"
> > to use in the pre_save function; the comment above
> > kvmclock_pre_save() suggests maybe that would be useful for other
> > than x86, too. kvm_s390_tod_vm_state_change() has
> > logic that's a slightly more complicated variation on just
> > testing the 'running' flag, but it doesn't look at the
> > specific new state.

I think I've mimicked this logic now for arm in v3.

Thanks,
drew




Re: qemu-4.0.1: vhost_region_add_section:Section rounded to 0 prior to previous a0000

2020-01-20 Thread Dr. David Alan Gilbert
* Peter Lieven (p...@kamp.de) wrote:
> Am 17.01.20 um 16:59 schrieb Dr. David Alan Gilbert:
> > * Peter Lieven (p...@kamp.de) wrote:
> > > Am 16.01.20 um 21:26 schrieb Dr. David Alan Gilbert:
> > > > * Peter Lieven (p...@kamp.de) wrote:
> > > > > Am 16.01.20 um 13:47 schrieb Peter Lieven:
> > > > > > Am 13.01.20 um 17:25 schrieb Peter Lieven:
> > > > > > > Am 09.01.20 um 19:44 schrieb Dr. David Alan Gilbert:
> > > > > > > > * Peter Lieven (p...@kamp.de) wrote:
> > > > > > > > > Am 08.01.20 um 16:04 schrieb Dr. David Alan Gilbert:
> > > > > > > > > > * Peter Lieven (p...@kamp.de) wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > I have a Qemu 4.0.1 machine with vhost-net network 
> > > > > > > > > > > adapter, thats polluting the log with the above message.
> > > > > > > > > > > 
> > > > > > > > > > > Is this something known? Googling revealed the following 
> > > > > > > > > > > patch in Nemu (with seems to be a Qemu fork from Intel):
> > > > > > > > > > > 
> > > > > > > > > > > https://github.com/intel/nemu/commit/03940ded7f5370ce7492c619dccced114ef7f56e
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > The network stopped functioning. After a live-migration 
> > > > > > > > > > > the vServer is reachable again.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Any ideas?
> > > > > > > > > > What guest are you running and what does your qemu 
> > > > > > > > > > commandline look
> > > > > > > > > > like?
> > > > > > > > > Its running debian9. We have hundreds of other VMs with 
> > > > > > > > > identical setup. Do not know why this one makes trouble.
> > > > > > > > Could you extract an 'info mtree' from it - particularly the
> > > > > > > > 'address-space: memory' near the top.
> > > > > > > Here we go:
> > > > > > > 
> > > > > > > 
> > > > > > > address-space: memory
> > > > > > >    - (prio 0, i/o): system
> > > > > > >      -3fff (prio 0, i/o): alias 
> > > > > > > ram-below-4g @pc.ram -3fff
> > > > > > >      - (prio -1, i/o): pci
> > > > > > >    000a-000a (prio 2, i/o): alias 
> > > > > > > vga.chain4 @vga.vram -
> > > > > > >    000a-000b (prio 1, i/o): vga-lowmem
> > > > > > What seems special is that the RAM area is prio2. Any idea if this 
> > > > > > makes trouble?
> > > > > Update from my side. This happens when I have Debian 10 with XFCE 
> > > > > when the Graphical User Interface is initialized.
> > > > > 
> > > > > I see the log message when I specify -M pc-i440fx-2.9. If I obmit the 
> > > > > machine type the error does not appear.
> > > > I can't persuade this to reproduce here on the images I currently have;
> > > > but if you can rebuild, can you try the v3 of 'Fix hyperv synic on
> > > > vhost' I've just posted?  It turns off the alignment code that's
> > > > spitting that error in vhost-kernel cases, so should go away.
> > > Your patch also seems to fix also my issue. No more errors and the 
> > > network keeps responding.
> > Great, can you reply to that post with a Tested-by ?
> 
> 
> I was not sure if I should reply since the patch is for a total different 
> issue.
> 
> Anyway, please feel free to add a
> 
> 
> Tested-by: Peter Lieven 
> 
> 
> I am currently not subscribed to qemu-devel so I can't reply directly.
> 

OK, thanks!

Dave

> Best,
> 
> Peter
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PATCH] audio/oss: fix buffer pos calculation

2020-01-20 Thread Gerd Hoffmann
Fixes: 3ba4066d085f ("ossaudio: port to the new audio backend api")
Reported-by: ziming zhang 
Signed-off-by: Gerd Hoffmann 
---
 audio/ossaudio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/audio/ossaudio.c b/audio/ossaudio.c
index c43faeeea4aa..94564916fbf0 100644
--- a/audio/ossaudio.c
+++ b/audio/ossaudio.c
@@ -420,7 +420,7 @@ static size_t oss_write(HWVoiceOut *hw, void *buf, size_t 
len)
 size_t to_copy = MIN(len, hw->size_emul - hw->pos_emul);
 memcpy(hw->buf_emul + hw->pos_emul, buf, to_copy);
 
-hw->pos_emul = (hw->pos_emul + to_copy) % hw->pos_emul;
+hw->pos_emul = (hw->pos_emul + to_copy) % hw->size_emul;
 buf += to_copy;
 len -= to_copy;
 }
-- 
2.18.1




[PULL 02/29] migration-test: Add migration multifd test

2020-01-20 Thread Juan Quintela
We set multifd-channels.

Signed-off-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Thomas Huth 
Tested-by: Wei Yang 
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 56 
 1 file changed, 56 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 341d190922..81f65602de 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1202,6 +1202,61 @@ static void test_migrate_auto_converge(void)
 test_migrate_end(from, to, true);
 }
 
+static void test_multifd_tcp(void)
+{
+MigrateStart *args = migrate_start_new();
+QTestState *from, *to;
+QDict *rsp;
+char *uri;
+
+if (test_migrate_start(, , "defer", args)) {
+return;
+}
+
+/*
+ * We want to pick a speed slow enough that the test completes
+ * quickly, but that it doesn't complete precopy even on a slow
+ * machine, so also set the downtime.
+ */
+/* 1 ms should make it not converge*/
+migrate_set_parameter_int(from, "downtime-limit", 1);
+/* 1GB/s */
+migrate_set_parameter_int(from, "max-bandwidth", 10);
+
+migrate_set_parameter_int(from, "multifd-channels", 16);
+migrate_set_parameter_int(to, "multifd-channels", 16);
+
+migrate_set_capability(from, "multifd", "true");
+migrate_set_capability(to, "multifd", "true");
+
+/* Start incoming migration from the 1st socket */
+rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
+   "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+qobject_unref(rsp);
+
+/* Wait for the first serial output from the source */
+wait_for_serial("src_serial");
+
+uri = migrate_get_socket_address(to, "socket-address");
+
+migrate_qmp(from, uri, "{}");
+
+wait_for_migration_pass(from);
+
+/* 300ms it should converge */
+migrate_set_parameter_int(from, "downtime-limit", 300);
+
+if (!got_stop) {
+qtest_qmp_eventwait(from, "STOP");
+}
+qtest_qmp_eventwait(to, "RESUME");
+
+wait_for_serial("dest_serial");
+wait_for_migration_complete(from);
+test_migrate_end(from, to, true);
+free(uri);
+}
+
 int main(int argc, char **argv)
 {
 char template[] = "/tmp/migration-test-XX";
@@ -1266,6 +1321,7 @@ int main(int argc, char **argv)
test_validate_uuid_dst_not_set);
 
 qtest_add_func("/migration/auto_converge", test_migrate_auto_converge);
+qtest_add_func("/migration/multifd/tcp", test_multifd_tcp);
 
 ret = g_test_run();
 
-- 
2.24.1




[PULL 00/29] Migration pull patches

2020-01-20 Thread Juan Quintela
The following changes since commit 7fb38daf256bd1bcbcb5ea556422283d0d55a1b1:

  Merge remote-tracking branch 
'remotes/pmaydell/tags/pull-target-arm-20200117-1' into staging (2020-01-17 
17:27:20 +)

are available in the Git repository at:

  https://github.com/juanquintela/qemu.git tags/migration-pull-pull-request

for you to fetch changes up to ddac5cb2d95774cd019bfaf93c54ffd921095fea:

  multifd: Be consistent about using uint64_t (2020-01-20 09:17:07 +0100)


Migration pull request (take 5)

Making history short:

* having your machine named x32 to really be a 32bit guest helps for
  testing 32bits
* disabling CONFIG_XEN on i686 makes rdma_addr_t be a 32bit value

After this, and patch sent on Friday, I got this pull request to:
* compile on x86_64
* pass tests on x86_64
* compile on i686
* pass tests on i686 (with rdma_dma_t 32 bits)
* cross-compile for windows 32bits
* cross-compile for windows 64bits

Please apply, Juan.



Alexey Romko (1):
  Bug #1829242 correction.

Daniel Henrique Barboza (1):
  ram.c: remove unneeded labels

Dr. David Alan Gilbert (1):
  migration: Rate limit inside host pages

Eric Auger (1):
  migration: Support QLIST migration

Fangrui Song (1):
  migration: Fix incorrect integer->float conversion caught by clang

Jiahui Cen (2):
  migration/multifd: fix nullptr access in terminating multifd threads
  migration/multifd: fix destroyed mutex access in terminating multifd
threads

Juan Quintela (5):
  multifd: Initialize local variable
  migration-test: Add migration multifd test
  migration: Make sure that we don't call write() in case of error
  migration-test: introduce functions to handle string parameters
  multifd: Be consistent about using uint64_t

Laurent Vivier (1):
  runstate: ignore finishmigrate -> prelaunch transition

Marc-André Lureau (1):
  misc: use QEMU_IS_ALIGNED

Peter Xu (3):
  migration: Define VMSTATE_INSTANCE_ID_ANY
  migration: Change SaveStateEntry.instance_id into uint32_t
  apic: Use 32bit APIC ID for migration instance ID

Scott Cheloha (2):
  migration: add savevm_state_handler_remove()
  migration: savevm_state_handler_insert: constant-time element
insertion

Wei Yang (8):
  migration/postcopy: reduce memset when it is zero page and
matches_target_page_size
  migration/postcopy: wait for decompress thread in precopy
  migration/postcopy: count target page number to decide the
place_needed
  migration/postcopy: set all_zero to true on the first target page
  migration/postcopy: enable random order target page arrival
  migration/postcopy: enable compress during postcopy
  migration/multifd: clean pages after filling packet
  migration/multifd: not use multifd during postcopy

Yury Kotov (2):
  migration: Fix the re-run check of the migrate-incoming command
  migration/ram: Yield periodically to the main loop

 backends/dbus-vmstate.c  |   3 +-
 exec.c   |   4 +-
 hw/arm/stellaris.c   |   2 +-
 hw/core/qdev.c   |   3 +-
 hw/display/ads7846.c |   2 +-
 hw/i2c/core.c|   2 +-
 hw/input/stellaris_input.c   |   3 +-
 hw/intc/apic_common.c|   7 +-
 hw/misc/max111x.c|   3 +-
 hw/net/eepro100.c|   3 +-
 hw/pci/pci.c |   2 +-
 hw/ppc/spapr.c   |   2 +-
 hw/timer/arm_timer.c |   2 +-
 hw/tpm/tpm_emulator.c|   3 +-
 include/migration/register.h |   2 +-
 include/migration/vmstate.h  |  25 -
 include/qemu/queue.h |  39 +++
 migration/migration.c|  72 ++---
 migration/migration.h|   1 +
 migration/ram.c  | 196 +--
 migration/savevm.c   |  61 ---
 migration/trace-events   |   9 +-
 migration/vmstate-types.c|  70 +
 stubs/vmstate.c  |   2 +-
 tests/qtest/migration-test.c |  93 +
 tests/test-vmstate.c | 170 ++
 vl.c |  10 +-
 27 files changed, 659 insertions(+), 132 deletions(-)

-- 
2.24.1




[PULL 10/29] misc: use QEMU_IS_ALIGNED

2020-01-20 Thread Juan Quintela
From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
Reviewed-by: Juan Quintela 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefan Berger 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Juan Quintela 
---
 exec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 0f6b087f57..67e520d18e 100644
--- a/exec.c
+++ b/exec.c
@@ -3895,7 +3895,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, 
size_t length)
 
 uint8_t *host_startaddr = rb->host + start;
 
-if ((uintptr_t)host_startaddr & (rb->page_size - 1)) {
+if (!QEMU_PTR_IS_ALIGNED(host_startaddr, rb->page_size)) {
 error_report("ram_block_discard_range: Unaligned start address: %p",
  host_startaddr);
 goto err;
@@ -3903,7 +3903,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, 
size_t length)
 
 if ((start + length) <= rb->used_length) {
 bool need_madvise, need_fallocate;
-if (length & (rb->page_size - 1)) {
+if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
 error_report("ram_block_discard_range: Unaligned length: %zx",
  length);
 goto err;
-- 
2.24.1




[PULL 13/29] migration/ram: Yield periodically to the main loop

2020-01-20 Thread Juan Quintela
From: Yury Kotov 

Usually, incoming migration coroutine yields to the main loop
while its IO-channel is waiting for data to receive. But there is a case
when RAM migration and data receive have the same speed: VM with huge
zeroed RAM. In this case, IO-channel won't read and thus the main loop
is stuck and for instance, it doesn't respond to QMP commands.

For this case, yield periodically, but not too often, so as not to
affect the speed of migration.

Signed-off-by: Yury Kotov 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 1ec5c10561..5cd066467c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4246,7 +4246,7 @@ static void colo_flush_ram_cache(void)
  */
 static int ram_load_precopy(QEMUFile *f)
 {
-int flags = 0, ret = 0, invalid_flags = 0, len = 0;
+int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
 /* ADVISE is earlier, it shows the source has the postcopy capability on */
 bool postcopy_advised = postcopy_is_advised();
 if (!migrate_use_compression()) {
@@ -4258,6 +4258,17 @@ static int ram_load_precopy(QEMUFile *f)
 void *host = NULL;
 uint8_t ch;
 
+/*
+ * Yield periodically to let main loop run, but an iteration of
+ * the main loop is expensive, so do it each some iterations
+ */
+if ((i & 32767) == 0 && qemu_in_coroutine()) {
+aio_co_schedule(qemu_get_current_aio_context(),
+qemu_coroutine_self());
+qemu_coroutine_yield();
+}
+i++;
+
 addr = qemu_get_be64(f);
 flags = addr & ~TARGET_PAGE_MASK;
 addr &= TARGET_PAGE_MASK;
-- 
2.24.1




[PULL 17/29] migration/postcopy: set all_zero to true on the first target page

2020-01-20 Thread Juan Quintela
From: Wei Yang 

For the first target page, all_zero is set to true for this round check.

After target_pages introduced, we could leverage this variable instead
of checking the address offset.

Signed-off-by: Wei Yang 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 8ebaea255e..460abfa2c3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4102,7 +4102,7 @@ static int ram_load_postcopy(QEMUFile *f)
 page_buffer = postcopy_host_page +
   ((uintptr_t)host & (block->page_size - 1));
 /* If all TP are zero then we can optimise the place */
-if (!((uintptr_t)host & (block->page_size - 1))) {
+if (target_pages == 1) {
 all_zero = true;
 } else {
 /* not the 1st TP within the HP */
-- 
2.24.1




[PULL 27/29] apic: Use 32bit APIC ID for migration instance ID

2020-01-20 Thread Juan Quintela
From: Peter Xu 

Migration is silently broken now with x2apic config like this:

 -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
 -device intel-iommu,intremap=on,eim=on

After migration, the guest kernel could hang at anything, due to
x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
any operations related to x2apic could be broken then (e.g., RDMSR on
x2apic MSRs could fail because KVM would think that the vcpu hasn't
enabled x2apic at all).

The issue is that the x2apic bit was never applied correctly for vcpus
whose ID > 255 when migrate completes, and that's because when we
migrate APIC we use the APICCommonState.id as instance ID of the
migration stream, while that's too short for x2apic.

Let's use the newly introduced initial_apic_id for that.

Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Reviewed-by: Eduardo Habkost 
Signed-off-by: Juan Quintela 
---
 hw/intc/apic_common.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 54b8731fca..b5dbeb6206 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -268,7 +268,10 @@ static void apic_common_realize(DeviceState *dev, Error 
**errp)
 APICCommonState *s = APIC_COMMON(dev);
 APICCommonClass *info;
 static DeviceState *vapic;
-uint32_t instance_id = s->id;
+uint32_t instance_id = s->initial_apic_id;
+
+/* Normally initial APIC ID should be no more than hundreds */
+assert(instance_id != VMSTATE_INSTANCE_ID_ANY);
 
 info = APIC_COMMON_GET_CLASS(s);
 info->realize(dev, errp);
-- 
2.24.1




Re: [PATCH v2 6/6] hw/core/Makefile: Group generic objects versus system-mode objects

2020-01-20 Thread Alex Bennée


Philippe Mathieu-Daudé  writes:

> To ease review/modifications of this Makefile, group generic
> objects first, then system-mode specific ones, and finally
> peripherals (which are only used in system-mode).
>
> No logical changes introduced here.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/core/Makefile.objs | 28 ++--
>  1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
> index 2fea68ccf7..a522b7297d 100644
> --- a/hw/core/Makefile.objs
> +++ b/hw/core/Makefile.objs
> @@ -1,32 +1,32 @@
>  # core qdev-related obj files, also used by *-user:
>  common-obj-y += qdev.o qdev-properties.o
>  common-obj-y += bus.o
> +common-obj-y += cpu.o
> +common-obj-y += hotplug.o
> +common-obj-y += vmstate-if.o
> +# irq.o needed for qdev GPIO handling:

how are IRQs relevant to linux-user? or vmstate and hotplug?

> +common-obj-y += irq.o
> +
>  common-obj-$(CONFIG_SOFTMMU) += reset.o
>  common-obj-$(CONFIG_SOFTMMU) += qdev-fw.o
>  common-obj-$(CONFIG_SOFTMMU) += fw-path-provider.o
> -# irq.o needed for qdev GPIO handling:
> -common-obj-y += irq.o
> -common-obj-y += hotplug.o
>  common-obj-$(CONFIG_SOFTMMU) += nmi.o
>  common-obj-$(CONFIG_SOFTMMU) += vm-change-state-handler.o
> -common-obj-y += cpu.o
> -common-obj-y += vmstate-if.o
> +common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
> +common-obj-$(CONFIG_SOFTMMU) += sysbus.o
> +common-obj-$(CONFIG_SOFTMMU) += machine.o
> +common-obj-$(CONFIG_SOFTMMU) += null-machine.o
> +common-obj-$(CONFIG_SOFTMMU) += loader.o
> +common-obj-$(CONFIG_SOFTMMU) += machine-hmp-cmds.o
> +obj-$(CONFIG_SOFTMMU) += machine-qmp-cmds.o
> +obj-$(CONFIG_SOFTMMU) += numa.o
>  
>  common-obj-$(CONFIG_EMPTY_SLOT) += empty_slot.o
>  common-obj-$(CONFIG_XILINX_AXI) += stream.o
>  common-obj-$(CONFIG_PTIMER) += ptimer.o
> -common-obj-$(CONFIG_SOFTMMU) += sysbus.o
> -common-obj-$(CONFIG_SOFTMMU) += machine.o
> -common-obj-$(CONFIG_SOFTMMU) += loader.o
>  common-obj-$(CONFIG_FITLOADER) += loader-fit.o
> -common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
>  common-obj-$(CONFIG_REGISTER) += register.o
>  common-obj-$(CONFIG_OR_IRQ) += or-irq.o
>  common-obj-$(CONFIG_SPLIT_IRQ) += split-irq.o
>  common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
>  common-obj-$(CONFIG_GENERIC_LOADER) += generic-loader.o
> -common-obj-$(CONFIG_SOFTMMU) += null-machine.o
> -
> -obj-$(CONFIG_SOFTMMU) += machine-qmp-cmds.o
> -obj-$(CONFIG_SOFTMMU) += numa.o
> -common-obj-$(CONFIG_SOFTMMU) += machine-hmp-cmds.o


-- 
Alex Bennée



Re: [PATCH] audio/oss: fix buffer pos calculation

2020-01-20 Thread Philippe Mathieu-Daudé

On 1/20/20 11:18 AM, Gerd Hoffmann wrote:

Fixes: 3ba4066d085f ("ossaudio: port to the new audio backend api")
Reported-by: ziming zhang 
Signed-off-by: Gerd Hoffmann 
---
  audio/ossaudio.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/audio/ossaudio.c b/audio/ossaudio.c
index c43faeeea4aa..94564916fbf0 100644
--- a/audio/ossaudio.c
+++ b/audio/ossaudio.c
@@ -420,7 +420,7 @@ static size_t oss_write(HWVoiceOut *hw, void *buf, size_t 
len)
  size_t to_copy = MIN(len, hw->size_emul - hw->pos_emul);
  memcpy(hw->buf_emul + hw->pos_emul, buf, to_copy);
  
-hw->pos_emul = (hw->pos_emul + to_copy) % hw->pos_emul;

+hw->pos_emul = (hw->pos_emul + to_copy) % hw->size_emul;


Interestingly oss_put_buffer_out() is correct.

Reviewed-by: Philippe Mathieu-Daudé 


  buf += to_copy;
  len -= to_copy;
  }






Re: [PATCH v3 03/10] hbitmap: unpublish hbitmap_iter_skip_words

2020-01-20 Thread Max Reitz
On 19.12.19 11:03, Vladimir Sementsov-Ogievskiy wrote:
> Function is internal and even commented as internal. Drop its
> definition from .h file.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/qemu/hbitmap.h | 7 ---
>  util/hbitmap.c | 2 +-
>  2 files changed, 1 insertion(+), 8 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/4] migration: Replace gemu_log with qemu_log

2020-01-20 Thread Alex Bennée


Josh Kunz  writes:

> On Tue, Jan 14, 2020 at 3:02 AM Alex Bennée  wrote:
>
>>
>> Josh Kunz  writes:
>>
>> 
>> >
>> > Not tested:
>> >   * Build/logging with bsd-user. I do not have easy access to a BSD
>> > system.
>>
>> If you have the time we have vm-build-netbsd:
>>
>>   make vm-build-netbsd EXTRA_CONFIGURE_OPTS="--disable-system"
>>
>> Which will create a NetBSD image for you and run the build through it.
>> Other images are available ;-)
>>
>
> This works, but it looks like it only runs the tests on BSD, even with
> `configure --enable-bsd-user` first. I don't see the bsd-user binaries
> being produced in the output of this command.

Ahh the default build target for the BSDs is "check" but as bsd-user
doesn't have any checks it doesn't end up building. You can force it
with

  make vm-build-netbsd EXTRA_CONFIGURE_OPTS="--disable-system" 
BUILD_TARGET="all"

It would be worth plumbing in the tests/tcg tests at some point. I
suspect most of the user-mode tests are more POSIX than Linux.

-- 
Alex Bennée



Re: [PATCH v2 3/5] linux-user/i386: Emulate x86_64 vsyscalls

2020-01-20 Thread Alex Bennée


Richard Henderson  writes:

> Notice the magic page during translate, much like we already
> do for the arm32 commpage.  At runtime, raise an exception to
> return cpu_loop for emulation.
>
> Reviewed-by: Paolo Bonzini 
> Signed-off-by: Richard Henderson 
> ---
>  target/i386/cpu.h  |   1 +
>  linux-user/i386/cpu_loop.c | 105 +
>  target/i386/translate.c|  16 +-
>  3 files changed, 121 insertions(+), 1 deletion(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 164d038d1f..3fb2d2a986 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1000,6 +1000,7 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
>  
>  #define EXCP_VMEXIT 0x100 /* only for system emulation */
>  #define EXCP_SYSCALL0x101 /* only for user emulation */
> +#define EXCP_VSYSCALL   0x102 /* only for user emulation */
>  
>  /* i386-specific interrupt pending bits.  */
>  #define CPU_INTERRUPT_POLL  CPU_INTERRUPT_TGT_EXT_1
> diff --git a/linux-user/i386/cpu_loop.c b/linux-user/i386/cpu_loop.c
> index e217cca5ee..f9bf6cec27 100644
> --- a/linux-user/i386/cpu_loop.c
> +++ b/linux-user/i386/cpu_loop.c
> @@ -92,6 +92,106 @@ static void gen_signal(CPUX86State *env, int sig, int 
> code, abi_ptr addr)
>  queue_signal(env, info.si_signo, QEMU_SI_FAULT, );
>  }
>  
> +#ifdef TARGET_X86_64
> +static bool write_ok_or_segv(CPUX86State *env, abi_ptr addr, size_t len)
> +{
> +/*
> + * For all the vsyscalls, NULL means "don't write anything" not
> + * "write it at address 0".
> + */
> +if (addr == 0 || access_ok(VERIFY_WRITE, addr, len)) {
> +return true;
> +}
> +
> +env->error_code = PG_ERROR_W_MASK | PG_ERROR_U_MASK;
> +gen_signal(env, TARGET_SIGSEGV, TARGET_SEGV_MAPERR, addr);
> +return false;
> +}
> +
> +/*
> + * Since v3.1, the kernel traps and emulates the vsyscall page.
> + * Entry points other than the official generate SIGSEGV.
> + */
> +static void emulate_vsyscall(CPUX86State *env)
> +{
> +int syscall;
> +abi_ulong ret;
> +uint64_t caller;
> +
> +/*
> + * Validate the entry point.  We have already validated the page
> + * during translation, now verify the offset.
> + */
> +switch (env->eip & ~TARGET_PAGE_MASK) {
> +case 0x000:
> +syscall = TARGET_NR_gettimeofday;
> +break;
> +case 0x400:
> +syscall = TARGET_NR_time;
> +break;
> +case 0x800:
> +syscall = TARGET_NR_getcpu;
> +break;
> +default:
> +sigsegv:

this label looks a little extraneous.

Otherwise:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée



Re: [PATCH 100/104] virtiofsd: process requests in a thread pool

2020-01-20 Thread Misono Tomohiro
> From: Stefan Hajnoczi 
> 
> Introduce a thread pool so that fv_queue_thread() just pops
> VuVirtqElements and hands them to the thread pool.  For the time being
> only one worker thread is allowed since passthrough_ll.c is not
> thread-safe yet.  Future patches will lift this restriction so that
> multiple FUSE requests can be processed in parallel.
> 
> The main new concept is struct FVRequest, which contains both
> VuVirtqElement and struct fuse_chan.  We now have fv_VuDev for a device,
> fv_QueueInfo for a virtqueue, and FVRequest for a request.  Some of
> fv_QueueInfo's fields are moved into FVRequest because they are
> per-request.  The name FVRequest conforms to QEMU coding style and I
> expect the struct fv_* types will be renamed in a future refactoring.
> 
> This patch series is not optimal.  fbuf reuse is dropped so each request
> does malloc(se->bufsize), but there is no clean and cheap way to keep
> this with a thread pool.  The vq_lock mutex is held for longer than
> necessary, especially during the eventfd_write() syscall.  Performance
> can be improved in the future.
> 
> prctl(2) had to be added to the seccomp whitelist because glib invokes
> it.
> 
> Signed-off-by: Stefan Hajnoczi 

Looks good to me.
 Reviewed-by: Misono Tomohiro 



Re: [PATCH 2/9] cputlb: Make tlb_n_entries private to cputlb.c

2020-01-20 Thread Philippe Mathieu-Daudé

On 1/9/20 3:49 AM, Richard Henderson wrote:

There are no users of this function outside cputlb.c,
and its interface will change in the next patch.

Signed-off-by: Richard Henderson 


Reviewed-by: Philippe Mathieu-Daudé 


---
  include/exec/cpu_ldst.h | 5 -
  accel/tcg/cputlb.c  | 5 +
  2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index a46116167c..53de19753a 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -234,11 +234,6 @@ static inline uintptr_t tlb_index(CPUArchState *env, 
uintptr_t mmu_idx,
  return (addr >> TARGET_PAGE_BITS) & size_mask;
  }
  
-static inline size_t tlb_n_entries(CPUArchState *env, uintptr_t mmu_idx)

-{
-return (env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS) + 1;
-}
-
  /* Find the TLB entry corresponding to the mmu_idx + address pair.  */
  static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx,
   target_ulong addr)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 1a81886e58..e4a8ed9534 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -80,6 +80,11 @@ QEMU_BUILD_BUG_ON(sizeof(target_ulong) > 
sizeof(run_on_cpu_data));
  QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
  #define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1)
  
+static inline size_t tlb_n_entries(CPUArchState *env, uintptr_t mmu_idx)

+{
+return (env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS) + 1;
+}
+
  static inline size_t sizeof_tlb(CPUArchState *env, uintptr_t mmu_idx)
  {
  return env_tlb(env)->f[mmu_idx].mask + (1 << CPU_TLB_ENTRY_BITS);






Re: [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA

2020-01-20 Thread Dr. David Alan Gilbert
* fengzhimin (fengzhim...@huawei.com) wrote:
> OK, I will modify it.
> 
> Due to the mach-virt.ram is sent by the multiRDMA channels instead of the 
> main channel, it don't to register on the main channel.

You might be OK if instead  of using the name, you use a size threshold;
e.g. you use the multirdma threads for any RAM block larger than say
128MB.

> It takes a long time to register the mach-virt.ram for VM with large capacity 
> memory, so we shall try our best not to register it.

I'm curious why, I know it's expensive to register RAM blocks with rdma
code; but I thought that would just be the first time; it surprises me
that registering it with a 2nd RDMA channel is as expensive.

But then that makes me ask a 2nd question; is your performance increase
due to multiple threads or is it due to the multiple RDMA channels?
COuld you have multiple threads but still a single RDMA channel
(and with sufficient locking) still get the performance?

Dave

> Thanks for your review.
> 
> Zhimin Feng
> 
> -Original Message-
> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com] 
> Sent: Saturday, January 18, 2020 2:52 AM
> To: fengzhimin 
> Cc: quint...@redhat.com; arm...@redhat.com; ebl...@redhat.com; 
> qemu-devel@nongnu.org; Zhanghailiang ; 
> jemmy858...@gmail.com
> Subject: Re: [PATCH RFC 12/12] migration/rdma: only register the virt-ram 
> block for MultiRDMA
> 
> * Zhimin Feng (fengzhim...@huawei.com) wrote:
> > From: fengzhimin 
> > 
> > The virt-ram block is sent by MultiRDMA, so we only register it for 
> > MultiRDMA channels and main channel don't register the virt-ram block.
> > 
> > Signed-off-by: fengzhimin 
> 
> You can't specialise on the name of the RAMBlock like that.
> 'mach-virt.ram' is the name specific to just the main ram on just aarch's 
> machine type;  for example the name on x86 is completely different and if you 
> use NUMA or hotplug etc it would also be different on aarch.
> 
> Is there a downside to also registering the mach-virt.ram on the main channel?
> 
> Dave
> 
> > ---
> >  migration/rdma.c | 140 
> > +--
> >  1 file changed, 112 insertions(+), 28 deletions(-)
> > 
> > diff --git a/migration/rdma.c b/migration/rdma.c index 
> > 0a150099e2..1477fd509b 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -618,7 +618,9 @@ const char *print_wrid(int wrid);  static int 
> > qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
> > uint8_t *data, RDMAControlHeader *resp,
> > int *resp_idx,
> > -   int (*callback)(RDMAContext *rdma));
> > +   int (*callback)(RDMAContext *rdma,
> > +   uint8_t id),
> > +   uint8_t id);
> >  
> >  static inline uint64_t ram_chunk_index(const uint8_t *start,
> > const uint8_t *host) @@ 
> > -1198,24 +1200,81 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
> >  return 0;
> >  }
> >  
> > -static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
> > +/*
> > + * Parameters:
> > + *@id == UNUSED_ID :
> > + *This means that we register memory for the main RDMA channel,
> > + *the main RDMA channel don't register the mach-virt.ram block
> > + *when we use multiRDMA method to migrate.
> > + *
> > + *@id == 0 or id == 1 or ... :
> > + *This means that we register memory for the multiRDMA channels,
> > + *the multiRDMA channels only register memory for the mach-virt.ram
> > + *block when we use multiRDAM method to migrate.
> > + */
> > +static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, uint8_t 
> > +id)
> >  {
> >  int i;
> >  RDMALocalBlocks *local = >local_ram_blocks;
> >  
> > -for (i = 0; i < local->nb_blocks; i++) {
> > -local->block[i].mr =
> > -ibv_reg_mr(rdma->pd,
> > -local->block[i].local_host_addr,
> > -local->block[i].length,
> > -IBV_ACCESS_LOCAL_WRITE |
> > -IBV_ACCESS_REMOTE_WRITE
> > -);
> > -if (!local->block[i].mr) {
> > -perror("Failed to register local dest ram block!\n");
> > -break;
> > +if (migrate_use_multiRDMA()) {
> > +if (id == UNUSED_ID) {
> > +for (i = 0; i < local->nb_blocks; i++) {
> > +/* main RDMA channel don't register the mach-virt.ram 
> > block */
> > +if (strcmp(local->block[i].block_name, "mach-virt.ram") == 
> > 0) {
> > +continue;
> > +}
> > +
> > +local->block[i].mr =
> > +ibv_reg_mr(rdma->pd,
> > +local->block[i].local_host_addr,
> > +local->block[i].length,
> > +

Re: [PATCH v2 0/5] fix migration with bitmaps and mirror

2020-01-20 Thread Vladimir Sementsov-Ogievskiy
John, I don't quite follow discussion in bugzilla. Do we need these series
as at least temporary workaround, or not? Should I resend?


19.12.2019 11:51, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> It's a continuation for
> "bitmap migration bug with -drive while block mirror runs"
> <315cff78-dcdb-a3ce-2742-da3cc9f0c...@redhat.com>
> https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg07241.html
> 
> The problem is that bitmaps migrated to node with same node-name or
> blk-parent name. And currently only the latter actually work in libvirt.
> And with mirror-top filter it doesn't work, because
> bdrv_get_device_or_node_name don't go through filters.
> 
> Fix this by handling filtered children of block backends in separate.
> 
> v2: rebase on current master
> 
> Max Reitz (1):
>block: Mark commit and mirror as filter drivers
> 
> Vladimir Sementsov-Ogievskiy (4):
>migretion/block-dirty-bitmap: refactor init_dirty_bitmap_migration
>block/dirty-bitmap: add bdrv_has_named_bitmaps helper
>migration/block-dirty-bitmap: fix bitmaps migration during mirror job
>iotests: 194: test also migration of dirty bitmap
> 
>   include/block/block_int.h  |   8 ++-
>   include/block/dirty-bitmap.h   |   1 +
>   block/commit.c |   2 +
>   block/dirty-bitmap.c   |  13 
>   block/mirror.c |   2 +
>   migration/block-dirty-bitmap.c | 114 +++--
>   tests/qemu-iotests/194 |  14 ++--
>   tests/qemu-iotests/194.out |   6 ++
>   8 files changed, 119 insertions(+), 41 deletions(-)
> 


-- 
Best regards,
Vladimir


Re: [PATCH v7 03/11] hw/core: create Resettable QOM interface

2020-01-20 Thread Philippe Mathieu-Daudé

On 1/20/20 10:08 AM, Damien Hedde wrote:

On 1/18/20 7:42 AM, Philippe Mathieu-Daudé wrote:

On 1/15/20 1:36 PM, Damien Hedde wrote:

This commit defines an interface allowing multi-phase reset. This aims
to solve a problem of the actual single-phase reset (built in
DeviceClass and BusClass): reset behavior is dependent on the order
in which reset handlers are called. In particular doing external
side-effect (like setting an qemu_irq) is problematic because receiving
object may not be reset yet.

The Resettable interface divides the reset in 3 well defined phases.
To reset an object tree, all 1st phases are executed then all 2nd then
all 3rd. See the comments in include/hw/resettable.h for a more complete
description. The interface defines 3 phases to let the future
possibility of holding an object into reset for some time.

The qdev/qbus reset in DeviceClass and BusClass will be modified in
following commits to use this interface. A mechanism is provided
to allow executing a transitional reset handler in place of the 2nd
phase which is executed in children-then-parent order inside a tree.
This will allow to transition devices and buses smoothly while
keeping the exact current qdev/qbus reset behavior for now.

Documentation will be added in a following commit.

Signed-off-by: Damien Hedde 
Reviewed-by: Richard Henderson 
---

v7 update: un-nest struct ResettablePhases
---
   Makefile.objs   |   1 +
   include/hw/resettable.h | 211 +++
   hw/core/resettable.c    | 238 
   hw/core/Makefile.objs   |   1 +
   hw/core/trace-events    |  17 +++
   5 files changed, 468 insertions(+)
   create mode 100644 include/hw/resettable.h
   create mode 100644 hw/core/resettable.c

diff --git a/Makefile.objs b/Makefile.objs
index 7c1e50f9d6..9752d549b4 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -191,6 +191,7 @@ trace-events-subdirs += migration
   trace-events-subdirs += net
   trace-events-subdirs += ui
   endif
+trace-events-subdirs += hw/core
   trace-events-subdirs += hw/display
   trace-events-subdirs += qapi
   trace-events-subdirs += qom
diff --git a/include/hw/resettable.h b/include/hw/resettable.h
new file mode 100644
index 00..58b3df4c22
--- /dev/null
+++ b/include/hw/resettable.h
@@ -0,0 +1,211 @@
+/*
+ * Resettable interface header.
+ *
+ * Copyright (c) 2019 GreenSocs SAS
+ *
+ * Authors:
+ *   Damien Hedde
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_RESETTABLE_H
+#define HW_RESETTABLE_H
+
+#include "qom/object.h"
+
+#define TYPE_RESETTABLE_INTERFACE "resettable"
+
+#define RESETTABLE_CLASS(class) \
+    OBJECT_CLASS_CHECK(ResettableClass, (class),
TYPE_RESETTABLE_INTERFACE)
+
+#define RESETTABLE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(ResettableClass, (obj), TYPE_RESETTABLE_INTERFACE)
+
+typedef struct ResettableState ResettableState;
+
+/**
+ * ResetType:
+ * Types of reset.
+ *
+ * + Cold: reset resulting from a power cycle of the object.
+ *
+ * TODO: Support has to be added to handle more types. In particular,
+ * ResettableState structure needs to be expanded.
+ */
+typedef enum ResetType {
+    RESET_TYPE_COLD,
+} ResetType;
+
+/*
+ * ResettableClass:
+ * Interface for resettable objects.
+ *
+ * See docs/devel/reset.rst for more detailed information about how
QEMU models
+ * reset. This whole API must only be used when holding the iothread
mutex.
+ *
+ * All objects which can be reset must implement this interface;
+ * it is usually provided by a base class such as DeviceClass or
BusClass.
+ * Every Resettable object must maintain some state tracking the
+ * progress of a reset operation by providing a ResettableState
structure.
+ * The functions defined in this module take care of updating the
+ * state of the reset.
+ * The base class implementation of the interface provides this
+ * state and implements the associated method: get_state.
+ *
+ * Concrete object implementations (typically specific devices
+ * such as a UART model) should provide the functions
+ * for the phases.enter, phases.hold and phases.exit methods, which
+ * they can set in their class init function, either directly or
+ * by calling resettable_class_set_parent_phases().
+ * The phase methods are guaranteed to only only ever be called once
+ * for any reset event, in the order 'enter', 'hold', 'exit'.
+ * An object will always move quickly from 'enter' to 'hold'
+ * but might remain in 'hold' for an arbitrary period of time
+ * before eventually reset is deasserted and the 'exit' phase is called.
+ * Object implementations should be prepared for functions handling
+ * inbound connections from other devices (such as qemu_irq handler
+ * functions) to be called at any point during reset after their
+ * 'enter' method has been called.
+ *
+ * Users of a resettable object should not call these methods
+ * directly, but instead use 

[PULL 01/29] multifd: Initialize local variable

2020-01-20 Thread Juan Quintela
Fill everything with zero, so the padding fields are also initialized.

Signed-off-by: Juan Quintela 
Reviewed-by: Daniel P. Berrangé 
---
 migration/ram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 96feb4062c..b9147bcca3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -703,7 +703,7 @@ typedef struct {
 
 static int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp)
 {
-MultiFDInit_t msg;
+MultiFDInit_t msg = {};
 int ret;
 
 msg.magic = cpu_to_be32(MULTIFD_MAGIC);
-- 
2.24.1




Re: [PATCH] block/backup: fix memory leak in bdrv_backup_top_append()

2020-01-20 Thread Kevin Wolf
Am 23.12.2019 um 10:06 hat Eiichi Tsukata geschrieben:
> bdrv_open_driver() allocates bs->opaque according to drv->instance_size.
> There is no need to allocate it and overwrite opaque in
> bdrv_backup_top_append().
> 
> Reproducer:
> 
>   $ QTEST_QEMU_BINARY=./x86_64-softmmu/qemu-system-x86_64 valgrind -q 
> --leak-check=full tests/test-replication -p /replication/secondary/start
>   ==29792== 24 bytes in 1 blocks are definitely lost in loss record 52 of 226
>   ==29792==at 0x483AB1A: calloc (vg_replace_malloc.c:762)
>   ==29792==by 0x4B07CE0: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.7)
>   ==29792==by 0x12BAB9: bdrv_open_driver (block.c:1289)
>   ==29792==by 0x12BEA9: bdrv_new_open_driver (block.c:1359)
>   ==29792==by 0x1D15CB: bdrv_backup_top_append (backup-top.c:190)
>   ==29792==by 0x1CC11A: backup_job_create (backup.c:439)
>   ==29792==by 0x1CD542: replication_start (replication.c:544)
>   ==29792==by 0x1401B9: replication_start_all (replication.c:52)
>   ==29792==by 0x128B50: test_secondary_start (test-replication.c:427)
>   ...
> 
> Fixes: 7df7868b9640 ("block: introduce backup-top filter driver")
> Signed-off-by: Eiichi Tsukata 

Thanks, applied to the block layer.

Kevin




[PULL 03/29] migration: Make sure that we don't call write() in case of error

2020-01-20 Thread Juan Quintela
If we are exiting due to an error/finish/ Just don't try to even
touch the channel with one IO operation.

Signed-off-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index b9147bcca3..f946282adb 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -900,6 +900,12 @@ struct {
 uint64_t packet_num;
 /* send channels ready */
 QemuSemaphore channels_ready;
+/*
+ * Have we already run terminate threads.  There is a race when it
+ * happens that we got one error while we are exiting.
+ * We will use atomic operations.  Only valid values are 0 and 1.
+ */
+int exiting;
 } *multifd_send_state;
 
 /*
@@ -928,6 +934,10 @@ static int multifd_send_pages(RAMState *rs)
 MultiFDPages_t *pages = multifd_send_state->pages;
 uint64_t transferred;
 
+if (atomic_read(_send_state->exiting)) {
+return -1;
+}
+
 qemu_sem_wait(_send_state->channels_ready);
 for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
 p = _send_state->params[i];
@@ -1009,6 +1019,16 @@ static void multifd_send_terminate_threads(Error *err)
 }
 }
 
+/*
+ * We don't want to exit each threads twice.  Depending on where
+ * we get the error, or if there are two independent errors in two
+ * threads at the same time, we can end calling this function
+ * twice.
+ */
+if (atomic_xchg(_send_state->exiting, 1)) {
+return;
+}
+
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
@@ -1118,6 +1138,10 @@ static void *multifd_send_thread(void *opaque)
 
 while (true) {
 qemu_sem_wait(>sem);
+
+if (atomic_read(_send_state->exiting)) {
+break;
+}
 qemu_mutex_lock(>mutex);
 
 if (p->pending_job) {
@@ -1224,6 +1248,7 @@ int multifd_save_setup(void)
 multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
 multifd_send_state->pages = multifd_pages_init(page_count);
 qemu_sem_init(_send_state->channels_ready, 0);
+atomic_set(_send_state->exiting, 0);
 
 for (i = 0; i < thread_count; i++) {
 MultiFDSendParams *p = _send_state->params[i];
-- 
2.24.1




[PULL 28/29] migration: Support QLIST migration

2020-01-20 Thread Juan Quintela
From: Eric Auger 

Support QLIST migration using the same principle as QTAILQ:
94869d5c52 ("migration: migrate QTAILQ").

The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD
and QLIST_RAW_REVERSE.

Tests also are provided.

Signed-off-by: Eric Auger 
Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 include/migration/vmstate.h |  21 +
 include/qemu/queue.h|  39 +
 migration/trace-events  |   5 ++
 migration/vmstate-types.c   |  70 +++
 tests/test-vmstate.c| 170 
 5 files changed, 305 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 01790b8d9b..30667631bc 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -229,6 +229,7 @@ extern const VMStateInfo vmstate_info_tmp;
 extern const VMStateInfo vmstate_info_bitmap;
 extern const VMStateInfo vmstate_info_qtailq;
 extern const VMStateInfo vmstate_info_gtree;
+extern const VMStateInfo vmstate_info_qlist;
 
 #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
 /*
@@ -798,6 +799,26 @@ extern const VMStateInfo vmstate_info_gtree;
 .offset   = offsetof(_state, _field),  
\
 }
 
+/*
+ * For migrating a QLIST
+ * Target QLIST needs be properly initialized.
+ * _type: type of QLIST element
+ * _next: name of QLIST_ENTRY entry field in QLIST element
+ * _vmsd: VMSD for QLIST element
+ * size: size of QLIST element
+ * start: offset of QLIST_ENTRY in QTAILQ element
+ */
+#define VMSTATE_QLIST_V(_field, _state, _version, _vmsd, _type, _next)  \
+{\
+.name = (stringify(_field)), \
+.version_id   = (_version),  \
+.vmsd = &(_vmsd),\
+.size = sizeof(_type),   \
+.info = _info_qlist, \
+.offset   = offsetof(_state, _field),\
+.start= offsetof(_type, _next),  \
+}
+
 /* _f : field name
_f_n : num of elements field_name
_n : num of elements
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 4764d93ea3..4d4554a7ce 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -501,4 +501,43 @@ union {
 \
 QTAILQ_RAW_TQH_CIRC(head)->tql_prev = QTAILQ_RAW_TQE_CIRC(elm, entry); 
 \
 } while (/*CONSTCOND*/0)
 
+#define QLIST_RAW_FIRST(head)  
\
+field_at_offset(head, 0, void *)
+
+#define QLIST_RAW_NEXT(elm, entry) 
\
+field_at_offset(elm, entry, void *)
+
+#define QLIST_RAW_PREVIOUS(elm, entry) 
\
+field_at_offset(elm, entry + sizeof(void *), void *)
+
+#define QLIST_RAW_FOREACH(elm, head, entry)
\
+for ((elm) = *QLIST_RAW_FIRST(head);   
\
+ (elm);
\
+ (elm) = *QLIST_RAW_NEXT(elm, entry))
+
+#define QLIST_RAW_INSERT_HEAD(head, elm, entry) do {   
\
+void *first = *QLIST_RAW_FIRST(head);  
\
+*QLIST_RAW_FIRST(head) = elm;  
\
+*QLIST_RAW_PREVIOUS(elm, entry) = QLIST_RAW_FIRST(head);   
\
+if (first) {   
\
+*QLIST_RAW_NEXT(elm, entry) = first;   
\
+*QLIST_RAW_PREVIOUS(first, entry) = QLIST_RAW_NEXT(elm, entry);
\
+} else {   
\
+*QLIST_RAW_NEXT(elm, entry) = NULL;
\
+}  
\
+} while (0)
+
+#define QLIST_RAW_REVERSE(head, elm, entry) do {   
\
+void *iter = *QLIST_RAW_FIRST(head), *prev = NULL, *next;  
\
+while (iter) { 
\
+next = *QLIST_RAW_NEXT(iter, entry);   
\
+*QLIST_RAW_PREVIOUS(iter, entry) = QLIST_RAW_NEXT(next, entry);
\
+*QLIST_RAW_NEXT(iter, entry) = prev;   
\
+prev = iter;   
\
+iter = next;   
\
+}   

[PULL 19/29] migration/postcopy: enable compress during postcopy

2020-01-20 Thread Juan Quintela
From: Wei Yang 

postcopy requires to place a whole host page, while migration thread
migrate memory in target page size. This makes postcopy need to collect
all target pages in one host page before placing via userfaultfd.

To enable compress during postcopy, there are two problems to solve:

1. Random order for target page arrival
2. Target pages in one host page arrives without interrupt by target
   page from other host page

The first one is handled by previous cleanup patch.

This patch handles the second one by:

1. Flush compress thread for each host page
2. Wait for decompress thread for before placing host page

Signed-off-by: Wei Yang 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/migration.c | 11 ---
 migration/ram.c   | 28 +++-
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e55edee606..990bff00c0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1005,17 +1005,6 @@ static bool migrate_caps_check(bool *cap_list,
 #endif
 
 if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
-if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
-/* The decompression threads asynchronously write into RAM
- * rather than use the atomic copies needed to avoid
- * userfaulting.  It should be possible to fix the decompression
- * threads for compatibility in future.
- */
-error_setg(errp, "Postcopy is not currently compatible "
-   "with compression");
-return false;
-}
-
 /* This check is reasonably expensive, so only when it's being
  * set the first time, also it's only the destination that needs
  * special support.
diff --git a/migration/ram.c b/migration/ram.c
index a7414170e5..5f20c3d15d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3469,6 +3469,14 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 
 rs->target_page_count += pages;
 
+/*
+ * During postcopy, it is necessary to make sure one whole host
+ * page is sent in one chunk.
+ */
+if (migrate_postcopy_ram()) {
+flush_compressed_data(rs);
+}
+
 /*
  * we want to check in the 1st loop, just in case it was the 1st
  * time and we had to sync the dirty bitmap.
@@ -4061,6 +4069,7 @@ static int ram_load_postcopy(QEMUFile *f)
 void *place_source = NULL;
 RAMBlock *block = NULL;
 uint8_t ch;
+int len;
 
 addr = qemu_get_be64(f);
 
@@ -4078,7 +4087,8 @@ static int ram_load_postcopy(QEMUFile *f)
 
 trace_ram_load_postcopy_loop((uint64_t)addr, flags);
 place_needed = false;
-if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE)) {
+if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
+ RAM_SAVE_FLAG_COMPRESS_PAGE)) {
 block = ram_block_from_stream(f, flags);
 
 host = host_from_ram_block_offset(block, addr);
@@ -4161,6 +4171,17 @@ static int ram_load_postcopy(QEMUFile *f)
  TARGET_PAGE_SIZE);
 }
 break;
+case RAM_SAVE_FLAG_COMPRESS_PAGE:
+all_zero = false;
+len = qemu_get_be32(f);
+if (len < 0 || len > compressBound(TARGET_PAGE_SIZE)) {
+error_report("Invalid compressed data length: %d", len);
+ret = -EINVAL;
+break;
+}
+decompress_data_with_multi_threads(f, page_buffer, len);
+break;
+
 case RAM_SAVE_FLAG_EOS:
 /* normal exit */
 multifd_recv_sync_main();
@@ -4172,6 +4193,11 @@ static int ram_load_postcopy(QEMUFile *f)
 break;
 }
 
+/* Got the whole host page, wait for decompress before placing. */
+if (place_needed) {
+ret |= wait_for_decompress_done();
+}
+
 /* Detect for any possible file errors */
 if (!ret && qemu_file_get_error(f)) {
 ret = qemu_file_get_error(f);
-- 
2.24.1




[PULL 20/29] migration/multifd: clean pages after filling packet

2020-01-20 Thread Juan Quintela
From: Wei Yang 

This is a preparation for the next patch:

not use multifd during postcopy.

Without enabling postcopy, everything looks good. While after enabling
postcopy, migration may fail even not use multifd during postcopy. The
reason is the pages is not properly cleared and *old* target page will
continue to be transferred.

After clean pages, migration succeeds.

Signed-off-by: Wei Yang 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5f20c3d15d..a05448c0c9 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -955,10 +955,10 @@ static int multifd_send_pages(RAMState *rs)
 }
 qemu_mutex_unlock(>mutex);
 }
-p->pages->used = 0;
+assert(!p->pages->used);
+assert(!p->pages->block);
 
 p->packet_num = multifd_send_state->packet_num++;
-p->pages->block = NULL;
 multifd_send_state->pages = p->pages;
 p->pages = pages;
 transferred = ((uint64_t) pages->used) * TARGET_PAGE_SIZE + p->packet_len;
@@ -1154,6 +1154,8 @@ static void *multifd_send_thread(void *opaque)
 p->flags = 0;
 p->num_packets++;
 p->num_pages += used;
+p->pages->used = 0;
+p->pages->block = NULL;
 qemu_mutex_unlock(>mutex);
 
 trace_multifd_send(p->id, packet_num, used, flags,
-- 
2.24.1




[PULL 14/29] migration/postcopy: reduce memset when it is zero page and matches_target_page_size

2020-01-20 Thread Juan Quintela
From: Wei Yang 

In this case, page_buffer content would not be used.

Skip this to save some time.

Signed-off-by: Wei Yang 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5cd066467c..bdb0316892 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4126,7 +4126,13 @@ static int ram_load_postcopy(QEMUFile *f)
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
 case RAM_SAVE_FLAG_ZERO:
 ch = qemu_get_byte(f);
-memset(page_buffer, ch, TARGET_PAGE_SIZE);
+/*
+ * Can skip to set page_buffer when
+ * this is a zero page and (block->page_size == TARGET_PAGE_SIZE).
+ */
+if (ch || !matches_target_page_size) {
+memset(page_buffer, ch, TARGET_PAGE_SIZE);
+}
 if (ch) {
 all_zero = false;
 }
-- 
2.24.1




Re: [PATCH] block/backup: fix memory leak in bdrv_backup_top_append()

2020-01-20 Thread Kevin Wolf
Am 23.12.2019 um 14:40 hat Eiichi Tsukata geschrieben:
> 
> 
> On 2019/12/23 21:40, Vladimir Sementsov-Ogievskiy wrote:
> > 23.12.2019 12:06, Eiichi Tsukata wrote:
> >> bdrv_open_driver() allocates bs->opaque according to drv->instance_size.
> >> There is no need to allocate it and overwrite opaque in
> >> bdrv_backup_top_append().
> >>
> >> Reproducer:
> >>
> >>$ QTEST_QEMU_BINARY=./x86_64-softmmu/qemu-system-x86_64 valgrind -q 
> >> --leak-check=full tests/test-replication -p /replication/secondary/start
> >>==29792== 24 bytes in 1 blocks are definitely lost in loss record 52 of 
> >> 226
> >>==29792==at 0x483AB1A: calloc (vg_replace_malloc.c:762)
> >>==29792==by 0x4B07CE0: g_malloc0 (in 
> >> /usr/lib64/libglib-2.0.so.0.6000.7)
> >>==29792==by 0x12BAB9: bdrv_open_driver (block.c:1289)
> >>==29792==by 0x12BEA9: bdrv_new_open_driver (block.c:1359)
> >>==29792==by 0x1D15CB: bdrv_backup_top_append (backup-top.c:190)
> >>==29792==by 0x1CC11A: backup_job_create (backup.c:439)
> >>==29792==by 0x1CD542: replication_start (replication.c:544)
> >>==29792==by 0x1401B9: replication_start_all (replication.c:52)
> >>==29792==by 0x128B50: test_secondary_start (test-replication.c:427)
> >>...
> >>
> >> Fixes: 7df7868b9640 ("block: introduce backup-top filter driver")
> >> Signed-off-by: Eiichi Tsukata 
> >> ---
> >>   block/backup-top.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/block/backup-top.c b/block/backup-top.c
> >> index 7cdb1f8eba..617217374d 100644
> >> --- a/block/backup-top.c
> >> +++ b/block/backup-top.c
> >> @@ -196,7 +196,7 @@ BlockDriverState 
> >> *bdrv_backup_top_append(BlockDriverState *source,
> >>   }
> >>   
> >>   top->total_sectors = source->total_sectors;
> >> -top->opaque = state = g_new0(BDRVBackupTopState, 1);
> >> +state = top->opaque;
> >>   
> >>   bdrv_ref(target);
> >>   state->target = bdrv_attach_child(top, target, "target", 
> >> _file, errp);
> >>
> > 
> > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > 
> > Hmm, it was not my idea, I just copied it from mirror.. And there should be 
> > the same leak. and
> > may be in other places:
> > 
> > # git grep 'opaque =.*g_new'
> > block/backup-top.c:top->opaque = state = g_new0(BDRVBackupTopState, 1);

Fixed by this patch.

> > block/file-posix.c:state->opaque = g_new0(BDRVRawReopenState, 1);
> > block/gluster.c:state->opaque = g_new0(BDRVGlusterReopenState, 1);
> > block/raw-format.c:reopen_state->opaque = g_new0(BDRVRawState, 1);
> > block/sheepdog.c:re_s = state->opaque = g_new0(BDRVSheepdogReopenState, 
> > 1);

Doing this for reopen state is fine.

> > block/iscsi.c:bs->opaque = g_new0(struct IscsiLun, 1);

This one looks kind of questionable. It basically builds its
BlockDriveState manually without using any of the block layer open
functions.

> > block/mirror.c:bs_opaque = g_new0(MirrorBDSOpaque, 1);

Harmless as Eiichi explained below, but not nice either.

> Thanks for reviewing.
> As you say, block/mirror.c has similar code. But it does not cause the leak.
> The difference is bdrv_mirror_top BlockDriver does not have .instance_size
> whereas bdrv_backup_top_filter BlockDriver has .instance_size = 
> sizeof(BDRVBackupTopState).
> So when bdrv_open_driver() is called from mirror.c, g_malloc0(0) is
> called allocating nothing.

I think it should still be changed just because it would make the code
cleaner. It's always better to use common infrastructure than
reimplementing it locally.

Kevin




[PATCH v2 2/2] target/ppc: add support for Hypervisor Facility Unavailable Exception

2020-01-20 Thread Cédric Le Goater
The privileged message send and clear instructions (msgsndp & msgclrp)
are privileged, but will generate a hypervisor facility unavailable
exception if not enabled in the HFSCR and executed in privileged
non-hypervisor state.

Add checks when accessing the DPDES register and when using the
msgsndp and msgclrp isntructions.

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Cédric Le Goater 
---
 target/ppc/cpu.h |  6 ++
 target/ppc/excp_helper.c | 13 +
 target/ppc/misc_helper.c | 27 +++
 3 files changed, 46 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 8ebeaba649d0..96aeea1934d8 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -397,6 +397,10 @@ typedef struct ppc_v3_pate_t {
 #define PSSCR_ESL PPC_BIT(42) /* Enable State Loss */
 #define PSSCR_EC  PPC_BIT(43) /* Exit Criterion */
 
+/* HFSCR bits */
+#define HFSCR_MSGP PPC_BIT(53) /* Privileged Message Send Facilities */
+#define HFSCR_IC_MSGP  0xA
+
 #define msr_sf   ((env->msr >> MSR_SF)   & 1)
 #define msr_isf  ((env->msr >> MSR_ISF)  & 1)
 #define msr_shv  ((env->msr >> MSR_SHV)  & 1)
@@ -1329,6 +1333,8 @@ void cpu_ppc_set_vhyp(PowerPCCPU *cpu, 
PPCVirtualHypervisor *vhyp);
 #endif
 
 void store_fpscr(CPUPPCState *env, uint64_t arg, uint32_t mask);
+void helper_hfscr_facility_check(CPUPPCState *env, uint32_t bit,
+ const char *caller, uint32_t cause);
 
 static inline uint64_t ppc_dump_gpr(CPUPPCState *env, int gprn)
 {
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 1b07c3ed561e..027f54c0ed5f 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -471,6 +471,15 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
 case POWERPC_EXCP_FU: /* Facility unavailable exception  */
 #ifdef TARGET_PPC64
 env->spr[SPR_FSCR] |= ((target_ulong)env->error_code << 56);
+#endif
+break;
+case POWERPC_EXCP_HV_FU: /* Hypervisor Facility Unavailable Exception 
*/
+#ifdef TARGET_PPC64
+env->spr[SPR_HFSCR] |= ((target_ulong)env->error_code << FSCR_IC_POS);
+srr0 = SPR_HSRR0;
+srr1 = SPR_HSRR1;
+new_msr |= (target_ulong)MSR_HVB;
+new_msr |= env->msr & ((target_ulong)1 << MSR_RI);
 #endif
 break;
 case POWERPC_EXCP_PIT:   /* Programmable interval timer interrupt*/
@@ -1277,6 +1286,8 @@ void helper_book3s_msgsnd(target_ulong rb)
 #if defined(TARGET_PPC64)
 void helper_book3s_msgclrp(CPUPPCState *env, target_ulong rb)
 {
+helper_hfscr_facility_check(env, HFSCR_MSGP, "msgclrp", HFSCR_IC_MSGP);
+
 if (!dbell_type_server(rb)) {
 return;
 }
@@ -1292,6 +1303,8 @@ void helper_book3s_msgsndp(CPUPPCState *env, target_ulong 
rb)
 {
 int pir = env->spr_cb[SPR_PIR].default_value;
 
+helper_hfscr_facility_check(env, HFSCR_MSGP, "msgsndp", HFSCR_IC_MSGP);
+
 if (!dbell_type_server(rb)) {
 return;
 }
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index 0c5919ff08b0..55b68d1246e4 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -41,6 +41,18 @@ void helper_store_dump_spr(CPUPPCState *env, uint32_t sprn)
 }
 
 #ifdef TARGET_PPC64
+static void raise_hv_fu_exception(CPUPPCState *env, uint32_t bit,
+  const char *caller, uint32_t cause,
+  uintptr_t raddr)
+{
+qemu_log_mask(CPU_LOG_INT, "HV Facility %d is unavailable (%s)\n",
+  bit, caller);
+
+env->spr[SPR_HFSCR] &= ~((target_ulong)FSCR_IC_MASK << FSCR_IC_POS);
+
+raise_exception_err_ra(env, POWERPC_EXCP_HV_FU, cause, raddr);
+}
+
 static void raise_fu_exception(CPUPPCState *env, uint32_t bit,
uint32_t sprn, uint32_t cause,
uintptr_t raddr)
@@ -55,6 +67,17 @@ static void raise_fu_exception(CPUPPCState *env, uint32_t 
bit,
 }
 #endif
 
+void helper_hfscr_facility_check(CPUPPCState *env, uint32_t bit,
+ const char *caller, uint32_t cause)
+{
+#ifdef TARGET_PPC64
+if ((env->msr_mask & MSR_HVB) && !msr_hv &&
+ !(env->spr[SPR_HFSCR] & (1UL << bit))) {
+raise_hv_fu_exception(env, bit, caller, cause, GETPC());
+}
+#endif
+}
+
 void helper_fscr_facility_check(CPUPPCState *env, uint32_t bit,
 uint32_t sprn, uint32_t cause)
 {
@@ -114,6 +137,8 @@ target_ulong helper_load_dpdes(CPUPPCState *env)
 {
 target_ulong dpdes = 0;
 
+helper_hfscr_facility_check(env, HFSCR_MSGP, "load DPDES", HFSCR_IC_MSGP);
+
 /* TODO: TCG supports only one thread */
 if (env->pending_interrupts & (1 << PPC_INTERRUPT_DOORBELL)) {
 dpdes = 1;
@@ -127,6 +152,8 @@ void helper_store_dpdes(CPUPPCState *env, target_ulong val)
 PowerPCCPU *cpu = env_archcpu(env);
 CPUState *cs = CPU(cpu);
 
+

Re: [PATCH v2] vhost-vsock: delete vqs in vhost_vsock_unrealize to avoid memleaks

2020-01-20 Thread Stefan Hajnoczi
On Wed, Jan 15, 2020 at 02:25:35PM +0800, pannengy...@huawei.com wrote:
> From: Pan Nengyuan 
> 
> Receive/transmit/event vqs forgot to cleanup in vhost_vsock_unrealize. This
> patch save receive/transmit vq pointer in realize() and cleanup vqs
> through those vq pointers in unrealize(). The leak stack is as follow:
> 
> Direct leak of 21504 byte(s) in 3 object(s) allocated from:
>   #0 0x7f86a1356970 (/lib64/libasan.so.5+0xef970)  ??:?
>   #1 0x7f86a09aa49d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
>   #2 0x5604852f85ca (./x86_64-softmmu/qemu-system-x86_64+0x2c3e5ca)  
> /mnt/sdb/qemu/hw/virtio/virtio.c:2333
>   #3 0x560485356208 (./x86_64-softmmu/qemu-system-x86_64+0x2c9c208)  
> /mnt/sdb/qemu/hw/virtio/vhost-vsock.c:339
>   #4 0x560485305a17 (./x86_64-softmmu/qemu-system-x86_64+0x2c4ba17)  
> /mnt/sdb/qemu/hw/virtio/virtio.c:3531
>   #5 0x5604858e6b65 (./x86_64-softmmu/qemu-system-x86_64+0x322cb65)  
> /mnt/sdb/qemu/hw/core/qdev.c:865
>   #6 0x5604861e6c41 (./x86_64-softmmu/qemu-system-x86_64+0x3b2cc41)  
> /mnt/sdb/qemu/qom/object.c:2102
> 
> Reported-by: Euler Robot 
> Signed-off-by: Pan Nengyuan 
> ---
> Changes V2 to V1:
> - delete virtqueues after vhost cleanup to avoid use-after-free
> - aslo delete virtqueues in the error path of realize()
> ---
>  hw/virtio/vhost-vsock.c | 12 ++--
>  include/hw/virtio/vhost-vsock.h |  2 ++
>  2 files changed, 12 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH] qapi: Fix code generation with Python 3.5

2020-01-20 Thread Alex Bennée


Markus Armbruster  writes:

> Recent commit 3e7fb5811b "qapi: Fix code generation for empty modules"
> modules" switched QAPISchema.visit() from
>
> for entity in self._entity_list:
>
> effectively to
>
> for mod in self._module_dict.values():
> for entity in mod._entity_list:
>
> Visits in the same order as long as .values() is in insertion order.
> That's the case only for Python 3.6 and later.  Before, it's in some
> arbitrary order, which results in broken generated code.
>
> Fix by making self._module_dict an OrderedDict rather than a dict.
>
> Fixes: 3e7fb5811baab213dcc7149c3aa69442d683c26c
> Signed-off-by: Markus Armbruster 

Reviewed-by: Alex Bennée 
Tested-by: Alex Bennée 

Well that certainly clears up a bunch of red. Can we apply it directly
as a build fix?

> ---
>  scripts/qapi/schema.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> index 0bfc5256fb..5100110fa2 100644
> --- a/scripts/qapi/schema.py
> +++ b/scripts/qapi/schema.py
> @@ -795,7 +795,7 @@ class QAPISchema(object):
>  self.docs = parser.docs
>  self._entity_list = []
>  self._entity_dict = {}
> -self._module_dict = {}
> +self._module_dict = OrderedDict()
>  self._schema_dir = os.path.dirname(fname)
>  self._make_module(None) # built-ins
>  self._make_module(fname)


-- 
Alex Bennée



Re: [PATCH v2 2/6] Makefile: Clarify all the codebase requires qom/ objects

2020-01-20 Thread Alex Bennée


Philippe Mathieu-Daudé  writes:

> QEMU user-mode also requires the qom/ objects, it is not only
> used by "system emulation and qemu-img". As we will use a big
> if() block, move it upper in the "Common libraries for tools
> and emulators" section.
>
> Reviewed-by: Thomas Huth 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alex Bennée 

> ---
>  Makefile.objs | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 7c1e50f9d6..5aae561984 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -2,6 +2,7 @@
>  # Common libraries for tools and emulators
>  stub-obj-y = stubs/
>  util-obj-y = crypto/ util/ qobject/ qapi/
> +qom-obj-y = qom/
>  
>  chardev-obj-y = chardev/
>  
> @@ -26,11 +27,6 @@ block-obj-m = block/
>  
>  crypto-obj-y = crypto/
>  
> -###
> -# qom-obj-y is code used by both qemu system emulation and qemu-img
> -
> -qom-obj-y = qom/
> -
>  ###
>  # io-obj-y is code used by both qemu system emulation and qemu-img


-- 
Alex Bennée



Re: [PATCH v2 5/6] hw/core: Restrict reset handlers API to system-mode

2020-01-20 Thread Alex Bennée


Philippe Mathieu-Daudé  writes:

> The user-mode code does not use this API, restrict it
> to the system-mode.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/core/Makefile.objs | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
> index 0edd9e635d..2fea68ccf7 100644
> --- a/hw/core/Makefile.objs
> +++ b/hw/core/Makefile.objs
> @@ -1,6 +1,7 @@
>  # core qdev-related obj files, also used by *-user:
>  common-obj-y += qdev.o qdev-properties.o
> -common-obj-y += bus.o reset.o
> +common-obj-y += bus.o
> +common-obj-$(CONFIG_SOFTMMU) += reset.o

This seems a very minor tweaks as far as it goes. I though the only
thing needed in hw was hw/core/cpu and everything else was system
emulation?

However it at least moves the needle in the right direction:

Reviewed-by: Alex Bennée 

>  common-obj-$(CONFIG_SOFTMMU) += qdev-fw.o
>  common-obj-$(CONFIG_SOFTMMU) += fw-path-provider.o
>  # irq.o needed for qdev GPIO handling:


-- 
Alex Bennée



Re: audio: audio input with new backend makes qemu to freeze

2020-01-20 Thread Michael Tokarev
20.01.2020 14:27, Michael Tokarev wrote:
> Hello.
> 
> There's a bugreport about audio subsystem being unable to use microphone
> (audio input) with new backend, since 286a5d201e432ed2963e5d860f239bb276edffeb
> See full details with stack traces at http://bugs.debian.org/948658

Hmm. I wonder if this is fixed by a recent patch from Volker Rümelin titled
"audio: fix audio recording"...

/mjt



Re: [PATCH v3 05/10] block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t

2020-01-20 Thread Max Reitz
On 19.12.19 11:03, Vladimir Sementsov-Ogievskiy wrote:
> We are going to introduce bdrv_dirty_bitmap_next_dirty so that same
> variable may be used to store its return value and to be its parameter,
> so it would int64_t.
> 
> Similarly, we are going to refactor hbitmap_next_dirty_area to use
> hbitmap_next_dirty together with hbitmap_next_zero, therefore we want
> hbitmap_next_zero parameter type to be int64_t too.
> 
> So, for convenience update all parameters of *_next_zero and
> *_next_dirty_area to be int64_t.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/dirty-bitmap.h |  6 +++---
>  include/qemu/hbitmap.h   |  7 +++
>  block/dirty-bitmap.c |  6 +++---
>  nbd/server.c |  2 +-
>  tests/test-hbitmap.c | 32 
>  util/hbitmap.c   | 13 -
>  6 files changed, 34 insertions(+), 32 deletions(-)

[...]

> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index b6d4b99a06..df22f06be6 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -193,7 +193,7 @@ void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap 
> *hb, uint64_t first)
>  }
>  }
>  
> -int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start, uint64_t count)
> +int64_t hbitmap_next_zero(const HBitmap *hb, int64_t start, int64_t count)
>  {
>  size_t pos = (start >> hb->granularity) >> BITS_PER_LEVEL;
>  unsigned long *last_lev = hb->levels[HBITMAP_LEVELS - 1];
> @@ -202,6 +202,8 @@ int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t 
> start, uint64_t count)
>  uint64_t end_bit, sz;
>  int64_t res;
>  
> +assert(start >= 0 && count >= 0);
> +
>  if (start >= hb->orig_size || count == 0) {
>  return -1;
>  }
As far as I can see, NBD just passes NBDRequest.from (which is a
uint64_t) to this function (on NBD_CMD_BLOCK_STATUS).  Would this allow
a malicious client to send a value > INT64_MAX, thus provoking an
overflow and killing the server with this new assertion?

On second thought, we have this problem already everywhere in
nbd_handle_request().  I don’t see it or its caller ever checking
whether the received values are in bounds, it just passes them to all
kind of block layer functions that sometimes even just accept plain
ints.  Well, I suppose all other functions just error out, so it
probably isn’t an actual problem in practice so far...

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3 0/7] gpio: Add GPIO Aggregator/Repeater

2020-01-20 Thread Eugeniu Rosca
Hi Geert,

On Mon, Jan 20, 2020 at 10:33:53AM +0100, Geert Uytterhoeven wrote:
> On Sat, Jan 18, 2020 at 2:46 AM Eugeniu Rosca  wrote:
> > The only unexpected thing is seeing below messages (where gpiochip99 and
> > gpiochip22 are inexisting gpiochip names, mistakenly provided on command
> > line prior to passing the correct name):
> >
> > root@rcar-gen3:~# echo gpiochip6 12-13 > 
> > /sys/bus/platform/drivers/gpio-aggregator/new_device
> > [  915.572905] gpio-aggregator gpio-aggregator.0: cannot find GPIO chip 
> > gpiochip99, deferring
> > [  915.584224] gpio-aggregator gpio-aggregator.2: cannot find GPIO chip 
> > gpiochip99, deferring
> > [  915.865281] gpio-aggregator gpio-aggregator.29: cannot find GPIO chip 
> > gpiochip22, deferring
> >
> > Obviously, in the above case, due to a typo in the names, the gpio
> > chips will never be found, no matter how long gpio-aggregator defers
> 
> Indeed, that is expected behavior: you have created platform devices
> referring to resources that are not available.

Got it. Sounds reasonable to me.

> 
> > their probing. Unfortunately, the driver will continuously emit those
> > messages, upon each successfully created/aggregated gpiochip. I built
> 
> That is expected behavior, too: every time the driver core manages to
> bind a device to a driver, it will retry all previously deferred probes,
> in the hope they can be satisfied by the just bound device.
> 
> Note that you can destroy these bogus devices, using e.g.
> 
> # echo gpio-aggregator.0 > \
> /sys/bus/platform/drivers/gpio-aggregator/delete_device

Yep, I can get rid of the bogus devices this way. Thanks!

> 
> > gpio-aggregator as a loadable module, if that's relevant.
> 
> Modular or non-modular shouldn't matter w.r.t. this behavior.
> Although unloading the module should get rid of the cruft.

Yes, indeed!

> 
> > Another comment is that, while the series _does_ allow specifying
> > gpio lines in the DTS (this would require a common compatible string
> > in gpio_aggregator_dt_ids[] and in the DTS node) and while those lines
> > are indeed exposed to userspace, based on my testing, these same gpio
> > lines are marked as "used/reserved" by the kernel. This means that
> > operating on those gpio pins from userspace will not be possible.
> > For instance, gpioget/gpioset return "Device or resource busy":
> >
> > gpioget: error reading GPIO values: Device or resource busy
> > gpioset: error setting the GPIO line values: Device or resource busy
> >
> > I guess Harish will be unhappy about that, as his expectation was that
> > upon merging gpio-aggregator with gpio-inverter, he will be able to
> > describe GPIO polarity and names in DTS without "hogging" the pins.
> > Perhaps this can be supplemented via an add-on patch later on?
> 
> When aggregating GPIO lines, the original GPIO lines are indeed marked
> used/reserved, so you cannot use them from userspace.
> However, you are expected to use them through the newly created virtual
> gpiochip representing the aggregated GPIO lines.
> 
> You can try this using the "door" example in
> Documentation/admin-guide/gpio/gpio-aggregator.rst, after replacing
> gpio2 {19,20} by gpio6 {12,13} to suit your H3ULCB.

Confirmed. The example works like a charm. One difference between
the runtime-created and DTS-created gpiochips is the name:
 - gpio-aggregator., for the ones created via sysfs
 - , for the ones created via DTS

Seeing this behavior on my target, I believe the expectations of
Harish should be met w/o any major limitations.

> 
> > For the whole series (leaving the above findings to your discretion):
> >
> > Reviewed-by: Eugeniu Rosca 
> > Tested-by: Eugeniu Rosca 

The recent [v3] discussion actually applies to [v4], for which I did
review and testing. Will relay the signatures to the latest version.

Thank you very much.

-- 
Best Regards,
Eugeniu



Re: [PATCH v22 8/9] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2020-01-20 Thread Peter Maydell
On Fri, 17 Jan 2020 at 10:05, gengdongjiu  wrote:
>
> On 2020/1/17 0:28, Peter Maydell wrote:
> > On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng  wrote:
> >
> >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >> +{
> >> +ram_addr_t ram_addr;
> >> +hwaddr paddr;
> >> +
> >> +assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> >> +
> >> +if (acpi_enabled && addr &&
> >> +object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> >> +ram_addr = qemu_ram_addr_from_host(addr);
> >> +if (ram_addr != RAM_ADDR_INVALID &&
> >> +kvm_physical_memory_addr_from_host(c->kvm_state, addr, 
> >> )) {
> >> +kvm_hwpoison_page_add(ram_addr);
> >> +/*
> >> + * Asynchronous signal will be masked by main thread, so
> >> + * only handle synchronous signal.
> >> + */
> >
> > I don't understand this comment. (I think we've had discussions
> > about it before, but it's still not clear to me.)
> >
> > This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts:
> >
> > (1) in the vcpu thread:
> >   * the real SIGBUS handler sigbus_handler() sets a flag and arranges
> > for an immediate vcpu exit
> >   * the vcpu thread reads the flag on exit from KVM_RUN and
> > calls kvm_arch_on_sigbus_vcpu() directly
> >   * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error 
> > can be MCEERR_AR or MCEERR_AO,
> but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO, 
> because it needs do action immediately. For MCEERR_AO error, the action is 
> optional and the error can be ignored.
> At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads.
>
> > (2) MCE errors on other threads:
> >   * here SIGBUS is blocked, so MCEERR_AR (action-required)
> > errors will cause the kernel to just kill the QEMU process
> >   * MCEERR_AO errors will be handled via the iothread's use
> > of signalfd(), so kvm_on_sigbus() will get called from
> > the main thread, and it will call kvm_arch_on_sigbus_vcpu()
> >   * in this case the passed in CPUState will (arbitrarily) be that
> > for the first vCPU
>
> For the MCE errors on other threads, it can only handle MCEERR_AO. If it is 
> MCEERR_AR, the QEMU will assert and exit[2].
>
> Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the 
> iothread's use of signalfd() through above path.
> Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it 
> because SIGBUS is masked in main thread[3], for this case, I do not see QEMU 
> handle it via signalfd() for MCEERR_AO errors from my test.

SIGBUS is blocked in the main thread because we use signalfd().
The function sigfd_handler() should be called and it will then
manually invoke the correct function for the signal.

> For Case1,I think we should not let guest know it, because it is not 
> triggered by guest. only other APP send SIGBUS to tell QEMU do somethings.

I don't understand what you mean here by "other app" or
"guest" triggering of MCEERR. I thought that an MCEERR meant
"the hardware has detected that there is a problem with the
RAM". If there's a problem with the RAM and it's the RAM that's
being used as guest RAM, we need to tell the guest, surely ?

> For Case2,it does not call call kvm_arch_on_sigbus_vcpu().

It should do. The code you quote calls that function
for that case:

> [1]:
> /* Called synchronously (via signalfd) in main thread.  */
> int kvm_on_sigbus(int code, void *addr)
> {
> #ifdef KVM_HAVE_MCE_INJECTION
> /* Action required MCE kills the process if SIGBUS is blocked.  Because
>  * that's what happens in the I/O thread, where we handle MCE via 
> signalfd,
>  * we can only get action optional here.
>  */
> [2]: assert(code != BUS_MCEERR_AR);
> kvm_arch_on_sigbus_vcpu(first_cpu, code, addr);
> return 0;
> #else
> return 1;
> #endif
> }


> Above all, from my test, for MCEERR_AO error which is triggered by guest, it 
> not call
kvm_arch_on_sigbus_vcpu().

I'm not sure what you mean by "triggered by guest". I assume that
exactly what kind of errors the kernel can report and when will
depend to some extent on the underlying hardware/firmware
implementation of reporting of memory errors, but in principle
the ABI allows the kernel to send SIGBUS_(BUS_MCEERR_AO) to the
main thread, the signal should be handled by signalfd, our code
for working with multiple fds should mean that the main thread
calls sigfd_handler() to deal with reading bytes from the signalfd
fd, and that function should then call sigbus_handler(), which
calls kvm_on_sigbus(), which calls kvm_arch_on_sigbus_vcpu().
If something in that code path is not working then we need to
find out what it is.

thanks
-- PMM



Re: [PATCH v7 03/11] hw/core: create Resettable QOM interface

2020-01-20 Thread Damien Hedde



On 1/18/20 7:35 AM, Philippe Mathieu-Daudé wrote:
> On 1/15/20 1:36 PM, Damien Hedde wrote:
>> This commit defines an interface allowing multi-phase reset. This aims
>> to solve a problem of the actual single-phase reset (built in
>> DeviceClass and BusClass): reset behavior is dependent on the order
>> in which reset handlers are called. In particular doing external
>> side-effect (like setting an qemu_irq) is problematic because receiving
>> object may not be reset yet.
>>
>> The Resettable interface divides the reset in 3 well defined phases.
>> To reset an object tree, all 1st phases are executed then all 2nd then
>> all 3rd. See the comments in include/hw/resettable.h for a more complete
>> description. The interface defines 3 phases to let the future
>> possibility of holding an object into reset for some time.
>>
>> The qdev/qbus reset in DeviceClass and BusClass will be modified in
>> following commits to use this interface. A mechanism is provided
>> to allow executing a transitional reset handler in place of the 2nd
>> phase which is executed in children-then-parent order inside a tree.
>> This will allow to transition devices and buses smoothly while
>> keeping the exact current qdev/qbus reset behavior for now.
>>
>> Documentation will be added in a following commit.
>>
>> Signed-off-by: Damien Hedde 
>> Reviewed-by: Richard Henderson 
>> ---
>>
>> v7 update: un-nest struct ResettablePhases
>> ---
>>   Makefile.objs   |   1 +
>>   include/hw/resettable.h | 211 +++
>>   hw/core/resettable.c    | 238 
>>   hw/core/Makefile.objs   |   1 +
>>   hw/core/trace-events    |  17 +++
>>   5 files changed, 468 insertions(+)
>>   create mode 100644 include/hw/resettable.h
>>   create mode 100644 hw/core/resettable.c
>>
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 7c1e50f9d6..9752d549b4 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -191,6 +191,7 @@ trace-events-subdirs += migration
>>   trace-events-subdirs += net
>>   trace-events-subdirs += ui
>>   endif
>> +trace-events-subdirs += hw/core
>>   trace-events-subdirs += hw/display
>>   trace-events-subdirs += qapi
>>   trace-events-subdirs += qom
>> diff --git a/include/hw/resettable.h b/include/hw/resettable.h
>> new file mode 100644
>> index 00..58b3df4c22
>> --- /dev/null
>> +++ b/include/hw/resettable.h
>> @@ -0,0 +1,211 @@
>> +/*
>> + * Resettable interface header.
>> + *
>> + * Copyright (c) 2019 GreenSocs SAS
>> + *
>> + * Authors:
>> + *   Damien Hedde
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef HW_RESETTABLE_H
>> +#define HW_RESETTABLE_H
>> +
>> +#include "qom/object.h"
>> +
>> +#define TYPE_RESETTABLE_INTERFACE "resettable"
>> +
>> +#define RESETTABLE_CLASS(class) \
>> +    OBJECT_CLASS_CHECK(ResettableClass, (class),
>> TYPE_RESETTABLE_INTERFACE)
>> +
>> +#define RESETTABLE_GET_CLASS(obj) \
>> +    OBJECT_GET_CLASS(ResettableClass, (obj), TYPE_RESETTABLE_INTERFACE)
>> +
>> +typedef struct ResettableState ResettableState;
>> +
>> +/**
>> + * ResetType:
>> + * Types of reset.
>> + *
>> + * + Cold: reset resulting from a power cycle of the object.
>> + *
>> + * TODO: Support has to be added to handle more types. In particular,
>> + * ResettableState structure needs to be expanded.
>> + */
>> +typedef enum ResetType {
>> +    RESET_TYPE_COLD,
>> +} ResetType;
>> +
>> +/*
>> + * ResettableClass:
>> + * Interface for resettable objects.
>> + *
>> + * See docs/devel/reset.rst for more detailed information about how
>> QEMU models
>> + * reset. This whole API must only be used when holding the iothread
>> mutex.
>> + *
>> + * All objects which can be reset must implement this interface;
>> + * it is usually provided by a base class such as DeviceClass or
>> BusClass.
>> + * Every Resettable object must maintain some state tracking the
>> + * progress of a reset operation by providing a ResettableState
>> structure.
>> + * The functions defined in this module take care of updating the
>> + * state of the reset.
>> + * The base class implementation of the interface provides this
>> + * state and implements the associated method: get_state.
>> + *
>> + * Concrete object implementations (typically specific devices
>> + * such as a UART model) should provide the functions
>> + * for the phases.enter, phases.hold and phases.exit methods, which
>> + * they can set in their class init function, either directly or
>> + * by calling resettable_class_set_parent_phases().
>> + * The phase methods are guaranteed to only only ever be called once
>> + * for any reset event, in the order 'enter', 'hold', 'exit'.
>> + * An object will always move quickly from 'enter' to 'hold'
>> + * but might remain in 'hold' for an arbitrary period of time
>> + * before eventually reset is deasserted and the 'exit' phase is called.
>> + * Object 

ping Re: [PATCH for-5.0 v2 0/3] benchmark util

2020-01-20 Thread Vladimir Sementsov-Ogievskiy
ping

26.11.2019 18:48, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is simple benchmarking utility, to generate performance
> comparison tables, like the following:
> 
> --  -  -  -
>  backup-1   backup-2   mirror
> ssd -> ssd  0.43 +- 0.00   4.48 +- 0.06   4.38 +- 0.02
> ssd -> hdd  10.60 +- 0.08  10.69 +- 0.18  10.57 +- 0.05
> ssd -> nbd  33.81 +- 0.37  10.67 +- 0.17  10.07 +- 0.07
> --  -  -  -
> 
> This is a v2, as v1 was inside
>   "[RFC 00/24] backup performance: block_status + async"
> 
> I'll use this benchmark in other series, hope someone
> will like it.
> 
> Vladimir Sementsov-Ogievskiy (3):
>python: add simplebench.py
>python: add qemu/bench_block_job.py
>python: add example usage of simplebench
> 
>   python/bench-example.py|  80 +
>   python/qemu/bench_block_job.py | 115 +
>   python/simplebench.py  | 128 +
>   3 files changed, 323 insertions(+)
>   create mode 100644 python/bench-example.py
>   create mode 100755 python/qemu/bench_block_job.py
>   create mode 100644 python/simplebench.py
> 


-- 
Best regards,
Vladimir


Re: Making QEMU easier for management tools and applications

2020-01-20 Thread Stefan Hajnoczi
On Thu, Jan 16, 2020 at 12:03:14PM +0100, Kashyap Chamarthy wrote:
> On Thu, Jan 02, 2020 at 02:47:22PM +, Stefan Hajnoczi wrote:
> > On Sat, Dec 21, 2019 at 10:02:23AM +0100, Markus Armbruster wrote:
> > > Stefan Hajnoczi  writes:
> 
> [...]
> 
> > > > 2. scripts/qmp/ contains command-line tools for QMP communication.
> > > > They could use some polish and then be shipped.
> > > 
> > > MAINTAINERS blames them on me, but they're effectively unmaintained.
> > > Prerequisite for shipping: having a maintainer who actually gives a
> > > damn.
> > ...
> > > * scripts/qmp/qmp-shell
> > > 
> > >   Half-hearted attempt at a human-friendly wrapper around the JSON
> > >   syntax.  I have no use for this myself.
> > 
> > I think this one is used by people.  John Snow comes to mind.
> 
> FWIW I too frequently use 'qmp-shell'.  And some of the examples in this
> document[1] are demonstrated with it.
> 
> I'm reasonably happy with it (particularly the persistent history
> captured in ~/.qmp-shell_history), and it has some "known issues" that
> can trip up a new user.  The one that immediately jumps to mind:
> asynchronous events won't be printed without a prompt from the user --
> e.g. after a `blockdev-commit`, you won't see BLOCK_JOB_{READY,
> COMPLETED} events printed unless you manually hit enter from the
> 'qmp-shell'.
> 
> (Not complaining; I have a long-standing TODO to make time to
> investigate this.)
> 
> [1] https://qemu.readthedocs.io/en/latest/interop/live-block-operations.html

John and I discussed async events in the past.  qmp-shell currently uses
the input() built-in function.  If we modify it with a
select(2)/poll(2)-style function that monitors both stdin and the QMP
socket then it could print QMP events as soon as they are received.

There might be a nicer way of doing it, but pseudo-code for the idea is:

  def input_with_events(prompt):
  while True:
  print(prompt, end='', flush=True)
  readable_files = select([sys.stdin, qmp_socket])
  if qmp_socket in readable_files:
  print_qmp_events()

  # stdin is ready, read a line
  return input()

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 070/104] virtiofsd: fail when parent inode isn't known in lo_do_lookup()

2020-01-20 Thread Sergio Lopez

Dr. David Alan Gilbert (git)  writes:

> From: Miklos Szeredi 
>
> The Linux file handle APIs (struct export_operations) can access inodes
> that are not attached to parents because path name traversal is not
> performed.  Refuse if there is no parent in lo_do_lookup().
>
> Also clean up lo_do_lookup() while we're here.
>
> Signed-off-by: Miklos Szeredi 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  tools/virtiofsd/passthrough_ll.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)

Reviewed-by: Sergio Lopez 


signature.asc
Description: PGP signature


Re: Making QEMU easier for management tools and applications

2020-01-20 Thread Stefan Hajnoczi
On Wed, Jan 15, 2020 at 01:15:17PM +0100, Markus Armbruster wrote:
> Christophe de Dinechin  writes:
> >> On 15 Jan 2020, at 10:20, Markus Armbruster  wrote:
> * qemuMonitorJSONSetIOThread() uses it to control iothread's properties
>   poll-max-ns, poll-grow, poll-shrink.  Their use with -object is
>   documented (in qemu-options.hx), their use with qom-set is not.

I'm happy to use a different interface.

Writing a boilerplate "iothread-set-poll-params" QMP command in C would
be a step backwards.

Maybe the QAPI code generator could map something like this:

  { 'command': 'iothread-set-poll-params',
'data': {
'id': 'str',
'*max-ns': 'uint64',
'*grow': 'uint64',
'*shrink': 'uint64'
},
'map-to-qom-set': 'IOThread'
  }

And turn it into QOM accessors on the IOThread object.

Stefan


signature.asc
Description: PGP signature


[RFC PATCH v3 0/6] target/arm/kvm: Adjust virtual time

2020-01-20 Thread Andrew Jones
v3:
 - Added a target/arm/kvm_arm.h comment cleanup patch (1/6)
 - Minor refactoring of assert_has_feature_enabled/disabled in 4/6,
   kept Richard's r-b.
 - Rewrote kvm-no-adjvtime documentation in 6/6.
 - Reworked approach in 5/6 to properly deal with migration and to
   track running vs. !running, rather than running vs. paused states.

v2:
 - Reworked it enough that I brought back the RFC tag and retitled the
   series. Also had to drop r-b's from a couple of patches, and even
   drop patches.
 - Changed approach from writing the QEMU virtual time to the guest
   vtime counter to saving and restoring the guest vtime counter.
 - Changed the kvm-adjvtime property, which was off by default, to a
   kvm-no-adjvtime property, which is also off by default, meaning the
   effective "adjust vtime" property is now on by default (but only
   for 5.0 virt machine types and later)

v1:
 - move from RFC status to v1
 - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c
 - add r-b's from Richard


This series is inspired by a series[1] posted by Bijan Mottahedeh over
a year ago and by the patch[2] posted by Heyi Guo almost a year ago.
The problem described in the cover letter of [1] is easily reproducible
and some users would like to have the option to avoid it. However the
solution, which is to adjust the virtual counter each time the VM
transitions to the running state, introduces a different problem, which
is that the virtual and physical counters diverge. As described in the
cover letter of [1] this divergence is easily observed when comparing
the output of `date` and `hwclock` after suspending the guest, waiting
a while, and then resuming it. Because this different problem may actually
be worse for some users, unlike [1], the series posted here makes the
virtual counter adjustment optional. Besides the adjustment being
optional, this series approaches the needed changes differently to apply
them in more appropriate locations.

Additional notes


Note 1
--

As described above, when running a guest with kvm-no-adjtime disabled
it will be less likely the guest OS and guest applications get surprise
time jumps when they use the virtual counter.  However the counter will
no longer reflect real time.  It will lag behind.  If this is a problem
then the guest can resynchronize its time from an external source or
even from its physical counter.  If the suspend/resume is done with
libvirt's virsh, and the guest is running the guest agent, then it's
also possible to use a sequence like this

 $ virsh suspend $GUEST
 $ virsh resume $GUEST
 $ virsh domtime --sync $GUEST

in order to resynchronize a guest right after the resume.  Of course
there will still be time when the clock is not right, possibly creating
confusing timestamps in logs, for example, and the guest must still be
tolerant to the time synchronizations.

Note 2
--

Userspace that wants to set KVM_REG_ARM_TIMER_CNT should beware that
the KVM register ID is not correct.  This cannot be fixed because it's
UAPI and if the UAPI headers are used then it can't be a problem.
However, if a userspace attempts to create the ID themselves from the
register's specification, then they will get KVM_REG_ARM_TIMER_CVAL
instead, as the _CNT and _CVAL definitions have their register
parameters swapped.

Note 3
--

I didn't test this with a 32-bit KVM host, but the changes to kvm32.c
are the same as kvm64.c. So what could go wrong? Test results would be
appreciated.
 

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05713.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg03695.html

Thanks,
drew


Andrew Jones (6):
  target/arm/kvm: trivial: Clean up header documentation
  hw/arm/virt: Add missing 5.0 options call to 4.2 options
  target/arm/kvm64: kvm64 cpus have timer registers
  tests/arm-cpu-features: Check feature default values
  target/arm/kvm: Implement virtual time adjustment
  target/arm/cpu: Add the kvm-no-adjvtime CPU property

 docs/arm-cpu-features.rst  |  37 +-
 hw/arm/virt.c  |   9 +++
 include/hw/arm/virt.h  |   1 +
 target/arm/cpu.c   |   2 +
 target/arm/cpu.h   |   7 ++
 target/arm/cpu64.c |   1 +
 target/arm/kvm.c   | 120 +
 target/arm/kvm32.c |   3 +
 target/arm/kvm64.c |   4 ++
 target/arm/kvm_arm.h   |  95 --
 target/arm/machine.c   |   7 ++
 target/arm/monitor.c   |   1 +
 tests/qtest/arm-cpu-features.c |  41 ---
 13 files changed, 299 insertions(+), 29 deletions(-)

-- 
2.21.1




[RFC PATCH v3 5/6] target/arm/kvm: Implement virtual time adjustment

2020-01-20 Thread Andrew Jones
When a VM is stopped (such as when it's paused) guest virtual time
should stop counting. Otherwise, when the VM is resumed it will
experience time jumps and its kernel may report soft lockups. Not
counting virtual time while the VM is stopped has the side effect
of making the guest's time appear to lag when compared with real
time, and even with time derived from the physical counter. For
this reason, this change, which is enabled by default, comes with
a KVM CPU feature allowing it to be disabled, restoring legacy
behavior.

This patch only provides the implementation of the virtual time
adjustment. A subsequent patch will provide the CPU property
allowing the change to be enabled and disabled.

Reported-by: Bijan Mottahedeh 
Signed-off-by: Andrew Jones 
---
 target/arm/cpu.h |  7 
 target/arm/kvm.c | 92 
 target/arm/kvm32.c   |  3 ++
 target/arm/kvm64.c   |  3 ++
 target/arm/kvm_arm.h | 38 ++
 target/arm/machine.c |  7 
 6 files changed, 150 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 40f2c45e17e3..2e6477c92c9c 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -821,6 +821,13 @@ struct ARMCPU {
 /* KVM init features for this CPU */
 uint32_t kvm_init_features[7];
 
+/* KVM CPU state */
+
+/* KVM virtual time adjustment */
+bool kvm_adjvtime;
+bool kvm_vtime_dirty;
+uint64_t kvm_vtime;
+
 /* Uniprocessor system with MP extensions */
 bool mp_is_up;
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index b87b59a02ad8..91e38def6b29 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -359,6 +359,22 @@ static int compare_u64(const void *a, const void *b)
 return 0;
 }
 
+/*
+ * cpreg_values are sorted in ascending order by KVM register ID
+ * (see kvm_arm_init_cpreg_list). This allows us to cheaply find
+ * the storage for a KVM register by ID with a binary search.
+ */
+static uint64_t *kvm_arm_get_cpreg_ptr(ARMCPU *cpu, uint64_t regidx)
+{
+uint64_t *res;
+
+res = bsearch(, cpu->cpreg_indexes, cpu->cpreg_array_len,
+  sizeof(uint64_t), compare_u64);
+assert(res);
+
+return >cpreg_values[res - cpu->cpreg_indexes];
+}
+
 /* Initialize the ARMCPU cpreg list according to the kernel's
  * definition of what CPU registers it knows about (and throw away
  * the previous TCG-created cpreg list).
@@ -512,6 +528,23 @@ bool write_list_to_kvmstate(ARMCPU *cpu, int level)
 return ok;
 }
 
+void kvm_arm_cpu_pre_save(ARMCPU *cpu)
+{
+/* KVM virtual time adjustment */
+if (cpu->kvm_vtime_dirty) {
+*kvm_arm_get_cpreg_ptr(cpu, KVM_REG_ARM_TIMER_CNT) = cpu->kvm_vtime;
+}
+}
+
+void kvm_arm_cpu_post_load(ARMCPU *cpu)
+{
+/* KVM virtual time adjustment */
+if (cpu->kvm_adjvtime) {
+cpu->kvm_vtime = *kvm_arm_get_cpreg_ptr(cpu, KVM_REG_ARM_TIMER_CNT);
+cpu->kvm_vtime_dirty = true;
+}
+}
+
 void kvm_arm_reset_vcpu(ARMCPU *cpu)
 {
 int ret;
@@ -579,6 +612,50 @@ int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu)
 return 0;
 }
 
+void kvm_arm_get_virtual_time(CPUState *cs)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+struct kvm_one_reg reg = {
+.id = KVM_REG_ARM_TIMER_CNT,
+.addr = (uintptr_t)>kvm_vtime,
+};
+int ret;
+
+if (cpu->kvm_vtime_dirty) {
+return;
+}
+
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, );
+if (ret) {
+error_report("Failed to get KVM_REG_ARM_TIMER_CNT");
+abort();
+}
+
+cpu->kvm_vtime_dirty = true;
+}
+
+void kvm_arm_put_virtual_time(CPUState *cs)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+struct kvm_one_reg reg = {
+.id = KVM_REG_ARM_TIMER_CNT,
+.addr = (uintptr_t)>kvm_vtime,
+};
+int ret;
+
+if (!cpu->kvm_vtime_dirty) {
+return;
+}
+
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
+if (ret) {
+error_report("Failed to set KVM_REG_ARM_TIMER_CNT");
+abort();
+}
+
+cpu->kvm_vtime_dirty = false;
+}
+
 int kvm_put_vcpu_events(ARMCPU *cpu)
 {
 CPUARMState *env = >env;
@@ -690,6 +767,21 @@ MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run 
*run)
 return MEMTXATTRS_UNSPECIFIED;
 }
 
+void kvm_arm_vm_state_change(void *opaque, int running, RunState state)
+{
+CPUState *cs = opaque;
+ARMCPU *cpu = ARM_CPU(cs);
+
+if (running) {
+if (cpu->kvm_adjvtime) {
+kvm_arm_put_virtual_time(cs);
+}
+} else {
+if (cpu->kvm_adjvtime) {
+kvm_arm_get_virtual_time(cs);
+}
+}
+}
 
 int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 {
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 32bf8d6757c4..3a8b437eef0b 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -16,6 +16,7 @@
 #include "qemu-common.h"
 #include "cpu.h"
 #include "qemu/timer.h"
+#include "sysemu/runstate.h"
 #include "sysemu/kvm.h"
 #include "kvm_arm.h"
 #include "internals.h"
@@ -198,6 +199,8 

Re: [PATCH v3 01/10] hbitmap: assert that we don't create bitmap larger than INT64_MAX

2020-01-20 Thread Max Reitz
On 19.12.19 11:03, Vladimir Sementsov-Ogievskiy wrote:
> We have APIs which returns signed int64_t, to be able to return error.
> Therefore we can't handle bitmaps with absolute size larger than
> (INT64_MAX+1). Still, keep maximum to be INT64_MAX which is a bit
> safer.
> 
> Note, that bitmaps are used to represent disk images, which can't
> exceed INT64_MAX anyway.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  util/hbitmap.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: Feedback on multi-process QEMU muser prototype

2020-01-20 Thread Stefan Hajnoczi
On Wed, Jan 15, 2020 at 10:58:38AM +, Felipe Franciosi wrote:
> > On Jan 14, 2020, at 3:22 PM, Stefan Hajnoczi  wrote:
> > 
> > I haven't seen the link to the muser prototype shared on the list yet,
> > so I'm taking the liberty of posting it for discussion:
> > https://github.com/oracle/qemu/tree/multi-process-qemu-v0.4.1-muser
> > 
> > Great that a lot of the multi-process patch series is no longer
> > necessary.  The muser approach requires less code in QEMU.
> > 
> > The following points came to mind:
> > 
> > 1. Configure PCI configuration space, BARs, and MSI/IRQs based on the 
> > PCIDevice
> >   instead of hard-coding the LSI SCSI controller's specifics.  That way any
> >   PCIDevice can run as an muser device.
> > 
> > 2. Integrate with QEMU's event loop instead of spawning threads and calling
> >   lm_ctx_run().  The event loop should monitor the muser fd for activity 
> > using
> >   aio_set_fd_handler() and then call into libmuser to handle the event.  
> > This
> >   will avoid thread model problems in the future and also allow true
> >   multi-threading (IOThreads).
> 
> Allowing muser to be used like that is in our to-do list.
> 
> (+ Thanos / Swapnil).
> 
> We have to extend muser.ko to allow the device file descriptor to be
> "pollable".  Let me know how soon you want to see that so we can
> prioritise accordingly or assist someone in doing the work.

Last I talked with the multi-process QEMU team the discussion was about
moving forward with the existing QEMU patches and then moving to
VFIO-over-socket.  libmuser makes a nice library API for device emulator
programs and I think it will be revisited when VFIO-over-socket
integration begins.

This makes the pollable fd less of a priority for QEMU at the moment.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2 4/6] Makefile: Remove unhelpful comment

2020-01-20 Thread Alex Bennée


Philippe Mathieu-Daudé  writes:

> It is pointless to keep qapi/ object separate from the other
> common-objects. Drop the comment.
>
> Reviewed-by: Thomas Huth 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alex Bennée 

> ---
>  Makefile.objs | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 395dd1e670..c6321d0465 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -71,11 +71,9 @@ qemu-seccomp.o-libs := $(SECCOMP_LIBS)
>  
>  common-obj-$(CONFIG_FDT) += device_tree.o
>  
> -##
> -# qapi
> -
>  common-obj-y += qapi/
> -endif
> +
> +endif # CONFIG_SOFTMMU
>  
>  ###
>  # Target-independent parts used in system and user emulation


-- 
Alex Bennée



[PATCH v2 0/2] ppc: add support for Directed Privileged Doorbell (non-hypervisor)

2020-01-20 Thread Cédric Le Goater
Hello,

The Processor Control facility for POWER8 processors and later
provides a mechanism for the hypervisor to send messages to other
threads in the system (msgsnd instruction) and cause hypervisor-level
exceptions.

Privileged non-hypervisor programs can also send messages (msgsndp
instruction) but are restricted to the threads of the same
subprocessor and cause privileged-level exceptions. The Directed
Privileged Doorbell Exception State (DPDES) register reflects the
state of pending privileged-level doorbell exceptions for all threads
and can be used to modify that state.

If the MSGP facility is not in the HFSCR, a hypervisor facility
unavailable exception is generated when these instructions are used or
when the DPDES register is accessed by the supervisor.

Based on previous work from Suraj Jitindar Singh. 

Thanks,

C.

Changes since v1:

 - removed DBELL_TIRTAG_MASK and simplified helpers as QEMU TCG
   doesn't support more than on thread per core   
 - simplified book3s_dbell2irq() and renamed it to dbell_type_server() 
 - replaced mask LOG_GUEST_ERROR by CPU_LOG_INT to track HV Facility
   errors
 
Cédric Le Goater (2):
  target/ppc: Add privileged message send facilities
  target/ppc: add support for Hypervisor Facility Unavailable Exception

 target/ppc/cpu.h|  6 +++
 target/ppc/helper.h |  4 ++
 target/ppc/excp_helper.c| 79 ++---
 target/ppc/misc_helper.c| 63 ++
 target/ppc/translate.c  | 26 +++
 target/ppc/translate_init.inc.c | 20 +++--
 6 files changed, 178 insertions(+), 20 deletions(-)

-- 
2.21.1




Re: [PATCH] target/arm: add PMU feature to cortex-r5 and cortex-r5f

2020-01-20 Thread Peter Maydell
On Tue, 14 Jan 2020 at 12:44, Philippe Mathieu-Daudé  wrote:
>
> On 1/14/20 11:59 AM, Clement Deschamps wrote:
>
> Maybe describe here:
>
> The PMU is not optional on cortex-r5 and cortex-r5f (see
> the "Features" chapter of the Technical Reference Manual).
>
> > Signed-off-by: Clement Deschamps 
>
> Reviewed-by: Philippe Mathieu-Daudé 



Applied to target-arm.next, thanks (with the suggested
improvement to the commit message).

-- PMM



Re: [PATCH 105/104] virtiofsd: Unref old/new inodes with the same mutex lock in lo_rename()

2020-01-20 Thread Philippe Mathieu-Daudé

On 1/19/20 9:35 AM, Xiao Yang wrote:

On 2020/1/17 21:32, Philippe Mathieu-Daudé wrote:

We can unref both old/new inodes with the same mutex lock.

Signed-off-by: Philippe Mathieu-Daudé
---
Based-on:<20191212163904.159893-1-dgilb...@redhat.com>
"virtiofs daemon"
https://www.mail-archive.com/qemu-devel@nongnu.org/msg664652.html

  tools/virtiofsd/passthrough_ll.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c 
b/tools/virtiofsd/passthrough_ll.c

index 57f58aef26..5c717cb5a1 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1461,8 +1461,10 @@ static void lo_rename(fuse_req_t req, 
fuse_ino_t parent, const char *name,

  }

  out:
-    unref_inode_lolocked(lo, oldinode, 1);
-    unref_inode_lolocked(lo, newinode, 1);
+    pthread_mutex_lock(>mutex);
+    unref_inode(lo, oldinode, 1);
+    unref_inode(lo, newinode, 1);
+    pthread_mutex_unlock(>mutex);

Hi,

It seems to avoid calling pthread_mutex_lock and pthread_mutex_unlock 
twice.

Does the change fix some issues or improve the performance?


No issue, simply intend to improve the performance.


Best Regards,
Xiao Yang

  lo_inode_put(lo,);
  lo_inode_put(lo,);
  lo_inode_put(lo,_inode);









Re: [PATCH 6/9] cputlb: Split out tlb_mmu_flush_locked

2020-01-20 Thread Philippe Mathieu-Daudé

On 1/9/20 3:49 AM, Richard Henderson wrote:

We will want to be able to flush a tlb without resizing.

Signed-off-by: Richard Henderson 


Reviewed-by: Philippe Mathieu-Daudé 


---
  accel/tcg/cputlb.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index eff427f137..e60e501334 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -228,12 +228,8 @@ static void tlb_mmu_resize_locked(CPUTLBDesc *desc, 
CPUTLBDescFast *fast)
  }
  }
  
-static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)

+static void tlb_mmu_flush_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast)
  {
-CPUTLBDesc *desc = _tlb(env)->d[mmu_idx];
-CPUTLBDescFast *fast = _tlb(env)->f[mmu_idx];
-
-tlb_mmu_resize_locked(desc, fast);
  desc->n_used_entries = 0;
  desc->large_page_addr = -1;
  desc->large_page_mask = -1;
@@ -242,6 +238,15 @@ static void tlb_flush_one_mmuidx_locked(CPUArchState *env, 
int mmu_idx)
  memset(desc->vtable, -1, sizeof(desc->vtable));
  }
  
+static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)

+{
+CPUTLBDesc *desc = _tlb(env)->d[mmu_idx];
+CPUTLBDescFast *fast = _tlb(env)->f[mmu_idx];
+
+tlb_mmu_resize_locked(desc, fast);
+tlb_mmu_flush_locked(desc, fast);
+}
+
  static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t 
mmu_idx)
  {
  env_tlb(env)->d[mmu_idx].n_used_entries++;






Re: [PATCH 5/9] cputlb: Hoist tlb portions in tlb_flush_one_mmuidx_locked

2020-01-20 Thread Philippe Mathieu-Daudé

On 1/9/20 3:49 AM, Richard Henderson wrote:

No functional change, but the smaller expressions make
the code easier to read.

Signed-off-by: Richard Henderson 


Reviewed-by: Philippe Mathieu-Daudé 


---
  accel/tcg/cputlb.c | 19 ++-
  1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index c7dc1dc85a..eff427f137 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -230,15 +230,16 @@ static void tlb_mmu_resize_locked(CPUTLBDesc *desc, 
CPUTLBDescFast *fast)
  
  static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)

  {
-tlb_mmu_resize_locked(_tlb(env)->d[mmu_idx], 
_tlb(env)->f[mmu_idx]);
-env_tlb(env)->d[mmu_idx].n_used_entries = 0;
-env_tlb(env)->d[mmu_idx].large_page_addr = -1;
-env_tlb(env)->d[mmu_idx].large_page_mask = -1;
-env_tlb(env)->d[mmu_idx].vindex = 0;
-memset(env_tlb(env)->f[mmu_idx].table, -1,
-   sizeof_tlb(_tlb(env)->f[mmu_idx]));
-memset(env_tlb(env)->d[mmu_idx].vtable, -1,
-   sizeof(env_tlb(env)->d[0].vtable));
+CPUTLBDesc *desc = _tlb(env)->d[mmu_idx];
+CPUTLBDescFast *fast = _tlb(env)->f[mmu_idx];
+
+tlb_mmu_resize_locked(desc, fast);
+desc->n_used_entries = 0;
+desc->large_page_addr = -1;
+desc->large_page_mask = -1;
+desc->vindex = 0;
+memset(fast->table, -1, sizeof_tlb(fast));
+memset(desc->vtable, -1, sizeof(desc->vtable));
  }
  
  static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu_idx)







Re: [PATCH v3 00/10] Further bitmaps improvements

2020-01-20 Thread Vladimir Sementsov-Ogievskiy
ping

19.12.2019 13:03, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> The main feature here is improvement of _next_dirty_area API, which I'm
> going to use then for backup / block-copy.
> 
> v3: rebase on current master. Mirror don't use _next_dirty_area any
> more, so mirror chunks dropped (patches 05 and 07) and 07 commit message
> changed.
> 
> Vladimir Sementsov-Ogievskiy (10):
>hbitmap: assert that we don't create bitmap larger than INT64_MAX
>hbitmap: move hbitmap_iter_next_word to hbitmap.c
>hbitmap: unpublish hbitmap_iter_skip_words
>hbitmap: drop meta bitmaps as they are unused
>block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t
>block/dirty-bitmap: add _next_dirty API
>block/dirty-bitmap: improve _next_dirty_area API
>nbd/server: introduce NBDExtentArray
>nbd/server: use bdrv_dirty_bitmap_next_dirty_area
>block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty
> 
>   include/block/dirty-bitmap.h |   9 +-
>   include/qemu/hbitmap.h   |  97 +++
>   block/dirty-bitmap.c |  16 +-
>   block/qcow2-bitmap.c |  11 +-
>   nbd/server.c | 242 +--
>   tests/test-hbitmap.c | 314 +--
>   util/hbitmap.c   | 133 +--
>   7 files changed, 366 insertions(+), 456 deletions(-)
> 


-- 
Best regards,
Vladimir


Re: [PATCH v3 0/7] gpio: Add GPIO Aggregator/Repeater

2020-01-20 Thread Geert Uytterhoeven
Hi Eugeniu,

On Sat, Jan 18, 2020 at 2:46 AM Eugeniu Rosca  wrote:
> On Wed, Nov 27, 2019 at 09:42:46AM +0100, Geert Uytterhoeven wrote:
> >   - Create aggregators:
> >
> > $ echo e6052000.gpio 19,20 \
> > > /sys/bus/platform/drivers/gpio-aggregator/new_device

> The only unexpected thing is seeing below messages (where gpiochip99 and
> gpiochip22 are inexisting gpiochip names, mistakenly provided on command
> line prior to passing the correct name):
>
> root@rcar-gen3:~# echo gpiochip6 12-13 > 
> /sys/bus/platform/drivers/gpio-aggregator/new_device
> [  915.572905] gpio-aggregator gpio-aggregator.0: cannot find GPIO chip 
> gpiochip99, deferring
> [  915.584224] gpio-aggregator gpio-aggregator.2: cannot find GPIO chip 
> gpiochip99, deferring
> [  915.865281] gpio-aggregator gpio-aggregator.29: cannot find GPIO chip 
> gpiochip22, deferring
>
> Obviously, in the above case, due to a typo in the names, the gpio
> chips will never be found, no matter how long gpio-aggregator defers

Indeed, that is expected behavior: you have created platform devices
referring to resources that are not available.

> their probing. Unfortunately, the driver will continuously emit those
> messages, upon each successfully created/aggregated gpiochip. I built

That is expected behavior, too: every time the driver core manages to
bind a device to a driver, it will retry all previously deferred probes,
in the hope they can be satisfied by the just bound device.

Note that you can destroy these bogus devices, using e.g.

# echo gpio-aggregator.0 > \
/sys/bus/platform/drivers/gpio-aggregator/delete_device

> gpio-aggregator as a loadable module, if that's relevant.

Modular or non-modular shouldn't matter w.r.t. this behavior.
Although unloading the module should get rid of the cruft.

> Another comment is that, while the series _does_ allow specifying
> gpio lines in the DTS (this would require a common compatible string
> in gpio_aggregator_dt_ids[] and in the DTS node) and while those lines
> are indeed exposed to userspace, based on my testing, these same gpio
> lines are marked as "used/reserved" by the kernel. This means that
> operating on those gpio pins from userspace will not be possible.
> For instance, gpioget/gpioset return "Device or resource busy":
>
> gpioget: error reading GPIO values: Device or resource busy
> gpioset: error setting the GPIO line values: Device or resource busy
>
> I guess Harish will be unhappy about that, as his expectation was that
> upon merging gpio-aggregator with gpio-inverter, he will be able to
> describe GPIO polarity and names in DTS without "hogging" the pins.
> Perhaps this can be supplemented via an add-on patch later on?

When aggregating GPIO lines, the original GPIO lines are indeed marked
used/reserved, so you cannot use them from userspace.
However, you are expected to use them through the newly created virtual
gpiochip representing the aggregated GPIO lines.

You can try this using the "door" example in
Documentation/admin-guide/gpio/gpio-aggregator.rst, after replacing
gpio2 {19,20} by gpio6 {12,13} to suit your H3ULCB.

> For the whole series (leaving the above findings to your discretion):
>
> Reviewed-by: Eugeniu Rosca 
> Tested-by: Eugeniu Rosca 

Thanks!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds



[RFC PATCH v3 6/6] target/arm/cpu: Add the kvm-no-adjvtime CPU property

2020-01-20 Thread Andrew Jones
kvm-no-adjvtime is a KVM specific CPU property and a first of its
kind. To accommodate it we also add kvm_arm_add_vcpu_properties()
and a KVM specific CPU properties description to the CPU features
document.

Signed-off-by: Andrew Jones 
---
 docs/arm-cpu-features.rst  | 37 +-
 hw/arm/virt.c  |  8 
 include/hw/arm/virt.h  |  1 +
 target/arm/cpu.c   |  2 ++
 target/arm/cpu64.c |  1 +
 target/arm/kvm.c   | 28 +
 target/arm/kvm_arm.h   | 11 ++
 target/arm/monitor.c   |  1 +
 tests/qtest/arm-cpu-features.c |  4 
 9 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
index 9b537a75e695..dbf3b7cf42c0 100644
--- a/docs/arm-cpu-features.rst
+++ b/docs/arm-cpu-features.rst
@@ -31,7 +31,9 @@ supporting the feature or only supporting the feature under 
certain
 configurations.  For example, the `aarch64` CPU feature, which, when
 disabled, enables the optional AArch32 CPU feature, is only supported
 when using the KVM accelerator and when running on a host CPU type that
-supports the feature.
+supports the feature.  While `aarch64` currently only works with KVM,
+it could work with TCG.  CPU features that are specific to KVM are
+prefixed with "kvm-" and are described in "KVM VCPU Features".
 
 CPU Feature Probing
 ===
@@ -171,6 +173,39 @@ disabling many SVE vector lengths would be quite verbose, 
the `sve` CPU
 properties have special semantics (see "SVE CPU Property Parsing
 Semantics").
 
+KVM VCPU Features
+=
+
+KVM VCPU features are CPU features that are specific to KVM, such as
+paravirt features or features that enable CPU virtualization extensions.
+The features' CPU properties are only available when KVM is enabled and
+are named with the prefix "kvm-".  KVM VCPU features may be probed,
+enabled, and disabled in the same way as other CPU features.  Below is
+the list of KVM VCPU features and their descriptions.
+
+  kvm-no-adjvtime  By default kvm-no-adjvtime is disabled.  This
+   means that by default the virtual time
+   adjustment is enabled (vtime is *not not*
+   adjusted).
+
+   When virtual time adjustment is enabled each
+   time the VM transitions back to running state
+   the VCPU's virtual counter is updated to ensure
+   stopped time is not counted.  This avoids time
+   jumps surprising guest OSes and applications,
+   as long as they use the virtual counter for
+   timekeeping.  However it has the side effect of
+   the virtual and physical counters diverging.
+   All timekeeping based on the virtual counter
+   will appear to lag behind any timekeeping that
+   does not subtract VM stopped time.  The guest
+   may resynchronize its virtual counter with
+   other time sources as needed.
+
+   Enable kvm-no-adjvtime to disable virtual time
+   adjustment, also restoring the legacy (pre-5.0)
+   behavior.
+
 SVE CPU Properties
 ==
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 27ca26b05e27..08b3c34c345c 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1663,6 +1663,11 @@ static void machvirt_init(MachineState *machine)
 }
 }
 
+if (vmc->kvm_no_adjvtime &&
+object_property_find(cpuobj, "kvm-no-adjvtime", NULL)) {
+object_property_set_bool(cpuobj, true, "kvm-no-adjvtime", NULL);
+}
+
 if (vmc->no_pmu && object_property_find(cpuobj, "pmu", NULL)) {
 object_property_set_bool(cpuobj, false, "pmu", NULL);
 }
@@ -2153,8 +2158,11 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 0)
 
 static void virt_machine_4_2_options(MachineClass *mc)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
 virt_machine_5_0_options(mc);
 compat_props_add(mc->compat_props, hw_compat_4_2, hw_compat_4_2_len);
+vmc->kvm_no_adjvtime = true;
 }
 DEFINE_VIRT_MACHINE(4, 2)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 38f0c33c77c4..71508bf40c34 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -109,6 +109,7 @@ typedef struct {
 bool smbios_old_sys_ver;
 bool no_highmem_ecam;
 bool no_ged;   /* Machines < 4.2 has no support for ACPI GED device */
+bool kvm_no_adjvtime;
 } VirtMachineClass;
 
 typedef struct {
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index d62fd5fdc644..c2347f1cd335 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ 

[RFC PATCH v3 4/6] tests/arm-cpu-features: Check feature default values

2020-01-20 Thread Andrew Jones
If we know what the default value should be then we can test for
that as well as the feature existence.

Signed-off-by: Andrew Jones 
Reviewed-by: Richard Henderson 
---
 tests/qtest/arm-cpu-features.c | 37 +-
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/arm-cpu-features.c b/tests/qtest/arm-cpu-features.c
index bef3ed24b604..a039e3c8d724 100644
--- a/tests/qtest/arm-cpu-features.c
+++ b/tests/qtest/arm-cpu-features.c
@@ -159,6 +159,25 @@ static bool resp_get_feature(QDict *resp, const char 
*feature)
 qobject_unref(_resp);  \
 })
 
+#define assert_feature(qts, cpu_type, feature, expected_value) \
+({ \
+QDict *_resp, *_props; \
+   \
+_resp = do_query_no_props(qts, cpu_type);  \
+g_assert(_resp);   \
+g_assert(resp_has_props(_resp));   \
+_props = resp_get_props(_resp);\
+g_assert(qdict_get(_props, feature));  \
+g_assert(qdict_get_bool(_props, feature) == (expected_value)); \
+qobject_unref(_resp);  \
+})
+
+#define assert_has_feature_enabled(qts, cpu_type, feature) \
+assert_feature(qts, cpu_type, feature, true)
+
+#define assert_has_feature_disabled(qts, cpu_type, feature)\
+assert_feature(qts, cpu_type, feature, false)
+
 static void assert_type_full(QTestState *qts)
 {
 const char *error;
@@ -405,16 +424,16 @@ static void test_query_cpu_model_expansion(const void 
*data)
 assert_error(qts, "host", "The CPU type 'host' requires KVM", NULL);
 
 /* Test expected feature presence/absence for some cpu types */
-assert_has_feature(qts, "max", "pmu");
-assert_has_feature(qts, "cortex-a15", "pmu");
+assert_has_feature_enabled(qts, "max", "pmu");
+assert_has_feature_enabled(qts, "cortex-a15", "pmu");
 assert_has_not_feature(qts, "cortex-a15", "aarch64");
 
 if (g_str_equal(qtest_get_arch(), "aarch64")) {
-assert_has_feature(qts, "max", "aarch64");
-assert_has_feature(qts, "max", "sve");
-assert_has_feature(qts, "max", "sve128");
-assert_has_feature(qts, "cortex-a57", "pmu");
-assert_has_feature(qts, "cortex-a57", "aarch64");
+assert_has_feature_enabled(qts, "max", "aarch64");
+assert_has_feature_enabled(qts, "max", "sve");
+assert_has_feature_enabled(qts, "max", "sve128");
+assert_has_feature_enabled(qts, "cortex-a57", "pmu");
+assert_has_feature_enabled(qts, "cortex-a57", "aarch64");
 
 sve_tests_default(qts, "max");
 
@@ -451,8 +470,8 @@ static void test_query_cpu_model_expansion_kvm(const void 
*data)
 QDict *resp;
 char *error;
 
-assert_has_feature(qts, "host", "aarch64");
-assert_has_feature(qts, "host", "pmu");
+assert_has_feature_enabled(qts, "host", "aarch64");
+assert_has_feature_enabled(qts, "host", "pmu");
 
 assert_error(qts, "cortex-a15",
 "We cannot guarantee the CPU type 'cortex-a15' works "
-- 
2.21.1




Re: [RFC PATCH v2 5/5] target/arm/cpu: Add the kvm-no-adjvtime CPU property

2020-01-20 Thread Andrew Jones
On Mon, Dec 16, 2019 at 03:06:57PM +, Peter Maydell wrote:
> Incidentally, if I understand things correctly, for TCG the
> behaviour is (and has always been) that VM-stopped time is
> not counted, because we run the emulated versions of these counters
> off QEMU_CLOCK_VIRTUAL. So having the KVM default be the same as
> the TCG default is nicely consistent.

It's correct that the vtimer has always been this way for TCG.
However TCG and KVM still won't be the same with the adjvtime
patches. The reason is that TCG also bases the ptimer off
QEMU_CLOCK_VIRTUAL. This means on TCG ptimer == vtimer and
both will lag a clocksource outside the VM.

I'm not sure if TCG people want to change that to match KVM.
If so, then I guess the ptimer should be based off 
QEMU_CLOCK_REALTIME.

Thanks,
drew




Re: [PATCH v2] target/s390x/kvm: Enable adapter interruption suppression again

2020-01-20 Thread Cornelia Huck
On Mon, 20 Jan 2020 10:49:01 +0100
Thomas Huth  wrote:

> The AIS feature has been disabled late in the v2.10 development cycle since
> there were some issues with migration (see commit 3f2d07b3b01ea61126b -
> "s390x/ais: for 2.10 stable: disable ais facility"). We originally wanted
> to enable it again for newer machine types, but apparently we forgot to do
> this so far. Let's do it for the new s390-ccw-virtio-5.0 machine now.
> 
> While at it, also add a more verbose comment why we need the *_allowed()
> wrappers in s390-virtio-ccw.c.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1756946
> Signed-off-by: Thomas Huth 
> ---
>  Matthew, could you please give this another try on your system? Thanks!
> 
>  hw/s390x/s390-virtio-ccw.c | 20 +---
>  include/hw/s390x/s390-virtio-ccw.h |  3 +++
>  target/s390x/kvm.c |  9 ++---
>  3 files changed, 26 insertions(+), 6 deletions(-)
> 

> @@ -658,6 +669,9 @@ static void ccw_machine_4_2_instance_options(MachineState 
> *machine)
>  
>  static void ccw_machine_4_2_class_options(MachineClass *mc)
>  {
> +S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc);
> +
> +s390mc->kvm_ais_allowed = false;
>  ccw_machine_5_0_class_options(mc);
>  compat_props_add(mc->compat_props, hw_compat_4_2, hw_compat_4_2_len);

One thing I've noticed: You set the _allowed value to false, and
afterwards apply the options from any 'later' class; this is the same
order as for the other _allowed values. There's also
css_migration_enabled, which is set to false _after_ the class options
from 'later' classes have been applied.

Both variants end up the same, as we only ever set the value to true in
the base class and to false just in a single class option callback; but
I wonder whether it would be cleaner to set it to false after the other
options have been applied. Or am I thinking backwards here?

>  }




[PULL 21/29] migration/multifd: not use multifd during postcopy

2020-01-20 Thread Juan Quintela
From: Wei Yang 

We don't support multifd during postcopy, but user still could enable
both multifd and postcopy. This leads to migration failure.

Skip multifd during postcopy.

Signed-off-by: Wei Yang 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index a05448c0c9..d4f33a4f2f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2587,10 +2587,13 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss,
 }
 
 /*
- * do not use multifd for compression as the first page in the new
- * block should be posted out before sending the compressed page
+ * Do not use multifd for:
+ * 1. Compression as the first page in the new block should be posted out
+ *before sending the compressed page
+ * 2. In postcopy as one whole host page should be placed
  */
-if (!save_page_use_compression(rs) && migrate_use_multifd()) {
+if (!save_page_use_compression(rs) && migrate_use_multifd()
+&& !migration_in_postcopy()) {
 return ram_save_multifd_page(rs, block, offset);
 }
 
-- 
2.24.1




[PULL 23/29] migration/multifd: fix destroyed mutex access in terminating multifd threads

2020-01-20 Thread Juan Quintela
From: Jiahui Cen 

One multifd will lock all the other multifds' IOChannel mutex to inform them
to quit by setting p->quit or shutting down p->c. In this senario, if some
multifds had already been terminated and 
multifd_load_cleanup/multifd_save_cleanup
had destroyed their mutex, it could cause destroyed mutex access when trying
lock their mutex.

Here is the coredump stack:
#0  0x7f81a2794437 in raise () from /usr/lib64/libc.so.6
#1  0x7f81a2795b28 in abort () from /usr/lib64/libc.so.6
#2  0x7f81a278d1b6 in __assert_fail_base () from /usr/lib64/libc.so.6
#3  0x7f81a278d262 in __assert_fail () from /usr/lib64/libc.so.6
#4  0x55eb1bfadbd3 in qemu_mutex_lock_impl (mutex=0x55eb1e2d1988, 
file=, line=) at util/qemu-thread-posix.c:64
#5  0x55eb1bb4564a in multifd_send_terminate_threads (err=) at migration/ram.c:1015
#6  0x55eb1bb4bb7f in multifd_send_thread (opaque=0x55eb1e2d19f8) at 
migration/ram.c:1171
#7  0x55eb1bfad628 in qemu_thread_start (args=0x55eb1e170450) at 
util/qemu-thread-posix.c:502
#8  0x7f81a2b36df5 in start_thread () from /usr/lib64/libpthread.so.0
#9  0x7f81a286048d in clone () from /usr/lib64/libc.so.6

To fix it up, let's destroy the mutex after all the other multifd threads had
been terminated.

Signed-off-by: Jiahui Cen 
Signed-off-by: Ying Fang 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 278b2ff87a..7221f54139 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1053,6 +1053,10 @@ void multifd_save_cleanup(void)
 if (p->running) {
 qemu_thread_join(>thread);
 }
+}
+for (i = 0; i < migrate_multifd_channels(); i++) {
+MultiFDSendParams *p = _send_state->params[i];
+
 socket_send_channel_destroy(p->c);
 p->c = NULL;
 qemu_mutex_destroy(>mutex);
@@ -1336,6 +1340,10 @@ int multifd_load_cleanup(Error **errp)
 qemu_sem_post(>sem_sync);
 qemu_thread_join(>thread);
 }
+}
+for (i = 0; i < migrate_multifd_channels(); i++) {
+MultiFDRecvParams *p = _recv_state->params[i];
+
 object_unref(OBJECT(p->c));
 p->c = NULL;
 qemu_mutex_destroy(>mutex);
-- 
2.24.1




Re: [PATCH REPOST v3] target/arm/arch_dump: Add SVE notes

2020-01-20 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200120101832.18781-1-drjo...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200120101832.18781-1-drjo...@redhat.com
Type: series
Subject: [PATCH REPOST v3] target/arm/arch_dump: Add SVE notes

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag] patchew/20200120101832.18781-1-drjo...@redhat.com -> 
patchew/20200120101832.18781-1-drjo...@redhat.com
Switched to a new branch 'test'
3df154f target/arm/arch_dump: Add SVE notes

=== OUTPUT BEGIN ===
ERROR: code indent should never use tabs
#22: FILE: include/elf.h:1653:
+#define NT_ARM_SVE^I0x405^I^I/* ARM Scalable Vector Extension$

WARNING: Block comments use a leading /* on a separate line
#22: FILE: include/elf.h:1653:
+#define NT_ARM_SVE 0x405   /* ARM Scalable Vector Extension

ERROR: code indent should never use tabs
#23: FILE: include/elf.h:1654:
+^I^I^I^I^I   registers */$

WARNING: Block comments use * on subsequent lines
#23: FILE: include/elf.h:1654:
+#define NT_ARM_SVE 0x405   /* ARM Scalable Vector Extension
+  registers */

WARNING: Block comments use a trailing */ on a separate line
#23: FILE: include/elf.h:1654:
+  registers */

total: 2 errors, 3 warnings, 233 lines checked

Commit 3df154fd119b (target/arm/arch_dump: Add SVE notes) has style problems, 
please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200120101832.18781-1-drjo...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v3 4/8] hw/avr: Add some Arduino boards

2020-01-20 Thread Philippe Mathieu-Daudé

On 1/20/20 11:09 AM, Igor Mammedov wrote:

On Mon, 20 Jan 2020 10:21:52 +0100
Philippe Mathieu-Daudé  wrote:


On 1/20/20 10:03 AM, Igor Mammedov wrote:

On Sun, 29 Dec 2019 23:45:01 +0100
Philippe Mathieu-Daudé  wrote:
   

Arduino boards are build with AVR chipsets.
Add some of the popular boards:

- Arduino Duemilanove
- Arduino Uno
- Arduino Mega

For more information:
https://www.arduino.cc/en/Main/Products
https://store.arduino.cc/arduino-genuino/most-popular

Signed-off-by: Philippe Mathieu-Daudé 
---
v2:
- Reword description adding more information (Aleksandar)
- Use DEFINE_TYPES (Igor)

Cc: Phillip Stevens 
Cc: Igor Mammedov 
---
   hw/avr/arduino.c | 177 +++
   hw/avr/Makefile.objs |   1 +
   2 files changed, 178 insertions(+)
   create mode 100644 hw/avr/arduino.c

diff --git a/hw/avr/arduino.c b/hw/avr/arduino.c
new file mode 100644
index 00..ecaaa295d8
--- /dev/null
+++ b/hw/avr/arduino.c
@@ -0,0 +1,177 @@
+/*
+ * QEMU Arduino boards
+ *
+ * Copyright (c) 2019 Philippe Mathieu-Daudé
+ *
+ * This work is licensed under the terms of the GNU GPLv2 or later.
+ * See the COPYING file in the top-level directory.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+/* TODO: Implement the use of EXTRAM */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "hw/loader.h"
+#include "elf.h"
+#include "atmega.h"
+
+typedef struct ArduinoMachineState {

[...]

+MemoryRegion extram;
+} ArduinoMachineState;
+
+typedef struct ArduinoMachineClass {

[...]

+size_t extram_size;


extfoo doesn't seem to be used in this patch


Ah leftover from adapting from the 'sample' board which has SIZE_EXMEM=0
so I ended removing a chunk and forgot this field.

Can I add your R-b tag when respining this patch without the field?

sure


Thanks!


later on we need to make up some generic way for machine to say that
-m is not supported to avoid useless/confusing option where board
doesn't care about it.


Does that means that when running this machine with '-m 3G' on top your 
memdev series, QEMU will still allocate 3G of unnecessary RAM?



+} ArduinoMachineClass;
+
+#define TYPE_ARDUINO_MACHINE \
+MACHINE_TYPE_NAME("arduino")
+#define ARDUINO_MACHINE(obj) \
+OBJECT_CHECK(ArduinoMachineState, (obj), TYPE_ARDUINO_MACHINE)
+#define ARDUINO_MACHINE_CLASS(klass) \
+OBJECT_CLASS_CHECK(ArduinoMachineClass, (klass), TYPE_ARDUINO_MACHINE)
+#define ARDUINO_MACHINE_GET_CLASS(obj) \
+OBJECT_GET_CLASS(ArduinoMachineClass, (obj), TYPE_ARDUINO_MACHINE)
+
+static void load_firmware(const char *firmware, uint64_t flash_size)
+{
+const char *filename;
+int bytes_loaded;
+
+/* Load firmware (contents of flash) trying to auto-detect format */
+filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, firmware);
+if (filename == NULL) {
+error_report("Unable to find %s", firmware);
+exit(1);
+}
+
+bytes_loaded = load_elf(filename, NULL, NULL, NULL, NULL, NULL, NULL,
+0, EM_NONE, 0, 0);
+if (bytes_loaded < 0) {
+bytes_loaded = load_image_targphys(filename, OFFSET_CODE, flash_size);
+}
+if (bytes_loaded < 0) {
+error_report("Unable to load firmware image %s as ELF or raw binary",
+ firmware);
+exit(1);
+}
+}
+
+static void arduino_machine_init(MachineState *machine)
+{
+ArduinoMachineClass *amc = ARDUINO_MACHINE_GET_CLASS(machine);
+ArduinoMachineState *ams = ARDUINO_MACHINE(machine);
+
+sysbus_init_child_obj(OBJECT(machine), "mcu", >mcu, sizeof(ams->mcu),
+  amc->mcu_type);
+object_property_set_uint(OBJECT(>mcu), amc->xtal_hz,
+ "xtal-frequency-hz", _abort);
+object_property_set_bool(OBJECT(>mcu), true, "realized",
+ _abort);
+
+if (machine->firmware) {
+load_firmware(machine->firmware, memory_region_size(>mcu.flash));
+}
+}
+
+static void arduino_machine_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+
+mc->init = arduino_machine_init;
+mc->default_cpus = 1;
+mc->min_cpus = mc->default_cpus;
+mc->max_cpus = mc->default_cpus;
+mc->no_floppy = 1;
+mc->no_cdrom = 1;
+mc->no_parallel = 1;
+}
+
+static void arduino_duemilanove_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+ArduinoMachineClass *amc = ARDUINO_MACHINE_CLASS(oc);
+
+/* https://www.arduino.cc/en/Main/ArduinoBoardDuemilanove */
+mc->desc= "Arduino Duemilanove (ATmega168)",
+mc->alias   = "2009";
+amc->mcu_type   = TYPE_ATMEGA168_MCU;
+amc->xtal_hz= 16 * 1000 * 1000;
+};
+
+static void arduino_uno_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+ArduinoMachineClass *amc = ARDUINO_MACHINE_CLASS(oc);
+
+/* 

Re: [PATCH v3 02/10] hbitmap: move hbitmap_iter_next_word to hbitmap.c

2020-01-20 Thread Max Reitz
On 19.12.19 11:03, Vladimir Sementsov-Ogievskiy wrote:
> The function is definitely internal (it's not used by third party and
> it has complicated interface). Move it to .c file.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/qemu/hbitmap.h | 30 --
>  util/hbitmap.c | 29 +
>  2 files changed, 29 insertions(+), 30 deletions(-)

I wonder why you removed the “inline”, but then again we should probably
trust the compiler more than our intuition anyway.

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 4/9] cputlb: Hoist tlb portions in tlb_mmu_resize_locked

2020-01-20 Thread Alistair Francis
On Thu, Jan 9, 2020 at 12:52 PM Richard Henderson
 wrote:
>
> No functional change, but the smaller expressions make
> the code easier to read.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  accel/tcg/cputlb.c | 35 +--
>  1 file changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 49c605b6d8..c7dc1dc85a 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -115,8 +115,8 @@ static void tlb_dyn_init(CPUArchState *env)
>
>  /**
>   * tlb_mmu_resize_locked() - perform TLB resize bookkeeping; resize if 
> necessary
> - * @env: CPU that owns the TLB
> - * @mmu_idx: MMU index of the TLB
> + * @desc: The CPUTLBDesc portion of the TLB
> + * @fast: The CPUTLBDescFast portion of the same TLB
>   *
>   * Called with tlb_lock_held.
>   *
> @@ -153,10 +153,9 @@ static void tlb_dyn_init(CPUArchState *env)
>   * high), since otherwise we are likely to have a significant amount of
>   * conflict misses.
>   */
> -static void tlb_mmu_resize_locked(CPUArchState *env, int mmu_idx)
> +static void tlb_mmu_resize_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast)
>  {
> -CPUTLBDesc *desc = _tlb(env)->d[mmu_idx];
> -size_t old_size = tlb_n_entries(_tlb(env)->f[mmu_idx]);
> +size_t old_size = tlb_n_entries(fast);
>  size_t rate;
>  size_t new_size = old_size;
>  int64_t now = get_clock_realtime();
> @@ -198,14 +197,15 @@ static void tlb_mmu_resize_locked(CPUArchState *env, 
> int mmu_idx)
>  return;
>  }
>
> -g_free(env_tlb(env)->f[mmu_idx].table);
> -g_free(env_tlb(env)->d[mmu_idx].iotlb);
> +g_free(fast->table);
> +g_free(desc->iotlb);
>
>  tlb_window_reset(desc, now, 0);
>  /* desc->n_used_entries is cleared by the caller */
> -env_tlb(env)->f[mmu_idx].mask = (new_size - 1) << CPU_TLB_ENTRY_BITS;
> -env_tlb(env)->f[mmu_idx].table = g_try_new(CPUTLBEntry, new_size);
> -env_tlb(env)->d[mmu_idx].iotlb = g_try_new(CPUIOTLBEntry, new_size);
> +fast->mask = (new_size - 1) << CPU_TLB_ENTRY_BITS;
> +fast->table = g_try_new(CPUTLBEntry, new_size);
> +desc->iotlb = g_try_new(CPUIOTLBEntry, new_size);
> +
>  /*
>   * If the allocations fail, try smaller sizes. We just freed some
>   * memory, so going back to half of new_size has a good chance of 
> working.
> @@ -213,25 +213,24 @@ static void tlb_mmu_resize_locked(CPUArchState *env, 
> int mmu_idx)
>   * allocations to fail though, so we progressively reduce the allocation
>   * size, aborting if we cannot even allocate the smallest TLB we support.
>   */
> -while (env_tlb(env)->f[mmu_idx].table == NULL ||
> -   env_tlb(env)->d[mmu_idx].iotlb == NULL) {
> +while (fast->table == NULL || desc->iotlb == NULL) {
>  if (new_size == (1 << CPU_TLB_DYN_MIN_BITS)) {
>  error_report("%s: %s", __func__, strerror(errno));
>  abort();
>  }
>  new_size = MAX(new_size >> 1, 1 << CPU_TLB_DYN_MIN_BITS);
> -env_tlb(env)->f[mmu_idx].mask = (new_size - 1) << CPU_TLB_ENTRY_BITS;
> +fast->mask = (new_size - 1) << CPU_TLB_ENTRY_BITS;
>
> -g_free(env_tlb(env)->f[mmu_idx].table);
> -g_free(env_tlb(env)->d[mmu_idx].iotlb);
> -env_tlb(env)->f[mmu_idx].table = g_try_new(CPUTLBEntry, new_size);
> -env_tlb(env)->d[mmu_idx].iotlb = g_try_new(CPUIOTLBEntry, new_size);
> +g_free(fast->table);
> +g_free(desc->iotlb);
> +fast->table = g_try_new(CPUTLBEntry, new_size);
> +desc->iotlb = g_try_new(CPUIOTLBEntry, new_size);
>  }
>  }
>
>  static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
>  {
> -tlb_mmu_resize_locked(env, mmu_idx);
> +tlb_mmu_resize_locked(_tlb(env)->d[mmu_idx], 
> _tlb(env)->f[mmu_idx]);
>  env_tlb(env)->d[mmu_idx].n_used_entries = 0;
>  env_tlb(env)->d[mmu_idx].large_page_addr = -1;
>  env_tlb(env)->d[mmu_idx].large_page_mask = -1;
> --
> 2.20.1
>
>



Re: [PATCH 026/104] virtiofsd: Fast path for virtio read

2020-01-20 Thread Dr. David Alan Gilbert
* Masayoshi Mizuma (msys.miz...@gmail.com) wrote:
> On Thu, Dec 12, 2019 at 04:37:46PM +, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > Readv the data straight into the guests buffer.
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > With fix by:
> > Signed-off-by: Eryu Guan 
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c |   5 +
> >  tools/virtiofsd/fuse_virtio.c   | 159 
> >  tools/virtiofsd/fuse_virtio.h   |   4 +
> >  3 files changed, 168 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c 
> > b/tools/virtiofsd/fuse_lowlevel.c
> > index c2b114cf5b..5f80625652 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -475,6 +475,11 @@ static int fuse_send_data_iov_fallback(struct 
> > fuse_session *se,
> >  return fuse_send_msg(se, ch, iov, iov_count);
> >  }
> >  
> > +if (fuse_lowlevel_is_virtio(se) && buf->count == 1 &&
> > +buf->buf[0].flags == (FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK)) {
> > +return virtio_send_data_iov(se, ch, iov, iov_count, buf, len);
> > +}
> > +
> >  abort(); /* Will have taken vhost path */
> >  return 0;
> >  }
> > diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
> > index c33e0f7e8c..146cd3f702 100644
> > --- a/tools/virtiofsd/fuse_virtio.c
> > +++ b/tools/virtiofsd/fuse_virtio.c
> > @@ -230,6 +230,165 @@ err:
> >  return ret;
> >  }
> >  
> > +/*
> > + * Callback from fuse_send_data_iov_* when it's virtio and the buffer
> > + * is a single FD with FUSE_BUF_IS_FD | FUSE_BUF_FD_SEEK
> > + * We need send the iov and then the buffer.
> > + * Return 0 on success
> > + */
> > +int virtio_send_data_iov(struct fuse_session *se, struct fuse_chan *ch,
> > + struct iovec *iov, int count, struct fuse_bufvec 
> > *buf,
> > + size_t len)
> > +{
> > +int ret = 0;
> > +VuVirtqElement *elem;
> > +VuVirtq *q;
> > +
> > +assert(count >= 1);
> > +assert(iov[0].iov_len >= sizeof(struct fuse_out_header));
> > +
> > +struct fuse_out_header *out = iov[0].iov_base;
> > +/* TODO: Endianness! */
> > +
> > +size_t iov_len = iov_size(iov, count);
> > +size_t tosend_len = iov_len + len;
> > +
> > +out->len = tosend_len;
> > +
> > +fuse_log(FUSE_LOG_DEBUG, "%s: count=%d len=%zd iov_len=%zd\n", 
> > __func__,
> > + count, len, iov_len);
> > +
> > +/* unique == 0 is notification which we don't support */
> > +assert(out->unique);
> > +
> > +/* For virtio we always have ch */
> > +assert(ch);
> > +assert(!ch->qi->reply_sent);
> > +elem = ch->qi->qe;
> > +q = >qi->virtio_dev->dev.vq[ch->qi->qidx];
> > +
> > +/* The 'in' part of the elem is to qemu */
> > +unsigned int in_num = elem->in_num;
> > +struct iovec *in_sg = elem->in_sg;
> > +size_t in_len = iov_size(in_sg, in_num);
> > +fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length 
> > %zd\n",
> > + __func__, elem->index, in_num, in_len);
> > +
> > +/*
> > + * The elem should have room for a 'fuse_out_header' (out from fuse)
> > + * plus the data based on the len in the header.
> > + */
> > +if (in_len < sizeof(struct fuse_out_header)) {
> > +fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n",
> > + __func__, elem->index);
> 
> > +ret = -E2BIG;
> 
> The ret should be positive value, right?
> 
>ret = E2BIG;

Yes, I think so.

> > +goto err;
> > +}
> > +if (in_len < tosend_len) {
> > +fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len %zd\n",
> > + __func__, elem->index, tosend_len);
> 
> > +ret = -E2BIG;
> 
>ret = E2BIG;
> 
> > +goto err;
> > +}
> > +
> > +/* TODO: Limit to 'len' */
> > +
> > +/* First copy the header data from iov->in_sg */
> > +copy_iov(iov, count, in_sg, in_num, iov_len);
> > +
> > +/*
> > + * Build a copy of the the in_sg iov so we can skip bits in it,
> > + * including changing the offsets
> > + */
> 
> > +struct iovec *in_sg_cpy = calloc(sizeof(struct iovec), in_num);
> 
>assert(in_sg_cpy) should be here? in case calloc() fails...

Thanks, added.

> > +memcpy(in_sg_cpy, in_sg, sizeof(struct iovec) * in_num);
> > +/* These get updated as we skip */
> > +struct iovec *in_sg_ptr = in_sg_cpy;
> > +int in_sg_cpy_count = in_num;
> > +
> > +/* skip over parts of in_sg that contained the header iov */
> > +size_t skip_size = iov_len;
> > +
> > +size_t in_sg_left = 0;
> > +do {
> > +while (skip_size != 0 && in_sg_cpy_count) {
> > +if (skip_size >= in_sg_ptr[0].iov_len) {
> > +skip_size -= in_sg_ptr[0].iov_len;
> > +in_sg_ptr++;
> > +in_sg_cpy_count--;
> > +} else {
> > + 

Re: [PATCH 040/104] virtiofsd: Pass write iov's all the way through

2020-01-20 Thread Philippe Mathieu-Daudé

On 1/19/20 9:08 AM, Xiao Yang wrote:

On 2019/12/13 0:38, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert"

Pass the write iov pointing to guest RAM all the way through rather
than copying the data.

Signed-off-by: Dr. David Alan Gilbert
---
  tools/virtiofsd/fuse_virtio.c | 79 ---
  1 file changed, 73 insertions(+), 6 deletions(-)


[...]

Hi,

Tested the patch and got the correct data written by guest, so it looks 
fine to me.

Reviewed-by: Xiao Yang 


So also:
Tested-by: Xiao Yang 



Best Regards,
Xiao Yang





Re: [PATCH v7 03/11] hw/core: create Resettable QOM interface

2020-01-20 Thread Damien Hedde



On 1/18/20 7:42 AM, Philippe Mathieu-Daudé wrote:
> On 1/15/20 1:36 PM, Damien Hedde wrote:
>> This commit defines an interface allowing multi-phase reset. This aims
>> to solve a problem of the actual single-phase reset (built in
>> DeviceClass and BusClass): reset behavior is dependent on the order
>> in which reset handlers are called. In particular doing external
>> side-effect (like setting an qemu_irq) is problematic because receiving
>> object may not be reset yet.
>>
>> The Resettable interface divides the reset in 3 well defined phases.
>> To reset an object tree, all 1st phases are executed then all 2nd then
>> all 3rd. See the comments in include/hw/resettable.h for a more complete
>> description. The interface defines 3 phases to let the future
>> possibility of holding an object into reset for some time.
>>
>> The qdev/qbus reset in DeviceClass and BusClass will be modified in
>> following commits to use this interface. A mechanism is provided
>> to allow executing a transitional reset handler in place of the 2nd
>> phase which is executed in children-then-parent order inside a tree.
>> This will allow to transition devices and buses smoothly while
>> keeping the exact current qdev/qbus reset behavior for now.
>>
>> Documentation will be added in a following commit.
>>
>> Signed-off-by: Damien Hedde 
>> Reviewed-by: Richard Henderson 
>> ---
>>
>> v7 update: un-nest struct ResettablePhases
>> ---
>>   Makefile.objs   |   1 +
>>   include/hw/resettable.h | 211 +++
>>   hw/core/resettable.c    | 238 
>>   hw/core/Makefile.objs   |   1 +
>>   hw/core/trace-events    |  17 +++
>>   5 files changed, 468 insertions(+)
>>   create mode 100644 include/hw/resettable.h
>>   create mode 100644 hw/core/resettable.c
>>
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 7c1e50f9d6..9752d549b4 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -191,6 +191,7 @@ trace-events-subdirs += migration
>>   trace-events-subdirs += net
>>   trace-events-subdirs += ui
>>   endif
>> +trace-events-subdirs += hw/core
>>   trace-events-subdirs += hw/display
>>   trace-events-subdirs += qapi
>>   trace-events-subdirs += qom
>> diff --git a/include/hw/resettable.h b/include/hw/resettable.h
>> new file mode 100644
>> index 00..58b3df4c22
>> --- /dev/null
>> +++ b/include/hw/resettable.h
>> @@ -0,0 +1,211 @@
>> +/*
>> + * Resettable interface header.
>> + *
>> + * Copyright (c) 2019 GreenSocs SAS
>> + *
>> + * Authors:
>> + *   Damien Hedde
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef HW_RESETTABLE_H
>> +#define HW_RESETTABLE_H
>> +
>> +#include "qom/object.h"
>> +
>> +#define TYPE_RESETTABLE_INTERFACE "resettable"
>> +
>> +#define RESETTABLE_CLASS(class) \
>> +    OBJECT_CLASS_CHECK(ResettableClass, (class),
>> TYPE_RESETTABLE_INTERFACE)
>> +
>> +#define RESETTABLE_GET_CLASS(obj) \
>> +    OBJECT_GET_CLASS(ResettableClass, (obj), TYPE_RESETTABLE_INTERFACE)
>> +
>> +typedef struct ResettableState ResettableState;
>> +
>> +/**
>> + * ResetType:
>> + * Types of reset.
>> + *
>> + * + Cold: reset resulting from a power cycle of the object.
>> + *
>> + * TODO: Support has to be added to handle more types. In particular,
>> + * ResettableState structure needs to be expanded.
>> + */
>> +typedef enum ResetType {
>> +    RESET_TYPE_COLD,
>> +} ResetType;
>> +
>> +/*
>> + * ResettableClass:
>> + * Interface for resettable objects.
>> + *
>> + * See docs/devel/reset.rst for more detailed information about how
>> QEMU models
>> + * reset. This whole API must only be used when holding the iothread
>> mutex.
>> + *
>> + * All objects which can be reset must implement this interface;
>> + * it is usually provided by a base class such as DeviceClass or
>> BusClass.
>> + * Every Resettable object must maintain some state tracking the
>> + * progress of a reset operation by providing a ResettableState
>> structure.
>> + * The functions defined in this module take care of updating the
>> + * state of the reset.
>> + * The base class implementation of the interface provides this
>> + * state and implements the associated method: get_state.
>> + *
>> + * Concrete object implementations (typically specific devices
>> + * such as a UART model) should provide the functions
>> + * for the phases.enter, phases.hold and phases.exit methods, which
>> + * they can set in their class init function, either directly or
>> + * by calling resettable_class_set_parent_phases().
>> + * The phase methods are guaranteed to only only ever be called once
>> + * for any reset event, in the order 'enter', 'hold', 'exit'.
>> + * An object will always move quickly from 'enter' to 'hold'
>> + * but might remain in 'hold' for an arbitrary period of time
>> + * before eventually reset is deasserted and the 'exit' phase is called.
>> + * Object 

  1   2   3   4   >