date:20200519

Re: [PATCH 2/2] linux-user: Adjust guest page protection for the host

2020-05-19 Thread Philippe Mathieu-Daudé


On 5/19/20 8:56 PM, Richard Henderson wrote:

Executable guest pages are never directly executed by
the host, but do need to be readable for translation.

Signed-off-by: Richard Henderson 
---
  linux-user/mmap.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 36fd1e2250..84662c3311 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -76,8 +76,12 @@ static int validate_prot_to_pageflags(int *host_prot, int 
prot)
   * don't bother transforming guest bit to host bit.  Any other
   * target-specific prot bits will not be understood by the host
   * and will need to be encoded into page_flags for qemu emulation.
+ *
+ * Pages that are executable by the guest will never be executed
+ * by the host, but the host will need to be able to read them.
   */
-*host_prot = prot & (PROT_READ | PROT_WRITE | PROT_EXEC);
+*host_prot = (prot & (PROT_READ | PROT_WRITE))
+   | (prot & PROT_EXEC ? PROT_READ : 0);
  
  return prot & ~valid ? 0 : page_flags;

  }



Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 1/7] colo-compare: Fix memory leak in packet_enqueue()

2020-05-19 Thread Philippe Mathieu-Daudé


On 5/19/20 10:02 PM, Zhang Chen wrote:

From: Derek Su 

The patch is to fix the "pkt" memory leak in packet_enqueue().
The allocated "pkt" needs to be freed if the colo compare
primary or secondary queue is too big.

Replace the error_report of full queue with a trace event.

Signed-off-by: Derek Su 
Reviewed-by: Zhang Chen 
Signed-off-by: Zhang Chen 
---
  net/colo-compare.c | 23 +++
  net/trace-events   |  1 +
  2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index c07e7c1c09..56d8976537 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -122,6 +122,10 @@ enum {
  SECONDARY_IN,
  };
  
+static const char *colo_mode[] = {

+[PRIMARY_IN] = "primary",
+[SECONDARY_IN] = "secondary",
+};
  
  static int compare_chr_send(CompareState *s,

  const uint8_t *buf,
@@ -217,6 +221,7 @@ static int packet_enqueue(CompareState *s, int mode, 
Connection **con)
  ConnectionKey key;
  Packet *pkt = NULL;
  Connection *conn;
+int ret;
  
  if (mode == PRIMARY_IN) {

  pkt = packet_new(s->pri_rs.buf,
@@ -245,16 +250,18 @@ static int packet_enqueue(CompareState *s, int mode, 
Connection **con)
  }
  
  if (mode == PRIMARY_IN) {

-if (!colo_insert_packet(>primary_list, pkt, >pack)) {
-error_report("colo compare primary queue size too big,"
- "drop packet");
-}
+ret = colo_insert_packet(>primary_list, pkt, >pack);
  } else {
-if (!colo_insert_packet(>secondary_list, pkt, >sack)) {
-error_report("colo compare secondary queue size too big,"
- "drop packet");
-}
+ret = colo_insert_packet(>secondary_list, pkt, >sack);
  }
+
+if (!ret) {
+trace_colo_compare_drop_packet(colo_mode[mode],
+"queue size too big, drop packet");
+packet_destroy(pkt, NULL);
+pkt = NULL;
+}
+
  *con = conn;
  
  return 0;

diff --git a/net/trace-events b/net/trace-events
index 02c13fd0ba..fa49c71533 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -12,6 +12,7 @@ colo_proxy_main(const char *chr) ": %s"
  
  # colo-compare.c

  colo_compare_main(const char *chr) ": %s"
+colo_compare_drop_packet(const char *queue, const char *chr) ": %s: %s"
  colo_compare_udp_miscompare(const char *sta, int size) ": %s = %d"
  colo_compare_icmp_miscompare(const char *sta, int size) ": %s = %d"
  colo_compare_ip_info(int psize, const char *sta, const char *stb, int ssize, const char 
*stc, const char *std) "ppkt size = %d, ip_src = %s, ip_dst = %s, spkt size = %d, 
ip_src = %s, ip_dst = %s"



Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v5 4/7] dwc-hsotg (dwc2) USB host controller emulation

2020-05-19 Thread Paul Zimmerman

On Mon, May 18, 2020 at 8:34 AM Peter Maydell 
wrote:

> On Tue, 12 May 2020 at 07:50, Paul Zimmerman  wrote:
> >
>
> > +static void dwc2_reset(DeviceState *dev)
> > +{
> > +DWC2State *s = DWC2_USB(dev);
> > +int i;
> > +
> > +trace_usb_dwc2_reset();
> > +timer_del(s->frame_timer);
> > +qemu_bh_cancel(s->async_bh);
> > +
> > +if (s->uport.dev && s->uport.dev->attached) {
> > +usb_detach(>uport);
> > +}
> > +
> > +dwc2_bus_stop(s);
>
>
> > +dwc2_update_irq(s);
>
> A device that uses single-phase reset shouldn't try to change
> outbound IRQ lines from its reset function (because the device
> on the other end might have already reset before this device,
> or might reset after this device, and it doesn't necessarily
> handle the irq line change correctly). If you need to
> update IRQ lines in reset, you can use three-phase-reset
> (see docs/devel/reset.rst).
>

Hi Peter,

Is there a tree somewhere that has a working example of a
three-phase reset? I did a 'git grep' on the master branch and didn't
find any code that is actually using it. I tried to implement it from
the example in reset.rst, but I'm getting a segfault on the first line in
resettable_class_set_parent_phases() that I'm having trouble figuring
out.

Thanks,
Paul

thanks
> -- PMM
>

Re: [RFC PATCH 2/8] riscv: Generate payload scripts

2020-05-19 Thread Richard Henderson

On 5/19/20 7:37 PM, LIU Zhiwei wrote:
> On 2020/5/12 1:40, Richard Henderson wrote:
>> On 4/30/20 12:21 AM, LIU Zhiwei wrote:
>>> +    # sequence of li rd, 0x1234567887654321
>>> +    #
>>> +    #  0:   002471b7    lui rd,0x247
>>> +    #  4:   8ad1819b    addiw   rd,rd,-1875
>>> +    #  8:   00c19193    slli    rd,rd,0xc
>>> +    #  c:   f1118193    addi    rd,rd,-239 # 0x246f11
>>> +    # 10:   00d19193    slli    rd,rd,0xd
>>> +    # 14:   d9518193    addi    rd,rd,-619
>>> +    # 18:   00e19193    slli    rd,rd,0xe
>>> +    # 1c:   32118193    addi    rd,rd,801
>> You don't really need to use addiw.  Removing that special case would really
>> simplify this.
> I think I don't get it. Do you mean that the immediate will not be 64 bit?

Well, mostly the immediate will be small, actually.  But the interface must
support 64-bit immediates.

I'm saying that for this computation,

lui
addi
slli
addi
...

is the same.  You don't *have* to use addiw.


r~

Re: [RFC PATCH 1/8] riscv: Add RV64I instructions description

2020-05-19 Thread Richard Henderson

On 5/19/20 7:41 PM, LIU Zhiwei wrote:
>> Since all of sp, gp, tp are not in risu's control, why is rs1 only excluding
>> sp, and not gp and tp as well?
> When I test the patch set, I find gp and tp will be the same in slave and 
> master，
> so they can be used as source register.

Ah, try again with different builds of risu, e.g. one with -O2 and one with
-O0.  I think you will find that these values are set by the linker for the 
image.


r~

Re: [PATCH 3/7] chardev/char.c: Use qemu_co_sleep_ns if in coroutine

2020-05-19 Thread Philippe Mathieu-Daudé


On 5/19/20 10:02 PM, Zhang Chen wrote:

From: Lukas Straub 

This will be needed in the next patch.


Can you reword to something clearer, maybe:

"To be able to convert compare_chr_send to a coroutine in the
next commit, use qemu_co_sleep_ns if in coroutine."

Reviewed-by: Philippe Mathieu-Daudé 



Signed-off-by: Lukas Straub 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Zhang Chen 
Signed-off-by: Zhang Chen 
---
  chardev/char.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/chardev/char.c b/chardev/char.c
index 0196e2887b..4c58ea1836 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -38,6 +38,7 @@
  #include "qemu/module.h"
  #include "qemu/option.h"
  #include "qemu/id.h"
+#include "qemu/coroutine.h"
  
  #include "chardev/char-mux.h"
  
@@ -119,7 +120,11 @@ static int qemu_chr_write_buffer(Chardev *s,

  retry:
  res = cc->chr_write(s, buf + *offset, len - *offset);
  if (res < 0 && errno == EAGAIN && write_all) {
-g_usleep(100);
+if (qemu_in_coroutine()) {
+qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 10);
+} else {
+g_usleep(100);
+}
  goto retry;
  }

Re: [PATCH 03/55] qdev: New qdev_new(), qdev_realize(), etc.

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 9:26 PM Markus Armbruster  wrote:
>
> Alistair Francis  writes:
>
> > On Tue, May 19, 2020 at 8:11 AM Markus Armbruster  wrote:
> >>
> >> We commonly plug devices into their bus right when we create them,
> >> like this:
> >>
> >> dev = qdev_create(bus, type_name);
> >>
> >> Note that @dev is a weak reference.  The reference from @bus to @dev
> >> is the only strong one.
> >>
> >> We realize at some later time, either with
> >>
> >> object_property_set_bool(OBJECT(dev), true, "realized", errp);
> >>
> >> or its convenience wrapper
> >>
> >> qdev_init_nofail(dev);
> >>
> >> If @dev still has no QOM parent then, realizing makes the
> >> /machine/unattached/ orphanage its QOM parent.
> >>
> >> Note that the device returned by qdev_create() is plugged into a bus,
> >> but doesn't have a QOM parent, yet.  Until it acquires one,
> >> unrealizing the bus will hang in bus_unparent():
> >>
> >> while ((kid = QTAILQ_FIRST(>children)) != NULL) {
> >> DeviceState *dev = kid->child;
> >> object_unparent(OBJECT(dev));
> >> }
> >>
> >> object_unparent() does nothing when its argument has no QOM parent,
> >> and the loop spins forever.
> >>
> >> Device state "no QOM parent, but plugged into bus" is dangerous.
> >>
> >> Paolo suggested to delay plugging into the bus until realize.  We need
> >> to plug into the parent bus before we call the device's realize
> >> method, in case it uses the parent bus.  So the dangerous state still
> >> exists, but only within realization, where we can manage it safely.
> >>
> >> This commit creates infrastructure to do this:
> >>
> >> dev = qdev_new(type_name);
> >> ...
> >> qdev_realize_and_unref(dev, bus, errp)
> >>
> >> Note that @dev becomes a strong reference here.
> >> qdev_realize_and_unref() drops it.  There is also plain
> >> qdev_realize(), which doesn't drop it.
> >>
> >> The remainder of this series will convert all users to this new
> >> interface.
> >>
> >> Cc: Michael S. Tsirkin 
> >> Cc: Marcel Apfelbaum 
> >> Cc: Alistair Francis 
> >> Cc: Gerd Hoffmann 
> >> Cc: Mark Cave-Ayland 
> >> Cc: David Gibson 
> >> Signed-off-by: Markus Armbruster 
> >> ---
> >>  include/hw/qdev-core.h | 11 -
> >>  hw/core/bus.c  | 14 +++
> >>  hw/core/qdev.c | 94 ++
> >>  3 files changed, 118 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> >> index b870b27966..fba29308f7 100644
> >> --- a/include/hw/qdev-core.h
> >> +++ b/include/hw/qdev-core.h
> >> @@ -57,7 +57,7 @@ typedef void (*BusUnrealize)(BusState *bus);
> >>   * After successful realization, setting static properties will fail.
> >>   *
> >>   * As an interim step, the #DeviceState:realized property can also be
> >> - * set with qdev_init_nofail().
> >> + * set with qdev_realize() or qdev_init_nofail().
> >>   * In the future, devices will propagate this state change to their 
> >> children
> >>   * and along busses they expose.
> >>   * The point in time will be deferred to machine creation, so that values
> >> @@ -322,7 +322,13 @@ compat_props_add(GPtrArray *arr,
> >>
> >>  DeviceState *qdev_create(BusState *bus, const char *name);
> >>  DeviceState *qdev_try_create(BusState *bus, const char *name);
> >> +DeviceState *qdev_new(const char *name);
> >> +DeviceState *qdev_try_new(const char *name);
> >>  void qdev_init_nofail(DeviceState *dev);
> >> +bool qdev_realize(DeviceState *dev, BusState *bus, Error **errp);
> >> +bool qdev_realize_and_unref(DeviceState *dev, BusState *bus, Error 
> >> **errp);
> >> +void qdev_unrealize(DeviceState *dev);
> >> +
> >>  void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
> >>   int required_for_version);
> >>  HotplugHandler *qdev_get_bus_hotplug_handler(DeviceState *dev);
> >> @@ -411,6 +417,9 @@ typedef int (qdev_walkerfn)(DeviceState *dev, void 
> >> *opaque);
> >>  void qbus_create_inplace(void *bus, size_t size, const char *typename,
> >>   DeviceState *parent, const char *name);
> >>  BusState *qbus_create(const char *typename, DeviceState *parent, const 
> >> char *name);
> >> +bool qbus_realize(BusState *bus, Error **errp);
> >> +void qbus_unrealize(BusState *bus);
> >> +
> >>  /* Returns > 0 if either devfn or busfn skip walk somewhere in cursion,
> >>   * < 0 if either devfn or busfn terminate walk somewhere in 
> >> cursion,
> >>   *   0 otherwise. */
> >> diff --git a/hw/core/bus.c b/hw/core/bus.c
> >> index 08c5eab24a..bf622604a3 100644
> >> --- a/hw/core/bus.c
> >> +++ b/hw/core/bus.c
> >> @@ -169,6 +169,20 @@ BusState *qbus_create(const char *typename, 
> >> DeviceState *parent, const char *nam
> >>  return bus;
> >>  }
> >>
> >> +bool qbus_realize(BusState *bus, Error **errp)
> >> +{
> >> +Error *err = NULL;
> >> +
> >> +object_property_set_bool(OBJECT(bus), true, "realized", );
> >> +

Re: [PATCH v8 74/74] cputlb: queue async flush jobs without the BQL

2020-05-19 Thread Emilio G. Cota

On Mon, May 18, 2020 at 09:46:36 -0400, Robert Foley wrote:
> We re-ran the numbers with the latest re-based series.
> 
> We used an aarch64 ubuntu VM image with a host CPU:
> Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 2 CPUs, 10 cores/CPU,
> 20 Threads/CPU.  40 cores total.
> 
> For the bare hardware and kvm tests (first chart) the host CPU was:
> HiSilicon 1620 CPU 2600 Mhz,  2 CPUs, 64 Cores per CPU, 128 CPUs total.
> 
> First, we ran a test of building the kernel in the VM.
> We did not see any major improvements nor major regressions.
> We show the results of the Speedup of building the kernel
> on bare hardware compared with kvm and QEMU (both the baseline and cpu locks).
> 
> 
>Speedup vs a single thread for kernel build
> 
>   40 +--+
>  | + + +  + + +  ** |
>  |bare hardwar* |
>  |  kvm ### |
>   35 |-+   baseline $$$-|
>  |*cpu lock %%% |
>  | ***  |
>  |   ** |
>   30 |-+  *** +-|
>  | ***  |
>  |  *** |
>  |**|
>   25 |-+   ***+-|
>  |  *** |
>  |**|
>  |  **  |
>   20 |-+  **  +-|
>  |  **# |
>  |**    |
>  |  **  ##  |
>  |** ###|
>   15 |-+ *    +-|
>  | ** ###   |
>  |*###  |
>  |   *  ### |
>   10 |-+   **###  +-|
>  |*##   |
>  |   ##     |
>  | #$   |
>5 |-+  $%%%%%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%+-|
>  |   %%   % |
>  | %%   |
>  |%+ + +  + + + |
>0 +--+
>  0 102030 40506070
>Guest vCPUs
> 
> 
> After seeing these results and the scaling limits inherent in the build 
> itself,
> we decided to run a test which might show the scaling improvements clearer.

Thanks for doing these tests. I know from experience that benchmarking
is hard and incredibly time consuming, so please do not be discouraged by
my comments below.

A couple of points:

1. I am not familiar with aarch64 KVM but I'd expect it to scale almost
like the native run. Are you assigning enough RAM to the guest? Also,
it can help to run the kernel build in a ramfs in the guest.

2. The build itself does not seem to impose a scaling limit, since
it scales very well when run natively (per-thread I presume aarch64 TCG is
still slower than native, even if TCG is run on a faster x86 machine).
The limit here is probably aarch64 TCG. In particular, last time I
checked aarch64 TCG has room for improvement scalability-wise handling
interrupts and some TLB operations; this is likely to explain why we
see no benefit with per-CPU locks, i.e. the bottleneck is elsewhere.
This can be confirmed with the sync profiler.

IIRC I originally used ppc64 for this test because ppc64 TCG does not
have any other big bottlenecks scalability-wise. I just checked but
unfortunately I can't find the ppc64 image I used :( What I can offer
is the script I used to run these benchmarks; see the appended.

Thanks,

Re: [PATCH 0/7] Latest COLO tree queued patches

2020-05-19 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200519200207.17773-1-chen.zh...@intel.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 1 fdc-test /x86_64/fdc/cmos
PASS 2 fdc-test /x86_64/fdc/no_media_on_start
PASS 3 fdc-test /x86_64/fdc/read_without_media
==6214==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 fdc-test /x86_64/fdc/media_change
PASS 5 fdc-test /x86_64/fdc/sense_interrupt
PASS 6 fdc-test /x86_64/fdc/relative_seek
---
PASS 32 test-opts-visitor /visitor/opts/range/beyond
PASS 33 test-opts-visitor /visitor/opts/dict/unvisited
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-coroutine" 
==6253==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
==6253==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 
0x7ffcb42bb000; bottom 0x7f9c45e2; size: 0x00606e49b000 (414167183360)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-coroutine /basic/no-dangling-access
---
PASS 13 test-aio /aio/event/wait/no-flush-cb
PASS 11 fdc-test /x86_64/fdc/read_no_dma_18
PASS 14 test-aio /aio/timer/schedule
==6268==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 15 test-aio /aio/coroutine/queue-chaining
PASS 16 test-aio /aio-gsource/flush
PASS 17 test-aio /aio-gsource/bh/schedule
---
PASS 27 test-aio /aio-gsource/event/wait/no-flush-cb
PASS 28 test-aio /aio-gsource/timer/schedule
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-aio-multithread -m=quick -k --tap < /dev/null | 
./scripts/tap-driver.pl --test-name="test-aio-multithread" 
==6273==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-aio-multithread /aio/multi/lifecycle
PASS 2 test-aio-multithread /aio/multi/schedule
PASS 12 fdc-test /x86_64/fdc/read_no_dma_19
PASS 13 fdc-test /x86_64/fdc/fuzz-registers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img 
tests/qtest/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="ide-test" 
PASS 3 test-aio-multithread /aio/multi/mutex/contended
==6295==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 ide-test /x86_64/ide/identify
==6306==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 2 ide-test /x86_64/ide/flush
==6312==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 3 ide-test /x86_64/ide/bmdma/simple_rw
==6318==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
PASS 4 ide-test /x86_64/ide/bmdma/trim
==6329==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
PASS 6 test-aio-multithread /aio/multi/mutex/pthread
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-throttle" 
==6341==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-throttle /throttle/leak_bucket
PASS 2 test-throttle /throttle/compute_wait
PASS 3 test-throttle /throttle/init
---
PASS 14 test-throttle /throttle/config/max
PASS 15 test-throttle /throttle/config/iops_size
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-thread-pool" 
==6345==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 test-thread-pool /thread-pool/submit-aio
PASS 3 test-thread-pool /thread-pool/submit-co
---
PASS 5 test-thread-pool /thread-pool/cancel
PASS 6 test-thread-pool /thread-pool/cancel-async
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-hbitmap -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl

Re: [PATCH 03/55] qdev: New qdev_new(), qdev_realize(), etc.

2020-05-19 Thread Markus Armbruster

Alistair Francis  writes:

> On Tue, May 19, 2020 at 8:11 AM Markus Armbruster  wrote:
>>
>> We commonly plug devices into their bus right when we create them,
>> like this:
>>
>> dev = qdev_create(bus, type_name);
>>
>> Note that @dev is a weak reference.  The reference from @bus to @dev
>> is the only strong one.
>>
>> We realize at some later time, either with
>>
>> object_property_set_bool(OBJECT(dev), true, "realized", errp);
>>
>> or its convenience wrapper
>>
>> qdev_init_nofail(dev);
>>
>> If @dev still has no QOM parent then, realizing makes the
>> /machine/unattached/ orphanage its QOM parent.
>>
>> Note that the device returned by qdev_create() is plugged into a bus,
>> but doesn't have a QOM parent, yet.  Until it acquires one,
>> unrealizing the bus will hang in bus_unparent():
>>
>> while ((kid = QTAILQ_FIRST(>children)) != NULL) {
>> DeviceState *dev = kid->child;
>> object_unparent(OBJECT(dev));
>> }
>>
>> object_unparent() does nothing when its argument has no QOM parent,
>> and the loop spins forever.
>>
>> Device state "no QOM parent, but plugged into bus" is dangerous.
>>
>> Paolo suggested to delay plugging into the bus until realize.  We need
>> to plug into the parent bus before we call the device's realize
>> method, in case it uses the parent bus.  So the dangerous state still
>> exists, but only within realization, where we can manage it safely.
>>
>> This commit creates infrastructure to do this:
>>
>> dev = qdev_new(type_name);
>> ...
>> qdev_realize_and_unref(dev, bus, errp)
>>
>> Note that @dev becomes a strong reference here.
>> qdev_realize_and_unref() drops it.  There is also plain
>> qdev_realize(), which doesn't drop it.
>>
>> The remainder of this series will convert all users to this new
>> interface.
>>
>> Cc: Michael S. Tsirkin 
>> Cc: Marcel Apfelbaum 
>> Cc: Alistair Francis 
>> Cc: Gerd Hoffmann 
>> Cc: Mark Cave-Ayland 
>> Cc: David Gibson 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  include/hw/qdev-core.h | 11 -
>>  hw/core/bus.c  | 14 +++
>>  hw/core/qdev.c | 94 ++
>>  3 files changed, 118 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
>> index b870b27966..fba29308f7 100644
>> --- a/include/hw/qdev-core.h
>> +++ b/include/hw/qdev-core.h
>> @@ -57,7 +57,7 @@ typedef void (*BusUnrealize)(BusState *bus);
>>   * After successful realization, setting static properties will fail.
>>   *
>>   * As an interim step, the #DeviceState:realized property can also be
>> - * set with qdev_init_nofail().
>> + * set with qdev_realize() or qdev_init_nofail().
>>   * In the future, devices will propagate this state change to their children
>>   * and along busses they expose.
>>   * The point in time will be deferred to machine creation, so that values
>> @@ -322,7 +322,13 @@ compat_props_add(GPtrArray *arr,
>>
>>  DeviceState *qdev_create(BusState *bus, const char *name);
>>  DeviceState *qdev_try_create(BusState *bus, const char *name);
>> +DeviceState *qdev_new(const char *name);
>> +DeviceState *qdev_try_new(const char *name);
>>  void qdev_init_nofail(DeviceState *dev);
>> +bool qdev_realize(DeviceState *dev, BusState *bus, Error **errp);
>> +bool qdev_realize_and_unref(DeviceState *dev, BusState *bus, Error **errp);
>> +void qdev_unrealize(DeviceState *dev);
>> +
>>  void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
>>   int required_for_version);
>>  HotplugHandler *qdev_get_bus_hotplug_handler(DeviceState *dev);
>> @@ -411,6 +417,9 @@ typedef int (qdev_walkerfn)(DeviceState *dev, void 
>> *opaque);
>>  void qbus_create_inplace(void *bus, size_t size, const char *typename,
>>   DeviceState *parent, const char *name);
>>  BusState *qbus_create(const char *typename, DeviceState *parent, const char 
>> *name);
>> +bool qbus_realize(BusState *bus, Error **errp);
>> +void qbus_unrealize(BusState *bus);
>> +
>>  /* Returns > 0 if either devfn or busfn skip walk somewhere in cursion,
>>   * < 0 if either devfn or busfn terminate walk somewhere in cursion,
>>   *   0 otherwise. */
>> diff --git a/hw/core/bus.c b/hw/core/bus.c
>> index 08c5eab24a..bf622604a3 100644
>> --- a/hw/core/bus.c
>> +++ b/hw/core/bus.c
>> @@ -169,6 +169,20 @@ BusState *qbus_create(const char *typename, DeviceState 
>> *parent, const char *nam
>>  return bus;
>>  }
>>
>> +bool qbus_realize(BusState *bus, Error **errp)
>> +{
>> +Error *err = NULL;
>> +
>> +object_property_set_bool(OBJECT(bus), true, "realized", );
>> +error_propagate(errp, err);
>> +return !err;
>> +}
>> +
>> +void qbus_unrealize(BusState *bus)
>> +{
>> +object_property_set_bool(OBJECT(bus), true, "realized", _abort);
>
> Not false?
>
> Alistair

Reasons it's _abort:

1. PATCH 06 and 07 transform variations of

  object_property_set_bool(..., false, "realized",

Re: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.

2020-05-19 Thread Li Feng

Hi, Any update about this issue?

Thanks,
Feng Li

Li Feng  于2020年5月14日周四 下午11:49写道：
>
> Dr. David Alan Gilbert  于2020年5月14日周四 下午11:31写道：
> >
> > * Li Feng (fen...@smartx.com) wrote:
> > > Dr. David Alan Gilbert  于2020年5月14日周四 下午11:16写道：
> > > >
> > > > * Li Feng (fen...@smartx.com) wrote:
> > > > > EXSi CPU is : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
> > > > > This is my vm, I run qemu in it.
> > > >
> > > > Do you know what the real hardware is?
> > > What information do you need? I could send it out.
> > > The EXSi version: VMware ESXi, 6.5.0, 5969303
> >
> > VMWare is saying to the guest it's an E5-2640 v3; is that what
> > your real CPU is?
>
> Yes, I confirm that the real CPU is indeed this version and VMWare is right.
>
> >
> > Dave
> >
> > > >
> > > > Dave
> > > >
> > > > > (base) 20-05-14 15:32:50 root@31_216:~  lscpu
> > > > > Architecture:  x86_64
> > > > > CPU op-mode(s):32-bit, 64-bit
> > > > > Byte Order:Little Endian
> > > > > CPU(s):16
> > > > > On-line CPU(s) list:   0-15
> > > > > Thread(s) per core:1
> > > > > Core(s) per socket:1
> > > > > Socket(s): 16
> > > > > NUMA node(s):  1
> > > > > Vendor ID: GenuineIntel
> > > > > CPU family:6
> > > > > Model: 63
> > > > > Model name:Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
> > > > > Stepping:  2
> > > > > CPU MHz:   2599.998
> > > > > BogoMIPS:  5199.99
> > > > > Virtualization:VT-x
> > > > > Hypervisor vendor: VMware
> > > > > Virtualization type:   full
> > > > > L1d cache: 32K
> > > > > L1i cache: 32K
> > > > > L2 cache:  256K
> > > > > L3 cache:  20480K
> > > > > NUMA node0 CPU(s): 0-15
> > > > > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
> > > > > mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall
> > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology
> > > > > tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid
> > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
> > > > > f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single
> > > > > tpr_shadow vnmi ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2
> > > > > invpcid xsaveopt arat
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Feng Li
> > > > >
> > > > > Dr. David Alan Gilbert  于2020年5月14日周四 下午8:52写道：
> > > > > >
> > > > > > * Philippe Mathieu-Daudé (phi...@redhat.com) wrote:
> > > > > > > Cc'ing David/Paolo in case they have a clue...
> > > > > > >
> > > > > > > On 5/14/20 1:27 PM, Li Feng wrote:
> > > > > > > > Dear all,
> > > > > > > >
> > > > > > > > I have encountered a weird crash.
> > > > > > > > I remember before a few days it works well and I rebase my code 
> > > > > > > > from upstream.
> > > > > > > >
> > > > > > > > This is the command:
> > > > > > > > /root/qemu-master/x86_64-softmmu/qemu-system-x86_64 -enable-kvm
> > > > > > > > -device virtio-balloon -cpu host -smp 4 -m 2G -drive
> > > > > > > > file=/root/html/fedora-10g.img,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk1
> > > > > > > > -device 
> > > > > > > > virtio-blk-pci,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1
> > > > > > > > -device virtio-net,netdev=nw1,mac=00:11:22:EE:EE:10 -netdev
> > > > > > > > tap,id=nw1,script=no,downscript=no,ifname=tap0 -serial mon:stdio
> > > > > > > > -nographic -object
> > > > > > > > memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on
> > > > > > > > -numa node,memdev=mem0 -vnc 0.0.0.0:100 -machine usb=on,nvdimm 
> > > > > > > > -device
> > > > > > > > usb-tablet -monitor unix:///tmp/a.socket,server,nowait -device
> > > > > > > > virtio-serial-pci,id=virtio-serial0,max_ports=16 -chardev
> > > > > > > > socket,id=channel1,path=/tmp/helloworld1,server,nowait -device
> > > > > > > > virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm1,bus=virtio-serial0.0,id=port1
> > > > > > > > -qmp tcp:0.0.0.0:2234,server,nowait
> > > > > > > > qemu-system-x86_64: error: failed to set MSR 0x48f to 
> > > > > > > > 0x7fefff00036dfb
> > > > > > > > qemu-system-x86_64: /root/qemu-master/target/i386/kvm.c:2695:
> > > > > > > > kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' 
> > > > > > > > failed.
> > > > > >
> > > > > > 48f is MSR_IA32_VMX_TRUE_EXIT_CTLS
> > > > > > I've not got a note of seeing that one before.
> > > > > >
> > > > > > > > This is the commit record:
> > > > > > > > *   c88f1ffc19 - (origin/master, origin/HEAD) Merge 
> > > > > > > > remote-tracking
> > > > > > > > branch 'remotes/kevin/tags/for-upstream' into staging (3 days 
> > > > > > > > ago)
> > > > > > > > 
> > > > > > > > |\
> > > > > > > > | * 47e0b38a13 - block: Drop unused 
> > > > > > > > .bdrv_has_zero_init_truncate (3
> > > > > > > > days ago) 
> > > > > > > > | * dbc636e791 - vhdx: Rework truncation logic (3

Re: [PATCH V2] Add a new PIIX option to control global PCI hot-plugging

2020-05-19 Thread Ani Sinha

@igor Did you get a chance to look?
On May 15, 2020, 22:57 +0530, Ani Sinha , wrote:
> A new option "acpi-pci-hotplug" is introduced for PIIX which will
> globally disable hot-plugging of both hot plugged and
> cold plugged PCI devices. This will prevent
> hot-plugging and hot un-plugging of devices from within Windows based
> guests using system tray.
>
> The patch has been tested on Windows 2016.
>
> Signed-off-by: Ani Sinha 
> ---
> hw/acpi/piix4.c | 18 --
> hw/i386/acpi-build.c | 46 ++
> 2 files changed, 42 insertions(+), 22 deletions(-)
>
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 964d6f5..91b7e86 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -78,6 +78,7 @@ typedef struct PIIX4PMState {
>
> AcpiPciHpState acpi_pci_hotplug;
> bool use_acpi_pci_hotplug;
> + bool use_acpi_hotplug_bridge;
>
> uint8_t disable_s3;
> uint8_t disable_s4;
> @@ -207,13 +208,13 @@ static const VMStateDescription vmstate_pci_status = {
> static bool vmstate_test_use_acpi_pci_hotplug(void *opaque, int version_id)
> {
> PIIX4PMState *s = opaque;
> - return s->use_acpi_pci_hotplug;
> + return s->use_acpi_hotplug_bridge;
> }
>
> static bool vmstate_test_no_use_acpi_pci_hotplug(void *opaque, int version_id)
> {
> PIIX4PMState *s = opaque;
> - return !s->use_acpi_pci_hotplug;
> + return !s->use_acpi_hotplug_bridge;
> }
>
> static bool vmstate_test_use_memhp(void *opaque)
> @@ -505,7 +506,10 @@ static void piix4_pm_realize(PCIDevice *dev, Error 
> **errp)
>
> piix4_acpi_system_hot_add_init(pci_address_space_io(dev),
> pci_get_bus(dev), s);
> - qbus_set_hotplug_handler(BUS(pci_get_bus(dev)), OBJECT(s), _abort);
> + if (s->use_acpi_pci_hotplug) {
> + qbus_set_hotplug_handler(BUS(pci_get_bus(dev)),
> + OBJECT(s), _abort);
> + }
>
> piix4_pm_add_propeties(s);
> }
> @@ -528,7 +532,7 @@ I2CBus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t 
> smb_io_base,
> s->smi_irq = smi_irq;
> s->smm_enabled = smm_enabled;
> if (xen_enabled()) {
> - s->use_acpi_pci_hotplug = false;
> + s->use_acpi_hotplug_bridge = false;
> }
>
> qdev_init_nofail(dev);
> @@ -593,7 +597,7 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion 
> *parent,
> memory_region_add_subregion(parent, GPE_BASE, >io_gpe);
>
> acpi_pcihp_init(OBJECT(s), >acpi_pci_hotplug, bus, parent,
> - s->use_acpi_pci_hotplug);
> + s->use_acpi_hotplug_bridge);
>
> s->cpu_hotplug_legacy = true;
> object_property_add_bool(OBJECT(s), "cpu-hotplug-legacy",
> @@ -631,8 +635,10 @@ static Property piix4_pm_properties[] = {
> DEFINE_PROP_UINT8(ACPI_PM_PROP_S3_DISABLED, PIIX4PMState, disable_s3, 0),
> DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_DISABLED, PIIX4PMState, disable_s4, 0),
> DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PIIX4PMState, s4_val, 2),
> - DEFINE_PROP_BOOL("acpi-pci-hotplug-with-bridge-support", PIIX4PMState,
> + DEFINE_PROP_BOOL("acpi-pci-hotplug", PIIX4PMState,
> use_acpi_pci_hotplug, true),
> + DEFINE_PROP_BOOL("acpi-pci-hotplug-with-bridge-support", PIIX4PMState,
> + use_acpi_hotplug_bridge, true),
> DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
> acpi_memory_hotplug.is_enabled, true),
> DEFINE_PROP_END_OF_LIST(),
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 2e15f68..1737378 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> bool s3_disabled;
> bool s4_disabled;
> bool pcihp_bridge_en;
> + bool acpi_pcihp_en;
> uint8_t s4_val;
> AcpiFadtData fadt;
> uint16_t cpu_hp_io_base;
> @@ -246,6 +247,9 @@ static void acpi_get_pm_info(MachineState *machine, 
> AcpiPmInfo *pm)
> pm->pcihp_bridge_en =
> object_property_get_bool(obj, "acpi-pci-hotplug-with-bridge-support",
> NULL);
> + pm->acpi_pcihp_en =
> + object_property_get_bool(obj, "acpi-pci-hotplug",
> + NULL);
> }
>
> static void acpi_get_misc_info(AcpiMiscInfo *info)
> @@ -457,7 +461,8 @@ static void build_append_pcihp_notify_entry(Aml *method, 
> int slot)
> }
>
> static void build_append_pci_bus_devices(Aml *parent_scope, PCIBus *bus,
> - bool pcihp_bridge_en)
> + bool pcihp_bridge_en,
> + bool acpi_pcihp_en)
> {
> Aml *dev, *notify_method = NULL, *method;
> QObject *bsel;
> @@ -481,18 +486,25 @@ static void build_append_pci_bus_devices(Aml 
> *parent_scope, PCIBus *bus,
> bool bridge_in_acpi;
>
> if (!pdev) {
> - if (bsel) { /* add hotplug slots for non present devices */
> - dev = aml_device("S%.02X", PCI_DEVFN(slot, 0));
> - aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
> - aml_append(dev, aml_name_decl("_ADR", aml_int(slot << 16)));
> - method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> - aml_append(method,
> - aml_call2("PCEJ", aml_name("BSEL"), aml_name("_SUN"))
> - );
> - aml_append(dev, method);
> - aml_append(parent_scope, dev);
> -
> - build_append_pcihp_notify_entry(notify_method, slot);
> + if (bsel) {
> + /*
> + * add hotplug slots for non present devices when
> + * acpi hotplug is enabled.
> + */
> + if (acpi_pcihp_en) {
> + dev =

Re: [PATCH QEMU v22 09/18] vfio: Add save state functions to SaveVMHandlers

2020-05-19 Thread Yan Zhao

On Mon, May 18, 2020 at 11:43:09AM +0530, Kirti Wankhede wrote:

<...>
> +
> +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
> +{
> +VFIOMigration *migration = vbasedev->migration;
> +VFIORegion *region = >region;
> +uint64_t data_offset = 0, data_size = 0;
> +int ret;
> +
> +ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
> +region->fd_offset + offsetof(struct 
> vfio_device_migration_info,
> + data_offset));
> +if (ret != sizeof(data_offset)) {
> +error_report("%s: Failed to get migration buffer data offset %d",
> + vbasedev->name, ret);
> +return -EINVAL;
> +}
> +
> +ret = pread(vbasedev->fd, _size, sizeof(data_size),
> +region->fd_offset + offsetof(struct 
> vfio_device_migration_info,
> + data_size));
> +if (ret != sizeof(data_size)) {
> +error_report("%s: Failed to get migration buffer data size %d",
> + vbasedev->name, ret);
> +return -EINVAL;
> +}
> +
> +if (data_size > 0) {
> +void *buf = NULL;
> +bool buffer_mmaped;
> +
> +if (region->mmaps) {
> +buf = find_data_region(region, data_offset, data_size);
> +}
> +
> +buffer_mmaped = (buf != NULL);
> +
> +if (!buffer_mmaped) {
> +buf = g_try_malloc(data_size);
> +if (!buf) {
> +error_report("%s: Error allocating buffer ", __func__);
> +return -ENOMEM;
> +}
> +
> +ret = pread(vbasedev->fd, buf, data_size,
> +region->fd_offset + data_offset);
> +if (ret != data_size) {
> +error_report("%s: Failed to get migration data %d",
> + vbasedev->name, ret);
> +g_free(buf);
> +return -EINVAL;
> +}
> +}
> +
> +qemu_put_be64(f, data_size);
> +qemu_put_buffer(f, buf, data_size);
> +
> +if (!buffer_mmaped) {
> +g_free(buf);
> +}
> +} else {
> +qemu_put_be64(f, data_size);
> +}
> +
> +trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
> +   migration->pending_bytes);
> +
> +ret = qemu_file_get_error(f);
> +if (ret) {
> +return ret;
> +}
> +
> +return data_size;
> +}
> +
> +static int vfio_update_pending(VFIODevice *vbasedev)
> +{
> +VFIOMigration *migration = vbasedev->migration;
> +VFIORegion *region = >region;
> +uint64_t pending_bytes = 0;
> +int ret;
> +
> +ret = pread(vbasedev->fd, _bytes, sizeof(pending_bytes),
> +region->fd_offset + offsetof(struct 
> vfio_device_migration_info,
> + pending_bytes));
> +if ((ret < 0) || (ret != sizeof(pending_bytes))) {
> +error_report("%s: Failed to get pending bytes %d",
> + vbasedev->name, ret);
> +migration->pending_bytes = 0;
> +return (ret < 0) ? ret : -EINVAL;
> +}
> +
> +migration->pending_bytes = pending_bytes;
> +trace_vfio_update_pending(vbasedev->name, pending_bytes);
> +return 0;
> +}
> +
<...>
>  
> +static void vfio_save_pending(QEMUFile *f, void *opaque,
> +  uint64_t threshold_size,
> +  uint64_t *res_precopy_only,
> +  uint64_t *res_compatible,
> +  uint64_t *res_postcopy_only)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +int ret;
> +
> +ret = vfio_update_pending(vbasedev);
> +if (ret) {
> +return;
> +}
> +
> +*res_precopy_only += migration->pending_bytes;
> +
> +trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
> +*res_postcopy_only, *res_compatible);
> +}
> +
> +static int vfio_save_iterate(QEMUFile *f, void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +int ret, data_size;
> +
> +qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
> +
hi Kirti
seems you also didn't address my previous comments.
https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg02795.html.
https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg02796.html


> +if (migration->pending_bytes == 0) {
> +ret = vfio_update_pending(vbasedev);
repeated get pending_bytes here would cause vmstats following vfio-pci
have no chance to get called.

Thanks
Yan

> +if (ret) {
> +return ret;
> +}
> +
> +if (migration->pending_bytes == 0) {
> +/* indicates data finished, goto complete phase */
> +return 1;
> +}
> +}
> +
> +data_size = vfio_save_buffer(f, vbasedev);
> +
> +if

Re: [PATCH Kernel v22 0/8] Add UAPIs to support migration for VFIO devices

2020-05-19 Thread Yan Zhao

On Tue, May 19, 2020 at 10:58:04AM -0600, Alex Williamson wrote:
> Hi folks,
> 
> My impression is that we're getting pretty close to a workable
> implementation here with v22 plus respins of patches 5, 6, and 8.  We
> also have a matching QEMU series and a proposal for a new i40e
> consumer, as well as I assume GVT-g updates happening internally at
> Intel.  I expect all of the latter needs further review and discussion,
> but we should be at the point where we can validate these proposed
> kernel interfaces.  Therefore I'd like to make a call for reviews so
> that we can get this wrapped up for the v5.8 merge window.  I know
> Connie has some outstanding documentation comments and I'd like to make
> sure everyone has an opportunity to check that their comments have been
> addressed and we don't discover any new blocking issues.  Please send
> your Acked-by/Reviewed-by/Tested-by tags if you're satisfied with this
> interface and implementation.  Thanks!
> 
hi Alex and Kirti,
after porting to qemu v22 and kernel v22, it is found out that
it can not even pass basic live migration test with error like

"Failed to get dirty bitmap for iova: 0xca000 size: 0x3000 err: 22"

Thanks
Yan

> 
> On Mon, 18 May 2020 11:26:29 +0530
> Kirti Wankhede  wrote:
> 
> > Hi,
> > 
> > This patch set adds:
> > * IOCTL VFIO_IOMMU_DIRTY_PAGES to get dirty pages bitmap with
> >   respect to IOMMU container rather than per device. All pages pinned by
> >   vendor driver through vfio_pin_pages external API has to be marked as
> >   dirty during  migration. When IOMMU capable device is present in the
> >   container and all pages are pinned and mapped, then all pages are marked
> >   dirty.
> >   When there are CPU writes, CPU dirty page tracking can identify dirtied
> >   pages, but any page pinned by vendor driver can also be written by
> >   device. As of now there is no device which has hardware support for
> >   dirty page tracking. So all pages which are pinned should be considered
> >   as dirty.
> >   This ioctl is also used to start/stop dirty pages tracking for pinned and
> >   unpinned pages while migration is active.
> > 
> > * Updated IOCTL VFIO_IOMMU_UNMAP_DMA to get dirty pages bitmap before
> >   unmapping IO virtual address range.
> >   With vIOMMU, during pre-copy phase of migration, while CPUs are still
> >   running, IO virtual address unmap can happen while device still keeping
> >   reference of guest pfns. Those pages should be reported as dirty before
> >   unmap, so that VFIO user space application can copy content of those
> >   pages from source to destination.
> > 
> > * Patch 8 detect if IOMMU capable device driver is smart to report pages
> >   to be marked dirty by pinning pages using vfio_pin_pages() API.
> > 
> > 
> > Yet TODO:
> > Since there is no device which has hardware support for system memmory
> > dirty bitmap tracking, right now there is no other API from vendor driver
> > to VFIO IOMMU module to report dirty pages. In future, when such hardware
> > support will be implemented, an API will be required such that vendor
> > driver could report dirty pages to VFIO module during migration phases.
> > 
> > Adding revision history from previous QEMU patch set to understand KABI
> > changes done till now
> > 
> > v21 -> v22
> > - Fixed issue raised by Alex :
> > https://lore.kernel.org/kvm/20200515163307.72951...@w520.home/
> > 
> > v20 -> v21
> > - Added checkin for GET_BITMAP ioctl for vfio_dma boundaries.
> > - Updated unmap ioctl function - as suggested by Alex.
> > - Updated comments in DIRTY_TRACKING ioctl definition - as suggested by
> >   Cornelia.
> > 
> > v19 -> v20
> > - Fixed ioctl to get dirty bitmap to get bitmap of multiple vfio_dmas
> > - Fixed unmap ioctl to get dirty bitmap of multiple vfio_dmas.
> > - Removed flag definition from migration capability.
> > 
> > v18 -> v19
> > - Updated migration capability with supported page sizes bitmap for dirty
> >   page tracking and  maximum bitmap size supported by kernel module.
> > - Added patch to calculate and cache pgsize_bitmap when iommu->domain_list
> >   is updated.
> > - Removed extra buffers added in previous version for bitmap manipulation
> >   and optimised the code.
> > 
> > v17 -> v18
> > - Add migration capability to the capability chain for VFIO_IOMMU_GET_INFO
> >   ioctl
> > - Updated UMAP_DMA ioctl to return bitmap of multiple vfio_dma
> > 
> > v16 -> v17
> > - Fixed errors reported by kbuild test robot  on i386
> > 
> > v15 -> v16
> > - Minor edits and nit picks (Auger Eric)
> > - On copying bitmap to user, re-populated bitmap only for pinned pages,
> >   excluding unmapped pages and CPU dirtied pages.
> > - Patches are on tag: next-20200318 and 1-3 patches from Yan's series
> >   https://lkml.org/lkml/2020/3/12/1255
> > 
> > v14 -> v15
> > - Minor edits and nit picks.
> > - In the verification of user allocated bitmap memory, added check of
> >maximum size.
> > - Patches are on tag: next-20200318 and 1-3 patches

Re: [PATCH 0/7] Latest COLO tree queued patches

2020-05-19 Thread Jason Wang




On 2020/5/20 上午4:02, Zhang Chen wrote:

From: Zhang Chen 

Hi Jason, this series include latest COLO related patches.
I have finish basic test and review.
If no other comments, please check and merge this series.



Applied.

Thanks




Derek Su (1):
   colo-compare: Fix memory leak in packet_enqueue()

Lukas Straub (6):
   net/colo-compare.c: Create event_bh with the right AioContext
   chardev/char.c: Use qemu_co_sleep_ns if in coroutine
   net/colo-compare.c: Fix deadlock in compare_chr_send
   net/colo-compare.c: Only hexdump packets if tracing is enabled
   net/colo-compare.c, softmmu/vl.c: Check that colo-compare is active
   net/colo-compare.c: Correct ordering in complete and finalize

  chardev/char.c |   7 +-
  net/colo-compare.c | 277 +
  net/colo-compare.h |   1 +
  net/colo.c |   7 ++
  net/colo.h |   1 +
  net/trace-events   |   1 +
  softmmu/vl.c   |   2 +
  7 files changed, 225 insertions(+), 71 deletions(-)

Re: [RFC PATCH 1/8] riscv: Add RV64I instructions description

2020-05-19 Thread LIU Zhiwei





On 2020/5/12 0:39, Richard Henderson wrote:

On 4/30/20 12:21 AM, LIU Zhiwei wrote:

+LUI RISCV imm:20 rd:5 0110111 \
+!constraints { $rd != 2 && $rd != 3 && $rd != 4 }

I think it would be helpful to add a function for this.  e.g. greg($rd) and
gbase($rs1) (including $0).  It would keep the constraints smaller, and avoid
mistakes.

These functions would go into risugen_riscv.pm.

Good idea. I will take it next patch set.

+ADDI RISCV imm:12 rs1:5 000 rd:5 0010011 \
+!constraints { $rd != 2 && $rd != 3 && $rd != 4 && $rs1 != 2 }

Since all of sp, gp, tp are not in risu's control, why is rs1 only excluding
sp, and not gp and tp as well?
When I test the patch set, I find gp and tp will be the same in slave 
and master，

so they can be used as source register.

I will check it again in next patch set test.

Zhiwei


r~

Re: [RFC PATCH 2/8] riscv: Generate payload scripts

2020-05-19 Thread LIU Zhiwei


On 2020/5/12 1:40, Richard Henderson wrote:

On 4/30/20 12:21 AM, LIU Zhiwei wrote:

+# sequence of li rd, 0x1234567887654321
+#
+#  0:   002471b7lui rd,0x247
+#  4:   8ad1819baddiw   rd,rd,-1875
+#  8:   00c19193sllird,rd,0xc
+#  c:   f1118193addird,rd,-239 # 0x246f11
+# 10:   00d19193sllird,rd,0xd
+# 14:   d9518193addird,rd,-619
+# 18:   00e19193sllird,rd,0xe
+# 1c:   32118193addird,rd,801

You don't really need to use addiw.  Removing that special case would really
simplify this.

I think I don't get it. Do you mean that the immediate will not be 64 bit?

+sub write_memblock_setup()
+{
+# Write code which sets up the memory block for loads and stores.
+# We set r0 to point to a block of 16K length
+# of random data, aligned to the maximum desired alignment.
+
+my $align = $MAXALIGN;
+my $datalen = 16384 + $align;

risu.h:#define MEMBLOCKLEN 8192

Why are you using 16384?

It's a bug.

Once I thought I should make it bigger to support  vector in the future .
And  even that, 8K byts is also enough, as the most bytes operates in 
one instruction

is LMUL * RV_VLEN_MAX  / 8 = 512 Bytes.

Zhiwei


Also, typo -- you're setting r10 not r0, obviously.

The rest looks fine.


r~

Re: [RFC PATCH 6/8] riscv: Add configure script

2020-05-19 Thread LIU Zhiwei





On 2020/5/20 9:45, LIU Zhiwei wrote:


On 2020/5/12 2:06, Richard Henderson wrote:

On 4/30/20 12:21 AM, LIU Zhiwei wrote:

+++ b/configure
@@ -58,6 +58,8 @@ guess_arch() {
  ARCH="m68k"
  elif check_define __powerpc64__ ; then
  ARCH="ppc64"
+    elif check_define __riscv ; then
+    ARCH="riscv64"
  else
  echo "This cpu is not supported by risu. Try -h. " >&2
  exit 1

Why "riscv64" and not "riscv"?

You can't really say more without checking __riscv_xlen.

Thanks for point it out. I will add support for RV32 next patch set.
Perhaps not to support RV32, because I don't have hardware in RV32 to 
support Linux.


So the next patch set will focus on the rv64gc, and add more check for 
__riscv_xlen and __riscv_flen.


Zhiwei



Zhiwei


r~

[PATCH v4 2/5] target/i386: add fast short REP MOV support

2020-05-19 Thread Chenyi Qiang

For CPUs support fast short REP MOV[CPUID.(EAX=7,ECX=0):EDX(bit4)], e.g
Icelake and Tigerlake, expose it to the guest VM.

Signed-off-by: Chenyi Qiang 
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2b653a0161..52f5aa5418 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -984,7 +984,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
 NULL, NULL, "avx512-4vnniw", "avx512-4fmaps",
-NULL, NULL, NULL, NULL,
+"fsrm", NULL, NULL, NULL,
 NULL, NULL, "md-clear", NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL /* pconfig */, NULL,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 408392dbf6..142256017b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -772,6 +772,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EDX_AVX512_4VNNIW (1U << 2)
 /* AVX512 Multiply Accumulation Single Precision */
 #define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3)
+/* Fast Short Rep Mov */
+#define CPUID_7_0_EDX_FSRM  (1U << 4)
 /* Speculation Control */
 #define CPUID_7_0_EDX_SPEC_CTRL (1U << 26)
 /* Single Thread Indirect Branch Predictors */
-- 
2.17.1

[PATCH v4 1/5] target/i386: add missing vmx features for several CPU models

2020-05-19 Thread Chenyi Qiang

Add some missing VMX features in Skylake-Server, Cascadelake-Server and
Icelake-Server CPU models based on the output of Paolo's script.

Signed-off-by: Chenyi Qiang 
---
 target/i386/cpu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7a4a8e3847..2b653a0161 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2982,6 +2982,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
  VMX_SECONDARY_EXEC_RDRAND_EXITING | 
VMX_SECONDARY_EXEC_ENABLE_INVPCID |
  VMX_SECONDARY_EXEC_ENABLE_VMFUNC | VMX_SECONDARY_EXEC_SHADOW_VMCS 
|
  VMX_SECONDARY_EXEC_RDSEED_EXITING | VMX_SECONDARY_EXEC_ENABLE_PML,
+.features[FEAT_VMX_VMFUNC] = MSR_VMX_VMFUNC_EPT_SWITCHING,
 .xlevel = 0x8008,
 .model_id = "Intel Xeon Processor (Skylake)",
 .versions = (X86CPUVersionDefinition[]) {
@@ -3110,6 +3111,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
  VMX_SECONDARY_EXEC_RDRAND_EXITING | 
VMX_SECONDARY_EXEC_ENABLE_INVPCID |
  VMX_SECONDARY_EXEC_ENABLE_VMFUNC | VMX_SECONDARY_EXEC_SHADOW_VMCS 
|
  VMX_SECONDARY_EXEC_RDSEED_EXITING | VMX_SECONDARY_EXEC_ENABLE_PML,
+.features[FEAT_VMX_VMFUNC] = MSR_VMX_VMFUNC_EPT_SWITCHING,
 .xlevel = 0x8008,
 .model_id = "Intel Xeon Processor (Cascadelake)",
 .versions = (X86CPUVersionDefinition[]) {
@@ -3457,7 +3459,9 @@ static X86CPUDefinition builtin_x86_defs[] = {
  VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT |
  VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
  VMX_SECONDARY_EXEC_RDRAND_EXITING | 
VMX_SECONDARY_EXEC_ENABLE_INVPCID |
- VMX_SECONDARY_EXEC_ENABLE_VMFUNC | VMX_SECONDARY_EXEC_SHADOW_VMCS,
+ VMX_SECONDARY_EXEC_ENABLE_VMFUNC | VMX_SECONDARY_EXEC_SHADOW_VMCS 
|
+ VMX_SECONDARY_EXEC_RDSEED_EXITING | VMX_SECONDARY_EXEC_ENABLE_PML,
+.features[FEAT_VMX_VMFUNC] = MSR_VMX_VMFUNC_EPT_SWITCHING,
 .xlevel = 0x8008,
 .model_id = "Intel Xeon Processor (Icelake)",
 .versions = (X86CPUVersionDefinition[]) {
-- 
2.17.1

[PATCH v4 0/5] modify CPU model info

2020-05-19 Thread Chenyi Qiang

Add the missing VMX features in Skylake-Server, Cascadelake-Server and
Icelake-Server CPU models. In Icelake-Server CPU model, it lacks sha_ni,
avx512ifma, rdpid and fsrm. The model number of Icelake-Server also needs
to be fixed.
Remove the Icelake-Client CPU model due to no Icelake Desktop products
in the market.

Changes in v4:
- remove the Icelake-Client CPU model

Changes in v3:
- change the missing features of Icelake-Server from v3 to v4

Changes in v2:
- add missing features as a new version of CPU model
- add the support of FSRM
- add New CPUID of FSRM and RDPID in Icelake-Server CPU model

Chenyi Qiang (5):
  target/i386: add missing vmx features for several CPU models
  target/i386: add fast short REP MOV support
  target/i386: add the missing features for Icelake-Server CPU model
  target/i386: modify Icelake-Server CPU model number
  target/i386: remove Icelake-Client CPU model

 hw/i386/pc.c  |   1 -
 target/i386/cpu.c | 133 ++
 target/i386/cpu.h |   2 +
 3 files changed, 19 insertions(+), 117 deletions(-)

-- 
2.17.1

[PATCH v4 5/5] target/i386: remove Icelake-Client CPU model

2020-05-19 Thread Chenyi Qiang

There are no Icelake Desktop products in the market. Remove the
Icelake-Client CPU model.

Signed-off-by: Chenyi Qiang 
---
 hw/i386/pc.c  |   1 -
 target/i386/cpu.c | 113 --
 2 files changed, 114 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2128f3d6fe..ecc5ab022b 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -127,7 +127,6 @@ GlobalProperty pc_compat_3_1[] = {
 { "Skylake-Server" "-" TYPE_X86_CPU,  "mpx", "on" },
 { "Skylake-Server-IBRS" "-" TYPE_X86_CPU, "mpx", "on" },
 { "Cascadelake-Server" "-" TYPE_X86_CPU,  "mpx", "on" },
-{ "Icelake-Client" "-" TYPE_X86_CPU,  "mpx", "on" },
 { "Icelake-Server" "-" TYPE_X86_CPU,  "mpx", "on" },
 { "Cascadelake-Server" "-" TYPE_X86_CPU, "stepping", "5" },
 { TYPE_X86_CPU, "x-intel-pt-auto-level", "off" },
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d59698710e..33c0fdc23f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3246,119 +3246,6 @@ static X86CPUDefinition builtin_x86_defs[] = {
 .xlevel = 0x8008,
 .model_id = "Intel Xeon Processor (Cooperlake)",
 },
-{
-.name = "Icelake-Client",
-.level = 0xd,
-.vendor = CPUID_VENDOR_INTEL,
-.family = 6,
-.model = 126,
-.stepping = 0,
-.features[FEAT_1_EDX] =
-CPUID_VME | CPUID_SSE2 | CPUID_SSE | CPUID_FXSR | CPUID_MMX |
-CPUID_CLFLUSH | CPUID_PSE36 | CPUID_PAT | CPUID_CMOV | CPUID_MCA |
-CPUID_PGE | CPUID_MTRR | CPUID_SEP | CPUID_APIC | CPUID_CX8 |
-CPUID_MCE | CPUID_PAE | CPUID_MSR | CPUID_TSC | CPUID_PSE |
-CPUID_DE | CPUID_FP87,
-.features[FEAT_1_ECX] =
-CPUID_EXT_AVX | CPUID_EXT_XSAVE | CPUID_EXT_AES |
-CPUID_EXT_POPCNT | CPUID_EXT_X2APIC | CPUID_EXT_SSE42 |
-CPUID_EXT_SSE41 | CPUID_EXT_CX16 | CPUID_EXT_SSSE3 |
-CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSE3 |
-CPUID_EXT_TSC_DEADLINE_TIMER | CPUID_EXT_FMA | CPUID_EXT_MOVBE |
-CPUID_EXT_PCID | CPUID_EXT_F16C | CPUID_EXT_RDRAND,
-.features[FEAT_8000_0001_EDX] =
-CPUID_EXT2_LM | CPUID_EXT2_RDTSCP | CPUID_EXT2_NX |
-CPUID_EXT2_SYSCALL,
-.features[FEAT_8000_0001_ECX] =
-CPUID_EXT3_ABM | CPUID_EXT3_LAHF_LM | CPUID_EXT3_3DNOWPREFETCH,
-.features[FEAT_8000_0008_EBX] =
-CPUID_8000_0008_EBX_WBNOINVD,
-.features[FEAT_7_0_EBX] =
-CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 |
-CPUID_7_0_EBX_HLE | CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP |
-CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_INVPCID |
-CPUID_7_0_EBX_RTM | CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_ADX |
-CPUID_7_0_EBX_SMAP,
-.features[FEAT_7_0_ECX] =
-CPUID_7_0_ECX_AVX512_VBMI | CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU 
|
-CPUID_7_0_ECX_AVX512_VBMI2 | CPUID_7_0_ECX_GFNI |
-CPUID_7_0_ECX_VAES | CPUID_7_0_ECX_VPCLMULQDQ |
-CPUID_7_0_ECX_AVX512VNNI | CPUID_7_0_ECX_AVX512BITALG |
-CPUID_7_0_ECX_AVX512_VPOPCNTDQ,
-.features[FEAT_7_0_EDX] =
-CPUID_7_0_EDX_SPEC_CTRL | CPUID_7_0_EDX_SPEC_CTRL_SSBD,
-/* Missing: XSAVES (not supported by some Linux versions,
-* including v4.1 to v4.12).
-* KVM doesn't yet expose any XSAVES state save component,
-* and the only one defined in Skylake (processor tracing)
-* probably will block migration anyway.
-*/
-.features[FEAT_XSAVE] =
-CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
-CPUID_XSAVE_XGETBV1,
-.features[FEAT_6_EAX] =
-CPUID_6_EAX_ARAT,
-/* Missing: Mode-based execute control (XS/XU), processor tracing, TSC 
scaling */
-.features[FEAT_VMX_BASIC] = MSR_VMX_BASIC_INS_OUTS |
- MSR_VMX_BASIC_TRUE_CTLS,
-.features[FEAT_VMX_ENTRY_CTLS] = VMX_VM_ENTRY_IA32E_MODE |
- VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | 
VMX_VM_ENTRY_LOAD_IA32_PAT |
- VMX_VM_ENTRY_LOAD_DEBUG_CONTROLS | VMX_VM_ENTRY_LOAD_IA32_EFER,
-.features[FEAT_VMX_EPT_VPID_CAPS] = MSR_VMX_EPT_EXECONLY |
- MSR_VMX_EPT_PAGE_WALK_LENGTH_4 | MSR_VMX_EPT_WB | MSR_VMX_EPT_2MB 
|
- MSR_VMX_EPT_1GB | MSR_VMX_EPT_INVEPT |
- MSR_VMX_EPT_INVEPT_SINGLE_CONTEXT | 
MSR_VMX_EPT_INVEPT_ALL_CONTEXT |
- MSR_VMX_EPT_INVVPID | MSR_VMX_EPT_INVVPID_SINGLE_ADDR |
- MSR_VMX_EPT_INVVPID_SINGLE_CONTEXT | 
MSR_VMX_EPT_INVVPID_ALL_CONTEXT |
- MSR_VMX_EPT_INVVPID_SINGLE_CONTEXT_NOGLOBALS | 
MSR_VMX_EPT_AD_BITS,
-.features[FEAT_VMX_EXIT_CTLS] =
- VMX_VM_EXIT_ACK_INTR_ON_EXIT | VMX_VM_EXIT_SAVE_DEBUG_CONTROLS |
- VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
- VMX_VM_EXIT_LOAD_IA32_PAT |

[PATCH v4 4/5] target/i386: modify Icelake-Server CPU model number

2020-05-19 Thread Chenyi Qiang

According to the Intel Icelake family list, Icelake-Server uses model
number 106(0x6A).

Signed-off-by: Chenyi Qiang 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b4697b0148..d59698710e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3364,7 +3364,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
 .level = 0xd,
 .vendor = CPUID_VENDOR_INTEL,
 .family = 6,
-.model = 134,
+.model = 106,
 .stepping = 0,
 .features[FEAT_1_EDX] =
 CPUID_VME | CPUID_SSE2 | CPUID_SSE | CPUID_FXSR | CPUID_MMX |
-- 
2.17.1

[PATCH v4 3/5] target/i386: add the missing features for Icelake-Server CPU model

2020-05-19 Thread Chenyi Qiang

Add the missing features(sha-ni, avx512ifma, rdpid, fsrm) in the
Icelake-Server CPU model.

Signed-off-by: Chenyi Qiang 
---
 target/i386/cpu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 52f5aa5418..b4697b0148 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3488,6 +3488,16 @@ static X86CPUDefinition builtin_x86_defs[] = {
 { /* end of list */ }
 },
 },
+{
+.version = 4,
+.props = (PropValue[]) {
+{ "sha-ni", "on" },
+{ "avx512ifma", "on" },
+{ "rdpid", "on" },
+{ "fsrm", "on" },
+{ /* end of list */ }
+},
+},
 { /* end of list */ }
 }
 },
-- 
2.17.1

[Bug 1879425] Re: The thread of "CPU 0 /KVM" keeping 99.9%CPU

2020-05-19 Thread cliff chen

one changes:
Guest VM is Red Hat Enterprise Linux 8.1 (Ootpa).
there is no issue once guest VM is RHEL7.6.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1879425

Title:
  The thread of "CPU 0 /KVM" keeping 99.9%CPU

Status in QEMU:
  New

Bug description:
  Hi Expert:

  The VM is hung here after (2, or 3, or 5 and the longest time is 10 hours) by 
qemu-kvm.
  Notes: 
  for VM:
OS: RHEL 7.6
CPU: 1
MEM:4G
  For qemu-kvm:
1) version:
   /usr/libexec/qemu-kvm -version
   QEMU emulator version 2.10.0(qemu-kvm-ev-2.10.0-21.el7_5.4.1)
2) once the issue is occurred, the CPU of "CPU0 /KVM" is more than 99% by 
com "top -p VM_pro_ID"
  PID  UDER   PR NI RES   S  % CPU %MEM  TIME+COMMAND
  872067   qemu   20 0  1.6g  R   99.9  0.6  37:08.87 CPU 0/KVM
3) use "pstack 493307" and below is function trace
  Thread 1 (Thread 0x7f2572e73040 (LWP 872067)):
  #0  0x7f256cad8fcf in ppoll () from /lib64/libc.so.6
  #1  0x55ff34bdf4a9 in qemu_poll_ns ()
  #2  0x55ff34be02a8 in main_loop_wait ()
  #3  0x55ff348bfb1a in main ()
4) use strace "strace -tt -ff -p 872067 -o cfx" and below log keep printing
  21:24:02.977833 ppoll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, 
events=POLLIN}, {fd=9, events=POLLIN}, {fd=80, events=POLLIN}, {fd=82, 
events=POLLIN}, {fd=84, events=POLLIN}, {fd=115, events=POLLIN}, {fd=121, 
events=POLLIN}], 9, {0, 0}, NULL, 8) = 0 (Timeout)
  21:24:02.977918 ppoll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, 
events=POLLIN}, {fd=9, events=POLLIN}, {fd=80, events=POLLIN}, {fd=82, 
events=POLLIN}, {fd=84, events=POLLIN}, {fd=115, events=POLLIN}, {fd=121, 
events=POLLIN}], 9, {0, 911447}, NULL, 8) = 0 (Timeout)
  21:24:02.978945 ppoll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, 
events=POLLIN}, {fd=9, events=POLLIN}, {fd=80, events=POLLIN}, {fd=82, 
events=POLLIN}, {fd=84, events=POLLIN}, {fd=115, events=POLLIN}, {fd=121, 
events=POLLIN}], 9, {0, 0}, NULL, 8) = 0 (Timeout)
  Therefore, I think the thread "CPU 0/KVM" is in tight loop.
5) use reset can recover this issue. however, it will reoccurred again.
  Current work around is increase one CPU for this VM, then issue is gone.

  thanks
  Cliff

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1879425/+subscriptions

[PATCH 1/2] Revert "9p: init_in_iov_from_pdu can truncate the size"

2020-05-19 Thread Stefano Stabellini

From: Stefano Stabellini 

This reverts commit 16724a173049ac29c7b5ade741da93a0f46edff7.
It causes https://bugs.launchpad.net/bugs/1877688.

Signed-off-by: Stefano Stabellini 
---
 hw/9pfs/9p.c   | 33 +++--
 hw/9pfs/9p.h   |  2 +-
 hw/9pfs/virtio-9p-device.c | 11 ---
 hw/9pfs/xen-9p-backend.c   | 15 ++-
 4 files changed, 22 insertions(+), 39 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index a2a14b5979..d39bfee462 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -2102,29 +2102,22 @@ out_nofid:
  * with qemu_iovec_destroy().
  */
 static void v9fs_init_qiov_from_pdu(QEMUIOVector *qiov, V9fsPDU *pdu,
-size_t skip, size_t *size,
+size_t skip, size_t size,
 bool is_write)
 {
 QEMUIOVector elem;
 struct iovec *iov;
 unsigned int niov;
-size_t alloc_size = *size + skip;
 
 if (is_write) {
-pdu->s->transport->init_out_iov_from_pdu(pdu, , , alloc_size);
+pdu->s->transport->init_out_iov_from_pdu(pdu, , , size + 
skip);
 } else {
-pdu->s->transport->init_in_iov_from_pdu(pdu, , , _size);
-}
-
-if (alloc_size < skip) {
-*size = 0;
-} else {
-*size = alloc_size - skip;
+pdu->s->transport->init_in_iov_from_pdu(pdu, , , size + skip);
 }
 
 qemu_iovec_init_external(, iov, niov);
 qemu_iovec_init(qiov, niov);
-qemu_iovec_concat(qiov, , skip, *size);
+qemu_iovec_concat(qiov, , skip, size);
 }
 
 static int v9fs_xattr_read(V9fsState *s, V9fsPDU *pdu, V9fsFidState *fidp,
@@ -2132,14 +2125,15 @@ static int v9fs_xattr_read(V9fsState *s, V9fsPDU *pdu, 
V9fsFidState *fidp,
 {
 ssize_t err;
 size_t offset = 7;
-size_t read_count;
+uint64_t read_count;
 QEMUIOVector qiov_full;
 
 if (fidp->fs.xattr.len < off) {
 read_count = 0;
-} else if (fidp->fs.xattr.len - off < max_count) {
-read_count = fidp->fs.xattr.len - off;
 } else {
+read_count = fidp->fs.xattr.len - off;
+}
+if (read_count > max_count) {
 read_count = max_count;
 }
 err = pdu_marshal(pdu, offset, "d", read_count);
@@ -2148,7 +2142,7 @@ static int v9fs_xattr_read(V9fsState *s, V9fsPDU *pdu, 
V9fsFidState *fidp,
 }
 offset += err;
 
-v9fs_init_qiov_from_pdu(_full, pdu, offset, _count, false);
+v9fs_init_qiov_from_pdu(_full, pdu, offset, read_count, false);
 err = v9fs_pack(qiov_full.iov, qiov_full.niov, 0,
 ((char *)fidp->fs.xattr.value) + off,
 read_count);
@@ -2277,11 +2271,9 @@ static void coroutine_fn v9fs_read(void *opaque)
 QEMUIOVector qiov_full;
 QEMUIOVector qiov;
 int32_t len;
-size_t size = max_count;
 
-v9fs_init_qiov_from_pdu(_full, pdu, offset + 4, , false);
+v9fs_init_qiov_from_pdu(_full, pdu, offset + 4, max_count, false);
 qemu_iovec_init(, qiov_full.niov);
-max_count = size;
 do {
 qemu_iovec_reset();
 qemu_iovec_concat(, _full, count, qiov_full.size - 
count);
@@ -2532,7 +2524,6 @@ static void coroutine_fn v9fs_write(void *opaque)
 int32_t len = 0;
 int32_t total = 0;
 size_t offset = 7;
-size_t size;
 V9fsFidState *fidp;
 V9fsPDU *pdu = opaque;
 V9fsState *s = pdu->s;
@@ -2545,9 +2536,7 @@ static void coroutine_fn v9fs_write(void *opaque)
 return;
 }
 offset += err;
-size = count;
-v9fs_init_qiov_from_pdu(_full, pdu, offset, , true);
-count = size;
+v9fs_init_qiov_from_pdu(_full, pdu, offset, count, true);
 trace_v9fs_write(pdu->tag, pdu->id, fid, off, count, qiov_full.niov);
 
 fidp = get_fid(pdu, fid);
diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
index dd1c6cb8d2..1b9e110605 100644
--- a/hw/9pfs/9p.h
+++ b/hw/9pfs/9p.h
@@ -436,7 +436,7 @@ struct V9fsTransport {
 ssize_t (*pdu_vunmarshal)(V9fsPDU *pdu, size_t offset, const char *fmt,
   va_list ap);
 void(*init_in_iov_from_pdu)(V9fsPDU *pdu, struct iovec **piov,
-unsigned int *pniov, size_t *size);
+unsigned int *pniov, size_t size);
 void(*init_out_iov_from_pdu)(V9fsPDU *pdu, struct iovec **piov,
  unsigned int *pniov, size_t size);
 void(*push_and_notify)(V9fsPDU *pdu);
diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index e5b44977c7..36f3aa9352 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -147,22 +147,19 @@ static ssize_t virtio_pdu_vunmarshal(V9fsPDU *pdu, size_t 
offset,
 }
 
 static void virtio_init_in_iov_from_pdu(V9fsPDU *pdu, struct iovec **piov,
-unsigned int *pniov, size_t *size)
+

[PATCH 2/2] xen/9pfs: yield when there isn't enough room on the ring

2020-05-19 Thread Stefano Stabellini

From: Stefano Stabellini 

Instead of truncating replies, which is problematic, wait until the
client reads more data and frees bytes on the reply ring.

Do that by calling qemu_coroutine_yield(). The corresponding
qemu_coroutine_enter_if_inactive() is called from xen_9pfs_bh upon
receiving the next notification from the client.

We need to be careful to avoid races in case xen_9pfs_bh and the
coroutine are both active at the same time. In xen_9pfs_bh, wait until
either the critical section is over (ring->co == NULL) or until the
coroutine becomes inactive (qemu_coroutine_yield() was called) before
continuing. Then, simply wake up the coroutine if it is inactive.

Signed-off-by: Stefano Stabellini 
---
 hw/9pfs/xen-9p-backend.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index fc197f6c8a..3939539028 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -37,6 +37,7 @@ typedef struct Xen9pfsRing {
 
 struct iovec *sg;
 QEMUBH *bh;
+Coroutine *co;
 
 /* local copies, so that we can read/write PDU data directly from
  * the ring */
@@ -198,16 +199,18 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU *pdu,
 g_free(ring->sg);
 
 ring->sg = g_new0(struct iovec, 2);
-xen_9pfs_in_sg(ring, ring->sg, , pdu->idx, size);
+ring->co = qemu_coroutine_self();
+smp_wmb();
 
+again:
+xen_9pfs_in_sg(ring, ring->sg, , pdu->idx, size);
 buf_size = iov_size(ring->sg, num);
 if (buf_size  < size) {
-xen_pv_printf(_9pfs->xendev, 0, "Xen 9pfs request type %d"
-"needs %zu bytes, buffer has %zu\n", pdu->id, size,
-buf_size);
-xen_be_set_state(_9pfs->xendev, XenbusStateClosing);
-xen_9pfs_disconnect(_9pfs->xendev);
+qemu_coroutine_yield();
+goto again;
 }
+ring->co = NULL;
+smp_wmb();
 
 *piov = ring->sg;
 *pniov = num;
@@ -292,6 +295,19 @@ static int xen_9pfs_receive(Xen9pfsRing *ring)
 static void xen_9pfs_bh(void *opaque)
 {
 Xen9pfsRing *ring = opaque;
+bool wait;
+
+again:
+wait = ring->co != NULL && qemu_coroutine_entered(ring->co);
+smp_rmb();
+if (wait) {
+cpu_relax();
+goto again;
+}
+
+if (ring->co != NULL) {
+qemu_coroutine_enter_if_inactive(ring->co);
+}
 xen_9pfs_receive(ring);
 }
 
-- 
2.17.1

[PATCH 0/2] revert 9pfs reply truncation, wait for free room to reply

2020-05-19 Thread Stefano Stabellini

Hi all,

This short series reverts commit 16724a173049ac29c7b5ade741da93a0f46edff
becauses it is the cause for https://bugs.launchpad.net/bugs/1877688.

The original issue addressed by 16724a173049ac29c7b5ade741da93a0f46edff
is solved differently in this series by using qemu_coroutine_yield() to
wait for the client to free more data from the ring before sending the
reply.

Cheers,

Stefano

Re: [RFC PATCH 6/8] riscv: Add configure script

2020-05-19 Thread LIU Zhiwei




On 2020/5/12 2:06, Richard Henderson wrote:

On 4/30/20 12:21 AM, LIU Zhiwei wrote:

+++ b/configure
@@ -58,6 +58,8 @@ guess_arch() {
  ARCH="m68k"
  elif check_define __powerpc64__ ; then
  ARCH="ppc64"
+elif check_define __riscv ; then
+ARCH="riscv64"
  else
  echo "This cpu is not supported by risu. Try -h. " >&2
  exit 1

Why "riscv64" and not "riscv"?

You can't really say more without checking __riscv_xlen.

Thanks for point it out. I will add support for RV32 next patch set.

Zhiwei


r~

[Bug 1879590] [NEW] Using qemu-system-sparc64 no network interface seems to exist

2020-05-19 Thread chris pugmire

Public bug reported:

Using boot command:

qemu-system-sparc64 -M niagara -L /home/chrisp/sparc/S10image/
-nographic -m 256 -drive
if=pflash,readonly=on,file=/home/chrisp/sparc/S10image/disk.s10hw2

After I log into solaris system I see no network devices other than the 
loopback device.
All the docs I can see suggest it should come up with a default network device 
that allows communication via the hosts network. Host is ubuntu 64bit.  

root@giant:/home/chrisp/sparc# qemu-system-sparc64 -version
QEMU emulator version 5.0.0
Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers


dladm show-link
ifconfig -a


ok boot
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Sun-Fire-T2000/ufsboot
Loading: /platform/sun4v/ufsboot
SunOS Release 5.10 Version Generic_118822-23 64-bit
Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: unknown

unknown console login: root
Last login: Wed Feb  8 09:01:28 on console
Sun Microsystems Inc.   SunOS 5.10  Generic January 2005
# dladm show-link
# ifconfig -a
lo0: flags=2001000849 mtu 8232 
index 1
inet 127.0.0.1 netmask ff00

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1879590

Title:
  Using qemu-system-sparc64 no network interface seems to exist

Status in QEMU:
  New

Bug description:
  Using boot command:

  qemu-system-sparc64 -M niagara -L /home/chrisp/sparc/S10image/
  -nographic -m 256 -drive
  if=pflash,readonly=on,file=/home/chrisp/sparc/S10image/disk.s10hw2

  After I log into solaris system I see no network devices other than the 
loopback device.
  All the docs I can see suggest it should come up with a default network 
device that allows communication via the hosts network. Host is ubuntu 64bit.  

  root@giant:/home/chrisp/sparc# qemu-system-sparc64 -version
  QEMU emulator version 5.0.0
  Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers

  
  dladm show-link
  ifconfig -a


  ok boot
  Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
  FCode UFS Reader 1.12 00/07/17 15:48:16.
  Loading: /platform/SUNW,Sun-Fire-T2000/ufsboot
  Loading: /platform/sun4v/ufsboot
  SunOS Release 5.10 Version Generic_118822-23 64-bit
  Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
  Use is subject to license terms.
  Hostname: unknown

  unknown console login: root
  Last login: Wed Feb  8 09:01:28 on console
  Sun Microsystems Inc.   SunOS 5.10  Generic January 2005
  # dladm show-link
  # ifconfig -a
  lo0: flags=2001000849 mtu 8232 
index 1
  inet 127.0.0.1 netmask ff00

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1879590/+subscriptions

[Bug 1879587] Re: Register number in ESR is incorrect for certain banked registers when switching from AA32 to AA64

2020-05-19 Thread Julien Freche

This is with qemu-system-aarch64 - forgot to mention it explicitly. So,
it will only affect qemu for ARM 64-bit.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1879587

Title:
  Register number in ESR is incorrect for certain banked registers when
  switching from AA32 to AA64

Status in QEMU:
  New

Bug description:
  I am running into a situation where I have:
  - A hypervisor running in EL2, AA64
  - A guest running in EL1, AA32

  We trap certain accesses to special registers such as DACR (via
  HCR.TVM). One instruction that is trapped is:

  ee03ef10  ->mcr 15, 0, lr, cr3, cr0, {0}

  The guest is running in SVC mode. So, LR should refer to LR_svc there.
  LR_svc is mapped to X18 in AA64. So, ESR should reflect that. However,
  the actual ESR value is: 0xfe00dc0

  If we decode the 'rt':
  >>> (0xfe00dc0 >> 5) & 0x1f
  14

  My understanding is that 14 is incorrect in the context of AA64. rt
  should be set to 18. The current mode being SVC, LR refers to LR_svc
  not LR_usr. In other words, the mapping between registers in AA64 and
  AA32 doesn't seem to be accounted for. I've tested this with Qemu
  5.0.0

  Let me know if that makes sense and if you would like more info. I am also 
happy to test patches.
  Thanks for all the great work on Qemu!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1879587/+subscriptions

[Bug 1879587] [NEW] Register number in ESR is incorrect for certain banked registers when switching from AA32 to AA64

2020-05-19 Thread Julien Freche

Public bug reported:

I am running into a situation where I have:
- A hypervisor running in EL2, AA64
- A guest running in EL1, AA32

We trap certain accesses to special registers such as DACR (via
HCR.TVM). One instruction that is trapped is:

ee03ef10  ->mcr 15, 0, lr, cr3, cr0, {0}

The guest is running in SVC mode. So, LR should refer to LR_svc there.
LR_svc is mapped to X18 in AA64. So, ESR should reflect that. However,
the actual ESR value is: 0xfe00dc0

If we decode the 'rt':
>>> (0xfe00dc0 >> 5) & 0x1f
14

My understanding is that 14 is incorrect in the context of AA64. rt
should be set to 18. The current mode being SVC, LR refers to LR_svc not
LR_usr. In other words, the mapping between registers in AA64 and AA32
doesn't seem to be accounted for. I've tested this with Qemu 5.0.0

Let me know if that makes sense and if you would like more info. I am also 
happy to test patches.
Thanks for all the great work on Qemu!

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1879587

Title:
  Register number in ESR is incorrect for certain banked registers when
  switching from AA32 to AA64

Status in QEMU:
  New

Bug description:
  I am running into a situation where I have:
  - A hypervisor running in EL2, AA64
  - A guest running in EL1, AA32

  We trap certain accesses to special registers such as DACR (via
  HCR.TVM). One instruction that is trapped is:

  ee03ef10  ->mcr 15, 0, lr, cr3, cr0, {0}

  The guest is running in SVC mode. So, LR should refer to LR_svc there.
  LR_svc is mapped to X18 in AA64. So, ESR should reflect that. However,
  the actual ESR value is: 0xfe00dc0

  If we decode the 'rt':
  >>> (0xfe00dc0 >> 5) & 0x1f
  14

  My understanding is that 14 is incorrect in the context of AA64. rt
  should be set to 18. The current mode being SVC, LR refers to LR_svc
  not LR_usr. In other words, the mapping between registers in AA64 and
  AA32 doesn't seem to be accounted for. I've tested this with Qemu
  5.0.0

  Let me know if that makes sense and if you would like more info. I am also 
happy to test patches.
  Thanks for all the great work on Qemu!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1879587/+subscriptions

Re: [PATCH v6 4/5] crypto: Add tls-cipher-suites object

2020-05-19 Thread Laszlo Ersek

On 05/19/20 20:20, Philippe Mathieu-Daudé wrote:
> Example of use to dump:
>
>   $ qemu-system-x86_64 -S \
> -object tls-cipher-suites,id=mysuite,priority=@SYSTEM,verbose=yes
>   Cipher suites for @SYSTEM:
>   - TLS_AES_256_GCM_SHA3840x13, 0x02  
> TLS1.3
>   - TLS_CHACHA20_POLY1305_SHA256  0x13, 0x03  
> TLS1.3
>   - TLS_AES_128_GCM_SHA2560x13, 0x01  
> TLS1.3
>   - TLS_AES_128_CCM_SHA2560x13, 0x04  
> TLS1.3
>   - TLS_ECDHE_RSA_AES_256_GCM_SHA384  0xc0, 0x30  
> TLS1.2
>   - TLS_ECDHE_RSA_CHACHA20_POLY1305   0xcc, 0xa8  
> TLS1.2
>   - TLS_ECDHE_RSA_AES_256_CBC_SHA10xc0, 0x14  
> TLS1.0
>   - TLS_ECDHE_RSA_AES_128_GCM_SHA256  0xc0, 0x2f  
> TLS1.2
>   - TLS_ECDHE_RSA_AES_128_CBC_SHA10xc0, 0x13  
> TLS1.0
>   - TLS_ECDHE_ECDSA_AES_256_GCM_SHA3840xc0, 0x2c  
> TLS1.2
>   - TLS_ECDHE_ECDSA_CHACHA20_POLY1305 0xcc, 0xa9  
> TLS1.2
>   - TLS_ECDHE_ECDSA_AES_256_CCM   0xc0, 0xad  
> TLS1.2
>   - TLS_ECDHE_ECDSA_AES_256_CBC_SHA1  0xc0, 0x0a  
> TLS1.0
>   - TLS_ECDHE_ECDSA_AES_128_GCM_SHA2560xc0, 0x2b  
> TLS1.2
>   - TLS_ECDHE_ECDSA_AES_128_CCM   0xc0, 0xac  
> TLS1.2
>   - TLS_ECDHE_ECDSA_AES_128_CBC_SHA1  0xc0, 0x09  
> TLS1.0
>   - TLS_RSA_AES_256_GCM_SHA3840x00, 0x9d  
> TLS1.2
>   - TLS_RSA_AES_256_CCM   0xc0, 0x9d  
> TLS1.2
>   - TLS_RSA_AES_256_CBC_SHA1  0x00, 0x35  
> TLS1.0
>   - TLS_RSA_AES_128_GCM_SHA2560x00, 0x9c  
> TLS1.2
>   - TLS_RSA_AES_128_CCM   0xc0, 0x9c  
> TLS1.2
>   - TLS_RSA_AES_128_CBC_SHA1  0x00, 0x2f  
> TLS1.0
>   - TLS_DHE_RSA_AES_256_GCM_SHA3840x00, 0x9f  
> TLS1.2
>   - TLS_DHE_RSA_CHACHA20_POLY1305 0xcc, 0xaa  
> TLS1.2
>   - TLS_DHE_RSA_AES_256_CCM   0xc0, 0x9f  
> TLS1.2
>   - TLS_DHE_RSA_AES_256_CBC_SHA1  0x00, 0x39  
> TLS1.0
>   - TLS_DHE_RSA_AES_128_GCM_SHA2560x00, 0x9e  
> TLS1.2
>   - TLS_DHE_RSA_AES_128_CCM   0xc0, 0x9e  
> TLS1.2
>   - TLS_DHE_RSA_AES_128_CBC_SHA1  0x00, 0x33  
> TLS1.0
>   total: 29 ciphers
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/crypto/tls-cipher-suites.h |  39 +
>  crypto/tls-cipher-suites.c | 133 +
>  crypto/Makefile.objs   |   1 +
>  3 files changed, 173 insertions(+)
>  create mode 100644 include/crypto/tls-cipher-suites.h
>  create mode 100644 crypto/tls-cipher-suites.c
>
> diff --git a/include/crypto/tls-cipher-suites.h 
> b/include/crypto/tls-cipher-suites.h
> new file mode 100644
> index 00..31e92916e1
> --- /dev/null
> +++ b/include/crypto/tls-cipher-suites.h
> @@ -0,0 +1,39 @@
> +/*
> + * QEMU TLS Cipher Suites
> + *
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * Author: Philippe Mathieu-Daudé 
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef QCRYPTO_TLSCIPHERSUITES_H
> +#define QCRYPTO_TLSCIPHERSUITES_H
> +
> +#include "qom/object.h"
> +#include "crypto/tlscreds.h"
> +
> +#define TYPE_QCRYPTO_TLS_CIPHER_SUITES "tls-cipher-suites"
> +#define QCRYPTO_TLS_CIPHER_SUITES(obj) \
> +OBJECT_CHECK(QCryptoTLSCipherSuites, (obj), 
> TYPE_QCRYPTO_TLS_CIPHER_SUITES)
> +
> +/*
> + * IANA registered TLS ciphers:
> + * 
> https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-4
> + */
> +typedef struct {
> +uint8_t data[2];
> +} IANA_TLS_CIPHER;

(1) I propose marking this as QEMU_PACKED, even if only for
documentation purposes.

> +
> +typedef struct QCryptoTLSCipherSuites {
> +/*  */
> +QCryptoTLSCreds parent_obj;
> +
> +/*  */
> +bool verbose;
> +IANA_TLS_CIPHER *cipher_list;
> +unsigned cipher_count;
> +} QCryptoTLSCipherSuites;
> +
> +#endif /* QCRYPTO_TLSCIPHERSUITES_H */
> diff --git a/crypto/tls-cipher-suites.c b/crypto/tls-cipher-suites.c
> new file mode 100644
> index 00..c6c51359bd
> --- /dev/null
> +++ b/crypto/tls-cipher-suites.c
> @@ -0,0 +1,133 @@
> +/*
> + * QEMU TLS Cipher Suites
> + *
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * Author: Philippe Mathieu-Daudé 
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qom/object_interfaces.h"
> +#include "qemu/error-report.h"
> +#include "crypto/tlscreds.h"
> +#include "crypto/tls-cipher-suites.h"

Re: [PATCH v6 10/16] floppy: move cmos_get_fd_drive_type() from pc

2020-05-19 Thread John Snow




On 5/19/20 10:51 AM, Philippe Mathieu-Daudé wrote:
> Missing "Signed-off-by: Gerd Hoffmann ",
> otherwise:
> 
> Reviewed-by: Philippe Mathieu-Daudé 
> 
> On 5/15/20 5:04 PM, Gerd Hoffmann wrote:

If you add the S-O-B:

Acked-by: John Snow 

>> ---
>>   include/hw/block/fdc.h |  1 +
>>   include/hw/i386/pc.h   |  1 -
>>   hw/block/fdc.c | 26 +-
>>   hw/i386/pc.c   | 25 -
>>   4 files changed, 26 insertions(+), 27 deletions(-)
>>
>> diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
>> index 5d71cf972268..479cebc0a330 100644
>> --- a/include/hw/block/fdc.h
>> +++ b/include/hw/block/fdc.h
>> @@ -16,5 +16,6 @@ void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
>>  DriveInfo **fds, qemu_irq *fdc_tc);
>>     FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
>> +int cmos_get_fd_drive_type(FloppyDriveType fd0);
>>     #endif
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index 8d764f965cd3..5e3b19ab78fc 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -176,7 +176,6 @@ typedef void (*cpu_set_smm_t)(int smm, void *arg);
>>   void pc_i8259_create(ISABus *isa_bus, qemu_irq *i8259_irqs);
>>     ISADevice *pc_find_fdc0(void);
>> -int cmos_get_fd_drive_type(FloppyDriveType fd0);
>>     /* port92.c */
>>   #define PORT92_A20_LINE "a20"
>> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
>> index 8024c822cea3..ea0fb8ee15b9 100644
>> --- a/hw/block/fdc.c
>> +++ b/hw/block/fdc.c
>> @@ -32,7 +32,6 @@
>>   #include "qapi/error.h"
>>   #include "qemu/error-report.h"
>>   #include "qemu/timer.h"
>> -#include "hw/i386/pc.h"
>>   #include "hw/acpi/aml-build.h"
>>   #include "hw/irq.h"
>>   #include "hw/isa/isa.h"
>> @@ -2809,6 +2808,31 @@ static Aml *build_fdinfo_aml(int idx,
>> FloppyDriveType type)
>>   return dev;
>>   }
>>   +int cmos_get_fd_drive_type(FloppyDriveType fd0)
>> +{
>> +    int val;
>> +
>> +    switch (fd0) {
>> +    case FLOPPY_DRIVE_TYPE_144:
>> +    /* 1.44 Mb 3"5 drive */
>> +    val = 4;
>> +    break;
>> +    case FLOPPY_DRIVE_TYPE_288:
>> +    /* 2.88 Mb 3"5 drive */
>> +    val = 5;
>> +    break;
>> +    case FLOPPY_DRIVE_TYPE_120:
>> +    /* 1.2 Mb 5"5 drive */
>> +    val = 2;
>> +    break;
>> +    case FLOPPY_DRIVE_TYPE_NONE:
>> +    default:
>> +    val = 0;
>> +    break;
>> +    }
>> +    return val;
>> +}
>> +
>>   static void fdc_isa_build_aml(ISADevice *isadev, Aml *scope)
>>   {
>>   Aml *dev;
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 2128f3d6fe8b..c5db7be6d8b1 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -385,31 +385,6 @@ static uint64_t ioportF0_read(void *opaque,
>> hwaddr addr, unsigned size)
>>     #define REG_EQUIPMENT_BYTE  0x14
>>   -int cmos_get_fd_drive_type(FloppyDriveType fd0)
>> -{
>> -    int val;
>> -
>> -    switch (fd0) {
>> -    case FLOPPY_DRIVE_TYPE_144:
>> -    /* 1.44 Mb 3"5 drive */
>> -    val = 4;
>> -    break;
>> -    case FLOPPY_DRIVE_TYPE_288:
>> -    /* 2.88 Mb 3"5 drive */
>> -    val = 5;
>> -    break;
>> -    case FLOPPY_DRIVE_TYPE_120:
>> -    /* 1.2 Mb 5"5 drive */
>> -    val = 2;
>> -    break;
>> -    case FLOPPY_DRIVE_TYPE_NONE:
>> -    default:
>> -    val = 0;
>> -    break;
>> -    }
>> -    return val;
>> -}
>> -
>>   static void cmos_init_hd(ISADevice *s, int type_ofs, int info_ofs,
>>    int16_t cylinders, int8_t heads, int8_t
>> sectors)
>>   {
>>
> 

-- 
—js

Re: QEMU 5.1: Can we require each new device/machine to provided a test?

2020-05-19 Thread John Snow




On 5/19/20 5:04 AM, Daniel P. Berrangé wrote:
> On Mon, May 18, 2020 at 03:56:36PM -0400, John Snow wrote:
>>
>>
>> On 5/15/20 6:23 AM, Daniel P. Berrangé wrote:
>>> On Fri, May 15, 2020 at 12:11:17PM +0200, Thomas Huth wrote:
 On 07/04/2020 12.59, Philippe Mathieu-Daudé wrote:
> Hello,
>
> Following Markus thread on deprecating unmaintained (untested) code
> (machines) [1] and the effort done to gather the information shared in
> the replies [2], and the various acceptance tests added, is it
> feasible to require for the next release that each new device/machine
> is provided a test covering it?
>
> If no, what is missing?

 If a qtest is feasible, yes, I think we should require one for new
 devices. But what about machines - you normally need a test image for
 this. In that case, there is still the question where testing images
 could be hosted. Not every developer has a web space where they could
 put their test images onto. And what about images that contain non-free
 code?
>>>
>>> Yep, it isn't feasible to make this a hard rule.
>>>
>>> IMHO this is where a support tier classification comes into play
>>>
>>>  - Tier 1: actively maintained, qtest coverage available. Expected
>>>to work reliably at all times since every commit is CI
>>>tested
>>>
>>>   - Tier 2: actively maintained, no qtest coverage. Should usually
>>>work but regression may creep in due to reliance on the
>>>maintainer to manually test on adhoc basis
>>>
>>>   - Tier 3: not actively maintained, unknown state but liable to
>>> be broken indefinitely
>>>
>>> Tier 1 is obviously the most desirable state we would like everthing to
>>> be at. Contributors will have to fix problems their patches cause as
>>> they will be blocked by CI.
>>>
>>> Tier 2 is an admission that reality gets in the way. Ideally stuff in
>>> this tier will graduate to Tier 1 at some point. Even if it doesn't
>>> though, it is still valid to keep it in QEMU long term. Contributors
>>> shouldn't gratuitously break stuff in these board, but if they do,
>>> then the maintainer is ultimately responsible for fixing it, as the
>>> contributors don't have a test rig for it.
>>>
>>> Tier 3 is abandonware. If a maintainer doesn't appear, users should
>>> not expect it to continue to exist long term. Contributors are free
>>> to send patches which break this, and are under no obligation to
>>> fix problems in these boards. We may deprecate & delete it after a
>>> while
>>>
>>>
>>> Over time we'll likely add more criteria to stuff in Tier 1. This
>>> could lead to some things dropping from Tier 1 to Tier 2. This is
>>> OK, as it doesn't make those things worse than they already were.
>>> We're just saying that Tier 2 isn't as thoroughly tested as we
>>> would like it to be in an ideal world.
>>
>> I really like the idea of device support tiers codified directly in the
>> QEMU codebase, to give upstream users some idea of which devices we
>> expect to work and which we ... don't, really.
>>
>> Not every last device we offer is enterprise production ready, but we
>> don't necessarily do a good job of explaining which devices fall into
>> which categories, and we've got quite a few of them.
>>
>> I wonder if a 2.5th tier would be useful; something like a "hobbyist"
>> tier for pet project SoC boards and the like -- they're not abandoned,
>> but we also don't expect them to work, exactly.
>>
>> Mild semantic difference from Tier 3.
> 
> I guess I was thinking such hobbyist stuff would fall into tier 2  if the
> hobbyist maintainer actually responds to fixing stuff, or tier 3 if they
> largely aren't active on the mailing list responding to issues/questions.
> 
> We add have a 4 tier system overall and put hobbyist stuff at tier 3,
> and abandonware at tier 4.
> 
> Probably shouldn't go beyond 4 tiers though, as the more criteria we add
> the harder it is to clearly decide which tier something should go into.
> 
> The tier 1 vs 2 divison is clearly split based on CI which is a simple
> classification to decide on.
> 
> The tier 2 vs 3 division is moderately clearly split based on whether
> there is a frequently active maintainer.
> 
> We can probably squeeze in the 4th tier without too much ambiguity in
> the classisfication if we think it is adding something worthwhile either
> from our POV as maintainers, or for users consuming it.

Yes, I didn't mean to start watering it down into a 1,380 tier system
that we're never able to properly utilize.

I was thinking more along the lines of:

- Device works and is well loved
- Device works and is well loved (but we have to test manually)
- Device doesn't work, but is well loved
  (Use at your own peril, please file a bug report)
- Device doesn't work, and is unloved

Perhaps it'd be clearer to name these Tier 1A, 1B, 2, and 3; where
things can shift from 1A to 1B as their test coverage allows, but it's
not meant to

Re: [PATCH v6 5/5] crypto/tls-cipher-suites: Product fw_cfg consumable blob

2020-05-19 Thread Laszlo Ersek

On 05/19/20 20:20, Philippe Mathieu-Daudé wrote:
> Since our format is consumable by the fw_cfg device,
> we can implement the FW_CFG_DATA_GENERATOR interface.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  crypto/tls-cipher-suites.c | 19 +++
>  1 file changed, 19 insertions(+)

(1) s/product/produce/ in the subject line.

This patch looks OK to me otherwise (but I don't feel confident enough
to give an R-b):

Acked-by: Laszlo Ersek 

Thanks
Laszlo

> 
> diff --git a/crypto/tls-cipher-suites.c b/crypto/tls-cipher-suites.c
> index c6c51359bd..11872783eb 100644
> --- a/crypto/tls-cipher-suites.c
> +++ b/crypto/tls-cipher-suites.c
> @@ -14,6 +14,7 @@
>  #include "qemu/error-report.h"
>  #include "crypto/tlscreds.h"
>  #include "crypto/tls-cipher-suites.h"
> +#include "hw/nvram/fw_cfg.h"
>  
>  static void parse_cipher_suites(QCryptoTLSCipherSuites *s,
>  const char *priority_name, Error **errp)
> @@ -101,11 +102,28 @@ static void qcrypto_tls_cipher_suites_finalize(Object 
> *obj)
>  g_free(s->cipher_list);
>  }
>  
> +static const void *qcrypto_tls_cipher_suites_get_data(Object *obj)
> +{
> +QCryptoTLSCipherSuites *s = QCRYPTO_TLS_CIPHER_SUITES(obj);
> +
> +return s->cipher_list;
> +}
> +
> +static size_t qcrypto_tls_cipher_suites_get_length(Object *obj)
> +{
> +QCryptoTLSCipherSuites *s = QCRYPTO_TLS_CIPHER_SUITES(obj);
> +
> +return s->cipher_count * sizeof(IANA_TLS_CIPHER);
> +}
> +
>  static void qcrypto_tls_cipher_suites_class_init(ObjectClass *oc, void *data)
>  {
>  UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
> +FWCfgDataGeneratorClass *fwgc = FW_CFG_DATA_GENERATOR_CLASS(oc);
>  
>  ucc->complete = qcrypto_tls_cipher_suites_complete;
> +fwgc->get_data = qcrypto_tls_cipher_suites_get_data;
> +fwgc->get_length = qcrypto_tls_cipher_suites_get_length;
>  
>  object_class_property_add_bool(oc, "verbose",
> qcrypto_tls_cipher_suites_get_verbose,
> @@ -121,6 +139,7 @@ static const TypeInfo qcrypto_tls_cipher_suites_info = {
>  .class_init = qcrypto_tls_cipher_suites_class_init,
>  .interfaces = (InterfaceInfo[]) {
>  { TYPE_USER_CREATABLE },
> +{ TYPE_FW_CFG_DATA_GENERATOR_INTERFACE },
>  { }
>  }
>  };
>

Re: [RFC PATCH v6 3/5] softmmu/vl: Allow -fw_cfg 'blob_id' option to set any file pathname

2020-05-19 Thread Laszlo Ersek

On 05/19/20 20:20, Philippe Mathieu-Daudé wrote:
> This is to silent:
> 
>   $ qemu-system-x86_64 \
> -object tls-cipher-suites,id=ciphersuite0,priority=@SYSTEM \
> -fw_cfg name=etc/edk2/https/ciphers,blob_id=ciphersuite0
>   qemu-system-x86_64: -fw_cfg 
> name=etc/edk2/https/ciphers,blob_id=ciphersuite0: warning: externally 
> provided fw_cfg item names should be prefixed with "opt/"
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  softmmu/vl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index f76c53ad2e..3b77dcc00d 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2052,7 +2052,7 @@ static int parse_fw_cfg(void *opaque, QemuOpts *opts, 
> Error **errp)
> FW_CFG_MAX_FILE_PATH - 1);
>  return -1;
>  }
> -if (strncmp(name, "opt/", 4) != 0) {
> +if (!nonempty_str(blob_id) && strncmp(name, "opt/", 4) != 0) {
>  warn_report("externally provided fw_cfg item names "
>  "should be prefixed with \"opt/\"");
>  }
> 

Hmmm, difficult question! Is "ciphersuite0" now externally provided or not?

Because, ciphersuite0 is populated internally to QEMU alright (and so we
can think it's internal), but its *association with the name* is external.

What if we keep the same "-object" switch, but use a different (bogus)
"name" with "-fw_cfg"? IMO that kind of invalidates "-object" too.

Should the fw_cfg generator interface dictate the fw_cfg filename too?
Because that would eliminate this problem. Put differently, we now have
a possibility to label the "ciphersuite0" object in the fw_cfg file
directory any way we want -- but is that freedom *useful* for anything?

I guess we might want multiple "tls-cipher-suites" objects one day, so
hard-coding the fw_cfg names on that level could cause conflicts. On the
other hand, I wouldn't like "blob_id" to generally circumvent the "etc/"
namespace protection.

I'm leaning towards agreeing with this patch, but I'd appreciate some
convincing arguments.

Thanks
Laszlo

Re: [PATCH v6 2/5] softmmu/vl: Let -fw_cfg option take a 'blob_id' argument

2020-05-19 Thread Laszlo Ersek

On 05/19/20 20:20, Philippe Mathieu-Daudé wrote:
> The 'blob_id' argument refers to a QOM object able to produce
> data consumable by the fw_cfg device. The producer object must
> implement the FW_CFG_DATA_GENERATOR interface.

OK, this answers my OBJECT_CHECK() question under patch #1 (in the
negative -- an assert would be wrong).

>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  softmmu/vl.c | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index ae5451bc23..f76c53ad2e 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -489,6 +489,10 @@ static QemuOptsList qemu_fw_cfg_opts = {
>  .name = "string",
>  .type = QEMU_OPT_STRING,
>  .help = "Sets content of the blob to be inserted from a string",
> +}, {
> +.name = "blob_id",
> +.type = QEMU_OPT_STRING,
> +.help = "Sets id of the object generating fw_cfg blob to be 
> used",
>  },
>  { /* end of list */ }
>  },
> @@ -2020,7 +2024,7 @@ static int parse_fw_cfg(void *opaque, QemuOpts *opts, 
> Error **errp)
>  {
>  gchar *buf;
>  size_t size;
> -const char *name, *file, *str;
> +const char *name, *file, *str, *blob_id;
>  FWCfgState *fw_cfg = (FWCfgState *) opaque;
>
>  if (fw_cfg == NULL) {
> @@ -2030,14 +2034,17 @@ static int parse_fw_cfg(void *opaque, QemuOpts *opts, 
> Error **errp)
>  name = qemu_opt_get(opts, "name");
>  file = qemu_opt_get(opts, "file");
>  str = qemu_opt_get(opts, "string");
> +blob_id = qemu_opt_get(opts, "blob_id");
>
>  /* we need name and either a file or the content string */

(1) Please update this comment. If the option is given, we need the
name, and exactly one of: file, content string, blob_id.

> -if (!(nonempty_str(name) && (nonempty_str(file) || nonempty_str(str {
> +if (!(nonempty_str(name)
> +  && (nonempty_str(file) || nonempty_str(str) || 
> nonempty_str(blob_id)))
> + ) {
>  error_setg(errp, "invalid argument(s)");
>  return -1;
>  }

(2) Coding style: does QEMU keep operators on the left or on the right
when breaking subconditions to new lines? (I vaguely recall "to the
right", but I could be wrong... Well, "hw/nvram/fw_cfg.c" has at least 7
examples of the operator being on the right.)

> -if (nonempty_str(file) && nonempty_str(str)) {
> -error_setg(errp, "file and string are mutually exclusive");
> +if (nonempty_str(file) && nonempty_str(str) && nonempty_str(blob_id)) {
> +error_setg(errp, "file, string and blob_id are mutually exclusive");
>  return -1;
>  }

(3) I believe this catches only when all three of name/string/blob_id
are given. But we should continue catching "two given".

How about reworking both "if"s, *and* the comment at (1) at the same
time, into:

if (!nonempty_str(name) ||
nonempty_str(file) + nonempty_str(str) + nonempty_str(blob_id) != 1) {
error_setg(errp, "name, plus exactly one of file, string and blob_id, "
   "are needed");
return -1;
}

(Regarding the addition, nonempty_str() returns a "bool", which is a
macro to _Bool, which is promoted to "int" or "unsigned int".)

>  if (strlen(name) > FW_CFG_MAX_FILE_PATH - 1) {
> @@ -2052,6 +2059,8 @@ static int parse_fw_cfg(void *opaque, QemuOpts *opts, 
> Error **errp)
>  if (nonempty_str(str)) {
>  size = strlen(str); /* NUL terminator NOT included in fw_cfg blob */
>  buf = g_memdup(str, size);
> +} else if (nonempty_str(blob_id)) {
> +return fw_cfg_add_from_generator(fw_cfg, name, blob_id, errp);
>  } else {
>  GError *err = NULL;
>  if (!g_file_get_contents(file, , , )) {
>

(4) The "-fw_cfg" command line option is documented in both the qemu(1)
manual, and the "docs/specs/fw_cfg.txt" file.

I think we may have to update those. In particular I mean *where* the
option is documented (in both texts).

In the manual, "-fw_cfg" is currently under "Debug/Expert options", but
that will no longer apply (I think?) after this series.

Similarly, in "docs/specs/fw_cfg.txt", the section is called "Externally
Provided Items" -- but that might not be strictly true any more either.

Maybe leave the current "-fw_cfg" mentions in peace, and document
"-fw_cfg blob_id=..." separately (in different docs sections)? The
"fw_cfg generators" concept could deserve dedicated sections.

Sorry that I can't make a good concrete suggestion. :(

Thanks,
Laszlo

Re: [PATCH 01/24] arm/stm32f405: Fix realization of "stm32f2xx-adc" devices

2020-05-19 Thread Alistair Francis

On Mon, May 18, 2020 at 10:08 PM Markus Armbruster  wrote:
>
> Alistair Francis  writes:
>
> > On Sun, May 17, 2020 at 10:06 PM Markus Armbruster  
> > wrote:
> >>
> >> stm32f405_soc_initfn() creates six such devices, but
> >> stm32f405_soc_realize() realizes only one.  Affects machine
> >> netduinoplus2.
> >>
> >> I wonder how this ever worked.  If the "device becomes real only on
> >> realize" thing actually works, then we've always been missing five of
> >> six such devices, yet nobody noticed.
> >
> > I must have just been testing the first ADC.
> >
> >>
> >> Fix stm32f405_soc_realize() to realize all six.  Visible in "info
> >> qtree":
> >>
> >>  bus: main-system-bus
> >>type System
> >>dev: stm32f405-soc, id ""
> >>  cpu-type = "cortex-m4-arm-cpu"
> >>dev: stm32f2xx-adc, id ""
> >>  gpio-out "sysbus-irq" 1
> >> -mmio /00ff
> >> +mmio 40012000/00ff
> >>dev: stm32f2xx-adc, id ""
> >>  gpio-out "sysbus-irq" 1
> >> -mmio /00ff
> >> +mmio 40012000/00ff
> >>dev: stm32f2xx-adc, id ""
> >>  gpio-out "sysbus-irq" 1
> >> -mmio /00ff
> >> +mmio 40012000/00ff
> >>dev: stm32f2xx-adc, id ""
> >>  gpio-out "sysbus-irq" 1
> >> -mmio /00ff
> >> +mmio 40012000/00ff
> >>dev: stm32f2xx-adc, id ""
> >>  gpio-out "sysbus-irq" 1
> >>  mmio 40012000/00ff
> >>dev: stm32f2xx-adc, id ""
> >>  gpio-out "sysbus-irq" 1
> >> -mmio /00ff
> >> +mmio 40012000/00ff
> >>dev: armv7m, id ""
> >>
> >> The mmio addresses look suspicious.
> >
> > Good catch, thanks :)
>
> I'd love to squash in corrections, but I don't know the correct
> addresses.  Can you help?

Yep, thanks for squashing it in.

The three addresses are:

0x40012000
0x40012100
0x40012200

and they all share interrupt number 18.

Let me know if you want me to do it.

Alistair

>
> >>
> >> Fixes: 529fc5fd3e18ace8f739afd02dc0953354f39442
> >> Cc: Alistair Francis 
> >> Cc: Peter Maydell 
> >> Cc: qemu-...@nongnu.org
> >> Signed-off-by: Markus Armbruster 
> >
> > Reviewed-by: Alistair Francis 
>
> Thanks!
>

Re: [PULL 00/10] softfloat misc cleanups

2020-05-19 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200519164957.26920-1-richard.hender...@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200519164957.26920-1-richard.hender...@linaro.org
Subject: [PULL 00/10] softfloat misc cleanups
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20200515170804.5707-1-phi...@redhat.com -> 
patchew/20200515170804.5707-1-phi...@redhat.com
 - [tag update]  patchew/20200516063746.18296-1-anup.pa...@wdc.com -> 
patchew/20200516063746.18296-1-anup.pa...@wdc.com
 * [new tag] patchew/cover.1589923785.git.alistair.fran...@wdc.com -> 
patchew/cover.1589923785.git.alistair.fran...@wdc.com
Switched to a new branch 'test'
c74e51d softfloat: Return bool from all classification predicates
796da14 softfloat: Inline floatx80 compare specializations
f3197d3 softfloat: Inline float128 compare specializations
c4e06ab softfloat: Inline float64 compare specializations
65adcae softfloat: Inline float32 compare specializations
4afb04e softfloat: Name compare relation enum
5383e5e softfloat: Name rounding mode enum
af9e7fd softfloat: Change tininess_before_rounding to bool
876ddf8 softfloat: Replace flag with bool
802d7a7 softfloat: Use post test for floatN_mul

=== OUTPUT BEGIN ===
1/10 Checking commit 802d7a73a18b (softfloat: Use post test for floatN_mul)
2/10 Checking commit 876ddf8f83ce (softfloat: Replace flag with bool)
3/10 Checking commit af9e7fdfdaed (softfloat: Change tininess_before_rounding 
to bool)
ERROR: space prohibited before that close parenthesis ')'
#68: FILE: fpu/softfloat.c:3877:
+  || (zExp < 0 )

total: 1 errors, 0 warnings, 143 lines checked

Patch 3/10 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/10 Checking commit 5383e5ecf1e3 (softfloat: Name rounding mode enum)
5/10 Checking commit 4afb04e42c21 (softfloat: Name compare relation enum)
6/10 Checking commit 65adcae6a290 (softfloat: Inline float32 compare 
specializations)
7/10 Checking commit c4e06abc4c09 (softfloat: Inline float64 compare 
specializations)
8/10 Checking commit f3197d3ad884 (softfloat: Inline float128 compare 
specializations)
9/10 Checking commit 796da149b721 (softfloat: Inline floatx80 compare 
specializations)
10/10 Checking commit c74e51d0ed0d (softfloat: Return bool from all 
classification predicates)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200519164957.26920-1-richard.hender...@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

RE: [EXTERNAL] Re: [PATCH] WHPX: Assigning maintainer for Windows Hypervisor Platform

2020-05-19 Thread Sunil Muthuswamy



> -Original Message-
> From: Stefan Weil 
> Sent: Thursday, February 20, 2020 11:54 PM
> To: Justin Terry (SF) ; Philippe Mathieu-Daudé 
> ; Sunil Muthuswamy
> ; Eduardo Habkost ; Paolo 
> Bonzini ; Richard
> Henderson 
> Cc: qemu-devel@nongnu.org
> Subject: Re: [EXTERNAL] Re: [PATCH] WHPX: Assigning maintainer for Windows 
> Hypervisor Platform
> 
> Am 19.02.20 um 16:50 schrieb Justin Terry (SF):
> 
> Hello Justin, hello Sunil,
> 
> just a reminder: we still have the problem with the proprietary license
> for the required Microsoft header files.
> 
> Can you estimate when this will be solved?
> 
> Regards,
> Stefan
> 

Adding Mike Battista, who is on the SDK team and can help provide some clarity 
around the questions about SDK licensing.

Re: [PATCH v6 1/5] hw/nvram/fw_cfg: Add the FW_CFG_DATA_GENERATOR interface

2020-05-19 Thread Laszlo Ersek

On 05/19/20 20:20, Philippe Mathieu-Daudé wrote:
> The FW_CFG_DATA_GENERATOR allow any object to product

(1) I suggest:

s/allow/allows/
s/product/produce/

> blob of data consumable by the fw_cfg device.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/hw/nvram/fw_cfg.h | 49 +++
>  hw/nvram/fw_cfg.c | 30 
>  2 files changed, 79 insertions(+)
> 
> diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
> index 25d9307018..74b4790fae 100644
> --- a/include/hw/nvram/fw_cfg.h
> +++ b/include/hw/nvram/fw_cfg.h
> @@ -9,11 +9,40 @@
>  #define TYPE_FW_CFG "fw_cfg"
>  #define TYPE_FW_CFG_IO  "fw_cfg_io"
>  #define TYPE_FW_CFG_MEM "fw_cfg_mem"
> +#define TYPE_FW_CFG_DATA_GENERATOR_INTERFACE "fw_cfg-data-generator"
>  
>  #define FW_CFG(obj) OBJECT_CHECK(FWCfgState,(obj), TYPE_FW_CFG)
>  #define FW_CFG_IO(obj)  OBJECT_CHECK(FWCfgIoState,  (obj), TYPE_FW_CFG_IO)
>  #define FW_CFG_MEM(obj) OBJECT_CHECK(FWCfgMemState, (obj), TYPE_FW_CFG_MEM)
>  
> +#define FW_CFG_DATA_GENERATOR_CLASS(class) \
> +OBJECT_CLASS_CHECK(FWCfgDataGeneratorClass, (class), \
> +   TYPE_FW_CFG_DATA_GENERATOR_INTERFACE)
> +#define FW_CFG_DATA_GENERATOR_GET_CLASS(obj) \
> +OBJECT_GET_CLASS(FWCfgDataGeneratorClass, (obj), \
> + TYPE_FW_CFG_DATA_GENERATOR_INTERFACE)
> +
> +typedef struct FWCfgDataGeneratorClass {
> +/*< private >*/
> +InterfaceClass parent_class;
> +/*< public >*/
> +
> +/**
> + * get_data:
> + * @obj: the object implementing this interface
> + *
> + * Returns: pointer to start of the generated item data
> + */
> +const void *(*get_data)(Object *obj);

I'm not familiar with QOM, so please excuse any dumb questions.

"const" suggests the blob returned remains owned by "obj"; that answers
the question whether the caller should attempt to free the blob. (The
answer is "no".)

(2) However, will this perhaps expose other functions, currently taking
non-const-qualified pointers, to which we'd like to pass the blob
returned by the above member function?

Because, then we'd have to cast away "const", and I find that much
uglier than removing the "const" from *here*, and adding a more verbose
comment as replacement.

Yes, this is clearly speculation -- IOW just a question. If all the
functions we're going to pass the return value to are fine with
pointer-to-const, then this interface should be OK.

(Obviously when I say "cast away const", I think of functions that do
not actually modify the object pointed-to by the non-const-qualified
pointer.)

> +/**
> + * get_length:
> + * @obj: the object implementing this interface
> + *
> + * Returns: the size of the generated item data in bytes
> + */
> +size_t (*get_length)(Object *obj);
> +} FWCfgDataGeneratorClass;
> +
>  typedef struct fw_cfg_file FWCfgFile;
>  
>  #define FW_CFG_ORDER_OVERRIDE_VGA70
> @@ -263,6 +292,26 @@ void fw_cfg_add_file_callback(FWCfgState *s, const char 
> *filename,
>  void *fw_cfg_modify_file(FWCfgState *s, const char *filename, void *data,
>   size_t len);
>  
> +/**
> + * fw_cfg_add_from_generator:
> + * @s: fw_cfg device being modified
> + * @filename: name of new fw_cfg file item
> + * @generator_id: name of object implementing FW_CFG_DATA_GENERATOR interface
> + * @errp: pointer to a NULL initialized error object
> + *
> + * Add a new NAMED fw_cfg item with the content generated from the
> + * @generator_id object. The data referenced by the starting pointer is 
> copied

(3) s/referenced by the starting pointer/generated by the @generator_id
object/

> + * into the data structure of the fw_cfg device.
> + * The next available (unused) selector key starting at FW_CFG_FILE_FIRST
> + * will be used; also, a new entry will be added to the file directory
> + * structure residing at key value FW_CFG_FILE_DIR, containing the item name,
> + * data size, and assigned selector key value.
> + *
> + * Returns: the size of the generated item data on success, -1 otherwise.

(4) I don't like ssize_t for a return value like this.

First, get_length() returns size_t, which may not be representable in an
ssize_t.

(Actually, it's worse than that; POSIX says, "the type ssize_t shall be
capable of storing values at least in the range [-1, {SSIZE_MAX}]" --
and if I run "getconf SSIZE_MAX", I get 32767. Indeed, _POSIX_SSIZE_MAX,
which is the minimum for any implementation's SSIZE_MAX, is 32767.)

Second, is a zero-sized blob useful in fw_cfg (from a generator)?

If it is not useful, then this function should return size_t, and use
retval=0 for signaling an error.

If a zero-sized blob is useful, then the function should return a bool
(in addition to producing "errp"), and output the blob size as a
separate parameter.

> + */
> +ssize_t fw_cfg_add_from_generator(FWCfgState *s, const char *filename,
> +  const

Re: [PATCH v2 10/10] hw/semihosting: Make the feature depend of TCG, and allow to disable it

2020-05-19 Thread Richard Henderson

On 5/15/20 10:08 AM, Philippe Mathieu-Daudé wrote:
> +++ b/hw/semihosting/Kconfig
> @@ -1,3 +1,5 @@
>  
> +# default is 'n'
>  config SEMIHOSTING
> -   bool
> +bool
> +depends on TCG
> diff --git a/target/arm/Kconfig b/target/arm/Kconfig
> new file mode 100644
> index 00..3224cac4ad
> --- /dev/null
> +++ b/target/arm/Kconfig
> @@ -0,0 +1,2 @@
> +config SEMIHOSTING
> +    default y if TCG

Do you really have to duplicate the TCG condition?


r~

Re: [PATCH v2 09/10] Makefile: Allow target-specific optional Kconfig

2020-05-19 Thread Richard Henderson

On 5/15/20 10:08 AM, Philippe Mathieu-Daudé wrote:
> Allow use of target-specific Kconfig file.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  Makefile | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 5/5] iotests: add commit top->base cases to 274

2020-05-19 Thread Eric Blake


On 5/19/20 4:25 PM, Vladimir Sementsov-Ogievskiy wrote:


$ ./qemu-img map --output=json top.qcow2
[{ "start": 0, "length": 1048576, "depth": 2, "zero": false, "data": 
true, "offset": 327680},
{ "start": 1048576, "length": 1048576, "depth": 0, "zero": true, 
"data": false}]


I think what we really want is:

[{ "start": 0, "length": 1048576, "depth": 2, "zero": false, "data": 
true, "offset": 327680},
{ "start": 1048576, "length": 1048576, "depth": 1, "zero": true, 
"data": false}]


because then we would be _accurately_ reporting that the zeroes that 
we read from 1m-2m come _because_ we read from mid (beyond EOF), which 
is different from our current answer that the zeroes come from top 
(they don't, because top deferred to mid). 


Right. This is exactly the logic which I bring to block_status_above and 
is_allocated_above by this series


If we fix up qemu-img map output to correctly report zeroes beyond EOF 
from the correct layer, will that also fix up the bug we are seeing in 
qemu-img commit?




No it will not fix it, because img_map has own implementation of 
block_status_above - get_block_status function in qemu-img.c, which goes 
through backing chain by itself, and is used only in img_map (not in 
img_convert). But you are right that it should be fixed too.


You are in a maze of twisty passages, all alike ;)

[Hope neither of us is eaten by a grue by the time we get this series in]

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2 04/10] accel/tcg: Add stub for probe_access()

2020-05-19 Thread Richard Henderson

On 5/15/20 10:07 AM, Philippe Mathieu-Daudé wrote:
> From: Philippe Mathieu-Daudé 
> 
> The TCG helpers where added in b92e5a22ec3 in softmmu_template.h.
> probe_write() was added in there in 3b4afc9e75a to be moved out
> to accel/tcg/cputlb.c in 3b08f0a9254, and was later refactored
> as probe_access() in c25c283df0f.
> Since it is a TCG specific helper, add a stub to avoid failures
> when building without TCG, such:
> 
>   target/arm/helper.o: In function `probe_read':
>   include/exec/exec-all.h:362: undefined reference to `probe_access'
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Richard Henderson 
> Cc: Emilio G. Cota 
> Cc: Alex Bennée 
> Cc: David Hildenbrand 
> ---
>  accel/stubs/tcg-stub.c | 7 +++
>  1 file changed, 7 insertions(+)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 1/5] block/io: fix bdrv_co_block_status_above

2020-05-19 Thread Eric Blake


On 5/19/20 4:13 PM, Vladimir Sementsov-Ogievskiy wrote:

19.05.2020 23:41, Eric Blake wrote:

On 5/19/20 2:54 PM, Vladimir Sementsov-Ogievskiy wrote:

bdrv_co_block_status_above has several problems with handling short
backing files:

1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
which produces these after-EOF zeros is inside requested backing
sequence.


That's intentional.  That portion of the guest-visible data reads as 
zero (BDRV_BLOCK_ZERO set) but was NOT read from the top layer, but 
rather synthesized by the block layer because it derived from the 
backing file but was beyond EOF of that backing layer 
(BDRV_BLOCK_ALLOCATED is clear).


Not in top yes. But _inside_ the requested base..top backing-chain-part. 
So it should be considered ALLOCATED, as we should not go to further 
backing.


Yes, I think I figured that out by patch 5.



Assume the following chain:

top    aa--
middle bb
base   

(so, middle is short)

block_status(top, 2) should return ZERO without ALLOCATED, as yes it's 
ZERO and yes, it's from another layer


block_status_above(top, base, 2) should return ZERO with ALLOCATED, as 
it's ZERO, and it's produced inside requested backing-chain-region, 
actually, it's produced because of short middle node. We must report 
ALLOCATED to show that we are not going to read from base.


Yes, that matches my intuition.  allocated_above says "where in the 
chain did we get the data, since it did not come from top", and the 
correct answer is "we got it from middle, due to synthesizing zero 
beyond EOF".  Okay, with that understanding in place, maybe this patch 
is right.  But I'll have to revisit it tomorrow on a fresh mind (it's 
too late in the day for me to be sure that I'm getting it all straight 
right now).








2. With want_zero=false, it may return pnum=0 prior to actual EOF,
because of EOF of short backing file.


Do you have a reproducer for this?


No, I don't have one, but it seems possible at least with 
want_zero=false. I'll think of it tomorrow, too tired now.


In my experience, this is not possible.  Generally, if you request 
status that overlaps EOF of the backing, you get a response truncated 
to the end of the backing, and you are then likely to follow up with a 
subsequent status request starting from the underlying EOF which then 
sees the desired unallocated zeroes:


back 
top  yy--
request    ^^
response   ^^
request  
response 


If we can come up with a reproducer where allocated_above returns 
pnum=0, that would indeed prove my initial hesitation wrong (perhaps by:


back
mid1xx
mid2
mid3xx
top 

for various different start and base points within the chain?)





Fix these things, making logic about short backing files clearer.

Note that 154 output changed, because now bdrv_block_status_above don't


doesn't


merge unallocated zeros with zeros after EOF (which are actually
"allocated" in POV of read from backing-chain top) and is_zero() just
don't understand that the whole head or tail is zero. We may update
is_zero to call bdrv_block_status_above several times, or add flag to
bdrv_block_status_above that we are not interested in ALLOCATED flag,
so ranges with different ALLOCATED status may be merged, but actually,
it seems that we'd better don't care about this corner case.


This actually sounds like an avoidable regression.  :(


I don't see real problem in it. But it seems not hard to avoid it, so I 
will try to.


I guess my real reasoning is: "I spent a lot of time trying to tweak 
that test to not lose the fact that the tail of the image reads as 
zero", because it looks weird if we later resize the image but still 
have a glitch in the middle reporting one non-zero cluster out of a 
larger range all because of the shenanigans that occurred around the 
tail prior to resizing.



+++ b/block/io.c
@@ -2461,25 +2461,45 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
  ret = bdrv_co_block_status(p, want_zero, offset, bytes, 
pnum, map,

 file);
  if (ret < 0) {
-    break;
+    return ret;
  }
-    if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+    if (*pnum == 0) {
+    if (first) {
+    return ret;
+    }
+
  /*
- * Reading beyond the end of the file continues to read
- * zeroes, but we can only widen the result to the
- * unallocated length we learned from an earlier
- * iteration.
+ * Reads from bs for the selected region will return 
zeroes,
+ * produced because the current level is short. We 
should consider

+ * it as allocated.


Why?  If we replaced the backing file to something longer (qemu-img 
rebase -u), we would WANT

[PATCH v3 9/9] target/riscv: Use a smaller guess size for no-MMU PMP

2020-05-19 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 target/riscv/pmp.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 0e6b640fbd..607a991260 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -233,12 +233,16 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
 return true;
 }
 
-/*
- * if size is unknown (0), assume that all bytes
- * from addr to the end of the page will be accessed.
- */
 if (size == 0) {
-pmp_size = -(addr | TARGET_PAGE_MASK);
+if (!riscv_feature(env, RISCV_FEATURE_MMU)) {
+/*
+ * If size is unknown (0), assume that all bytes
+ * from addr to the end of the page will be accessed.
+ */
+pmp_size = -(addr | TARGET_PAGE_MASK);
+} else {
+pmp_size = sizeof(target_ulong);
+}
 } else {
 pmp_size = size;
 }
-- 
2.26.2

[PATCH v3 8/9] riscv/opentitan: Connect the UART device

2020-05-19 Thread Alistair Francis

Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
---
 include/hw/riscv/opentitan.h | 13 +
 hw/riscv/opentitan.c | 24 ++--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/include/hw/riscv/opentitan.h b/include/hw/riscv/opentitan.h
index 8d6a09b696..825a3610bc 100644
--- a/include/hw/riscv/opentitan.h
+++ b/include/hw/riscv/opentitan.h
@@ -21,6 +21,7 @@
 
 #include "hw/riscv/riscv_hart.h"
 #include "hw/intc/ibex_plic.h"
+#include "hw/char/ibex_uart.h"
 
 #define TYPE_RISCV_IBEX_SOC "riscv.lowrisc.ibex.soc"
 #define RISCV_IBEX_SOC(obj) \
@@ -33,6 +34,7 @@ typedef struct LowRISCIbexSoCState {
 /*< public >*/
 RISCVHartArrayState cpus;
 IbexPlicState plic;
+IbexUartState uart;
 
 MemoryRegion flash_mem;
 MemoryRegion rom;
@@ -63,4 +65,15 @@ enum {
 IBEX_USBDEV,
 };
 
+enum {
+IBEX_UART_RX_PARITY_ERR_IRQ = 0x28,
+IBEX_UART_RX_TIMEOUT_IRQ = 0x27,
+IBEX_UART_RX_BREAK_ERR_IRQ = 0x26,
+IBEX_UART_RX_FRAME_ERR_IRQ = 0x25,
+IBEX_UART_RX_OVERFLOW_IRQ = 0x24,
+IBEX_UART_TX_EMPTY_IRQ = 0x23,
+IBEX_UART_RX_WATERMARK_IRQ = 0x22,
+IBEX_UART_TX_WATERMARK_IRQ = 0x21
+};
+
 #endif
diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c
index 3926321d8c..a6c0b949ca 100644
--- a/hw/riscv/opentitan.c
+++ b/hw/riscv/opentitan.c
@@ -96,6 +96,9 @@ static void riscv_lowrisc_ibex_soc_init(Object *obj)
 
 sysbus_init_child_obj(obj, "plic", >plic,
   sizeof(s->plic), TYPE_IBEX_PLIC);
+
+sysbus_init_child_obj(obj, "uart", >uart,
+  sizeof(s->uart), TYPE_IBEX_UART);
 }
 
 static void riscv_lowrisc_ibex_soc_realize(DeviceState *dev_soc, Error **errp)
@@ -137,8 +140,25 @@ static void riscv_lowrisc_ibex_soc_realize(DeviceState 
*dev_soc, Error **errp)
 busdev = SYS_BUS_DEVICE(dev);
 sysbus_mmio_map(busdev, 0, memmap[IBEX_PLIC].base);
 
-create_unimplemented_device("riscv.lowrisc.ibex.uart",
-memmap[IBEX_UART].base, memmap[IBEX_UART].size);
+/* UART */
+dev = DEVICE(&(s->uart));
+qdev_prop_set_chr(dev, "chardev", serial_hd(0));
+object_property_set_bool(OBJECT(>uart), true, "realized", );
+if (err != NULL) {
+error_propagate(errp, err);
+return;
+}
+busdev = SYS_BUS_DEVICE(dev);
+sysbus_mmio_map(busdev, 0, memmap[IBEX_UART].base);
+sysbus_connect_irq(busdev, 0, qdev_get_gpio_in(DEVICE(>plic),
+   IBEX_UART_TX_WATERMARK_IRQ));
+sysbus_connect_irq(busdev, 1, qdev_get_gpio_in(DEVICE(>plic),
+   IBEX_UART_RX_WATERMARK_IRQ));
+sysbus_connect_irq(busdev, 2, qdev_get_gpio_in(DEVICE(>plic),
+   IBEX_UART_TX_EMPTY_IRQ));
+sysbus_connect_irq(busdev, 3, qdev_get_gpio_in(DEVICE(>plic),
+   IBEX_UART_RX_OVERFLOW_IRQ));
+
 create_unimplemented_device("riscv.lowrisc.ibex.gpio",
 memmap[IBEX_GPIO].base, memmap[IBEX_GPIO].size);
 create_unimplemented_device("riscv.lowrisc.ibex.spi",
-- 
2.26.2

[PATCH v3 7/9] riscv/opentitan: Connect the PLIC device

2020-05-19 Thread Alistair Francis

Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
---
 include/hw/riscv/opentitan.h |  3 +++
 hw/riscv/opentitan.c | 19 +--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/hw/riscv/opentitan.h b/include/hw/riscv/opentitan.h
index 15a3d87ed0..8d6a09b696 100644
--- a/include/hw/riscv/opentitan.h
+++ b/include/hw/riscv/opentitan.h
@@ -20,6 +20,7 @@
 #define HW_OPENTITAN_H
 
 #include "hw/riscv/riscv_hart.h"
+#include "hw/intc/ibex_plic.h"
 
 #define TYPE_RISCV_IBEX_SOC "riscv.lowrisc.ibex.soc"
 #define RISCV_IBEX_SOC(obj) \
@@ -31,6 +32,8 @@ typedef struct LowRISCIbexSoCState {
 
 /*< public >*/
 RISCVHartArrayState cpus;
+IbexPlicState plic;
+
 MemoryRegion flash_mem;
 MemoryRegion rom;
 } LowRISCIbexSoCState;
diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c
index c00f0720ab..3926321d8c 100644
--- a/hw/riscv/opentitan.c
+++ b/hw/riscv/opentitan.c
@@ -25,6 +25,7 @@
 #include "hw/misc/unimp.h"
 #include "hw/riscv/boot.h"
 #include "exec/address-spaces.h"
+#include "sysemu/sysemu.h"
 
 static const struct MemmapEntry {
 hwaddr base;
@@ -92,6 +93,9 @@ static void riscv_lowrisc_ibex_soc_init(Object *obj)
 object_initialize_child(obj, "cpus", >cpus,
 sizeof(s->cpus), TYPE_RISCV_HART_ARRAY,
 _abort, NULL);
+
+sysbus_init_child_obj(obj, "plic", >plic,
+  sizeof(s->plic), TYPE_IBEX_PLIC);
 }
 
 static void riscv_lowrisc_ibex_soc_realize(DeviceState *dev_soc, Error **errp)
@@ -100,6 +104,9 @@ static void riscv_lowrisc_ibex_soc_realize(DeviceState 
*dev_soc, Error **errp)
 MachineState *ms = MACHINE(qdev_get_machine());
 LowRISCIbexSoCState *s = RISCV_IBEX_SOC(dev_soc);
 MemoryRegion *sys_mem = get_system_memory();
+DeviceState *dev;
+SysBusDevice *busdev;
+Error *err = NULL;
 
 object_property_set_str(OBJECT(>cpus), ms->cpu_type, "cpu-type",
 _abort);
@@ -120,6 +127,16 @@ static void riscv_lowrisc_ibex_soc_realize(DeviceState 
*dev_soc, Error **errp)
 memory_region_add_subregion(sys_mem, memmap[IBEX_FLASH].base,
 >flash_mem);
 
+/* PLIC */
+dev = DEVICE(>plic);
+object_property_set_bool(OBJECT(>plic), true, "realized", );
+if (err != NULL) {
+error_propagate(errp, err);
+return;
+}
+busdev = SYS_BUS_DEVICE(dev);
+sysbus_mmio_map(busdev, 0, memmap[IBEX_PLIC].base);
+
 create_unimplemented_device("riscv.lowrisc.ibex.uart",
 memmap[IBEX_UART].base, memmap[IBEX_UART].size);
 create_unimplemented_device("riscv.lowrisc.ibex.gpio",
@@ -134,8 +151,6 @@ static void riscv_lowrisc_ibex_soc_realize(DeviceState 
*dev_soc, Error **errp)
 memmap[IBEX_AES].base, memmap[IBEX_AES].size);
 create_unimplemented_device("riscv.lowrisc.ibex.hmac",
 memmap[IBEX_HMAC].base, memmap[IBEX_HMAC].size);
-create_unimplemented_device("riscv.lowrisc.ibex.plic",
-memmap[IBEX_PLIC].base, memmap[IBEX_PLIC].size);
 create_unimplemented_device("riscv.lowrisc.ibex.pinmux",
 memmap[IBEX_PINMUX].base, memmap[IBEX_PINMUX].size);
 create_unimplemented_device("riscv.lowrisc.ibex.alert_handler",
-- 
2.26.2

[PATCH v3 5/9] hw/char: Initial commit of Ibex UART

2020-05-19 Thread Alistair Francis

This is the initial commit of the Ibex UART device. Serial TX is
working, while RX has been implemeneted but untested.

This is based on the documentation from:
https://docs.opentitan.org/hw/ip/uart/doc/

Signed-off-by: Alistair Francis 
---
 include/hw/char/ibex_uart.h | 110 
 hw/char/ibex_uart.c | 492 
 MAINTAINERS |   2 +
 hw/char/Makefile.objs   |   1 +
 hw/riscv/Kconfig|   4 +
 5 files changed, 609 insertions(+)
 create mode 100644 include/hw/char/ibex_uart.h
 create mode 100644 hw/char/ibex_uart.c

diff --git a/include/hw/char/ibex_uart.h b/include/hw/char/ibex_uart.h
new file mode 100644
index 00..2bec772615
--- /dev/null
+++ b/include/hw/char/ibex_uart.h
@@ -0,0 +1,110 @@
+/*
+ * QEMU lowRISC Ibex UART device
+ *
+ * Copyright (c) 2020 Western Digital
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef HW_IBEX_UART_H
+#define HW_IBEX_UART_H
+
+#include "hw/sysbus.h"
+#include "chardev/char-fe.h"
+#include "qemu/timer.h"
+
+#define IBEX_UART_INTR_STATE   0x00
+#define INTR_STATE_TX_WATERMARK (1 << 0)
+#define INTR_STATE_RX_WATERMARK (1 << 1)
+#define INTR_STATE_TX_EMPTY (1 << 2)
+#define INTR_STATE_RX_OVERFLOW  (1 << 3)
+#define IBEX_UART_INTR_ENABLE  0x04
+#define IBEX_UART_INTR_TEST0x08
+
+#define IBEX_UART_CTRL 0x0c
+#define UART_CTRL_TX_ENABLE (1 << 0)
+#define UART_CTRL_RX_ENABLE (1 << 1)
+#define UART_CTRL_NF(1 << 2)
+#define UART_CTRL_SLPBK (1 << 4)
+#define UART_CTRL_LLPBK (1 << 5)
+#define UART_CTRL_PARITY_EN (1 << 6)
+#define UART_CTRL_PARITY_ODD(1 << 7)
+#define UART_CTRL_RXBLVL(3 << 8)
+#define UART_CTRL_NCO   (0x << 16)
+
+#define IBEX_UART_STATUS   0x10
+#define UART_STATUS_TXFULL  (1 << 0)
+#define UART_STATUS_RXFULL  (1 << 1)
+#define UART_STATUS_TXEMPTY (1 << 2)
+#define UART_STATUS_RXIDLE  (1 << 4)
+#define UART_STATUS_RXEMPTY (1 << 5)
+
+#define IBEX_UART_RDATA0x14
+#define IBEX_UART_WDATA0x18
+
+#define IBEX_UART_FIFO_CTRL0x1c
+#define FIFO_CTRL_RXRST  (1 << 0)
+#define FIFO_CTRL_TXRST  (1 << 1)
+#define FIFO_CTRL_RXILVL (7 << 2)
+#define FIFO_CTRL_RXILVL_SHIFT   (2)
+#define FIFO_CTRL_TXILVL (3 << 5)
+#define FIFO_CTRL_TXILVL_SHIFT   (5)
+
+#define IBEX_UART_FIFO_STATUS  0x20
+#define IBEX_UART_OVRD 0x24
+#define IBEX_UART_VAL  0x28
+#define IBEX_UART_TIMEOUT_CTRL 0x2c
+
+#define IBEX_UART_TX_FIFO_SIZE 16
+
+#define TYPE_IBEX_UART "ibex-uart"
+#define IBEX_UART(obj) \
+OBJECT_CHECK(IbexUartState, (obj), TYPE_IBEX_UART)
+
+typedef struct {
+/*  */
+SysBusDevice parent_obj;
+
+/*  */
+MemoryRegion mmio;
+
+uint8_t tx_fifo[IBEX_UART_TX_FIFO_SIZE];
+uint32_t tx_level;
+
+QEMUTimer *fifo_trigger_handle;
+uint64_t char_tx_time;
+
+uint32_t uart_intr_state;
+uint32_t uart_intr_enable;
+uint32_t uart_ctrl;
+uint32_t uart_status;
+uint32_t uart_rdata;
+uint32_t uart_fifo_ctrl;
+uint32_t uart_fifo_status;
+uint32_t uart_ovrd;
+uint32_t uart_val;
+uint32_t uart_timeout_ctrl;
+
+CharBackend chr;
+qemu_irq tx_watermark;
+qemu_irq rx_watermark;
+qemu_irq tx_empty;
+qemu_irq rx_overflow;
+} IbexUartState;
+#endif /* HW_IBEX_UART_H */
diff --git a/hw/char/ibex_uart.c b/hw/char/ibex_uart.c
new file mode 100644
index 00..c416325d73
--- /dev/null
+++ b/hw/char/ibex_uart.c
@@ -0,0 +1,492 @@
+/*
+ * QEMU lowRISC Ibex UART device
+ *
+ * Copyright (c) 2020 Western Digital
+ *
+ * For details check the documentation here:
+ *https://docs.opentitan.org/hw/ip/uart/doc/
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"),

[PATCH v3 3/9] target/riscv: Add the lowRISC Ibex CPU

2020-05-19 Thread Alistair Francis

Ibex is a small and efficient, 32-bit, in-order RISC-V core with
a 2-stage pipeline that implements the RV32IMC instruction set
architecture.

For more details on lowRISC see here:
https://github.com/lowRISC/ibex

Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
---
 target/riscv/cpu.h |  1 +
 target/riscv/cpu.c | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index d0e7f5b9c5..8733d7467f 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -35,6 +35,7 @@
 #define TYPE_RISCV_CPU_ANY  RISCV_CPU_TYPE_NAME("any")
 #define TYPE_RISCV_CPU_BASE32   RISCV_CPU_TYPE_NAME("rv32")
 #define TYPE_RISCV_CPU_BASE64   RISCV_CPU_TYPE_NAME("rv64")
+#define TYPE_RISCV_CPU_IBEX RISCV_CPU_TYPE_NAME("lowrisc-ibex")
 #define TYPE_RISCV_CPU_SIFIVE_E31   RISCV_CPU_TYPE_NAME("sifive-e31")
 #define TYPE_RISCV_CPU_SIFIVE_E34   RISCV_CPU_TYPE_NAME("sifive-e34")
 #define TYPE_RISCV_CPU_SIFIVE_E51   RISCV_CPU_TYPE_NAME("sifive-e51")
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 5eb3c02735..eb2bbc87ae 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -156,6 +156,15 @@ static void rv32gcsu_priv1_10_0_cpu_init(Object *obj)
 set_feature(env, RISCV_FEATURE_PMP);
 }
 
+static void rv32imcu_nommu_cpu_init(Object *obj)
+{
+CPURISCVState *env = _CPU(obj)->env;
+set_misa(env, RV32 | RVI | RVM | RVC | RVU);
+set_priv_version(env, PRIV_VERSION_1_10_0);
+set_resetvec(env, 0x8090);
+set_feature(env, RISCV_FEATURE_PMP);
+}
+
 static void rv32imacu_nommu_cpu_init(Object *obj)
 {
 CPURISCVState *env = _CPU(obj)->env;
@@ -619,6 +628,7 @@ static const TypeInfo riscv_cpu_type_infos[] = {
 DEFINE_CPU(TYPE_RISCV_CPU_ANY,  riscv_any_cpu_init),
 #if defined(TARGET_RISCV32)
 DEFINE_CPU(TYPE_RISCV_CPU_BASE32,   riscv_base32_cpu_init),
+DEFINE_CPU(TYPE_RISCV_CPU_IBEX, rv32imcu_nommu_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_SIFIVE_E31,   rv32imacu_nommu_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_SIFIVE_E34,   rv32imafcu_nommu_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_SIFIVE_U34,   rv32gcsu_priv1_10_0_cpu_init),
-- 
2.26.2

[PATCH v3 6/9] hw/intc: Initial commit of lowRISC Ibex PLIC

2020-05-19 Thread Alistair Francis

The Ibex core contains a PLIC that although similar to the RISC-V spec
is not RISC-V spec compliant.

This patch implements a Ibex PLIC in a somewhat generic way.

As the current RISC-V PLIC needs tidying up, my hope is that as the Ibex
PLIC move towards spec compliance this PLIC implementation can be
updated until it can replace the current PLIC.

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/hw/intc/ibex_plic.h |  63 +
 hw/intc/ibex_plic.c | 261 
 MAINTAINERS |   2 +
 hw/intc/Makefile.objs   |   1 +
 4 files changed, 327 insertions(+)
 create mode 100644 include/hw/intc/ibex_plic.h
 create mode 100644 hw/intc/ibex_plic.c

diff --git a/include/hw/intc/ibex_plic.h b/include/hw/intc/ibex_plic.h
new file mode 100644
index 00..ddc7909903
--- /dev/null
+++ b/include/hw/intc/ibex_plic.h
@@ -0,0 +1,63 @@
+/*
+ * QEMU RISC-V lowRISC Ibex PLIC
+ *
+ * Copyright (c) 2020 Western Digital
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef HW_IBEX_PLIC_H
+#define HW_IBEX_PLIC_H
+
+#include "hw/sysbus.h"
+
+#define TYPE_IBEX_PLIC "ibex-plic"
+#define IBEX_PLIC(obj) \
+OBJECT_CHECK(IbexPlicState, (obj), TYPE_IBEX_PLIC)
+
+typedef struct IbexPlicState {
+/*< private >*/
+SysBusDevice parent_obj;
+
+/*< public >*/
+MemoryRegion mmio;
+
+uint32_t *pending;
+uint32_t *source;
+uint32_t *priority;
+uint32_t *enable;
+uint32_t threshold;
+uint32_t claim;
+
+/* config */
+uint32_t num_cpus;
+uint32_t num_sources;
+
+uint32_t pending_base;
+uint32_t pending_num;
+
+uint32_t source_base;
+uint32_t source_num;
+
+uint32_t priority_base;
+uint32_t priority_num;
+
+uint32_t enable_base;
+uint32_t enable_num;
+
+uint32_t threshold_base;
+
+uint32_t claim_base;
+} IbexPlicState;
+
+#endif /* HW_IBEX_PLIC_H */
diff --git a/hw/intc/ibex_plic.c b/hw/intc/ibex_plic.c
new file mode 100644
index 00..41079518c6
--- /dev/null
+++ b/hw/intc/ibex_plic.c
@@ -0,0 +1,261 @@
+/*
+ * QEMU RISC-V lowRISC Ibex PLIC
+ *
+ * Copyright (c) 2020 Western Digital
+ *
+ * Documentation avaliable: https://docs.opentitan.org/hw/ip/rv_plic/doc/
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "hw/core/cpu.h"
+#include "hw/boards.h"
+#include "hw/pci/msi.h"
+#include "target/riscv/cpu_bits.h"
+#include "target/riscv/cpu.h"
+#include "hw/intc/ibex_plic.h"
+
+static bool addr_between(uint32_t addr, uint32_t base, uint32_t num)
+{
+uint32_t end = base + (num * 0x04);
+
+if (addr >= base && addr < end) {
+return true;
+}
+
+return false;
+}
+
+static void ibex_plic_irqs_set_pending(IbexPlicState *s, int irq, bool level)
+{
+int pending_num = irq / 32;
+
+s->pending[pending_num] |= level << (irq % 32);
+}
+
+static bool ibex_plic_irqs_pending(IbexPlicState *s, uint32_t context)
+{
+int i;
+
+for (i = 0; i < s->pending_num; i++) {
+uint32_t irq_num = ctz64(s->pending[i]) + (i * 32);
+
+if (!(s->pending[i] & s->enable[i])) {
+/* No pending and enabled IRQ */
+continue;
+}
+
+if (s->priority[irq_num] > s->threshold) {
+if (!s->claim) {
+s->claim = irq_num;
+}
+return true;
+}
+}
+
+return false;
+}
+
+static void ibex_plic_update(IbexPlicState *s)
+{
+CPUState *cpu;
+int level, i;
+
+for (i = 0; i < s->num_cpus; i++) {
+cpu = qemu_get_cpu(i);
+
+if (!cpu) {
+continue;
+}
+
+level = ibex_plic_irqs_pending(s, 0);
+
+riscv_cpu_update_mip(RISCV_CPU(cpu), MIP_MEIP, BOOL_TO_MASK(level));
+}
+}
+
+static

[PATCH v3 1/9] riscv/boot: Add a missing header include

2020-05-19 Thread Alistair Francis

Currently every c file that includes boot.h also includes loader.h
before it. Which is why the build works fine. We should be able to
include just boot.h though so this is a small fixup to allow that.

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Bin Meng 
---
 include/hw/riscv/boot.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/hw/riscv/boot.h b/include/hw/riscv/boot.h
index 474a940ad5..9daa98da08 100644
--- a/include/hw/riscv/boot.h
+++ b/include/hw/riscv/boot.h
@@ -21,6 +21,7 @@
 #define RISCV_BOOT_H
 
 #include "exec/cpu-defs.h"
+#include "hw/loader.h"
 
 void riscv_find_and_load_firmware(MachineState *machine,
   const char *default_machine_firmware,
-- 
2.26.2

[PATCH v3 4/9] riscv: Initial commit of OpenTitan machine

2020-05-19 Thread Alistair Francis

This adds a barebone OpenTitan machine to QEMU.

Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
---
 default-configs/riscv32-softmmu.mak |   1 +
 default-configs/riscv64-softmmu.mak |  11 +-
 include/hw/riscv/opentitan.h|  63 +++
 hw/riscv/opentitan.c| 169 
 MAINTAINERS |   9 ++
 hw/riscv/Kconfig|   5 +
 hw/riscv/Makefile.objs  |   1 +
 7 files changed, 258 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/riscv/opentitan.h
 create mode 100644 hw/riscv/opentitan.c

diff --git a/default-configs/riscv32-softmmu.mak 
b/default-configs/riscv32-softmmu.mak
index 1ae077ed87..94a236c9c2 100644
--- a/default-configs/riscv32-softmmu.mak
+++ b/default-configs/riscv32-softmmu.mak
@@ -10,3 +10,4 @@ CONFIG_SPIKE=y
 CONFIG_SIFIVE_E=y
 CONFIG_SIFIVE_U=y
 CONFIG_RISCV_VIRT=y
+CONFIG_OPENTITAN=y
diff --git a/default-configs/riscv64-softmmu.mak 
b/default-configs/riscv64-softmmu.mak
index 235c6f473f..aaf6d735bb 100644
--- a/default-configs/riscv64-softmmu.mak
+++ b/default-configs/riscv64-softmmu.mak
@@ -1,3 +1,12 @@
 # Default configuration for riscv64-softmmu
 
-include riscv32-softmmu.mak
+# Uncomment the following lines to disable these optional devices:
+#
+#CONFIG_PCI_DEVICES=n
+
+# Boards:
+#
+CONFIG_SPIKE=y
+CONFIG_SIFIVE_E=y
+CONFIG_SIFIVE_U=y
+CONFIG_RISCV_VIRT=y
diff --git a/include/hw/riscv/opentitan.h b/include/hw/riscv/opentitan.h
new file mode 100644
index 00..15a3d87ed0
--- /dev/null
+++ b/include/hw/riscv/opentitan.h
@@ -0,0 +1,63 @@
+/*
+ * QEMU RISC-V Board Compatible with OpenTitan FPGA platform
+ *
+ * Copyright (c) 2020 Western Digital
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef HW_OPENTITAN_H
+#define HW_OPENTITAN_H
+
+#include "hw/riscv/riscv_hart.h"
+
+#define TYPE_RISCV_IBEX_SOC "riscv.lowrisc.ibex.soc"
+#define RISCV_IBEX_SOC(obj) \
+OBJECT_CHECK(LowRISCIbexSoCState, (obj), TYPE_RISCV_IBEX_SOC)
+
+typedef struct LowRISCIbexSoCState {
+/*< private >*/
+SysBusDevice parent_obj;
+
+/*< public >*/
+RISCVHartArrayState cpus;
+MemoryRegion flash_mem;
+MemoryRegion rom;
+} LowRISCIbexSoCState;
+
+typedef struct OpenTitanState {
+/*< private >*/
+SysBusDevice parent_obj;
+
+/*< public >*/
+LowRISCIbexSoCState soc;
+} OpenTitanState;
+
+enum {
+IBEX_ROM,
+IBEX_RAM,
+IBEX_FLASH,
+IBEX_UART,
+IBEX_GPIO,
+IBEX_SPI,
+IBEX_FLASH_CTRL,
+IBEX_RV_TIMER,
+IBEX_AES,
+IBEX_HMAC,
+IBEX_PLIC,
+IBEX_PINMUX,
+IBEX_ALERT_HANDLER,
+IBEX_USBDEV,
+};
+
+#endif
diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c
new file mode 100644
index 00..c00f0720ab
--- /dev/null
+++ b/hw/riscv/opentitan.c
@@ -0,0 +1,169 @@
+/*
+ * QEMU RISC-V Board Compatible with OpenTitan FPGA platform
+ *
+ * Copyright (c) 2020 Western Digital
+ *
+ * Provides a board compatible with the OpenTitan FPGA platform:
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/riscv/opentitan.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "hw/misc/unimp.h"
+#include "hw/riscv/boot.h"
+#include "exec/address-spaces.h"
+
+static const struct MemmapEntry {
+hwaddr base;
+hwaddr size;
+} ibex_memmap[] = {
+[IBEX_ROM] ={  0x8000,   0xc000 },
+[IBEX_RAM] ={  0x1000,  0x1 },
+[IBEX_FLASH] =  {  0x2000,  0x8 },
+[IBEX_UART] =   {  0x4000,  0x1 },
+[IBEX_GPIO] =   {  0x4001,  0x1 },
+[IBEX_SPI] ={  0x4002,  0x1 },
+[IBEX_FLASH_CTRL] = {  0x4003,  0x1 },
+[IBEX_PINMUX] = {  0x4007,  0x1 },
+[IBEX_RV_TIMER] =   {  0x4008,  0x1 },
+[IBEX_PLIC] =

[PATCH v3 2/9] target/riscv: Don't overwrite the reset vector

2020-05-19 Thread Alistair Francis

The reset vector is set in the init function don't set it again in
realize.

Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 059d71f2c7..5eb3c02735 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -133,6 +133,7 @@ static void riscv_base32_cpu_init(Object *obj)
 CPURISCVState *env = _CPU(obj)->env;
 /* We set this in the realise function */
 set_misa(env, 0);
+set_resetvec(env, DEFAULT_RSTVEC);
 }
 
 static void rv32gcsu_priv1_09_1_cpu_init(Object *obj)
@@ -180,6 +181,7 @@ static void riscv_base64_cpu_init(Object *obj)
 CPURISCVState *env = _CPU(obj)->env;
 /* We set this in the realise function */
 set_misa(env, 0);
+set_resetvec(env, DEFAULT_RSTVEC);
 }
 
 static void rv64gcsu_priv1_09_1_cpu_init(Object *obj)
@@ -399,7 +401,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 
 set_priv_version(env, priv_version);
-set_resetvec(env, DEFAULT_RSTVEC);
 
 if (cpu->cfg.mmu) {
 set_feature(env, RISCV_FEATURE_MMU);
-- 
2.26.2

[PATCH v3 0/9] RISC-V Add the OpenTitan Machine

2020-05-19 Thread Alistair Francis

OpenTitan is an open source silicon Root of Trust (RoT) project. This
series adds initial support for the OpenTitan machine to QEMU.

This series add the Ibex CPU to the QEMU RISC-V target. It then adds the
OpenTitan machine, the Ibex UART and the Ibex PLIC.

The UART has been tested sending and receiving data.

With this series QEMU can boot the OpenTitan ROM, Tock OS and a Tock
userspace app.

The Ibex PLIC is similar to the RISC-V PLIC (and is based on the QEMU
implementation) with some differences. The hope is that the Ibex PLIC
will converge to follow the RISC-V spec. As that happens I want to
update the QEMU Ibex PLIC and hopefully eventually replace the current
PLIC as the implementation is a little overlay complex.

For more details on OpenTitan, see here: https://docs.opentitan.org/

v3:
 - Don't set the reset vector in realise
 - Small fixes pointed out in review
v2:
 - Rebase on master
 - Get uart receive working



Alistair Francis (9):
  riscv/boot: Add a missing header include
  target/riscv: Don't overwrite the reset vector
  target/riscv: Add the lowRISC Ibex CPU
  riscv: Initial commit of OpenTitan machine
  hw/char: Initial commit of Ibex UART
  hw/intc: Initial commit of lowRISC Ibex PLIC
  riscv/opentitan: Connect the PLIC device
  riscv/opentitan: Connect the UART device
  target/riscv: Use a smaller guess size for no-MMU PMP

 default-configs/riscv32-softmmu.mak |   1 +
 default-configs/riscv64-softmmu.mak |  11 +-
 include/hw/char/ibex_uart.h | 110 +++
 include/hw/intc/ibex_plic.h |  63 
 include/hw/riscv/boot.h |   1 +
 include/hw/riscv/opentitan.h|  79 +
 target/riscv/cpu.h  |   1 +
 hw/char/ibex_uart.c | 492 
 hw/intc/ibex_plic.c | 261 +++
 hw/riscv/opentitan.c| 204 
 target/riscv/cpu.c  |  13 +-
 target/riscv/pmp.c  |  14 +-
 MAINTAINERS |  13 +
 hw/char/Makefile.objs   |   1 +
 hw/intc/Makefile.objs   |   1 +
 hw/riscv/Kconfig|   9 +
 hw/riscv/Makefile.objs  |   1 +
 17 files changed, 1268 insertions(+), 7 deletions(-)
 create mode 100644 include/hw/char/ibex_uart.h
 create mode 100644 include/hw/intc/ibex_plic.h
 create mode 100644 include/hw/riscv/opentitan.h
 create mode 100644 hw/char/ibex_uart.c
 create mode 100644 hw/intc/ibex_plic.c
 create mode 100644 hw/riscv/opentitan.c

-- 
2.26.2

Re: [PATCH 53/55] qdev: Convert bus-less devices to qdev_realize() with Coccinelle

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 8:29 AM Markus Armbruster  wrote:
>
> All remaining conversions to qdev_realize() are for bus-less devices.
> Coccinelle script:
>
> // only correct for bus-less @dev!
>
> @@
> expression errp;
> expression dev;
> @@
> -qdev_init_nofail(dev);
> +qdev_realize(dev, NULL, _fatal);
>
> @ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@
> expression errp;
> expression dev;
> symbol true;
> @@
> -object_property_set_bool(OBJECT(dev), true, "realized", errp);
> +qdev_realize(DEVICE(dev), NULL, errp);
>
> @ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@
> expression errp;
> expression dev;
> symbol true;
> @@
> -object_property_set_bool(dev, true, "realized", errp);
> +qdev_realize(DEVICE(dev), NULL, errp);
>
> Note that Coccinelle chokes on ARMSSE typedef vs. macro in
> hw/arm/armsse.c.  Worked around by temporarily renaming the macro for
> the spatch run.
>
> Signed-off-by: Markus Armbruster 

Acked-by: Alistair Francis 

Alistair

> --
> 2.21.1
>
>

Re: [PATCH 45/55] sysbus: Convert qdev_set_parent_bus() use with Coccinelle, part 1

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 8:22 AM Markus Armbruster  wrote:
>
> I'm converting from qdev_set_parent_bus()/realize to qdev_realize();
> recent commit "qdev: Convert uses of qdev_set_parent_bus() with
> Coccinelle" explains why.
>
> sysbus_init_child_obj() is a wrapper around
> object_initialize_child_with_props() and qdev_set_parent_bus().  It
> passes no properties.
>
> Convert sysbus_init_child_obj()/realize to object_initialize_child()/
> qdev_realize().
>
> Coccinelle script:
>
> @@
> expression parent, name, size, type, errp;
> expression child;
> symbol true;
> @@
> -sysbus_init_child_obj(parent, name, , size, type);
> +sysbus_init_child_XXX(parent, name, , size, type);
>  ...
> -object_property_set_bool(OBJECT(), true, "realized", errp);
> +sysbus_realize(_obj, errp);
>
> @@
> expression parent, name, size, type, errp;
> expression child;
> symbol true;
> @@
> -sysbus_init_child_obj(parent, name, child, size, type);
> +sysbus_init_child_XXX(parent, name, child, size, type);
>  ...
> -object_property_set_bool(OBJECT(child), true, "realized", errp);
> +sysbus_realize(>parent_obj, errp);
>
> @@
> expression parent, name, size, type;
> expression child;
> expression dev;
> expression expr;
> @@
> -sysbus_init_child_obj(parent, name, child, size, type);
> +sysbus_init_child_XXX(parent, name, child, size, type);
>  ...
>  dev = DEVICE(child);
>  ... when != dev = expr;
> -qdev_init_nofail(dev);
> +sysbus_realize(SYS_BUS_DEVICE(dev), _fatal);
>
> @@
> expression parent, propname, type;
> expression child;
> @@
> -sysbus_init_child_XXX(parent, propname, child, sizeof(*child), type)
> +object_initialize_child(parent, propname, child, type)
>
> @@
> expression parent, propname, type;
> expression child;
> @@
> -sysbus_init_child_XXX(parent, propname, , sizeof(child), type)
> +object_initialize_child(parent, propname, , type)
>
> Signed-off-by: Markus Armbruster 

Acked-by: Alistair Francis 

Alistair

> ---
>  hw/arm/bcm2835_peripherals.c |  5 ++--
>  hw/arm/exynos4_boards.c  |  7 +++--
>  hw/arm/mps2-tz.c | 50 
>  hw/arm/mps2.c| 19 +-
>  hw/arm/musca.c   | 37 --
>  hw/arm/xlnx-versal-virt.c|  6 ++---
>  hw/arm/xlnx-versal.c | 36 +++---
>  hw/intc/armv7m_nvic.c|  8 +++---
>  hw/mips/boston.c |  5 ++--
>  hw/mips/cps.c| 20 ++-
>  hw/mips/mips_malta.c |  5 ++--
>  hw/riscv/spike.c | 21 +++
>  hw/riscv/virt.c  |  7 +++--
>  13 files changed, 96 insertions(+), 130 deletions(-)
>
> diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
> index 49bfabee9b..301e8f72c0 100644
> --- a/hw/arm/bcm2835_peripherals.c
> +++ b/hw/arm/bcm2835_peripherals.c
> @@ -27,11 +27,10 @@ static void create_unimp(BCM2835PeripheralState *ps,
>   UnimplementedDeviceState *uds,
>   const char *name, hwaddr ofs, hwaddr size)
>  {
> -sysbus_init_child_obj(OBJECT(ps), name, uds, sizeof(*uds),
> -  TYPE_UNIMPLEMENTED_DEVICE);
> +object_initialize_child(OBJECT(ps), name, uds, 
> TYPE_UNIMPLEMENTED_DEVICE);
>  qdev_prop_set_string(DEVICE(uds), "name", name);
>  qdev_prop_set_uint64(DEVICE(uds), "size", size);
> -object_property_set_bool(OBJECT(uds), true, "realized", _fatal);
> +sysbus_realize(>parent_obj, _fatal);
>  memory_region_add_subregion_overlap(>peri_mr, ofs,
>  sysbus_mmio_get_region(SYS_BUS_DEVICE(uds), 0), -1000);
>  }
> diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
> index 326122abff..28f37d22cf 100644
> --- a/hw/arm/exynos4_boards.c
> +++ b/hw/arm/exynos4_boards.c
> @@ -128,10 +128,9 @@ exynos4_boards_init_common(MachineState *machine,
>  exynos4_boards_init_ram(s, get_system_memory(),
>  exynos4_board_ram_size[board_type]);
>
> -sysbus_init_child_obj(OBJECT(machine), "soc",
> -  >soc, sizeof(s->soc), TYPE_EXYNOS4210_SOC);
> -object_property_set_bool(OBJECT(>soc), true, "realized",
> - _fatal);
> +object_initialize_child(OBJECT(machine), "soc", >soc,
> +TYPE_EXYNOS4210_SOC);
> +sysbus_realize(>soc.parent_obj, _fatal);
>
>  return s;
>  }
> diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
> index 4c49512e0b..4d917eba47 100644
> --- a/hw/arm/mps2-tz.c
> +++ b/hw/arm/mps2-tz.c
> @@ -174,11 +174,10 @@ static MemoryRegion *make_unimp_dev(MPS2TZMachineState 
> *mms,
>   */
>  UnimplementedDeviceState *uds = opaque;
>
> -

Re: [PATCH 1/4] hw/riscv: Allow creating multiple instances of CLINT

2020-05-19 Thread Alistair Francis

On Fri, May 15, 2020 at 11:39 PM Anup Patel  wrote:
>
> We extend CLINT emulation to allow multiple instances of CLINT in
> a QEMU RISC-V machine. To achieve this, we remove first HART id
> zero assumption from CLINT emulation.
>
> Signed-off-by: Anup Patel 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/riscv/sifive_clint.c | 20 
>  hw/riscv/sifive_e.c |  2 +-
>  hw/riscv/sifive_u.c |  2 +-
>  hw/riscv/spike.c|  6 +++---
>  hw/riscv/virt.c |  2 +-
>  include/hw/riscv/sifive_clint.h |  7 ---
>  6 files changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/hw/riscv/sifive_clint.c b/hw/riscv/sifive_clint.c
> index e933d35092..7d713fd743 100644
> --- a/hw/riscv/sifive_clint.c
> +++ b/hw/riscv/sifive_clint.c
> @@ -78,7 +78,7 @@ static uint64_t sifive_clint_read(void *opaque, hwaddr 
> addr, unsigned size)
>  SiFiveCLINTState *clint = opaque;
>  if (addr >= clint->sip_base &&
>  addr < clint->sip_base + (clint->num_harts << 2)) {
> -size_t hartid = (addr - clint->sip_base) >> 2;
> +size_t hartid = clint->hartid_base + ((addr - clint->sip_base) >> 2);
>  CPUState *cpu = qemu_get_cpu(hartid);
>  CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
>  if (!env) {
> @@ -91,7 +91,8 @@ static uint64_t sifive_clint_read(void *opaque, hwaddr 
> addr, unsigned size)
>  }
>  } else if (addr >= clint->timecmp_base &&
>  addr < clint->timecmp_base + (clint->num_harts << 3)) {
> -size_t hartid = (addr - clint->timecmp_base) >> 3;
> +size_t hartid = clint->hartid_base +
> +((addr - clint->timecmp_base) >> 3);
>  CPUState *cpu = qemu_get_cpu(hartid);
>  CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
>  if (!env) {
> @@ -128,7 +129,7 @@ static void sifive_clint_write(void *opaque, hwaddr addr, 
> uint64_t value,
>
>  if (addr >= clint->sip_base &&
>  addr < clint->sip_base + (clint->num_harts << 2)) {
> -size_t hartid = (addr - clint->sip_base) >> 2;
> +size_t hartid = clint->hartid_base + ((addr - clint->sip_base) >> 2);
>  CPUState *cpu = qemu_get_cpu(hartid);
>  CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
>  if (!env) {
> @@ -141,7 +142,8 @@ static void sifive_clint_write(void *opaque, hwaddr addr, 
> uint64_t value,
>  return;
>  } else if (addr >= clint->timecmp_base &&
>  addr < clint->timecmp_base + (clint->num_harts << 3)) {
> -size_t hartid = (addr - clint->timecmp_base) >> 3;
> +size_t hartid = clint->hartid_base +
> +((addr - clint->timecmp_base) >> 3);
>  CPUState *cpu = qemu_get_cpu(hartid);
>  CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
>  if (!env) {
> @@ -185,6 +187,7 @@ static const MemoryRegionOps sifive_clint_ops = {
>  };
>
>  static Property sifive_clint_properties[] = {
> +DEFINE_PROP_UINT32("hartid-base", SiFiveCLINTState, hartid_base, 0),
>  DEFINE_PROP_UINT32("num-harts", SiFiveCLINTState, num_harts, 0),
>  DEFINE_PROP_UINT32("sip-base", SiFiveCLINTState, sip_base, 0),
>  DEFINE_PROP_UINT32("timecmp-base", SiFiveCLINTState, timecmp_base, 0),
> @@ -226,13 +229,13 @@ type_init(sifive_clint_register_types)
>  /*
>   * Create CLINT device.
>   */
> -DeviceState *sifive_clint_create(hwaddr addr, hwaddr size, uint32_t 
> num_harts,
> -uint32_t sip_base, uint32_t timecmp_base, uint32_t time_base,
> -bool provide_rdtime)
> +DeviceState *sifive_clint_create(hwaddr addr, hwaddr size,
> +uint32_t hartid_base, uint32_t num_harts, uint32_t sip_base,
> +uint32_t timecmp_base, uint32_t time_base, bool provide_rdtime)
>  {
>  int i;
>  for (i = 0; i < num_harts; i++) {
> -CPUState *cpu = qemu_get_cpu(i);
> +CPUState *cpu = qemu_get_cpu(hartid_base + i);
>  CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
>  if (!env) {
>  continue;
> @@ -246,6 +249,7 @@ DeviceState *sifive_clint_create(hwaddr addr, hwaddr 
> size, uint32_t num_harts,
>  }
>
>  DeviceState *dev = qdev_create(NULL, TYPE_SIFIVE_CLINT);
> +qdev_prop_set_uint32(dev, "hartid-base", hartid_base);
>  qdev_prop_set_uint32(dev, "num-harts", num_harts);
>  qdev_prop_set_uint32(dev, "sip-base", sip_base);
>  qdev_prop_set_uint32(dev, "timecmp-base", timecmp_base);
> diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
> index b53109521e..1c3b37d0ba 100644
> --- a/hw/riscv/sifive_e.c
> +++ b/hw/riscv/sifive_e.c
> @@ -163,7 +163,7 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, 
> Error **errp)
>  SIFIVE_E_PLIC_CONTEXT_STRIDE,
>  memmap[SIFIVE_E_PLIC].size);
>  sifive_clint_create(memmap[SIFIVE_E_CLINT].base,
> -memmap[SIFIVE_E_CLINT].size, ms->smp.cpus,
> +memmap[SIFIVE_E_CLINT].size, 0, ms->smp.cpus,
>  SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE,

Re: [PATCH 0/4] RISC-V multi-socket support

2020-05-19 Thread Alistair Francis

On Fri, May 15, 2020 at 11:40 PM Anup Patel  wrote:
>
> This series adds multi-socket support for RISC-V virt machine and
> RISC-V spike machine. The multi-socket support will help us improve
> various RISC-V operating systems, firmwares, and bootloader to
> support RISC-V NUMA systems.
>
> These patch can be found in riscv_multi_socket_v1 branch at:
> https://github.com/avpatel/qemu.git
>
> To try this patches, we will need:
> 1. OpenSBI multi-PLIC and multi-CLINT support which can be found in
>multi_plic_clint_v1 branch at:
>https://github.com/avpatel/opensbi.git
> 2. Linux multi-PLIC improvements support which can be found in
>plic_imp_v1 branch at:
>https://github.com/avpatel/linux.git
>
> Anup Patel (4):
>   hw/riscv: Allow creating multiple instances of CLINT
>   hw/riscv: spike: Allow creating multiple sockets
>   hw/riscv: Allow creating multiple instances of PLIC
>   hw/riscv: virt: Allow creating multiple sockets

Can you make sure all the patches pass checkpatch?

Alistair

>
>  hw/riscv/sifive_clint.c |  20 +-
>  hw/riscv/sifive_e.c |   4 +-
>  hw/riscv/sifive_plic.c  |  24 +-
>  hw/riscv/sifive_u.c |   4 +-
>  hw/riscv/spike.c| 210 --
>  hw/riscv/virt.c | 495 ++--
>  include/hw/riscv/sifive_clint.h |   7 +-
>  include/hw/riscv/sifive_plic.h  |  12 +-
>  include/hw/riscv/spike.h|   8 +-
>  include/hw/riscv/virt.h |  12 +-
>  10 files changed, 458 insertions(+), 338 deletions(-)
>
> --
> 2.25.1
>
>

Re: [PATCH v2 5/5] iotests: add commit top->base cases to 274

2020-05-19 Thread Vladimir Sementsov-Ogievskiy


20.05.2020 00:13, Eric Blake wrote:

On 5/19/20 2:55 PM, Vladimir Sementsov-Ogievskiy wrote:

These cases are fixed by previous patches around block_status and
is_allocated.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/274 | 20 
  tests/qemu-iotests/274.out | 65 ++
  2 files changed, 85 insertions(+)


Okay, so this test fails when applied in isolation without the rest of your 
series.



diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
index 5d1bf34dff..e910455f13 100755
--- a/tests/qemu-iotests/274
+++ b/tests/qemu-iotests/274
@@ -115,6 +115,26 @@ with iotests.FilePath('base') as base, \
  iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
  iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), 
mid)
+    iotests.log('=== Testing qemu-img commit (top -> base) ===')
+
+    create_chain()
+    iotests.qemu_img_log('commit', '-b', base, top)
+    iotests.img_info_log(base)
+    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
+    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), 
base)


So if I understand it, we are going from:

base    
mid 
top 
guest   

and we want to go to:

base    

except that we are not properly writing the zeroes into base, because we 
grabbed the wrong status, ending up with:

base    

The status of top from 1M onwards is unallocated, and if we were to commit to 
just mid, Kevin's truncate fixes solve that (we now zero out the tail of mid as 
part of resizing it to be large enough).  But you are instead skipping mid, and 
committing all the way to base.  So we need _something_ that can tell qemu-img 
commit that even though the region 1m-2m is unallocated in top, we must behave 
as though the status of mid reports it as allocated (because when reading 
beyond EOF in mid, we DO read zero).  Since the data is allocated not in top, 
but acts as though it was allocated in mid, which is above base, then the 
commit operation has to do something to preserve that allocation.

Okay, you've convinced me we have a bug. > However, I'm still not sold that 
patches 1 and 4 are quite the right fix.  Going back to the original setup, 
unpatched qemu.git head reports:

$ ./qemu-img map --output=json top.qcow2
[{ "start": 0, "length": 1048576, "depth": 2, "zero": false, "data": true, 
"offset": 327680},
{ "start": 1048576, "length": 1048576, "depth": 0, "zero": true, "data": false}]

I think what we really want is:

[{ "start": 0, "length": 1048576, "depth": 2, "zero": false, "data": true, 
"offset": 327680},
{ "start": 1048576, "length": 1048576, "depth": 1, "zero": true, "data": false}]

because then we would be _accurately_ reporting that the zeroes that we read from 1m-2m come _because_ we read from mid (beyond EOF), which is different from our current answer that the zeroes come from top (they don't, because top deferred to mid). 


Right. This is exactly the logic which I bring to block_status_above and 
is_allocated_above by this series

If we fix up qemu-img map output to correctly report zeroes beyond EOF from the 
correct layer, will that also fix up the bug we are seeing in qemu-img commit?




No it will not fix it, because img_map has own implementation of 
block_status_above - get_block_status function in qemu-img.c, which goes 
through backing chain by itself, and is used only in img_map (not in 
img_convert). But you are right that it should be fixed too.

--
Best regards,
Vladimir

Re: [PATCH 43/55] sysbus: Convert to sysbus_realize() etc. with Coccinelle

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 8:25 AM Markus Armbruster  wrote:
>
> Convert from qdev_realize(), qdev_realize_and_unref() with null @bus
> argument to sysbus_realize(), sysbus_realize_and_unref().
>
> Coccinelle script:
>
> @@
> expression dev, errp;
> @@
> -qdev_realize(DEVICE(dev), NULL, errp);
> +sysbus_realize(SYS_BUS_DEVICE(dev), errp);
>
> @@
> expression sysbus_dev, dev, errp;
> @@
> +sysbus_dev = SYS_BUS_DEVICE(dev);
> -qdev_realize_and_unref(dev, NULL, errp);
> +sysbus_realize_and_unref(sysbus_dev, errp);
> -sysbus_dev = SYS_BUS_DEVICE(dev);
>
> @@
> expression sysbus_dev, dev, errp;
> expression expr;
> @@
>  sysbus_dev = SYS_BUS_DEVICE(dev);
>  ... when != dev = expr;
> -qdev_realize_and_unref(dev, NULL, errp);
> +sysbus_realize_and_unref(sysbus_dev, errp);
>
> @@
> expression dev, errp;
> @@
> -qdev_realize_and_unref(DEVICE(dev), NULL, errp);
> +sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), errp);
>
> @@
> expression dev, errp;
> @@
> -qdev_realize_and_unref(dev, NULL, errp);
> +sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), errp);
>
> Whitespace changes minimized manually.
>
> Signed-off-by: Markus Armbruster 

Acked-by: Alistair Francis 

Alistair

> ---
>  hw/lm32/lm32.h   |  6 ++---
>  hw/lm32/milkymist-hw.h   | 18 ++---
>  include/hw/char/cadence_uart.h   |  2 +-
>  include/hw/char/cmsdk-apb-uart.h |  2 +-
>  include/hw/char/pl011.h  |  4 +--
>  include/hw/char/xilinx_uartlite.h|  2 +-
>  include/hw/cris/etraxfs.h|  2 +-
>  include/hw/misc/unimp.h  |  2 +-
>  include/hw/timer/cmsdk-apb-timer.h   |  2 +-
>  hw/alpha/typhoon.c   |  2 +-
>  hw/arm/exynos4210.c  | 18 ++---
>  hw/arm/exynos4_boards.c  |  2 +-
>  hw/arm/highbank.c| 12 -
>  hw/arm/integratorcp.c|  2 +-
>  hw/arm/mps2-tz.c |  2 +-
>  hw/arm/msf2-som.c|  2 +-
>  hw/arm/musicpal.c|  4 +--
>  hw/arm/netduino2.c   |  2 +-
>  hw/arm/netduinoplus2.c   |  2 +-
>  hw/arm/nseries.c |  4 +--
>  hw/arm/omap1.c   |  8 +++---
>  hw/arm/omap2.c   |  8 +++---
>  hw/arm/pxa2xx.c  |  4 +--
>  hw/arm/pxa2xx_gpio.c |  2 +-
>  hw/arm/pxa2xx_pic.c  |  2 +-
>  hw/arm/realview.c| 10 
>  hw/arm/sbsa-ref.c| 12 -
>  hw/arm/spitz.c   |  2 +-
>  hw/arm/stellaris.c   |  6 ++---
>  hw/arm/strongarm.c   |  4 +--
>  hw/arm/versatilepb.c |  8 +++---
>  hw/arm/vexpress.c|  8 +++---
>  hw/arm/virt.c| 18 ++---
>  hw/arm/xilinx_zynq.c | 16 ++--
>  hw/arm/xlnx-versal-virt.c|  2 +-
>  hw/arm/xlnx-versal.c |  2 +-
>  hw/block/fdc.c   |  4 +--
>  hw/block/pflash_cfi01.c  |  2 +-
>  hw/block/pflash_cfi02.c  |  2 +-
>  hw/char/exynos4210_uart.c|  2 +-
>  hw/char/mcf_uart.c   |  2 +-
>  hw/char/serial.c |  2 +-
>  hw/core/empty_slot.c |  2 +-
>  hw/core/sysbus.c |  2 +-
>  hw/cris/axis_dev88.c |  2 +-
>  hw/display/milkymist-tmu2.c  |  2 +-
>  hw/display/sm501.c   |  2 +-
>  hw/dma/pxa2xx_dma.c  |  4 +--
>  hw/dma/rc4030.c  |  2 +-
>  hw/dma/sparc32_dma.c |  8 +++---
>  hw/hppa/dino.c   |  2 +-
>  hw/hppa/lasi.c   |  2 +-
>  hw/hppa/machine.c|  2 +-
>  hw/i386/pc.c |  2 +-
>  hw/i386/pc_q35.c |  2 +-
>  hw/i386/pc_sysfw.c   |  2 +-
>  hw/i386/x86.c|  2 +-
>  hw/intc/exynos4210_gic.c |  2 +-
>  hw/intc/s390_flic.c  |  2 +-
>  hw/isa/isa-bus.c |  2 +-
>  hw/m68k/mcf5208.c|  2 +-
>  hw/m68k/mcf_intc.c   |  2 +-
>  hw/m68k/next-cube.c  |  6 ++---
>  hw/m68k/q800.c   | 12 -
>  hw/microblaze/petalogix_ml605_mmu.c  | 10 
>  hw/microblaze/petalogix_s3adsp1800_mmu.c |  6 ++---
>  hw/mips/boston.c |  4 +--
>  hw/mips/gt64xxx_pci.c

[PATCH v3 2/2] target/arm: Use clear_vec_high more effectively

2020-05-19 Thread Richard Henderson

Do not explicitly store zero to the NEON high part
when we can pass !is_q to clear_vec_high.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 53 +++---
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 4f6edb2892..874f3eb4f9 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -900,11 +900,10 @@ static void do_fp_ld(DisasContext *s, int destidx, 
TCGv_i64 tcg_addr, int size)
 {
 /* This always zero-extends and writes to a full 128 bit wide vector */
 TCGv_i64 tmplo = tcg_temp_new_i64();
-TCGv_i64 tmphi;
+TCGv_i64 tmphi = NULL;
 
 if (size < 4) {
 MemOp memop = s->be_data + size;
-tmphi = tcg_const_i64(0);
 tcg_gen_qemu_ld_i64(tmplo, tcg_addr, get_mem_index(s), memop);
 } else {
 bool be = s->be_data == MO_BE;
@@ -922,12 +921,13 @@ static void do_fp_ld(DisasContext *s, int destidx, 
TCGv_i64 tcg_addr, int size)
 }
 
 tcg_gen_st_i64(tmplo, cpu_env, fp_reg_offset(s, destidx, MO_64));
-tcg_gen_st_i64(tmphi, cpu_env, fp_reg_hi_offset(s, destidx));
-
 tcg_temp_free_i64(tmplo);
-tcg_temp_free_i64(tmphi);
 
-clear_vec_high(s, true, destidx);
+if (tmphi) {
+tcg_gen_st_i64(tmphi, cpu_env, fp_reg_hi_offset(s, destidx));
+tcg_temp_free_i64(tmphi);
+}
+clear_vec_high(s, tmphi != NULL, destidx);
 }
 
 /*
@@ -6934,7 +6934,6 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
 read_vec_element(s, tcg_resh, rm, 0, MO_64);
 do_ext64(s, tcg_resh, tcg_resl, pos);
 }
-tcg_gen_movi_i64(tcg_resh, 0);
 } else {
 TCGv_i64 tcg_hh;
 typedef struct {
@@ -6964,9 +6963,11 @@ static void disas_simd_ext(DisasContext *s, uint32_t 
insn)
 
 write_vec_element(s, tcg_resl, rd, 0, MO_64);
 tcg_temp_free_i64(tcg_resl);
-write_vec_element(s, tcg_resh, rd, 1, MO_64);
+if (is_q) {
+write_vec_element(s, tcg_resh, rd, 1, MO_64);
+}
 tcg_temp_free_i64(tcg_resh);
-clear_vec_high(s, true, rd);
+clear_vec_high(s, is_q, rd);
 }
 
 /* TBL/TBX
@@ -7003,17 +7004,21 @@ static void disas_simd_tb(DisasContext *s, uint32_t 
insn)
  * the input.
  */
 tcg_resl = tcg_temp_new_i64();
-tcg_resh = tcg_temp_new_i64();
+tcg_resh = NULL;
 
 if (is_tblx) {
 read_vec_element(s, tcg_resl, rd, 0, MO_64);
 } else {
 tcg_gen_movi_i64(tcg_resl, 0);
 }
-if (is_tblx && is_q) {
-read_vec_element(s, tcg_resh, rd, 1, MO_64);
-} else {
-tcg_gen_movi_i64(tcg_resh, 0);
+
+if (is_q) {
+tcg_resh = tcg_temp_new_i64();
+if (is_tblx) {
+read_vec_element(s, tcg_resh, rd, 1, MO_64);
+} else {
+tcg_gen_movi_i64(tcg_resh, 0);
+}
 }
 
 tcg_idx = tcg_temp_new_i64();
@@ -7033,9 +7038,12 @@ static void disas_simd_tb(DisasContext *s, uint32_t insn)
 
 write_vec_element(s, tcg_resl, rd, 0, MO_64);
 tcg_temp_free_i64(tcg_resl);
-write_vec_element(s, tcg_resh, rd, 1, MO_64);
-tcg_temp_free_i64(tcg_resh);
-clear_vec_high(s, true, rd);
+
+if (is_q) {
+write_vec_element(s, tcg_resh, rd, 1, MO_64);
+tcg_temp_free_i64(tcg_resh);
+}
+clear_vec_high(s, is_q, rd);
 }
 
 /* ZIP/UZP/TRN
@@ -7072,7 +7080,7 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t 
insn)
 }
 
 tcg_resl = tcg_const_i64(0);
-tcg_resh = tcg_const_i64(0);
+tcg_resh = is_q ? tcg_const_i64(0) : NULL;
 tcg_res = tcg_temp_new_i64();
 
 for (i = 0; i < elements; i++) {
@@ -7123,9 +7131,12 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t 
insn)
 
 write_vec_element(s, tcg_resl, rd, 0, MO_64);
 tcg_temp_free_i64(tcg_resl);
-write_vec_element(s, tcg_resh, rd, 1, MO_64);
-tcg_temp_free_i64(tcg_resh);
-clear_vec_high(s, true, rd);
+
+if (is_q) {
+write_vec_element(s, tcg_resh, rd, 1, MO_64);
+tcg_temp_free_i64(tcg_resh);
+}
+clear_vec_high(s, is_q, rd);
 }
 
 /*
-- 
2.20.1

Re: [PATCH 34/55] qom: Less verbose object_initialize_child()

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 8:04 AM Markus Armbruster

 wrote:
>
> All users of object_initialize_child() pass the obvious child size
> argument.  Almost all pass _abort and no properties.  Tiresome.
>
> Rename object_initialize_child() to
> object_initialize_child_with_props() to free the name.  New
> convenience wrapper object_initialize_child() automates the size
> argument, and passes _abort and no properties.
>
> Rename object_initialize_childv() to
> object_initialize_child_with_propsv() for consistency.
>
> Convert callers with this Coccinelle script:
>
> @@
> expression parent, propname, type;
> expression child, size;
> symbol error_abort;
> @@
> -object_initialize_child(parent, propname, OBJECT(child), size, type, 
> _abort, NULL)
> +object_initialize_child(parent, propname, child, size, type, 
> _abort, NULL)
>
> @@
> expression parent, propname, type;
> expression child;
> symbol error_abort;
> @@
> -object_initialize_child(parent, propname, child, sizeof(*child), 
> type, _abort, NULL)
> +object_initialize_child(parent, propname, child, type)
>
> @@
> expression parent, propname, type;
> expression child;
> symbol error_abort;
> @@
> -object_initialize_child(parent, propname, , sizeof(child), 
> type, _abort, NULL)
> +object_initialize_child(parent, propname, , type)
>
> @@
> expression parent, propname, type;
> expression child, size, err;
> expression list props;
> @@
> -object_initialize_child(parent, propname, child, size, type, err, 
> props)
> +object_initialize_child_with_props(parent, propname, child, size, 
> type, err, props)
>
> Note that Coccinelle chokes on ARMSSE typedef vs. macro in
> hw/arm/armsse.c.  Worked around by temporarily renaming the macro for
> the spatch run.
>
> Signed-off-by: Markus Armbruster 

Acked-by: Alistair Francis 

Alistair

> ---
>  include/qom/object.h| 30 +++
>  hw/arm/allwinner-a10.c  |  5 ++--
>  hw/arm/allwinner-h3.c   |  5 ++--
>  hw/arm/armsse.c | 26 +++-
>  hw/arm/aspeed.c |  4 +---
>  hw/arm/aspeed_ast2600.c |  4 +---
>  hw/arm/aspeed_soc.c |  4 +---
>  hw/arm/bcm2836.c|  3 +--
>  hw/arm/digic.c  |  4 +---
>  hw/arm/exynos4210.c |  3 +--
>  hw/arm/fsl-imx25.c  |  4 +---
>  hw/arm/fsl-imx31.c  |  4 +---
>  hw/arm/fsl-imx6.c   |  5 ++--
>  hw/arm/fsl-imx6ul.c |  4 ++--
>  hw/arm/fsl-imx7.c   |  5 ++--
>  hw/arm/imx25_pdk.c  |  3 +--
>  hw/arm/kzm.c|  3 +--
>  hw/arm/mps2-tz.c| 14 +--
>  hw/arm/musca.c  | 14 +--
>  hw/arm/raspi.c  |  4 ++--
>  hw/arm/stm32f405_soc.c  |  6 ++---
>  hw/arm/xlnx-versal.c|  5 ++--
>  hw/arm/xlnx-zcu102.c|  3 +--
>  hw/arm/xlnx-zynqmp.c| 16 +
>  hw/char/serial-isa.c|  3 +--
>  hw/char/serial-pci-multi.c  |  4 +---
>  hw/char/serial-pci.c|  3 +--
>  hw/char/serial.c|  6 ++---
>  hw/core/sysbus.c|  4 ++--
>  hw/dma/xilinx_axidma.c  |  9 +++
>  hw/intc/pnv_xive.c  |  6 ++---
>  hw/intc/spapr_xive.c|  6 ++---
>  hw/microblaze/xlnx-zynqmp-pmu.c |  7 +++---
>  hw/misc/macio/macio.c   | 10 
>  hw/net/xilinx_axienet.c |  9 +++
>  hw/pci-host/designware.c|  3 +--
>  hw/pci-host/gpex.c  |  3 +--
>  hw/pci-host/pnv_phb3.c  | 12 --
>  hw/pci-host/pnv_phb4.c  |  6 ++---
>  hw/pci-host/pnv_phb4_pec.c  |  6 ++---
>  hw/pci-host/q35.c   |  3 +--
>  hw/pci-host/xilinx-pcie.c   |  3 +--
>  hw/ppc/pnv.c| 42 -
>  hw/ppc/pnv_psi.c|  6 ++---
>  hw/ppc/spapr.c  |  6 ++---
>  hw/riscv/riscv_hart.c   |  4 +---
>  hw/riscv/sifive_e.c |  4 +---
>  hw/riscv/sifive_u.c | 12 +++---
>  hw/virtio/virtio.c  |  5 ++--
>  qom/object.c| 19 +++
>  50 files changed, 160 insertions(+), 219 deletions(-)
>
> diff --git a/include/qom/object.h b/include/qom/object.h
> index fd453dc8d6..89e67ce82b 100644
> --- a/include/qom/object.h
> +++ b/include/qom/object.h
> @@ -783,7 +783,7 @@ int object_set_propv(Object *obj,
>  void object_initialize(void *obj, size_t size, const char *typename);
>
>  /**
> - * object_initialize_child:
> + * object_initialize_child_with_props:
>   * @parentobj: The parent object to add a property to
>   * @propname: The name of the property
>   * @childobj: A pointer to the memory to be used for the object.
> @@ -803,12 +803,13 @@ void object_initialize(void *obj, size_t size, const 
> char

[PATCH v3 1/2] target/arm: Use tcg_gen_gvec_mov for clear_vec_high

2020-05-19 Thread Richard Henderson

The 8-byte store for the end a !is_q operation can be
merged with the other stores.  Use a no-op vector move
to trigger the expand_clr portion of tcg_gen_gvec_mov.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 991e451644..4f6edb2892 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -496,14 +496,8 @@ static void clear_vec_high(DisasContext *s, bool is_q, int 
rd)
 unsigned ofs = fp_reg_offset(s, rd, MO_64);
 unsigned vsz = vec_full_reg_size(s);
 
-if (!is_q) {
-TCGv_i64 tcg_zero = tcg_const_i64(0);
-tcg_gen_st_i64(tcg_zero, cpu_env, ofs + 8);
-tcg_temp_free_i64(tcg_zero);
-}
-if (vsz > 16) {
-tcg_gen_gvec_dup_imm(MO_64, ofs + 16, vsz - 16, vsz - 16, 0);
-}
+/* Nop move, with side effect of clearing the tail. */
+tcg_gen_gvec_mov(MO_64, ofs, ofs, is_q ? 16 : 8, vsz);
 }
 
 void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
-- 
2.20.1

[PATCH v3 0/2] target/arm: vector tail cleanups

2020-05-19 Thread Richard Henderson

Version 3 fixes the reported bug in EXT.

I should make sure I have fixed the bug wherein RISU prints a
mismatch and still exits with success, which hid this problem
in the scrollback.


r~


Richard Henderson (2):
  target/arm: Use tcg_gen_gvec_mov for clear_vec_high
  target/arm: Use clear_vec_high more effectively

 target/arm/translate-a64.c | 63 --
 1 file changed, 34 insertions(+), 29 deletions(-)

-- 
2.20.1

Re: [PATCH 33/55] qom: Tidy up a few object_initialize_child() calls

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 8:04 AM Markus Armbruster  wrote:
>
> The callers of object_initialize_child() commonly  pass either
> , sizeof(child), or pchild, sizeof(*pchild).  Tidy up the few
> that don't, mostly to keep the next commit simpler.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/arm/aspeed.c | 2 +-
>  hw/microblaze/xlnx-zynqmp-pmu.c | 3 +--
>  hw/pci-host/pnv_phb4.c  | 2 +-
>  hw/riscv/riscv_hart.c   | 2 +-
>  4 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index adbfbbd6b4..eaf50da8df 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -267,7 +267,7 @@ static void aspeed_machine_init(MachineState *machine)
>  memory_region_add_subregion(>ram_container, 0, machine->ram);
>
>  object_initialize_child(OBJECT(machine), "soc", >soc,
> -(sizeof(bmc->soc)), amc->soc_name, _abort,
> +sizeof(bmc->soc), amc->soc_name, _abort,
>  NULL);
>
>  sc = ASPEED_SOC_GET_CLASS(>soc);
> diff --git a/hw/microblaze/xlnx-zynqmp-pmu.c b/hw/microblaze/xlnx-zynqmp-pmu.c
> index 028f31894d..aa90b9d1be 100644
> --- a/hw/microblaze/xlnx-zynqmp-pmu.c
> +++ b/hw/microblaze/xlnx-zynqmp-pmu.c
> @@ -174,8 +174,7 @@ static void xlnx_zynqmp_pmu_init(MachineState *machine)
>  pmu_ram);
>
>  /* Create the PMU device */
> -object_initialize_child(OBJECT(machine), "pmu", pmu,
> -sizeof(XlnxZynqMPPMUSoCState),
> +object_initialize_child(OBJECT(machine), "pmu", pmu, sizeof(*pmu),
>  TYPE_XLNX_ZYNQMP_PMU_SOC, _abort, NULL);
>  object_property_set_bool(OBJECT(pmu), true, "realized", _fatal);
>
> diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
> index e30ae9ad5b..aba710fd1f 100644
> --- a/hw/pci-host/pnv_phb4.c
> +++ b/hw/pci-host/pnv_phb4.c
> @@ -1155,7 +1155,7 @@ static void pnv_phb4_instance_init(Object *obj)
>  QLIST_INIT(>dma_spaces);
>
>  /* XIVE interrupt source object */
> -object_initialize_child(obj, "source", >xsrc, sizeof(XiveSource),
> +object_initialize_child(obj, "source", >xsrc, sizeof(phb->xsrc),
>  TYPE_XIVE_SOURCE, _abort, NULL);
>
>  /* Root Port */
> diff --git a/hw/riscv/riscv_hart.c b/hw/riscv/riscv_hart.c
> index 276a9baca0..61e88e2e37 100644
> --- a/hw/riscv/riscv_hart.c
> +++ b/hw/riscv/riscv_hart.c
> @@ -46,7 +46,7 @@ static void riscv_hart_realize(RISCVHartArrayState *s, int 
> idx,
>  Error *err = NULL;
>
>  object_initialize_child(OBJECT(s), "harts[*]", >harts[idx],
> -sizeof(RISCVCPU), cpu_type,
> +sizeof(s->harts[idx]), cpu_type,
>  _abort, NULL);
>  s->harts[idx].env.mhartid = s->hartid_base + idx;
>  qemu_register_reset(riscv_harts_cpu_reset, >harts[idx]);
> --
> 2.21.1
>
>

Re: [PATCH v2 0/3] RTISC-V: Remove deprecated ISA, CPUs and machines

2020-05-19 Thread Alistair Francis

On Thu, May 7, 2020 at 12:19 PM Alistair Francis
 wrote:
>
> v2:
>  - Remove the CPUs and ISA seperatley
>
> Alistair Francis (3):
>   hw/riscv: spike: Remove deprecated ISA specific machines
>   target/riscv: Remove the deprecated CPUs
>   target/riscv: Drop support for ISA spec version 1.09.1

Any more comments?

Alistair

>
>  hw/riscv/spike.c  | 217 --
>  include/hw/riscv/spike.h  |   6 +-
>  target/riscv/cpu.c|  30 ---
>  target/riscv/cpu.h|   8 -
>  target/riscv/csr.c|  82 ++-
>  .../riscv/insn_trans/trans_privileged.inc.c   |   6 -
>  tests/qtest/machine-none-test.c   |   4 +-
>  7 files changed, 21 insertions(+), 332 deletions(-)
>
> --
> 2.26.2
>

Re: [PATCH 24/55] ssi: ssi_create_slave_no_init() is now unused, drop

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 8:06 AM Markus Armbruster  wrote:
>
> Cc: Alistair Francis 
> Signed-off-by: Markus Armbruster 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  include/hw/ssi/ssi.h | 1 -
>  hw/ssi/ssi.c | 5 -
>  2 files changed, 6 deletions(-)
>
> diff --git a/include/hw/ssi/ssi.h b/include/hw/ssi/ssi.h
> index 1725b13c32..93f2b8b0be 100644
> --- a/include/hw/ssi/ssi.h
> +++ b/include/hw/ssi/ssi.h
> @@ -79,7 +79,6 @@ extern const VMStateDescription vmstate_ssi_slave;
>  }
>
>  DeviceState *ssi_create_slave(SSIBus *bus, const char *name);
> -DeviceState *ssi_create_slave_no_init(SSIBus *bus, const char *name);
>
>  /* Master interface.  */
>  SSIBus *ssi_create_bus(DeviceState *parent, const char *name);
> diff --git a/hw/ssi/ssi.c b/hw/ssi/ssi.c
> index 58e7d904db..67b48c31cd 100644
> --- a/hw/ssi/ssi.c
> +++ b/hw/ssi/ssi.c
> @@ -90,11 +90,6 @@ static const TypeInfo ssi_slave_info = {
>  .abstract = true,
>  };
>
> -DeviceState *ssi_create_slave_no_init(SSIBus *bus, const char *name)
> -{
> -return qdev_create(BUS(bus), name);
> -}
> -
>  DeviceState *ssi_create_slave(SSIBus *bus, const char *name)
>  {
>  DeviceState *dev = qdev_new(name);
> --
> 2.21.1
>
>

Re: [RISU v2 14/17] Add magic and size to the trace header

2020-05-19 Thread Richard Henderson

On 5/18/20 7:53 PM, Richard Henderson wrote:
> +if (master_header.magic != RISU_MAGIC ||
> +master_header.risu_op != op ||
> +master_header.size != extra_size) {
> +res = RES_MISMATCH_HEAD;
> +goto fail_header;
>  }

Hmm.  This isn't ideal.

Consider e.g. an insn being tested that should pass, so master steps past the
insn to the UDF and sends OP_COMPARE.  But there's a bug in the emulator being
tested so the apprentice gets SIGILL on the insn and so op == OP_SIGILL.

So risu_op != op, but we only report the header difference.

Perhaps that's good enough to understand the this particular problem, without
the clutter of printing the rest of the reginfo frame -- at least if
report_mismatch_header is improved to print risu_op names instead of numbers.

Consider if master and apprentice are run with different --test-sve=
values.  That will produce a mismatch in size.

Which could be a serious problem, if master_header.size > sizeof(master_ri) --
we can't even receive the data.  In that case, what I'm doing here printing the
size mismatch is all that's possible.

But suppose master_header.size <= sizeof(master_ri), so we can receive the
data.  So long as master_header.size == reginfo_size(_ri), then at least
the data is self-consistent, and we *can* print out the difference in
report_mismatch_reg().  Which in this case is going to be the difference in the
two ri->sve_vl values.  That difference is likely to be easiest to understand
for the end user.

I should probably split out this receive logic from
recv_and_compare_register_info so that it can be reused by dump.

r~

Re: [PATCH 21/55] ssi: ssi_auto_connect_slaves() never does anything, drop

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 8:14 AM Markus Armbruster  wrote:
>
> ssi_auto_connect_slaves(parent, cs_line, bus) iterates over @parent's
> QOM children @dev of type TYPE_SSI_SLAVE.  It puts these on @bus, and
> sets cs_line[] to qdev_get_gpio_in_named(dev, SSI_GPIO_CS, 0).
>
> Suspicious: there is no protection against overrunning cs_line[].
>
> Turns out it's safe because ssi_auto_connect_slaves() never finds any
> such children.  Its called by realize methods of some (but not all)
> devices providing an SSI bus, and gets passed the device.
>
> SSI slave devices are always created with ssi_create_slave_no_init(),
> optionally via ssi_create_slave().  This adds them to their SSI bus.
> It doesn't set their QOM parent.
>
> ssi_create_slave_no_init() is always immediately followed by
> qdev_init_nofail(), with no QOM parent assigned, so
> device_set_realized() puts the device into the /machine/unattached/
> orphanage.  None become QOM children of a device providing an SSI bus.
>
> ssi_auto_connect_slaves() was added in commit b4ae3cfa57 "ssi: Add
> slave autoconnect helper".  I can't see which slaves it was supposed
> to connect back then.
>
> Cc: Alistair Francis 
> Signed-off-by: Markus Armbruster 

This looks ok. I haven't tested it though.

Acked-by: Alistair Francis 

Alistair

> ---
>  include/hw/ssi/ssi.h  |  4 
>  hw/ssi/aspeed_smc.c   |  1 -
>  hw/ssi/imx_spi.c  |  2 --
>  hw/ssi/mss-spi.c  |  1 -
>  hw/ssi/ssi.c  | 33 -
>  hw/ssi/xilinx_spi.c   |  1 -
>  hw/ssi/xilinx_spips.c |  4 
>  7 files changed, 46 deletions(-)
>
> diff --git a/include/hw/ssi/ssi.h b/include/hw/ssi/ssi.h
> index 1107cb89ee..1725b13c32 100644
> --- a/include/hw/ssi/ssi.h
> +++ b/include/hw/ssi/ssi.h
> @@ -86,10 +86,6 @@ SSIBus *ssi_create_bus(DeviceState *parent, const char 
> *name);
>
>  uint32_t ssi_transfer(SSIBus *bus, uint32_t val);
>
> -/* Automatically connect all children nodes a spi controller as slaves */
> -void ssi_auto_connect_slaves(DeviceState *parent, qemu_irq *cs_lines,
> - SSIBus *bus);
> -
>  /* max111x.c */
>  void max111x_set_input(DeviceState *dev, int line, uint8_t value);
>
> diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c
> index 2edccef2d5..4fab1f5f85 100644
> --- a/hw/ssi/aspeed_smc.c
> +++ b/hw/ssi/aspeed_smc.c
> @@ -1356,7 +1356,6 @@ static void aspeed_smc_realize(DeviceState *dev, Error 
> **errp)
>
>  /* Setup cs_lines for slaves */
>  s->cs_lines = g_new0(qemu_irq, s->num_cs);
> -ssi_auto_connect_slaves(dev, s->cs_lines, s->spi);
>
>  for (i = 0; i < s->num_cs; ++i) {
>  sysbus_init_irq(sbd, >cs_lines[i]);
> diff --git a/hw/ssi/imx_spi.c b/hw/ssi/imx_spi.c
> index 2dd9a631e1..2f09f15892 100644
> --- a/hw/ssi/imx_spi.c
> +++ b/hw/ssi/imx_spi.c
> @@ -424,8 +424,6 @@ static void imx_spi_realize(DeviceState *dev, Error 
> **errp)
>  sysbus_init_mmio(SYS_BUS_DEVICE(dev), >iomem);
>  sysbus_init_irq(SYS_BUS_DEVICE(dev), >irq);
>
> -ssi_auto_connect_slaves(dev, s->cs_lines, s->bus);
> -
>  for (i = 0; i < 4; ++i) {
>  sysbus_init_irq(SYS_BUS_DEVICE(dev), >cs_lines[i]);
>  }
> diff --git a/hw/ssi/mss-spi.c b/hw/ssi/mss-spi.c
> index 3050fabb69..b2432c5a13 100644
> --- a/hw/ssi/mss-spi.c
> +++ b/hw/ssi/mss-spi.c
> @@ -376,7 +376,6 @@ static void mss_spi_realize(DeviceState *dev, Error 
> **errp)
>  s->spi = ssi_create_bus(dev, "spi");
>
>  sysbus_init_irq(sbd, >irq);
> -ssi_auto_connect_slaves(dev, >cs_line, s->spi);
>  sysbus_init_irq(sbd, >cs_line);
>
>  memory_region_init_io(>mmio, OBJECT(s), _ops, s,
> diff --git a/hw/ssi/ssi.c b/hw/ssi/ssi.c
> index c6415eb6e3..54106f5ef8 100644
> --- a/hw/ssi/ssi.c
> +++ b/hw/ssi/ssi.c
> @@ -142,36 +142,3 @@ static void ssi_slave_register_types(void)
>  }
>
>  type_init(ssi_slave_register_types)
> -
> -typedef struct SSIAutoConnectArg {
> -qemu_irq **cs_linep;
> -SSIBus *bus;
> -} SSIAutoConnectArg;
> -
> -static int ssi_auto_connect_slave(Object *child, void *opaque)
> -{
> -SSIAutoConnectArg *arg = opaque;
> -SSISlave *dev = (SSISlave *)object_dynamic_cast(child, TYPE_SSI_SLAVE);
> -qemu_irq cs_line;
> -
> -if (!dev) {
> -return 0;
> -}
> -
> -cs_line = qdev_get_gpio_in_named(DEVICE(dev), SSI_GPIO_CS, 0);
> -qdev_set_parent_bus(DEVICE(dev), BUS(arg->bus));
> -**arg->cs_linep = cs_line;
> -(*arg->cs_linep)++;
> -return 0;
> -}
> -
> -void ssi_auto_connect_slaves(DeviceState *parent, qemu_irq *cs_line,
> - SSIBus *bus)
> -{
> -SSIAutoConnectArg arg = {
> -.cs_linep = _line,
> -.bus = bus
> -};
> -
> -object_child_foreach(OBJECT(parent), ssi_auto_connect_slave, );
> -}
> diff --git a/hw/ssi/xilinx_spi.c b/hw/ssi/xilinx_spi.c
> index eba7ccd46a..80d1488dc7 100644
> --- a/hw/ssi/xilinx_spi.c
> +++ b/hw/ssi/xilinx_spi.c
> @@ -334,7 +334,6 @@ static void xilinx_spi_realize(DeviceState *dev, Error 
>

Re: [PATCH 22/55] ssi: Convert uses of ssi_create_slave_no_init() with Coccinelle

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 8:19 AM Markus Armbruster  wrote:
>
> Replace
>
> dev = ssi_create_slave_no_init(bus, type_name);
> ...
> qdev_init_nofail(dev);
>
> by
>
> dev = qdev_new(type_name);
> ...
> qdev_realize_and_unref(dev, bus, _fatal);
>
> Recent commit "qdev: New qdev_new(), qdev_realize(), etc." explains
> why.
>
> @@
> type SSIBus;
> identifier bus;
> expression dev, qbus, expr;
> expression list args;
> @@
> -bus = (SSIBus *)qbus;
> +bus = qbus; // TODO fix up decl
>  ...
> -dev = ssi_create_slave_no_init(bus, args);
> +dev = qdev_new(args);
>  ... when != dev = expr
> -qdev_init_nofail(dev);
> +qdev_realize_and_unref(dev, bus, _fatal);
>
> @@
> expression dev, bus, expr;
> expression list args;
> @@
> -dev = ssi_create_slave_no_init(bus, args);
> +dev = qdev_new(args);
>  ... when != dev = expr
> -qdev_init_nofail(dev);
> +qdev_realize_and_unref(dev, BUS(bus), _fatal);
>
> Bus declarations fixed up manually.
>
> Cc: Alistair Francis 
> Signed-off-by: Markus Armbruster 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/arm/aspeed.c |  4 ++--
>  hw/arm/msf2-som.c   |  8 
>  hw/arm/sabrelite.c  |  4 ++--
>  hw/arm/xilinx_zynq.c|  4 ++--
>  hw/arm/xlnx-zcu102.c| 16 
>  hw/microblaze/petalogix_ml605_mmu.c |  4 ++--
>  6 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index c425c01e06..adbfbbd6b4 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -225,12 +225,12 @@ static void aspeed_board_init_flashes(AspeedSMCState 
> *s, const char *flashtype,
>  DriveInfo *dinfo = drive_get_next(IF_MTD);
>  qemu_irq cs_line;
>
> -fl->flash = ssi_create_slave_no_init(s->spi, flashtype);
> +fl->flash = qdev_new(flashtype);
>  if (dinfo) {
>  qdev_prop_set_drive(fl->flash, "drive", 
> blk_by_legacy_dinfo(dinfo),
>  errp);
>  }
> -qdev_init_nofail(fl->flash);
> +qdev_realize_and_unref(fl->flash, BUS(s->spi), _fatal);
>
>  cs_line = qdev_get_gpio_in_named(fl->flash, SSI_GPIO_CS, 0);
>  sysbus_connect_irq(SYS_BUS_DEVICE(s), i + 1, cs_line);
> diff --git a/hw/arm/msf2-som.c b/hw/arm/msf2-som.c
> index e398703742..ca9cbe1acb 100644
> --- a/hw/arm/msf2-som.c
> +++ b/hw/arm/msf2-som.c
> @@ -47,7 +47,7 @@ static void emcraft_sf2_s2s010_init(MachineState *machine)
>  MachineClass *mc = MACHINE_GET_CLASS(machine);
>  DriveInfo *dinfo = drive_get_next(IF_MTD);
>  qemu_irq cs_line;
> -SSIBus *spi_bus;
> +BusState *spi_bus;
>  MemoryRegion *sysmem = get_system_memory();
>  MemoryRegion *ddr = g_new(MemoryRegion, 1);
>
> @@ -82,14 +82,14 @@ static void emcraft_sf2_s2s010_init(MachineState *machine)
>  soc = MSF2_SOC(dev);
>
>  /* Attach SPI flash to SPI0 controller */
> -spi_bus = (SSIBus *)qdev_get_child_bus(dev, "spi0");
> -spi_flash = ssi_create_slave_no_init(spi_bus, "s25sl12801");
> +spi_bus = qdev_get_child_bus(dev, "spi0");
> +spi_flash = qdev_new("s25sl12801");
>  qdev_prop_set_uint8(spi_flash, "spansion-cr2nv", 1);
>  if (dinfo) {
>  qdev_prop_set_drive(spi_flash, "drive", blk_by_legacy_dinfo(dinfo),
>  _fatal);
>  }
> -qdev_init_nofail(spi_flash);
> +qdev_realize_and_unref(spi_flash, spi_bus, _fatal);
>  cs_line = qdev_get_gpio_in_named(spi_flash, SSI_GPIO_CS, 0);
>  sysbus_connect_irq(SYS_BUS_DEVICE(>spi[0]), 1, cs_line);
>
> diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
> index 6f0e233d77..dfd6643822 100644
> --- a/hw/arm/sabrelite.c
> +++ b/hw/arm/sabrelite.c
> @@ -80,13 +80,13 @@ static void sabrelite_init(MachineState *machine)
>  qemu_irq cs_line;
>  DriveInfo *dinfo = drive_get_next(IF_MTD);
>
> -flash_dev = ssi_create_slave_no_init(spi_bus, "sst25vf016b");
> +flash_dev = qdev_new("sst25vf016b");
>  if (dinfo) {
>  qdev_prop_set_drive(flash_dev, "drive",
>  blk_by_legacy_dinfo(dinfo),
>  _fatal);
>  }
> -qdev_init_nofail(flash_dev);
> +qdev_realize_and_unref(flash_dev, BUS(spi_bus), 
> _fatal);
>
>  cs_line = qdev_get_gpio_in_named(flash_dev, SSI_GPIO_CS, 0);
>  sysbus_connect_irq(SYS_BUS_DEVICE(spi_dev), 1, cs_line);
> diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
> index 5fbd2b2e31..0e0f0976c4 100644
> --- a/hw/arm/xilinx_zynq.c
> +++ b/hw/arm/xilinx_zynq.c
> @@ -157,12 +157,12 @@ static inline void zynq_init_spi_flashes(uint32_t 
> base_addr, qemu_irq irq,
>

Re: [PATCH v2 5/5] iotests: add commit top->base cases to 274

2020-05-19 Thread Eric Blake


On 5/19/20 2:55 PM, Vladimir Sementsov-Ogievskiy wrote:

These cases are fixed by previous patches around block_status and
is_allocated.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/274 | 20 
  tests/qemu-iotests/274.out | 65 ++
  2 files changed, 85 insertions(+)


Okay, so this test fails when applied in isolation without the rest of 
your series.




diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
index 5d1bf34dff..e910455f13 100755
--- a/tests/qemu-iotests/274
+++ b/tests/qemu-iotests/274
@@ -115,6 +115,26 @@ with iotests.FilePath('base') as base, \
  iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
  iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), 
mid)
  
+iotests.log('=== Testing qemu-img commit (top -> base) ===')

+
+create_chain()
+iotests.qemu_img_log('commit', '-b', base, top)
+iotests.img_info_log(base)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), 
base)


So if I understand it, we are going from:

base
mid 
top 
guest   

and we want to go to:

base

except that we are not properly writing the zeroes into base, because we 
grabbed the wrong status, ending up with:


base

The status of top from 1M onwards is unallocated, and if we were to 
commit to just mid, Kevin's truncate fixes solve that (we now zero out 
the tail of mid as part of resizing it to be large enough).  But you are 
instead skipping mid, and committing all the way to base.  So we need 
_something_ that can tell qemu-img commit that even though the region 
1m-2m is unallocated in top, we must behave as though the status of mid 
reports it as allocated (because when reading beyond EOF in mid, we DO 
read zero).  Since the data is allocated not in top, but acts as though 
it was allocated in mid, which is above base, then the commit operation 
has to do something to preserve that allocation.


Okay, you've convinced me we have a bug.  However, I'm still not sold 
that patches 1 and 4 are quite the right fix.  Going back to the 
original setup, unpatched qemu.git head reports:


$ ./qemu-img map --output=json top.qcow2
[{ "start": 0, "length": 1048576, "depth": 2, "zero": false, "data": 
true, "offset": 327680},
{ "start": 1048576, "length": 1048576, "depth": 0, "zero": true, "data": 
false}]


I think what we really want is:

[{ "start": 0, "length": 1048576, "depth": 2, "zero": false, "data": 
true, "offset": 327680},
{ "start": 1048576, "length": 1048576, "depth": 1, "zero": true, "data": 
false}]


because then we would be _accurately_ reporting that the zeroes that we 
read from 1m-2m come _because_ we read from mid (beyond EOF), which is 
different from our current answer that the zeroes come from top (they 
don't, because top deferred to mid).  If we fix up qemu-img map output 
to correctly report zeroes beyond EOF from the correct layer, will that 
also fix up the bug we are seeing in qemu-img commit?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2 1/5] block/io: fix bdrv_co_block_status_above

2020-05-19 Thread Vladimir Sementsov-Ogievskiy


19.05.2020 23:41, Eric Blake wrote:

On 5/19/20 2:54 PM, Vladimir Sementsov-Ogievskiy wrote:

bdrv_co_block_status_above has several problems with handling short
backing files:

1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
which produces these after-EOF zeros is inside requested backing
sequence.


That's intentional.  That portion of the guest-visible data reads as zero 
(BDRV_BLOCK_ZERO set) but was NOT read from the top layer, but rather 
synthesized by the block layer because it derived from the backing file but was 
beyond EOF of that backing layer (BDRV_BLOCK_ALLOCATED is clear).


Not in top yes. But _inside_ the requested base..top backing-chain-part. So it 
should be considered ALLOCATED, as we should not go to further backing.

Assume the following chain:

topaa--
middle bb
base   

(so, middle is short)

block_status(top, 2) should return ZERO without ALLOCATED, as yes it's ZERO and 
yes, it's from another layer

block_status_above(top, base, 2) should return ZERO with ALLOCATED, as it's 
ZERO, and it's produced inside requested backing-chain-region, actually, it's 
produced because of short middle node. We must report ALLOCATED to show that we 
are not going to read from base.





2. With want_zero=false, it may return pnum=0 prior to actual EOF,
because of EOF of short backing file.


Do you have a reproducer for this?


No, I don't have one, but it seems possible at least with want_zero=false. I'll 
think of it tomorrow, too tired now.


In my experience, this is not possible.  Generally, if you request status that 
overlaps EOF of the backing, you get a response truncated to the end of the 
backing, and you are then likely to follow up with a subsequent status request 
starting from the underlying EOF which then sees the desired unallocated zeroes:

back 
top  yy--
request    ^^
response   ^^
request  
response 



Fix these things, making logic about short backing files clearer.

Note that 154 output changed, because now bdrv_block_status_above don't


doesn't


merge unallocated zeros with zeros after EOF (which are actually
"allocated" in POV of read from backing-chain top) and is_zero() just
don't understand that the whole head or tail is zero. We may update
is_zero to call bdrv_block_status_above several times, or add flag to
bdrv_block_status_above that we are not interested in ALLOCATED flag,
so ranges with different ALLOCATED status may be merged, but actually,
it seems that we'd better don't care about this corner case.


This actually sounds like an avoidable regression.  :(


I don't see real problem in it. But it seems not hard to avoid it, so I will 
try to.



I argue that if we did not explicitly write data/zero clusters in the tail of 
the top layer, then those clusters are not allocated from the POV of reading 
from the backing-chain top.  Yes, we know what their contents will be, but we 
also know what the contents of unallocated clusters will be when there is no 
backing file at all - basically, after your other patch series to drop 
unallocated_blocks_are_zero:
https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05429.html
then we know that only format drivers that can support backing files even care 
what allocation means, and 'allocated' strictly means that the data comes from 
the top layer rather than from a backing (whether directly from the backing, or 
synthesized as zero by the block layer because it was beyond EOF of the 
backing).


I agree about allocated in top, returned by block_status. But this patch is for 
allocated_above, and the ALLOCATED status is not about top, but about a set of 
nodes from base (not inclusive) to top.





Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/io.c | 38 +-
  tests/qemu-iotests/154.out |  4 ++--
  2 files changed, 31 insertions(+), 11 deletions(-)



I'm already not a fan of this patch - it adds lines rather than removes, and 
seems to add a regression.


diff --git a/block/io.c b/block/io.c
index 121ce17a49..db990e812b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2461,25 +2461,45 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
  ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
 file);
  if (ret < 0) {
-    break;
+    return ret;
  }
-    if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+    if (*pnum == 0) {
+    if (first) {
+    return ret;
+    }
+
  /*
- * Reading beyond the end of the file continues to read
- * zeroes, but we can only widen the result to the
- * unallocated length we learned from an earlier
- * iteration.
+ * Reads from bs for the selected region will return

Re: [PATCH 03/55] qdev: New qdev_new(), qdev_realize(), etc.

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 8:11 AM Markus Armbruster  wrote:
>
> We commonly plug devices into their bus right when we create them,
> like this:
>
> dev = qdev_create(bus, type_name);
>
> Note that @dev is a weak reference.  The reference from @bus to @dev
> is the only strong one.
>
> We realize at some later time, either with
>
> object_property_set_bool(OBJECT(dev), true, "realized", errp);
>
> or its convenience wrapper
>
> qdev_init_nofail(dev);
>
> If @dev still has no QOM parent then, realizing makes the
> /machine/unattached/ orphanage its QOM parent.
>
> Note that the device returned by qdev_create() is plugged into a bus,
> but doesn't have a QOM parent, yet.  Until it acquires one,
> unrealizing the bus will hang in bus_unparent():
>
> while ((kid = QTAILQ_FIRST(>children)) != NULL) {
> DeviceState *dev = kid->child;
> object_unparent(OBJECT(dev));
> }
>
> object_unparent() does nothing when its argument has no QOM parent,
> and the loop spins forever.
>
> Device state "no QOM parent, but plugged into bus" is dangerous.
>
> Paolo suggested to delay plugging into the bus until realize.  We need
> to plug into the parent bus before we call the device's realize
> method, in case it uses the parent bus.  So the dangerous state still
> exists, but only within realization, where we can manage it safely.
>
> This commit creates infrastructure to do this:
>
> dev = qdev_new(type_name);
> ...
> qdev_realize_and_unref(dev, bus, errp)
>
> Note that @dev becomes a strong reference here.
> qdev_realize_and_unref() drops it.  There is also plain
> qdev_realize(), which doesn't drop it.
>
> The remainder of this series will convert all users to this new
> interface.
>
> Cc: Michael S. Tsirkin 
> Cc: Marcel Apfelbaum 
> Cc: Alistair Francis 
> Cc: Gerd Hoffmann 
> Cc: Mark Cave-Ayland 
> Cc: David Gibson 
> Signed-off-by: Markus Armbruster 
> ---
>  include/hw/qdev-core.h | 11 -
>  hw/core/bus.c  | 14 +++
>  hw/core/qdev.c | 94 ++
>  3 files changed, 118 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index b870b27966..fba29308f7 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -57,7 +57,7 @@ typedef void (*BusUnrealize)(BusState *bus);
>   * After successful realization, setting static properties will fail.
>   *
>   * As an interim step, the #DeviceState:realized property can also be
> - * set with qdev_init_nofail().
> + * set with qdev_realize() or qdev_init_nofail().
>   * In the future, devices will propagate this state change to their children
>   * and along busses they expose.
>   * The point in time will be deferred to machine creation, so that values
> @@ -322,7 +322,13 @@ compat_props_add(GPtrArray *arr,
>
>  DeviceState *qdev_create(BusState *bus, const char *name);
>  DeviceState *qdev_try_create(BusState *bus, const char *name);
> +DeviceState *qdev_new(const char *name);
> +DeviceState *qdev_try_new(const char *name);
>  void qdev_init_nofail(DeviceState *dev);
> +bool qdev_realize(DeviceState *dev, BusState *bus, Error **errp);
> +bool qdev_realize_and_unref(DeviceState *dev, BusState *bus, Error **errp);
> +void qdev_unrealize(DeviceState *dev);
> +
>  void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
>   int required_for_version);
>  HotplugHandler *qdev_get_bus_hotplug_handler(DeviceState *dev);
> @@ -411,6 +417,9 @@ typedef int (qdev_walkerfn)(DeviceState *dev, void 
> *opaque);
>  void qbus_create_inplace(void *bus, size_t size, const char *typename,
>   DeviceState *parent, const char *name);
>  BusState *qbus_create(const char *typename, DeviceState *parent, const char 
> *name);
> +bool qbus_realize(BusState *bus, Error **errp);
> +void qbus_unrealize(BusState *bus);
> +
>  /* Returns > 0 if either devfn or busfn skip walk somewhere in cursion,
>   * < 0 if either devfn or busfn terminate walk somewhere in cursion,
>   *   0 otherwise. */
> diff --git a/hw/core/bus.c b/hw/core/bus.c
> index 08c5eab24a..bf622604a3 100644
> --- a/hw/core/bus.c
> +++ b/hw/core/bus.c
> @@ -169,6 +169,20 @@ BusState *qbus_create(const char *typename, DeviceState 
> *parent, const char *nam
>  return bus;
>  }
>
> +bool qbus_realize(BusState *bus, Error **errp)
> +{
> +Error *err = NULL;
> +
> +object_property_set_bool(OBJECT(bus), true, "realized", );
> +error_propagate(errp, err);
> +return !err;
> +}
> +
> +void qbus_unrealize(BusState *bus)
> +{
> +object_property_set_bool(OBJECT(bus), true, "realized", _abort);

Not false?

Alistair

> +}
> +
>  static bool bus_get_realized(Object *obj, Error **errp)
>  {
>  BusState *bus = BUS(obj);
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index a68ba674db..82deeb7841 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -176,6 +176,32 @@ DeviceState

Re: [PATCH 23/55] ssi: Convert last use of ssi_create_slave_no_init() manually

2020-05-19 Thread Alistair Francis

On Tue, May 19, 2020 at 8:03 AM Markus Armbruster  wrote:
>
> Same transformation as in the previous commit.  Manual, because
> convincing Coccinelle to transform this case is not worthwhile.
>
> Cc: Alistair Francis 
> Signed-off-by: Markus Armbruster 

Looks sane.

Acked-by: Alistair Francis 

Alistair

> ---
>  hw/ssi/ssi.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/ssi/ssi.c b/hw/ssi/ssi.c
> index 54106f5ef8..58e7d904db 100644
> --- a/hw/ssi/ssi.c
> +++ b/hw/ssi/ssi.c
> @@ -16,6 +16,7 @@
>  #include "hw/ssi/ssi.h"
>  #include "migration/vmstate.h"
>  #include "qemu/module.h"
> +#include "qapi/error.h"
>
>  struct SSIBus {
>  BusState parent_obj;
> @@ -96,9 +97,9 @@ DeviceState *ssi_create_slave_no_init(SSIBus *bus, const 
> char *name)
>
>  DeviceState *ssi_create_slave(SSIBus *bus, const char *name)
>  {
> -DeviceState *dev = ssi_create_slave_no_init(bus, name);
> +DeviceState *dev = qdev_new(name);
>
> -qdev_init_nofail(dev);
> +qdev_realize_and_unref(dev, >parent_obj, _fatal);
>  return dev;
>  }
>
> --
> 2.21.1
>
>

Re: [PATCH v2 4/5] block/io: fix bdrv_is_allocated_above

2020-05-19 Thread Eric Blake


On 5/19/20 2:55 PM, Vladimir Sementsov-Ogievskiy wrote:

bdrv_is_allocated_above wrongly handles short backing files: it reports
after-EOF space as UNALLOCATED which is wrong,


You haven't convinced me of that claim.


as on read the data is
generated on the level of short backing file (if all overlays has
unallocated area at that place).

Reusing bdrv_common_block_status_above fixes the issue and unifies code
path.


Unifying the code path is admirable, but I'm not sure we have the 
semantics right, yet.




Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/io.c | 43 +--
  1 file changed, 5 insertions(+), 38 deletions(-)


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2 1/5] block/io: fix bdrv_co_block_status_above

2020-05-19 Thread Eric Blake


On 5/19/20 2:54 PM, Vladimir Sementsov-Ogievskiy wrote:

bdrv_co_block_status_above has several problems with handling short
backing files:

1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
which produces these after-EOF zeros is inside requested backing
sequence.


That's intentional.  That portion of the guest-visible data reads as 
zero (BDRV_BLOCK_ZERO set) but was NOT read from the top layer, but 
rather synthesized by the block layer because it derived from the 
backing file but was beyond EOF of that backing layer 
(BDRV_BLOCK_ALLOCATED is clear).




2. With want_zero=false, it may return pnum=0 prior to actual EOF,
because of EOF of short backing file.


Do you have a reproducer for this?  In my experience, this is not 
possible.  Generally, if you request status that overlaps EOF of the 
backing, you get a response truncated to the end of the backing, and you 
are then likely to follow up with a subsequent status request starting 
from the underlying EOF which then sees the desired unallocated zeroes:


back 
top  yy--
request^^
response   ^^
request  
response 



Fix these things, making logic about short backing files clearer.

Note that 154 output changed, because now bdrv_block_status_above don't


doesn't


merge unallocated zeros with zeros after EOF (which are actually
"allocated" in POV of read from backing-chain top) and is_zero() just
don't understand that the whole head or tail is zero. We may update
is_zero to call bdrv_block_status_above several times, or add flag to
bdrv_block_status_above that we are not interested in ALLOCATED flag,
so ranges with different ALLOCATED status may be merged, but actually,
it seems that we'd better don't care about this corner case.


This actually sounds like an avoidable regression.  :(

I argue that if we did not explicitly write data/zero clusters in the 
tail of the top layer, then those clusters are not allocated from the 
POV of reading from the backing-chain top.  Yes, we know what their 
contents will be, but we also know what the contents of unallocated 
clusters will be when there is no backing file at all - basically, after 
your other patch series to drop unallocated_blocks_are_zero:

https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05429.html
then we know that only format drivers that can support backing files 
even care what allocation means, and 'allocated' strictly means that the 
data comes from the top layer rather than from a backing (whether 
directly from the backing, or synthesized as zero by the block layer 
because it was beyond EOF of the backing).




Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/io.c | 38 +-
  tests/qemu-iotests/154.out |  4 ++--
  2 files changed, 31 insertions(+), 11 deletions(-)



I'm already not a fan of this patch - it adds lines rather than removes, 
and seems to add a regression.



diff --git a/block/io.c b/block/io.c
index 121ce17a49..db990e812b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2461,25 +2461,45 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
  ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
 file);
  if (ret < 0) {
-break;
+return ret;
  }
-if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+if (*pnum == 0) {
+if (first) {
+return ret;
+}
+
  /*
- * Reading beyond the end of the file continues to read
- * zeroes, but we can only widen the result to the
- * unallocated length we learned from an earlier
- * iteration.
+ * Reads from bs for the selected region will return zeroes,
+ * produced because the current level is short. We should consider
+ * it as allocated.


Why?  If we replaced the backing file to something longer (qemu-img 
rebase -u), we would WANT to read from the backing file.  The only 
reason we read zero is because the block layer synthesized it _while_ 
deferring to the backing layer, not because it was directly allocated in 
the top layer.



+ *
+ * TODO: Should we report p as file here?


No. Reporting 'file' only makes sense if you can point to an offset 
within that file that would read the guest-visible data in question - 
but when the data is synthesized, there is no such offset.



   */
+assert(ret & BDRV_BLOCK_EOF);
  *pnum = bytes;
+return BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
  }
-if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
-break;
+if (ret & BDRV_BLOCK_ALLOCATED) {
+/* We've found the node and the status, we must return. */
+
+if

[Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs

2020-05-19 Thread Heiko Sieger

Thanks Jan. I had some new hardware/software issues combined with the
QEMU 5.0.. issues that had my Windows VM crash after some minutes.

I totally overlooked the following:



So I guess you posted to answer to this:
https://www.reddit.com/r/VFIO/comments/erwzrg/think_i_found_a_workaround_to_get_l3_cache_shared/

As it's late, I'll try tomorrow. Sorry for all the confusion but I had a
real tough time with this Ryzen build.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1856335

Title:
  Cache Layout wrong on many Zen Arch CPUs

Status in QEMU:
  New

Bug description:
  AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems
  to always map Cache ass if it was an 4-Core per CCX CPU, which is
  incorrect, and costs upwards 30% performance (more realistically 10%)
  in L3 Cache Layout aware applications.

  Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT):

    
  EPYC-IBPB
  AMD
  

  In windows, coreinfo reports correctly:

    Unified Cache 1, Level 3,8 MB, Assoc  16, LineSize  64
    Unified Cache 6, Level 3,8 MB, Assoc  16, LineSize  64

  On a 3-CCX CPU (3960X /w 6 cores and no SMT):

   
  EPYC-IBPB
  AMD
  

  in windows, coreinfo reports incorrectly:

  --  Unified Cache  1, Level 3,8 MB, Assoc  16, LineSize  64
  **  Unified Cache  6, Level 3,8 MB, Assoc  16, LineSize  64

  Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm.

  With newer Qemu there is a fix (that does behave correctly) in using the dies 
parameter:
   

  The problem is that the dies are exposed differently than how AMD does
  it natively, they are exposed to Windows as sockets, which means, that
  if you are nto a business user, you can't ever have a machine with
  more than two CCX (6 cores) as consumer versions of Windows only
  supports two sockets. (Should this be reported as a separate bug?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1856335/+subscriptions

Re: [PATCH v2 0/5] fix & merge block_status_above and is_allocated_above

2020-05-19 Thread Vladimir Sementsov-Ogievskiy


19.05.2020 23:21, Eric Blake wrote:

On 5/19/20 2:54 PM, Vladimir Sementsov-Ogievskiy wrote:


This leads to the following effect:

./qemu-img create -f qcow2 base.qcow2 2M
./qemu-io -c "write -P 0x1 0 2M" base.qcow2

./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M

Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
./qemu-io -c "read -P 0 1M 1M" top.qcow2
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)

But after commit guest visible state is changed, which seems wrong for me:
./qemu-img commit top.qcow2 -b mid.qcow2

./qemu-io -c "read -P 0 1M 1M" mid.qcow2
Pattern verification failed at offset 1048576, 1048576 bytes
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)


This no longer happens as of commit bf03dede47 and friends.  As such, how much 
of this series is still needed for other reasons?



Oops sorry. I blindly copied cover-letter of v1, and forget that it describes 
another thing. This test above is unrelated now. The whole series is valid, it 
fixes another problem (see 04 and new test cases in 05).

--
Best regards,
Vladimir

Re: [PATCH v2 0/5] fix & merge block_status_above and is_allocated_above

2020-05-19 Thread Eric Blake


On 5/19/20 2:54 PM, Vladimir Sementsov-Ogievskiy wrote:


This leads to the following effect:

./qemu-img create -f qcow2 base.qcow2 2M
./qemu-io -c "write -P 0x1 0 2M" base.qcow2

./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M

Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
./qemu-io -c "read -P 0 1M 1M" top.qcow2
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)

But after commit guest visible state is changed, which seems wrong for me:
./qemu-img commit top.qcow2 -b mid.qcow2

./qemu-io -c "read -P 0 1M 1M" mid.qcow2
Pattern verification failed at offset 1048576, 1048576 bytes
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)


This no longer happens as of commit bf03dede47 and friends.  As such, 
how much of this series is still needed for other reasons?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH 5/7] net/colo-compare.c: Only hexdump packets if tracing is enabled

2020-05-19 Thread Zhang Chen

From: Lukas Straub 

Else the log will be flooded if there is a lot of network
traffic.

Signed-off-by: Lukas Straub 
Reviewed-by: Zhang Chen 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index fe557b4693..db536c9419 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -490,10 +490,12 @@ sec:
 g_queue_push_head(>primary_list, ppkt);
 g_queue_push_head(>secondary_list, spkt);
 
-qemu_hexdump((char *)ppkt->data, stderr,
- "colo-compare ppkt", ppkt->size);
-qemu_hexdump((char *)spkt->data, stderr,
- "colo-compare spkt", spkt->size);
+if (trace_event_get_state_backends(TRACE_COLO_COMPARE_MISCOMPARE)) {
+qemu_hexdump((char *)ppkt->data, stderr,
+"colo-compare ppkt", ppkt->size);
+qemu_hexdump((char *)spkt->data, stderr,
+"colo-compare spkt", spkt->size);
+}
 
 colo_compare_inconsistency_notify(s);
 }
-- 
2.17.1

Re: Question: How do I discard any changes for the device which is set by blockdev option?

2020-05-19 Thread Masayoshi Mizuma

On Tue, May 19, 2020 at 01:41:08PM -0500, Eric Blake wrote:
> On 5/19/20 12:56 PM, Masayoshi Mizuma wrote:
> > Hello,
> > 
> > I would like to discard any changes while the qemu guest OS is done.
> > I can do that with snapshot and drive option.
> > However, snapshot option doesn't work for the device which set by
> > blockdev option like as:
> > 
> > $QEMU --enable-kvm \
> >-m 1024 \
> >-nographic \
> >-serial mon:stdio \
> >-blockdev driver=file,node-name=mydisk,filename=/mnt/fedora.qcow2 \
> >-blockdev driver=qcow2,node-name=vda,file=mydisk \
> >-device virtio-blk-pci,drive=vda,bootindex=1 \
> >-snapshot
> > 
> > I would like to use blockdev option to set the device because
> > libvirt uses blockdev option for disk element.
> > 
> > If there's no way to do so, does that make sense to get available
> > snapshot option to blockdev as well? If that makes sense, I'll try to
> > implement that.
> > 
> > As for qcow2, I think we can do such things to use qemu-img snapshot
> > command, for example save the original image and restore the image
> > after the qemu guest OS is shutdowned. However, it may be complecated
> > for user. I would like the simple way like as snapshot/drive option...
> > 
> > If I'm missing something, let me know.
> > 
> 
> Sounds like a repeat of this thread:
> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg06144.html
> 
> where the consensus is yes, -blockdev and -snapshot are incompatible,
> libvirt has plans to use the  tag to behave the same as what
> -snapshot does (but no one has implemented it yet), and in the meantime, it
> is possible to force libvirt to avoid -blockdev if you still need to supply
> -snapshot behind libvirt's back.

Thank you for the info! I didn't notice the thread.
I got we should implement that feature for libvirt side, not qemu.

Thanks!
Masa

> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>

[PATCH 1/7] colo-compare: Fix memory leak in packet_enqueue()

2020-05-19 Thread Zhang Chen

From: Derek Su 

The patch is to fix the "pkt" memory leak in packet_enqueue().
The allocated "pkt" needs to be freed if the colo compare
primary or secondary queue is too big.

Replace the error_report of full queue with a trace event.

Signed-off-by: Derek Su 
Reviewed-by: Zhang Chen 
Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 23 +++
 net/trace-events   |  1 +
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index c07e7c1c09..56d8976537 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -122,6 +122,10 @@ enum {
 SECONDARY_IN,
 };
 
+static const char *colo_mode[] = {
+[PRIMARY_IN] = "primary",
+[SECONDARY_IN] = "secondary",
+};
 
 static int compare_chr_send(CompareState *s,
 const uint8_t *buf,
@@ -217,6 +221,7 @@ static int packet_enqueue(CompareState *s, int mode, 
Connection **con)
 ConnectionKey key;
 Packet *pkt = NULL;
 Connection *conn;
+int ret;
 
 if (mode == PRIMARY_IN) {
 pkt = packet_new(s->pri_rs.buf,
@@ -245,16 +250,18 @@ static int packet_enqueue(CompareState *s, int mode, 
Connection **con)
 }
 
 if (mode == PRIMARY_IN) {
-if (!colo_insert_packet(>primary_list, pkt, >pack)) {
-error_report("colo compare primary queue size too big,"
- "drop packet");
-}
+ret = colo_insert_packet(>primary_list, pkt, >pack);
 } else {
-if (!colo_insert_packet(>secondary_list, pkt, >sack)) {
-error_report("colo compare secondary queue size too big,"
- "drop packet");
-}
+ret = colo_insert_packet(>secondary_list, pkt, >sack);
 }
+
+if (!ret) {
+trace_colo_compare_drop_packet(colo_mode[mode],
+"queue size too big, drop packet");
+packet_destroy(pkt, NULL);
+pkt = NULL;
+}
+
 *con = conn;
 
 return 0;
diff --git a/net/trace-events b/net/trace-events
index 02c13fd0ba..fa49c71533 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -12,6 +12,7 @@ colo_proxy_main(const char *chr) ": %s"
 
 # colo-compare.c
 colo_compare_main(const char *chr) ": %s"
+colo_compare_drop_packet(const char *queue, const char *chr) ": %s: %s"
 colo_compare_udp_miscompare(const char *sta, int size) ": %s = %d"
 colo_compare_icmp_miscompare(const char *sta, int size) ": %s = %d"
 colo_compare_ip_info(int psize, const char *sta, const char *stb, int ssize, 
const char *stc, const char *std) "ppkt size = %d, ip_src = %s, ip_dst = %s, 
spkt size = %d, ip_src = %s, ip_dst = %s"
-- 
2.17.1

[PATCH 7/7] net/colo-compare.c: Correct ordering in complete and finalize

2020-05-19 Thread Zhang Chen

From: Lukas Straub 

In colo_compare_complete, insert CompareState into net_compares
only after everything has been initialized.
In colo_compare_finalize, remove CompareState from net_compares
before anything is deinitialized.

Signed-off-by: Lukas Straub 
Reviewed-by: Zhang Chen 
Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 45 +++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 1ee8b9dc3c..8632681229 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1290,15 +1290,6 @@ static void colo_compare_complete(UserCreatable *uc, 
Error **errp)
s->vnet_hdr);
 }
 
-qemu_mutex_lock(_compare_mutex);
-if (!colo_compare_active) {
-qemu_mutex_init(_mtx);
-qemu_cond_init(_complete_cond);
-colo_compare_active = true;
-}
-QTAILQ_INSERT_TAIL(_compares, s, next);
-qemu_mutex_unlock(_compare_mutex);
-
 s->out_sendco.s = s;
 s->out_sendco.chr = >chr_out;
 s->out_sendco.notify_remote_frame = false;
@@ -1321,6 +1312,16 @@ static void colo_compare_complete(UserCreatable *uc, 
Error **errp)
   connection_destroy);
 
 colo_compare_iothread(s);
+
+qemu_mutex_lock(_compare_mutex);
+if (!colo_compare_active) {
+qemu_mutex_init(_mtx);
+qemu_cond_init(_complete_cond);
+colo_compare_active = true;
+}
+QTAILQ_INSERT_TAIL(_compares, s, next);
+qemu_mutex_unlock(_compare_mutex);
+
 return;
 }
 
@@ -1389,19 +1390,6 @@ static void colo_compare_finalize(Object *obj)
 CompareState *s = COLO_COMPARE(obj);
 CompareState *tmp = NULL;
 
-qemu_chr_fe_deinit(>chr_pri_in, false);
-qemu_chr_fe_deinit(>chr_sec_in, false);
-qemu_chr_fe_deinit(>chr_out, false);
-if (s->notify_dev) {
-qemu_chr_fe_deinit(>chr_notify_dev, false);
-}
-
-if (s->iothread) {
-colo_compare_timer_del(s);
-}
-
-qemu_bh_delete(s->event_bh);
-
 qemu_mutex_lock(_compare_mutex);
 QTAILQ_FOREACH(tmp, _compares, next) {
 if (tmp == s) {
@@ -1416,6 +1404,19 @@ static void colo_compare_finalize(Object *obj)
 }
 qemu_mutex_unlock(_compare_mutex);
 
+qemu_chr_fe_deinit(>chr_pri_in, false);
+qemu_chr_fe_deinit(>chr_sec_in, false);
+qemu_chr_fe_deinit(>chr_out, false);
+if (s->notify_dev) {
+qemu_chr_fe_deinit(>chr_notify_dev, false);
+}
+
+if (s->iothread) {
+colo_compare_timer_del(s);
+}
+
+qemu_bh_delete(s->event_bh);
+
 AioContext *ctx = iothread_get_aio_context(s->iothread);
 aio_context_acquire(ctx);
 AIO_WAIT_WHILE(ctx, !s->out_sendco.done);
-- 
2.17.1

[PATCH 4/7] net/colo-compare.c: Fix deadlock in compare_chr_send

2020-05-19 Thread Zhang Chen

From: Lukas Straub 

The chr_out chardev is connected to a filter-redirector
running in the main loop. qemu_chr_fe_write_all might block
here in compare_chr_send if the (socket-)buffer is full.
If another filter-redirector in the main loop want's to
send data to chr_pri_in it might also block if the buffer
is full. This leads to a deadlock because both event loops
get blocked.

Fix this by converting compare_chr_send to a coroutine and
putting the packets in a send queue.

Signed-off-by: Lukas Straub 
Reviewed-by: Zhang Chen 
Tested-by: Zhang Chen 
Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 193 ++---
 net/colo.c |   7 ++
 net/colo.h |   1 +
 3 files changed, 156 insertions(+), 45 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 2edfa31f6a..fe557b4693 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -32,6 +32,9 @@
 #include "migration/migration.h"
 #include "util.h"
 
+#include "block/aio-wait.h"
+#include "qemu/coroutine.h"
+
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
@@ -77,6 +80,23 @@ static int event_unhandled_count;
  *|packet  |  |packet  +|packet  | |packet  +
  *++  ++++ ++
  */
+
+typedef struct SendCo {
+Coroutine *co;
+struct CompareState *s;
+CharBackend *chr;
+GQueue send_list;
+bool notify_remote_frame;
+bool done;
+int ret;
+} SendCo;
+
+typedef struct SendEntry {
+uint32_t size;
+uint32_t vnet_hdr_len;
+uint8_t *buf;
+} SendEntry;
+
 typedef struct CompareState {
 Object parent;
 
@@ -91,6 +111,8 @@ typedef struct CompareState {
 SocketReadState pri_rs;
 SocketReadState sec_rs;
 SocketReadState notify_rs;
+SendCo out_sendco;
+SendCo notify_sendco;
 bool vnet_hdr;
 uint32_t compare_timeout;
 uint32_t expired_scan_cycle;
@@ -128,10 +150,11 @@ static const char *colo_mode[] = {
 };
 
 static int compare_chr_send(CompareState *s,
-const uint8_t *buf,
+uint8_t *buf,
 uint32_t size,
 uint32_t vnet_hdr_len,
-bool notify_remote_frame);
+bool notify_remote_frame,
+bool zero_copy);
 
 static bool packet_matches_str(const char *str,
const uint8_t *buf,
@@ -149,7 +172,7 @@ static void notify_remote_frame(CompareState *s)
 char msg[] = "DO_CHECKPOINT";
 int ret = 0;
 
-ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true);
+ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true, false);
 if (ret < 0) {
 error_report("Notify Xen COLO-frame failed");
 }
@@ -279,12 +302,13 @@ static void colo_release_primary_pkt(CompareState *s, 
Packet *pkt)
pkt->data,
pkt->size,
pkt->vnet_hdr_len,
-   false);
+   false,
+   true);
 if (ret < 0) {
 error_report("colo send primary packet failed");
 }
 trace_colo_compare_main("packet same and release packet");
-packet_destroy(pkt, NULL);
+packet_destroy_partial(pkt, NULL);
 }
 
 /*
@@ -706,65 +730,115 @@ static void colo_compare_connection(void *opaque, void 
*user_data)
 }
 }
 
-static int compare_chr_send(CompareState *s,
-const uint8_t *buf,
-uint32_t size,
-uint32_t vnet_hdr_len,
-bool notify_remote_frame)
+static void coroutine_fn _compare_chr_send(void *opaque)
 {
+SendCo *sendco = opaque;
+CompareState *s = sendco->s;
 int ret = 0;
-uint32_t len = htonl(size);
 
-if (!size) {
-return 0;
-}
+while (!g_queue_is_empty(>send_list)) {
+SendEntry *entry = g_queue_pop_tail(>send_list);
+uint32_t len = htonl(entry->size);
 
-if (notify_remote_frame) {
-ret = qemu_chr_fe_write_all(>chr_notify_dev,
-(uint8_t *),
-sizeof(len));
-} else {
-ret = qemu_chr_fe_write_all(>chr_out, (uint8_t *), sizeof(len));
-}
+ret = qemu_chr_fe_write_all(sendco->chr, (uint8_t *), sizeof(len));
 
-if (ret != sizeof(len)) {
-goto err;
-}
+if (ret != sizeof(len)) {
+g_free(entry->buf);
+g_slice_free(SendEntry, entry);
+goto err;
+}
 
-if (s->vnet_hdr) {
-/*
- * We send vnet header len make other module(like filter-redirector)
- * know how to parse net packet correctly.
- */
-len = htonl(vnet_hdr_len);
+if

[PATCH 2/7] net/colo-compare.c: Create event_bh with the right AioContext

2020-05-19 Thread Zhang Chen

From: Lukas Straub 

qemu_bh_new will set the bh to be executed in the main
loop. This causes crashes as colo_compare_handle_event assumes
that it has exclusive access the queues, which are also
concurrently accessed in the iothread.

Create the bh with the AioContext of the iothread to fulfill
these assumptions and fix the crashes. This is safe, because
the bh already takes the appropriate locks.

Signed-off-by: Lukas Straub 
Reviewed-by: Zhang Chen 
Reviewed-by: Derek Su 
Tested-by: Derek Su 
Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 56d8976537..2edfa31f6a 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -897,6 +897,7 @@ static void colo_compare_handle_event(void *opaque)
 
 static void colo_compare_iothread(CompareState *s)
 {
+AioContext *ctx = iothread_get_aio_context(s->iothread);
 object_ref(OBJECT(s->iothread));
 s->worker_context = iothread_get_g_main_context(s->iothread);
 
@@ -913,7 +914,7 @@ static void colo_compare_iothread(CompareState *s)
 }
 
 colo_compare_timer_init(s);
-s->event_bh = qemu_bh_new(colo_compare_handle_event, s);
+s->event_bh = aio_bh_new(ctx, colo_compare_handle_event, s);
 }
 
 static char *compare_get_pri_indev(Object *obj, Error **errp)
-- 
2.17.1

[PATCH 3/7] chardev/char.c: Use qemu_co_sleep_ns if in coroutine

2020-05-19 Thread Zhang Chen

From: Lukas Straub 

This will be needed in the next patch.

Signed-off-by: Lukas Straub 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Zhang Chen 
Signed-off-by: Zhang Chen 
---
 chardev/char.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/chardev/char.c b/chardev/char.c
index 0196e2887b..4c58ea1836 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -38,6 +38,7 @@
 #include "qemu/module.h"
 #include "qemu/option.h"
 #include "qemu/id.h"
+#include "qemu/coroutine.h"
 
 #include "chardev/char-mux.h"
 
@@ -119,7 +120,11 @@ static int qemu_chr_write_buffer(Chardev *s,
 retry:
 res = cc->chr_write(s, buf + *offset, len - *offset);
 if (res < 0 && errno == EAGAIN && write_all) {
-g_usleep(100);
+if (qemu_in_coroutine()) {
+qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 10);
+} else {
+g_usleep(100);
+}
 goto retry;
 }
 
-- 
2.17.1

[PATCH 6/7] net/colo-compare.c, softmmu/vl.c: Check that colo-compare is active

2020-05-19 Thread Zhang Chen

From: Lukas Straub 

If the colo-compare object is removed before failover and a
checkpoint happens, qemu crashes because it tries to lock
the destroyed event_mtx in colo_notify_compares_event.

Fix this by checking if everything is initialized by
introducing a new variable colo_compare_active which
is protected by a new mutex colo_compare_mutex. The new mutex
also protects against concurrent access of the net_compares
list and makes sure that colo_notify_compares_event isn't
active while we destroy event_mtx and event_complete_cond.

With this it also is again possible to use colo without
colo-compare (periodic mode) and to use multiple colo-compare
for multiple network interfaces.

Signed-off-by: Lukas Straub 
Reviewed-by: Zhang Chen 
Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 35 +--
 net/colo-compare.h |  1 +
 softmmu/vl.c   |  2 ++
 3 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index db536c9419..1ee8b9dc3c 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -54,6 +54,8 @@ static NotifierList colo_compare_notifiers =
 #define REGULAR_PACKET_CHECK_MS 3000
 #define DEFAULT_TIME_OUT_MS 3000
 
+static QemuMutex colo_compare_mutex;
+static bool colo_compare_active;
 static QemuMutex event_mtx;
 static QemuCond event_complete_cond;
 static int event_unhandled_count;
@@ -913,6 +915,12 @@ static void check_old_packet_regular(void *opaque)
 void colo_notify_compares_event(void *opaque, int event, Error **errp)
 {
 CompareState *s;
+qemu_mutex_lock(_compare_mutex);
+
+if (!colo_compare_active) {
+qemu_mutex_unlock(_compare_mutex);
+return;
+}
 
 qemu_mutex_lock(_mtx);
 QTAILQ_FOREACH(s, _compares, next) {
@@ -926,6 +934,7 @@ void colo_notify_compares_event(void *opaque, int event, 
Error **errp)
 }
 
 qemu_mutex_unlock(_mtx);
+qemu_mutex_unlock(_compare_mutex);
 }
 
 static void colo_compare_timer_init(CompareState *s)
@@ -1281,7 +1290,14 @@ static void colo_compare_complete(UserCreatable *uc, 
Error **errp)
s->vnet_hdr);
 }
 
+qemu_mutex_lock(_compare_mutex);
+if (!colo_compare_active) {
+qemu_mutex_init(_mtx);
+qemu_cond_init(_complete_cond);
+colo_compare_active = true;
+}
 QTAILQ_INSERT_TAIL(_compares, s, next);
+qemu_mutex_unlock(_compare_mutex);
 
 s->out_sendco.s = s;
 s->out_sendco.chr = >chr_out;
@@ -1299,9 +1315,6 @@ static void colo_compare_complete(UserCreatable *uc, 
Error **errp)
 
 g_queue_init(>conn_list);
 
-qemu_mutex_init(_mtx);
-qemu_cond_init(_complete_cond);
-
 s->connection_track_table = g_hash_table_new_full(connection_key_hash,
   connection_key_equal,
   g_free,
@@ -1389,12 +1402,19 @@ static void colo_compare_finalize(Object *obj)
 
 qemu_bh_delete(s->event_bh);
 
+qemu_mutex_lock(_compare_mutex);
 QTAILQ_FOREACH(tmp, _compares, next) {
 if (tmp == s) {
 QTAILQ_REMOVE(_compares, s, next);
 break;
 }
 }
+if (QTAILQ_EMPTY(_compares)) {
+colo_compare_active = false;
+qemu_mutex_destroy(_mtx);
+qemu_cond_destroy(_complete_cond);
+}
+qemu_mutex_unlock(_compare_mutex);
 
 AioContext *ctx = iothread_get_aio_context(s->iothread);
 aio_context_acquire(ctx);
@@ -1422,15 +1442,18 @@ static void colo_compare_finalize(Object *obj)
 object_unref(OBJECT(s->iothread));
 }
 
-qemu_mutex_destroy(_mtx);
-qemu_cond_destroy(_complete_cond);
-
 g_free(s->pri_indev);
 g_free(s->sec_indev);
 g_free(s->outdev);
 g_free(s->notify_dev);
 }
 
+void colo_compare_init_globals(void)
+{
+colo_compare_active = false;
+qemu_mutex_init(_compare_mutex);
+}
+
 static const TypeInfo colo_compare_info = {
 .name = TYPE_COLO_COMPARE,
 .parent = TYPE_OBJECT,
diff --git a/net/colo-compare.h b/net/colo-compare.h
index 22ddd512e2..eb483ac586 100644
--- a/net/colo-compare.h
+++ b/net/colo-compare.h
@@ -17,6 +17,7 @@
 #ifndef QEMU_COLO_COMPARE_H
 #define QEMU_COLO_COMPARE_H
 
+void colo_compare_init_globals(void);
 void colo_notify_compares_event(void *opaque, int event, Error **errp);
 void colo_compare_register_notifier(Notifier *notify);
 void colo_compare_unregister_notifier(Notifier *notify);
diff --git a/softmmu/vl.c b/softmmu/vl.c
index ae5451bc23..81602c12b5 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -112,6 +112,7 @@
 #include "qapi/qmp/qerror.h"
 #include "sysemu/iothread.h"
 #include "qemu/guest-random.h"
+#include "net/colo-compare.h"
 
 #define MAX_VIRTIO_CONSOLES 1
 
@@ -2906,6 +2907,7 @@ void qemu_init(int argc, char **argv, char **envp)
 precopy_infrastructure_init();
 postcopy_infrastructure_init();
 monitor_init_globals();
+colo_compare_init_globals();
 
 if (qcrypto_init() < 0)

[PATCH 0/7] Latest COLO tree queued patches

2020-05-19 Thread Zhang Chen

From: Zhang Chen 

Hi Jason, this series include latest COLO related patches.
I have finish basic test and review.
If no other comments, please check and merge this series.

Derek Su (1):
  colo-compare: Fix memory leak in packet_enqueue()

Lukas Straub (6):
  net/colo-compare.c: Create event_bh with the right AioContext
  chardev/char.c: Use qemu_co_sleep_ns if in coroutine
  net/colo-compare.c: Fix deadlock in compare_chr_send
  net/colo-compare.c: Only hexdump packets if tracing is enabled
  net/colo-compare.c, softmmu/vl.c: Check that colo-compare is active
  net/colo-compare.c: Correct ordering in complete and finalize

 chardev/char.c |   7 +-
 net/colo-compare.c | 277 +
 net/colo-compare.h |   1 +
 net/colo.c |   7 ++
 net/colo.h |   1 +
 net/trace-events   |   1 +
 softmmu/vl.c   |   2 +
 7 files changed, 225 insertions(+), 71 deletions(-)

-- 
2.17.1

[PATCH v2 5/5] iotests: add commit top->base cases to 274

2020-05-19 Thread Vladimir Sementsov-Ogievskiy

These cases are fixed by previous patches around block_status and
is_allocated.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/274 | 20 
 tests/qemu-iotests/274.out | 65 ++
 2 files changed, 85 insertions(+)

diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
index 5d1bf34dff..e910455f13 100755
--- a/tests/qemu-iotests/274
+++ b/tests/qemu-iotests/274
@@ -115,6 +115,26 @@ with iotests.FilePath('base') as base, \
 iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
 iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
 
+iotests.log('=== Testing qemu-img commit (top -> base) ===')
+
+create_chain()
+iotests.qemu_img_log('commit', '-b', base, top)
+iotests.img_info_log(base)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), 
base)
+
+iotests.log('=== Testing QMP active commit (top -> base) ===')
+
+create_chain()
+with create_vm() as vm:
+vm.launch()
+vm.qmp_log('block-commit', device='top', base_node='base',
+   job_id='job0', auto_dismiss=False)
+vm.run_job('job0', wait=5)
+
+iotests.img_info_log(mid)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), 
base)
 
 iotests.log('== Resize tests ==')
 
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
index d24ff681af..9806dea8b6 100644
--- a/tests/qemu-iotests/274.out
+++ b/tests/qemu-iotests/274.out
@@ -129,6 +129,71 @@ read 1048576/1048576 bytes at offset 0
 read 1048576/1048576 bytes at offset 1048576
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+=== Testing qemu-img commit (top -> base) ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=2097152 cluster_size=65536 
lazy_refcounts=off refcount_bits=16 compression_type=zlib
+
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 size=1048576 
backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off 
refcount_bits=16 compression_type=zlib
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=2097152 
backing_file=TEST_DIR/PID-mid cluster_size=65536 lazy_refcounts=off 
refcount_bits=16 compression_type=zlib
+
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Image committed.
+
+image: TEST_IMG
+file format: IMGFMT
+virtual size: 2 MiB (2097152 bytes)
+cluster_size: 65536
+Format specific information:
+compat: 1.1
+compression type: zlib
+lazy refcounts: false
+refcount bits: 16
+corrupt: false
+
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing QMP active commit (top -> base) ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=2097152 cluster_size=65536 
lazy_refcounts=off refcount_bits=16 compression_type=zlib
+
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 size=1048576 
backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off 
refcount_bits=16 compression_type=zlib
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=2097152 
backing_file=TEST_DIR/PID-mid cluster_size=65536 lazy_refcounts=off 
refcount_bits=16 compression_type=zlib
+
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": 
"base", "device": "top", "job-id": "job0"}}
+{"return": {}}
+{"execute": "job-complete", "arguments": {"id": "job0"}}
+{"return": {}}
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, 
"type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": 
"USECS", "seconds": "SECS"}}
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, 
"type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "job-dismiss", "arguments": {"id": "job0"}}
+{"return": {}}
+image: TEST_IMG
+file format: IMGFMT
+virtual size: 1 MiB (1048576 bytes)
+cluster_size: 65536
+backing file: TEST_DIR/PID-base
+Format specific information:
+compat: 1.1
+compression type: zlib
+lazy refcounts: false
+refcount bits: 16
+corrupt: false
+
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 == Resize tests ==
 === preallocation=off ===
 Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=6442450944 cluster_size=65536 
lazy_refcounts=off refcount_bits=16 compression_type=zlib
-- 
2.21.0

[PATCH v2 2/5] block/io: bdrv_common_block_status_above: support include_base

2020-05-19 Thread Vladimir Sementsov-Ogievskiy

In order to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above, let's support include_base parameter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Kevin Wolf 
---
 block/io.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index db990e812b..cdc0e6663e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2223,6 +2223,7 @@ int bdrv_flush_all(void)
 typedef struct BdrvCoBlockStatusData {
 BlockDriverState *bs;
 BlockDriverState *base;
+bool include_base;
 bool want_zero;
 int64_t offset;
 int64_t bytes;
@@ -2445,6 +2446,7 @@ early_out:
 
 static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
+   bool include_base,
bool want_zero,
int64_t offset,
int64_t bytes,
@@ -2456,8 +2458,8 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
 int ret = 0;
 bool first = true;
 
-assert(bs != base);
-for (p = bs; p != base; p = backing_bs(p)) {
+assert(include_base || bs != base);
+for (p = bs; include_base || p != base; p = backing_bs(p)) {
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
file);
 if (ret < 0) {
@@ -2495,6 +2497,11 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
 
 /* [offset, pnum] unallocated on this layer, which could be only
  * the first part of [offset, bytes].  */
+
+if (p == base) {
+break;
+}
+
 assert(*pnum <= bytes);
 bytes = *pnum;
 first = false;
@@ -2509,7 +2516,7 @@ static void coroutine_fn 
bdrv_block_status_above_co_entry(void *opaque)
 BdrvCoBlockStatusData *data = opaque;
 
 data->ret = bdrv_co_block_status_above(data->bs, data->base,
-   data->want_zero,
+   data->include_base, data->want_zero,
data->offset, data->bytes,
data->pnum, data->map, data->file);
 data->done = true;
@@ -2523,6 +2530,7 @@ static void coroutine_fn 
bdrv_block_status_above_co_entry(void *opaque)
  */
 static int bdrv_common_block_status_above(BlockDriverState *bs,
   BlockDriverState *base,
+  bool include_base,
   bool want_zero, int64_t offset,
   int64_t bytes, int64_t *pnum,
   int64_t *map,
@@ -2532,6 +2540,7 @@ static int 
bdrv_common_block_status_above(BlockDriverState *bs,
 BdrvCoBlockStatusData data = {
 .bs = bs,
 .base = base,
+.include_base = include_base,
 .want_zero = want_zero,
 .offset = offset,
 .bytes = bytes,
@@ -2556,7 +2565,7 @@ int bdrv_block_status_above(BlockDriverState *bs, 
BlockDriverState *base,
 int64_t offset, int64_t bytes, int64_t *pnum,
 int64_t *map, BlockDriverState **file)
 {
-return bdrv_common_block_status_above(bs, base, true, offset, bytes,
+return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
   pnum, map, file);
 }
 
@@ -2573,7 +2582,7 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, 
int64_t offset,
 int ret;
 int64_t dummy;
 
-ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
+ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
  bytes, pnum ? pnum : , NULL,
  NULL);
 if (ret < 0) {
-- 
2.21.0

[PATCH v2 4/5] block/io: fix bdrv_is_allocated_above

2020-05-19 Thread Vladimir Sementsov-Ogievskiy

bdrv_is_allocated_above wrongly handles short backing files: it reports
after-EOF space as UNALLOCATED which is wrong, as on read the data is
generated on the level of short backing file (if all overlays has
unallocated area at that place).

Reusing bdrv_common_block_status_above fixes the issue and unifies code
path.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 43 +--
 1 file changed, 5 insertions(+), 38 deletions(-)

diff --git a/block/io.c b/block/io.c
index df44e89b7d..61f0930626 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2610,52 +2610,19 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
*bs, int64_t offset,
  * at 'offset + *pnum' may return the same allocation status (in other
  * words, the result is not necessarily the maximum possible range);
  * but 'pnum' will only be 0 when end of file is reached.
- *
  */
 int bdrv_is_allocated_above(BlockDriverState *top,
 BlockDriverState *base,
 bool include_base, int64_t offset,
 int64_t bytes, int64_t *pnum)
 {
-BlockDriverState *intermediate;
-int ret;
-int64_t n = bytes;
-
-assert(base || !include_base);
-
-intermediate = top;
-while (include_base || intermediate != base) {
-int64_t pnum_inter;
-int64_t size_inter;
-
-assert(intermediate);
-ret = bdrv_is_allocated(intermediate, offset, bytes, _inter);
-if (ret < 0) {
-return ret;
-}
-if (ret) {
-*pnum = pnum_inter;
-return 1;
-}
-
-size_inter = bdrv_getlength(intermediate);
-if (size_inter < 0) {
-return size_inter;
-}
-if (n > pnum_inter &&
-(intermediate == top || offset + pnum_inter < size_inter)) {
-n = pnum_inter;
-}
-
-if (intermediate == base) {
-break;
-}
-
-intermediate = backing_bs(intermediate);
+int ret = bdrv_common_block_status_above(top, base, include_base, false,
+ offset, bytes, pnum, NULL, NULL);
+if (ret < 0) {
+return ret;
 }
 
-*pnum = n;
-return 0;
+return !!(ret & BDRV_BLOCK_ALLOCATED);
 }
 
 typedef struct BdrvVmstateCo {
-- 
2.21.0

[PATCH v2 1/5] block/io: fix bdrv_co_block_status_above

2020-05-19 Thread Vladimir Sementsov-Ogievskiy

bdrv_co_block_status_above has several problems with handling short
backing files:

1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
which produces these after-EOF zeros is inside requested backing
sequence.

2. With want_zero=false, it may return pnum=0 prior to actual EOF,
because of EOF of short backing file.

Fix these things, making logic about short backing files clearer.

Note that 154 output changed, because now bdrv_block_status_above don't
merge unallocated zeros with zeros after EOF (which are actually
"allocated" in POV of read from backing-chain top) and is_zero() just
don't understand that the whole head or tail is zero. We may update
is_zero to call bdrv_block_status_above several times, or add flag to
bdrv_block_status_above that we are not interested in ALLOCATED flag,
so ranges with different ALLOCATED status may be merged, but actually,
it seems that we'd better don't care about this corner case.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 38 +-
 tests/qemu-iotests/154.out |  4 ++--
 2 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/block/io.c b/block/io.c
index 121ce17a49..db990e812b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2461,25 +2461,45 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
file);
 if (ret < 0) {
-break;
+return ret;
 }
-if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+if (*pnum == 0) {
+if (first) {
+return ret;
+}
+
 /*
- * Reading beyond the end of the file continues to read
- * zeroes, but we can only widen the result to the
- * unallocated length we learned from an earlier
- * iteration.
+ * Reads from bs for the selected region will return zeroes,
+ * produced because the current level is short. We should consider
+ * it as allocated.
+ *
+ * TODO: Should we report p as file here?
  */
+assert(ret & BDRV_BLOCK_EOF);
 *pnum = bytes;
+return BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
 }
-if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
-break;
+if (ret & BDRV_BLOCK_ALLOCATED) {
+/* We've found the node and the status, we must return. */
+
+if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+/*
+ * This level is also responsible for reads after EOF inside
+ * the unallocated region in the previous level.
+ */
+*pnum = bytes;
+}
+
+return ret;
 }
+
 /* [offset, pnum] unallocated on this layer, which could be only
  * the first part of [offset, bytes].  */
-bytes = MIN(bytes, *pnum);
+assert(*pnum <= bytes);
+bytes = *pnum;
 first = false;
 }
+
 return ret;
 }
 
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index fa3673317f..a203dfcadd 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -310,13 +310,13 @@ wrote 512/512 bytes at offset 134217728
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 2048/2048 bytes allocated at offset 128 MiB
 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
-{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET}]
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 
backing_file=TEST_DIR/t.IMGFMT.base
 wrote 512/512 bytes at offset 134219264
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 2048/2048 bytes allocated at offset 128 MiB
 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
-{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET}]
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 
backing_file=TEST_DIR/t.IMGFMT.base
 wrote 1024/1024 bytes at offset 134218240
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-- 
2.21.0

[PATCH v2 3/5] block/io: bdrv_common_block_status_above: support bs == base

2020-05-19 Thread Vladimir Sementsov-Ogievskiy

We are going to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
include_base == false and still bs == base (for ex. from img_rebase()).

So, support this corner case.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Kevin Wolf 
---
 block/io.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index cdc0e6663e..df44e89b7d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2458,7 +2458,11 @@ static int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
 int ret = 0;
 bool first = true;
 
-assert(include_base || bs != base);
+if (!include_base && bs == base) {
+*pnum = bytes;
+return 0;
+}
+
 for (p = bs; include_base || p != base; p = backing_bs(p)) {
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
file);
-- 
2.21.0

[PATCH v2 0/5] fix & merge block_status_above and is_allocated_above

2020-05-19 Thread Vladimir Sementsov-Ogievskiy

Hi all!

v2:
01: wording, grammar, keep comment
02-03: add Kevin's r-bs
05: test-output rebased on compression type qcow2 extension

=

I wanted to understand, what is the real difference between 
bdrv_block_status_above
and bdrv_is_allocated_above, IMHO bdrv_is_allocated_above should work through
bdrv_block_status_above..

And I found the problem: bdrv_is_allocated_above considers space after EOF as
UNALLOCATED for intermediate nodes..

UNALLOCATED is not about allocation at fs level, but about should we go to 
backing or
not.. And it seems incorrect for me, as in case of short backing file, we'll 
read
zeroes after EOF, instead of going further by backing chain.

This leads to the following effect:

./qemu-img create -f qcow2 base.qcow2 2M
./qemu-io -c "write -P 0x1 0 2M" base.qcow2

./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M

Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
./qemu-io -c "read -P 0 1M 1M" top.qcow2
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)

But after commit guest visible state is changed, which seems wrong for me:
./qemu-img commit top.qcow2 -b mid.qcow2

./qemu-io -c "read -P 0 1M 1M" mid.qcow2
Pattern verification failed at offset 1048576, 1048576 bytes
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)

./qemu-io -c "read -P 1 1M 1M" mid.qcow2
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)

=

bdrv_block_allocated_above behaves strange too:

with want_zero=true, it may report unallocated zeroes because of short backing 
files, which
are actually "allocated" in POV of backing chains. But I see this may influence 
only
qemu-img compare, and I don't see can it trigger some bug..

with want_zero=false, it may do no progress because of short backing file. 
Moreover it may
report EOF in the middle!! But want_zero=false used only in bdrv_is_allocated, 
which considers
onlyt top layer, so it seems OK. 

Vladimir Sementsov-Ogievskiy (5):
  block/io: fix bdrv_co_block_status_above
  block/io: bdrv_common_block_status_above: support include_base
  block/io: bdrv_common_block_status_above: support bs == base
  block/io: fix bdrv_is_allocated_above
  iotests: add commit top->base cases to 274

 block/io.c | 104 ++---
 tests/qemu-iotests/154.out |   4 +-
 tests/qemu-iotests/274 |  20 +++
 tests/qemu-iotests/274.out |  65 +++
 4 files changed, 139 insertions(+), 54 deletions(-)

-- 
2.21.0

Re: [PATCH QEMU v22 04/18] vfio: Add save and load functions for VFIO PCI devices

2020-05-19 Thread Alex Williamson

On Tue, 19 May 2020 20:28:13 +0100
"Dr. David Alan Gilbert"  wrote:

> * Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
> > * Kirti Wankhede (kwankh...@nvidia.com) wrote:  
> > > These functions save and restore PCI device specific data - config
> > > space of PCI device.
> > > Tested save and restore with MSI and MSIX type.  
> > 
> > I don't think my comments from v16 on 26th March were addressed/replied
> > to:  
> 
> 
> Oops, I've just spotted your reply from earlier this month; so:
> 
> > > Signed-off-by: Kirti Wankhede 
> > > Reviewed-by: Neo Jia 
> > > ---
> > >  hw/vfio/pci.c | 163 
> > > ++
> > >  include/hw/vfio/vfio-common.h |   2 +
> > >  2 files changed, 165 insertions(+)
> > > 
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index 6c77c12e44b9..36b1e08f84d8 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -41,6 +41,7 @@
> > >  #include "trace.h"
> > >  #include "qapi/error.h"
> > >  #include "migration/blocker.h"
> > > +#include "migration/qemu-file.h"
> > >  
> > >  #define TYPE_VFIO_PCI "vfio-pci"
> > >  #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
> > > @@ -1632,6 +1633,50 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
> > >  }
> > >  }
> > >  
> > > +static int vfio_bar_validate(VFIOPCIDevice *vdev, int nr)
> > > +{
> > > +PCIDevice *pdev = >pdev;
> > > +VFIOBAR *bar = >bars[nr];
> > > +uint64_t addr;
> > > +uint32_t addr_lo, addr_hi = 0;
> > > +
> > > +/* Skip unimplemented BARs and the upper half of 64bit BARS. */
> > > +if (!bar->size) {
> > > +return 0;
> > > +}
> > > +
> > > +addr_lo = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + nr * 4, 
> > > 4);
> > > +
> > > +addr_lo &= (bar->ioport ? PCI_BASE_ADDRESS_IO_MASK :
> > > +  PCI_BASE_ADDRESS_MEM_MASK);
> > > +if (bar->type == PCI_BASE_ADDRESS_MEM_TYPE_64) {
> > > +addr_hi = pci_default_read_config(pdev,
> > > + PCI_BASE_ADDRESS_0 + (nr + 1) * 
> > > 4, 4);
> > > +}
> > > +
> > > +addr = ((uint64_t)addr_hi << 32) | addr_lo;
> > > +
> > > +if (!QEMU_IS_ALIGNED(addr, bar->size)) {
> > > +return -EINVAL;
> > > +}
> > > +
> > > +return 0;
> > > +}
> > > +
> > > +static int vfio_bars_validate(VFIOPCIDevice *vdev)
> > > +{
> > > +int i, ret;
> > > +
> > > +for (i = 0; i < PCI_ROM_SLOT; i++) {
> > > +ret = vfio_bar_validate(vdev, i);
> > > +if (ret) {
> > > +error_report("vfio: BAR address %d validation failed", i);
> > > +return ret;
> > > +}
> > > +}
> > > +return 0;
> > > +}
> > > +
> > >  static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
> > >  {
> > >  VFIOBAR *bar = >bars[nr];
> > > @@ -2414,11 +2459,129 @@ static Object *vfio_pci_get_object(VFIODevice 
> > > *vbasedev)
> > >  return OBJECT(vdev);
> > >  }
> > >  
> > > +static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
> > > +{
> > > +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, 
> > > vbasedev);
> > > +PCIDevice *pdev = >pdev;
> > > +uint16_t pci_cmd;
> > > +int i;
> > > +
> > > +for (i = 0; i < PCI_ROM_SLOT; i++) {
> > > +uint32_t bar;
> > > +
> > > +bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 
> > > 4);
> > > +qemu_put_be32(f, bar);
> > > +}
> > > +
> > > +qemu_put_be32(f, vdev->interrupt);
> > > +if (vdev->interrupt == VFIO_INT_MSI) {
> > > +uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
> > > +bool msi_64bit;
> > > +
> > > +msi_flags = pci_default_read_config(pdev, pdev->msi_cap + 
> > > PCI_MSI_FLAGS,
> > > +2);
> > > +msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
> > > +
> > > +msi_addr_lo = pci_default_read_config(pdev,
> > > + pdev->msi_cap + 
> > > PCI_MSI_ADDRESS_LO, 4);
> > > +qemu_put_be32(f, msi_addr_lo);
> > > +
> > > +if (msi_64bit) {
> > > +msi_addr_hi = pci_default_read_config(pdev,
> > > + pdev->msi_cap + 
> > > PCI_MSI_ADDRESS_HI,
> > > + 4);
> > > +}
> > > +qemu_put_be32(f, msi_addr_hi);
> > > +
> > > +msi_data = pci_default_read_config(pdev,
> > > +pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
> > > PCI_MSI_DATA_32),
> > > +2);
> > > +qemu_put_be16(f, msi_data);
> > > +} else if (vdev->interrupt == VFIO_INT_MSIX) {
> > > +uint16_t offset;
> > > +
> > > +/* save enable bit and maskall bit */
> > > +offset = pci_default_read_config(pdev,
> > > +   pdev->msix_cap + PCI_MSIX_FLAGS + 
> > > 1, 2);
> > > +

[PATCH 1/2] linux-user: Build vdso for x64.

2020-05-19 Thread Richard Henderson

From: Richard Henderson 

... Well, sortof.  The Makefile bits are broken.
Patch to load the vdso into the running program to follow.

Signed-off-by: Richard Henderson 
---
 Makefile  |   4 +-
 pc-bios/Makefile  |   5 ++
 pc-bios/vdso-linux-x64.S  | 115 ++
 pc-bios/vdso-linux-x64.ld |  81 +++
 pc-bios/vdso-linux-x64.so | Bin 0 -> 7500 bytes
 5 files changed, 203 insertions(+), 2 deletions(-)
 create mode 100644 pc-bios/vdso-linux-x64.S
 create mode 100644 pc-bios/vdso-linux-x64.ld
 create mode 100755 pc-bios/vdso-linux-x64.so

diff --git a/Makefile b/Makefile
index 40e4f7677b..73e380ac6a 100644
--- a/Makefile
+++ b/Makefile
@@ -848,8 +848,8 @@ qemu_vga.ndrv \
 edk2-licenses.txt \
 hppa-firmware.img \
 opensbi-riscv32-sifive_u-fw_jump.bin opensbi-riscv32-virt-fw_jump.bin \
-opensbi-riscv64-sifive_u-fw_jump.bin opensbi-riscv64-virt-fw_jump.bin
-
+opensbi-riscv64-sifive_u-fw_jump.bin opensbi-riscv64-virt-fw_jump.bin \
+vdso-linux-x64.so
 
 DESCS=50-edk2-i386-secure.json 50-edk2-x86_64-secure.json \
 60-edk2-aarch64.json 60-edk2-arm.json 60-edk2-i386.json 60-edk2-x86_64.json
diff --git a/pc-bios/Makefile b/pc-bios/Makefile
index 315288df84..70e2485e2e 100644
--- a/pc-bios/Makefile
+++ b/pc-bios/Makefile
@@ -15,5 +15,10 @@ all: $(TARGETS)
 %.dtb: %.dts
dtc -I dts -O dtb -o $@ $<
 
+vdso-linux-x64.so: vdso-linux-x64.o vdso-linux-x64.ld
+   $(CC) -nostdlib -shared -Wl,-T,vdso-linux-x64.ld \
+ -Wl,-h,linux-vdso.so.1 -Wl,--hash-style=both \
+ vdso-linux-x64.o -o $@
+
 clean:
rm -f $(TARGETS) *.o *~
diff --git a/pc-bios/vdso-linux-x64.S b/pc-bios/vdso-linux-x64.S
new file mode 100644
index 00..090d82c26a
--- /dev/null
+++ b/pc-bios/vdso-linux-x64.S
@@ -0,0 +1,115 @@
+/*
+ *  x86-64 linux replacement vdso.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include 
+
+   .globl  __vdso_clock_gettime
+   .type   __vdso_clock_gettime, @function
+   .balign 16
+   .cfi_startproc
+__vdso_clock_gettime:
+   mov $__NR_clock_gettime, %eax
+   syscall
+   ret
+   .cfi_endproc
+   .size   __vdso_clock_gettime, . - __vdso_clock_gettime
+
+clock_gettime = __vdso_clock_gettime
+   .weak   clock_gettime
+
+
+   .globl  __vdso_gettimeofday
+   .type   __vdso_gettimeofday, @function
+   .balign 16
+   .cfi_startproc
+__vdso_gettimeofday:
+   mov $__NR_gettimeofday, %eax
+   syscall
+   ret
+   .cfi_endproc
+   .size   __vdso_gettimeofday, . - __vdso_gettimeofday
+
+gettimeofday = __vdso_gettimeofday
+   .weak   gettimeofday
+
+
+   .globl  __vdso_time
+   .type   __vdso_time, @function
+   .balign 16
+   .cfi_startproc
+__vdso_time:
+   mov $__NR_time, %eax
+   syscall
+   ret
+   .cfi_endproc
+   .size   __vdso_time, . - __vdso_time
+
+time = __vdso_time
+   .weak   time
+
+
+   .globl  __vdso_getcpu
+   .type   __vdso_getcpu, @function
+   .balign 16
+   .cfi_startproc
+__vdso_getcpu:
+   /* ??? There is no syscall number for this allocated on x64.
+  We can handle this several ways:
+
+  (1) Invent a syscall number for use within qemu.
+   It should be easy enough to pick a number that
+   is well out of the way of the kernel numbers.
+
+   (2) Force the emulated cpu to support the rdtscp insn,
+  and initialize the TSC_AUX value the appropriate value.
+
+  (3) Pretend that we're always running on cpu 0.
+
+  This last is the one that's implemented here, with the
+  tiny bit of extra code to support rdtscp in place.  */
+
+   xor %ecx, %ecx  /* rdtscp w/ tsc_aux = 0 */
+
+   /* if (cpu != NULL) *cpu = (ecx & 0xfff); */
+   test%rdi, %rdi
+   jz  1f
+   mov %ecx, %eax
+   and $0xfff, %eax
+   mov %eax, (%rdi)
+
+   /* if (node != NULL) *node = (ecx >> 12); */
+1: test%rsi, %rsi
+   jz  2f
+   shr $12, %ecx
+   mov %ecx, (%rsi)
+
+2: xor %eax, %eax
+   ret
+   .cfi_endproc
+   .size   __vdso_getcpu, . - __vdso_getcpu
+
+getcpu = __vdso_getcpu
+   .weak   getcpu
+
+/* ??? Perhaps add elf notes.  E.g.
+
+   #include 
+

[PATCH 0/2] linux-user: Load a vdso for x86_64

2020-05-19 Thread Richard Henderson

The subject of AT_SYSINFO came up on launchpad recently.

There is definite room for improvement in all of this:

(1) We could build the vdso binary into qemu instead of really
loading it from the file system.  This would obviate the
several problems of locating the .so file.  It would also
mean that --static builds continue to create a standalone
qemu binary.

(2) We could use our cross-build system to build the vdso.
Though we'd still likely want to keep the image in git
along side the other rom images for when cross-build is
not available.

(3) There are some ??? comments where some decisions could be made,
and other ??? that are merely commenting on weirdness.

(4) It shouldn't take too much effort to create vdsos for the
other architectures.  But we should get this one as clean
as we can first.

Amusingly, this patch set has just turned 10 years old.
First posted April 4, 2010.  I don't recall ever seeing
any review on the several postings over the years.


r~


Richard Henderson (2):
  linux-user: Build vdso for x64.
  linux-user: Load a VDSO for x86-64.

 Makefile  |   4 +-
 linux-user/elfload.c  | 203 +-
 pc-bios/Makefile  |   5 +
 pc-bios/vdso-linux-x64.S  | 115 +
 pc-bios/vdso-linux-x64.ld |  81 +++
 pc-bios/vdso-linux-x64.so | Bin 0 -> 7500 bytes
 6 files changed, 401 insertions(+), 7 deletions(-)
 create mode 100644 pc-bios/vdso-linux-x64.S
 create mode 100644 pc-bios/vdso-linux-x64.ld
 create mode 100755 pc-bios/vdso-linux-x64.so

-- 
2.20.1

1 2 3 4 >

1 - 100 of 394 matches

Mail list logo