date:20200722

Re: [Bug 1888606] [NEW] Heap-use-after-free in virtio_gpu_ctrl_response

2020-07-22 Thread Li Qiang

Alexander Bulekov <1888...@bugs.launchpad.net> 于2020年7月23日周四 下午1:02写道：
>
> Public bug reported:
>
> Hello,
> Here is a reproducer (build with --enable-sanitizers):
> cat << EOF | ./i386-softmmu/qemu-system-i386 -nographic -M pc -nodefaults -m 
> 512M -device virtio-vga -qtest stdio
> outl 0xcf8 0x80001018
> outl 0xcfc 0xe080
> outl 0xcf8 0x80001020
> outl 0xcf8 0x80001004
> outw 0xcfc 0x7
> writeq 0xe0801024 0x10646c00776c6cff
> writeq 0xe080102d 0xe080100032
> writeq 0xe0801015 0x12b2901ba00
> write 0x10646c02 0x1 0x2c
> write 0x999 0x1 0x25
> write 0x8 0x1 0x78
> write 0x2c7 0x1 0x32
> write 0x2cb 0x1 0xff
> write 0x2cc 0x1 0x7e
> writeq 0xe0803000 0xf2b8f0540ff83
> EOF
>
> The ASAN trace:
> ==29798==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x60d050e8 at pc 0x560629814761 bp 0x7ffe916eb1e0 sp 0x7ffe916eb1d8
> READ of size 8 at 0x60d050e8 thread T0
> #0 0x560629814760 in virtio_gpu_ctrl_response 
> /home/alxndr/Development/qemu/hw/display/virtio-gpu.c:181:42
> #1 0x56062981adc8 in virtio_gpu_ctrl_response_nodata 
> /home/alxndr/Development/qemu/hw/display/virtio-gpu.c:193:5
> #2 0x56062981adc8 in virtio_gpu_simple_process_cmd 
> /home/alxndr/Development/qemu/hw/display/virtio-gpu.c:791:9
> #3 0x5606298175f8 in virtio_gpu_process_cmdq 
> /home/alxndr/Development/qemu/hw/display/virtio-gpu.c:820:9
> #4 0x56062a8f1c96 in aio_bh_poll 
> /home/alxndr/Development/qemu/util/async.c:164:13
> #5 0x56062a887b9d in aio_dispatch 
> /home/alxndr/Development/qemu/util/aio-posix.c:380:5
> #6 0x56062a8f6b1c in aio_ctx_dispatch 
> /home/alxndr/Development/qemu/util/async.c:306:5
> #7 0x7f0d5e1cf9ed in g_main_context_dispatch 
> (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4e9ed)
> #8 0x56062a919571 in glib_pollfds_poll 
> /home/alxndr/Development/qemu/util/main-loop.c:217:9
> #9 0x56062a919571 in os_host_main_loop_wait 
> /home/alxndr/Development/qemu/util/main-loop.c:240:5
> #10 0x56062a919571 in main_loop_wait 
> /home/alxndr/Development/qemu/util/main-loop.c:516:11
> #11 0x560629094a64 in qemu_main_loop 
> /home/alxndr/Development/qemu/softmmu/vl.c:1676:9
> #12 0x56062a749ab5 in main 
> /home/alxndr/Development/qemu/softmmu/main.c:49:5
> #13 0x7f0d5cd55e0a in __libc_start_main 
> (/lib/x86_64-linux-gnu/libc.so.6+0x26e0a)
> #14 0x5606288ba889 in _start 
> (/home/alxndr/Development/qemu/build/i386-softmmu/qemu-system-i386+0x24d0889)
>
> 0x60d050e8 is located 56 bytes inside of 136-byte region 
> [0x60d050b0,0x60d05138)
> freed by thread T0 here:
> #0 0x56062893250d in free 
> (/home/alxndr/Development/qemu/build/i386-softmmu/qemu-system-i386+0x254850d)
> #1 0x560629827730 in virtio_gpu_reset 
> /home/alxndr/Development/qemu/hw/display/virtio-gpu.c:1160:9
> #2 0x560628e81d34 in virtio_reset 
> /home/alxndr/Development/qemu/hw/virtio/virtio.c:1999:9
> #3 0x560629f08773 in virtio_pci_reset 
> /home/alxndr/Development/qemu/hw/virtio/virtio-pci.c:1841:5
> #4 0x560629043ab6 in memory_region_write_accessor 
> /home/alxndr/Development/qemu/softmmu/memory.c:483:5
> #5 0x560629043473 in access_with_adjusted_size 
> /home/alxndr/Development/qemu/softmmu/memory.c:544:18
> #6 0x560629042c99 in memory_region_dispatch_write 
> /home/alxndr/Development/qemu/softmmu/memory.c
> #7 0x560628990a37 in flatview_write_continue 
> /home/alxndr/Development/qemu/exec.c:3176:23
> #8 0x56062899041a in address_space_write_cached_slow 
> /home/alxndr/Development/qemu/exec.c:3789:12
> #9 0x560628e6f9bb in vring_used_write 
> /home/alxndr/Development/qemu/hw/virtio/virtio.c:347:5
> #10 0x560628e6f9bb in virtqueue_split_fill 
> /home/alxndr/Development/qemu/hw/virtio/virtio.c:788:5
> #11 0x560628e6f9bb in virtqueue_fill 
> /home/alxndr/Development/qemu/hw/virtio/virtio.c:852:9
> #12 0x560628e7205e in virtqueue_push 
> /home/alxndr/Development/qemu/hw/virtio/virtio.c:917:5
> #13 0x560629814246 in virtio_gpu_ctrl_response 
> /home/alxndr/Development/qemu/hw/display/virtio-gpu.c:180:5
> #14 0x56062981adc8 in virtio_gpu_ctrl_response_nodata 
> /home/alxndr/Development/qemu/hw/display/virtio-gpu.c:193:5
> #15 0x56062981adc8 in virtio_gpu_simple_process_cmd 
> /home/alxndr/Development/qemu/hw/display/virtio-gpu.c:791:9
> #16 0x5606298175f8 in virtio_gpu_process_cmdq 
> /home/alxndr/Development/qemu/hw/display/virtio-gpu.c:820:9
> #17 0x56062a8f1c96 in aio_bh_poll 
> /home/alxndr/Development/qemu/util/async.c:164:13
> #18 0x56062a887b9d in aio_dispatch 
> /home/alxndr/Development/qemu/util/aio-posix.c:380:5
> #19 0x56062a8f6b1c in aio_ctx_dispatch 
> /home/alxndr/Development/qemu/util/async.c:306:5
> #20 0x7f0d5e1cf9ed in g_main_context_dispatch 
> (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4e9ed)
>

Seems again when we write back to virtio used vring, we write to the
MMIO addresspace.

Thanks,
Li Qiang


> previously allocated by thread T0 here:
> #0

[Bug 1888606] [NEW] Heap-use-after-free in virtio_gpu_ctrl_response

2020-07-22 Thread Alexander Bulekov

Public bug reported:

Hello,
Here is a reproducer (build with --enable-sanitizers):
cat << EOF | ./i386-softmmu/qemu-system-i386 -nographic -M pc -nodefaults -m 
512M -device virtio-vga -qtest stdio
outl 0xcf8 0x80001018
outl 0xcfc 0xe080
outl 0xcf8 0x80001020
outl 0xcf8 0x80001004
outw 0xcfc 0x7
writeq 0xe0801024 0x10646c00776c6cff
writeq 0xe080102d 0xe080100032
writeq 0xe0801015 0x12b2901ba00
write 0x10646c02 0x1 0x2c
write 0x999 0x1 0x25
write 0x8 0x1 0x78
write 0x2c7 0x1 0x32
write 0x2cb 0x1 0xff
write 0x2cc 0x1 0x7e
writeq 0xe0803000 0xf2b8f0540ff83
EOF

The ASAN trace:
==29798==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d050e8 
at pc 0x560629814761 bp 0x7ffe916eb1e0 sp 0x7ffe916eb1d8
READ of size 8 at 0x60d050e8 thread T0
#0 0x560629814760 in virtio_gpu_ctrl_response 
/home/alxndr/Development/qemu/hw/display/virtio-gpu.c:181:42
#1 0x56062981adc8 in virtio_gpu_ctrl_response_nodata 
/home/alxndr/Development/qemu/hw/display/virtio-gpu.c:193:5
#2 0x56062981adc8 in virtio_gpu_simple_process_cmd 
/home/alxndr/Development/qemu/hw/display/virtio-gpu.c:791:9
#3 0x5606298175f8 in virtio_gpu_process_cmdq 
/home/alxndr/Development/qemu/hw/display/virtio-gpu.c:820:9
#4 0x56062a8f1c96 in aio_bh_poll 
/home/alxndr/Development/qemu/util/async.c:164:13
#5 0x56062a887b9d in aio_dispatch 
/home/alxndr/Development/qemu/util/aio-posix.c:380:5
#6 0x56062a8f6b1c in aio_ctx_dispatch 
/home/alxndr/Development/qemu/util/async.c:306:5
#7 0x7f0d5e1cf9ed in g_main_context_dispatch 
(/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4e9ed)
#8 0x56062a919571 in glib_pollfds_poll 
/home/alxndr/Development/qemu/util/main-loop.c:217:9
#9 0x56062a919571 in os_host_main_loop_wait 
/home/alxndr/Development/qemu/util/main-loop.c:240:5
#10 0x56062a919571 in main_loop_wait 
/home/alxndr/Development/qemu/util/main-loop.c:516:11
#11 0x560629094a64 in qemu_main_loop 
/home/alxndr/Development/qemu/softmmu/vl.c:1676:9
#12 0x56062a749ab5 in main /home/alxndr/Development/qemu/softmmu/main.c:49:5
#13 0x7f0d5cd55e0a in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x26e0a)
#14 0x5606288ba889 in _start 
(/home/alxndr/Development/qemu/build/i386-softmmu/qemu-system-i386+0x24d0889)

0x60d050e8 is located 56 bytes inside of 136-byte region 
[0x60d050b0,0x60d05138)
freed by thread T0 here:
#0 0x56062893250d in free 
(/home/alxndr/Development/qemu/build/i386-softmmu/qemu-system-i386+0x254850d)
#1 0x560629827730 in virtio_gpu_reset 
/home/alxndr/Development/qemu/hw/display/virtio-gpu.c:1160:9
#2 0x560628e81d34 in virtio_reset 
/home/alxndr/Development/qemu/hw/virtio/virtio.c:1999:9
#3 0x560629f08773 in virtio_pci_reset 
/home/alxndr/Development/qemu/hw/virtio/virtio-pci.c:1841:5
#4 0x560629043ab6 in memory_region_write_accessor 
/home/alxndr/Development/qemu/softmmu/memory.c:483:5
#5 0x560629043473 in access_with_adjusted_size 
/home/alxndr/Development/qemu/softmmu/memory.c:544:18
#6 0x560629042c99 in memory_region_dispatch_write 
/home/alxndr/Development/qemu/softmmu/memory.c
#7 0x560628990a37 in flatview_write_continue 
/home/alxndr/Development/qemu/exec.c:3176:23
#8 0x56062899041a in address_space_write_cached_slow 
/home/alxndr/Development/qemu/exec.c:3789:12
#9 0x560628e6f9bb in vring_used_write 
/home/alxndr/Development/qemu/hw/virtio/virtio.c:347:5
#10 0x560628e6f9bb in virtqueue_split_fill 
/home/alxndr/Development/qemu/hw/virtio/virtio.c:788:5
#11 0x560628e6f9bb in virtqueue_fill 
/home/alxndr/Development/qemu/hw/virtio/virtio.c:852:9
#12 0x560628e7205e in virtqueue_push 
/home/alxndr/Development/qemu/hw/virtio/virtio.c:917:5
#13 0x560629814246 in virtio_gpu_ctrl_response 
/home/alxndr/Development/qemu/hw/display/virtio-gpu.c:180:5
#14 0x56062981adc8 in virtio_gpu_ctrl_response_nodata 
/home/alxndr/Development/qemu/hw/display/virtio-gpu.c:193:5
#15 0x56062981adc8 in virtio_gpu_simple_process_cmd 
/home/alxndr/Development/qemu/hw/display/virtio-gpu.c:791:9
#16 0x5606298175f8 in virtio_gpu_process_cmdq 
/home/alxndr/Development/qemu/hw/display/virtio-gpu.c:820:9
#17 0x56062a8f1c96 in aio_bh_poll 
/home/alxndr/Development/qemu/util/async.c:164:13
#18 0x56062a887b9d in aio_dispatch 
/home/alxndr/Development/qemu/util/aio-posix.c:380:5
#19 0x56062a8f6b1c in aio_ctx_dispatch 
/home/alxndr/Development/qemu/util/async.c:306:5
#20 0x7f0d5e1cf9ed in g_main_context_dispatch 
(/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4e9ed)

previously allocated by thread T0 here:
#0 0x56062893278d in malloc 
(/home/alxndr/Development/qemu/build/i386-softmmu/qemu-system-i386+0x254878d)
#1 0x7f0d5e1d5500 in g_malloc 
(/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x54500)
#2 0x560628e7844b in virtqueue_split_pop 
/home/alxndr/Development/qemu/hw/virtio/virtio.c:1524:12
#3 0x560628e7844b in virtqueue_pop 
/home/alxndr/Development/qemu/hw/virtio/virtio.c:1693:16
#4

Re: [PATCH 1/2] configure: avx2 and avx512f detection for clang

2020-07-22 Thread Thomas Huth

On 23/07/2020 02.27, Shu-Chun Weng wrote:
> Since clang does not support "#pragma GCC", the instruction sets are
> always disabled. In this change, we
> 
>  1. wrap "#pragma GCC" inside "#ifndef __clang__",
>  2. only retain them around "#include <{e,i,s}mmintrin.h>" to work
> around gcc bug,
>  3. and annotate each function with `__attribute__((target(*)))` which
> is recognized by both gcc and clang.
> 
> Signed-off-by: Shu-Chun Weng 
> ---
>  configure   | 16 ++--
>  util/bufferiszero.c | 33 +++--
>  2 files changed, 37 insertions(+), 12 deletions(-)
> 
> diff --git a/configure b/configure
> index 4bd80ed507..d9ce3aa5db 100755
> --- a/configure
> +++ b/configure
> @@ -5808,10 +5808,16 @@ fi
>  
>  if test "$cpuid_h" = "yes" && test "$avx2_opt" != "no"; then
>cat > $TMPC << EOF
> +#include 
> +#ifndef __clang__
>  #pragma GCC push_options
>  #pragma GCC target("avx2")
> -#include 
> +#endif
>  #include 
> +#ifndef __clang__
> +#pragma GCC pop_options
> +#endif
> +__attribute__((target("avx2")))
>  static int bar(void *a) {
>  __m256i x = *(__m256i *)a;
>  return _mm256_testz_si256(x, x);

I wonder whether it would make more sense to pass "-mavx2" to the
compile_object call afterwards and simply remove the #pragmas here?
Did you try that already?

 Thomas

Re: [PATCH for-5.1] hw: Only compile the usb-dwc2 controller if it is really needed

2020-07-22 Thread Paul Zimmerman

Gerd, it's OK by me if you take Thomas's patch instead, I agree the
changelog is better. It also has a fixes tag.

Reviewed-by: Paul Zimmerman 


On Wed, Jul 22, 2020 at 9:29 PM Thomas Huth  wrote:

> On 22/07/2020 19.23, Philippe Mathieu-Daudé wrote:
> > On 7/22/20 5:47 PM, Thomas Huth wrote:
> >> The USB_DWC2 switch is currently "default y", so it is included in all
> >> qemu-system-* builds, even if it is not needed. Even worse, it does a
> >> "select USB", so USB devices are now showing up as available on targets
> >> that do not support USB at all. This sysbus device should only be
> >> included by the boards that need it, i.e. by the Raspi machines.
> >
> > Paul already sent that patch (your description is better although):
> > https://www.mail-archive.com/qemu-devel@nongnu.org/msg723681.html
>
> Oh, thanks for the pointer, I should have checked my qemu-devel folder
> first... Please disregard my patch!
>
>  Thomas
>
>

Re: [PATCH for-5.1] hw: Only compile the usb-dwc2 controller if it is really needed

2020-07-22 Thread Thomas Huth

On 22/07/2020 19.23, Philippe Mathieu-Daudé wrote:
> On 7/22/20 5:47 PM, Thomas Huth wrote:
>> The USB_DWC2 switch is currently "default y", so it is included in all
>> qemu-system-* builds, even if it is not needed. Even worse, it does a
>> "select USB", so USB devices are now showing up as available on targets
>> that do not support USB at all. This sysbus device should only be
>> included by the boards that need it, i.e. by the Raspi machines.
> 
> Paul already sent that patch (your description is better although):
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg723681.html

Oh, thanks for the pointer, I should have checked my qemu-devel folder
first... Please disregard my patch!

 Thomas

Re: [PATCH v2] virtio-balloon: always indicate S_DONE when migration fails

2020-07-22 Thread Michael S. Tsirkin

On Wed, Jul 22, 2020 at 02:11:52PM +0200, David Hildenbrand wrote:
> On 22.07.20 14:05, David Hildenbrand wrote:
> > On 22.07.20 14:04, Michael S. Tsirkin wrote:
> >> On Mon, Jun 29, 2020 at 10:06:15AM +0200, David Hildenbrand wrote:
> >>> If something goes wrong during precopy, before stopping the VM, we will
> >>> never send a S_DONE indication to the VM, resulting in the hinted pages
> >>> not getting released to be used by the guest OS (e.g., Linux).
> >>>
> >>> Easy to reproduce:
> >>> 1. Start migration (e.g., HMP "migrate -d 'exec:gzip -c > STATEFILE.gz'")
> >>> 2. Cancel migration (e.g., HMP "migrate_cancel")
> >>> 3. Oberve in the guest (e.g., cat /proc/meminfo) that there is basically
> >>>no free memory left.
> >>>
> >>> While at it, add similar locking to virtio_balloon_free_page_done() as
> >>> done in virtio_balloon_free_page_stop. Locking is still weird, but that
> >>> has to be sorted out separately.
> >>>
> >>> There is nothing to do in the PRECOPY_NOTIFY_COMPLETE case. Add some
> >>> comments regarding S_DONE handling.
> >>>
> >>> Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> >>> Reviewed-by: Alexander Duyck 
> >>> Cc: Wei Wang 
> >>> Cc: Alexander Duyck 
> >>> Signed-off-by: David Hildenbrand 
> >>
> >> IIUC this is superceded by Alexander's patches right?
> > 
> > Not that I know ... @Alex?
> > 
> 
> Okay, I'm confused, that patch is already upstream (via your tree)?
> 
> dd8eeb9671fc ("virtio-balloon: always indicate S_DONE when migration fails")
> 
> Did you stumble over this mail by mistake again?
> 
> -- 
> Thanks,
> 
> David / dhildenb

Oh. I guess that's what happened. I saw the code in the tree and thought
it came in from Alex's patch.
Sorry about the noise.

-- 
MST

Re: [PATCH] Fix vhost-user buffer over-read on ram hot-unplug

2020-07-22 Thread Raphael Norwitz

ping


On Thu, Jul 16, 2020 at 10:21 PM Raphael Norwitz
 wrote:
>
> The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS vhost-user protocol
> feature introduced a shadow-table, used by the backend to dynamically
> determine how a vdev's memory regions have changed since the last
> vhost_user_set_mem_table() call. On hot-remove, a memmove() operation
> is used to overwrite the removed shadow region descriptor(s). The size
> parameter of this memmove was off by 1 such that if a VM with a backend
> supporting the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS filled it's
> shadow-table (by performing the maximum number of supported hot-add
> operatons) and attempted to remove the last region, Qemu would read an
> out of bounds value and potentially crash.
>
> This change fixes the memmove() bounds such that this erroneous read can
> never happen.
>
> Signed-off-by: Peter Turschmid 
> Signed-off-by: Raphael Norwitz 
> ---
>  hw/virtio/vhost-user.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 3123121..d7e2423 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -672,7 +672,7 @@ static int send_remove_regions(struct vhost_dev *dev,
>  memmove(>shadow_regions[shadow_reg_idx],
>  >shadow_regions[shadow_reg_idx + 1],
>  sizeof(struct vhost_memory_region) *
> -(u->num_shadow_regions - shadow_reg_idx));
> +(u->num_shadow_regions - shadow_reg_idx - 1));
>  u->num_shadow_regions--;
>  }
>
> --
> 1.8.3.1
>
>

Re: [PATCH v2] virtio-balloon: always indicate S_DONE when migration fails

2020-07-22 Thread Michael S. Tsirkin

On Wed, Jul 22, 2020 at 02:05:19PM +0200, David Hildenbrand wrote:
> On 22.07.20 14:04, Michael S. Tsirkin wrote:
> > On Mon, Jun 29, 2020 at 10:06:15AM +0200, David Hildenbrand wrote:
> >> If something goes wrong during precopy, before stopping the VM, we will
> >> never send a S_DONE indication to the VM, resulting in the hinted pages
> >> not getting released to be used by the guest OS (e.g., Linux).
> >>
> >> Easy to reproduce:
> >> 1. Start migration (e.g., HMP "migrate -d 'exec:gzip -c > STATEFILE.gz'")
> >> 2. Cancel migration (e.g., HMP "migrate_cancel")
> >> 3. Oberve in the guest (e.g., cat /proc/meminfo) that there is basically
> >>no free memory left.
> >>
> >> While at it, add similar locking to virtio_balloon_free_page_done() as
> >> done in virtio_balloon_free_page_stop. Locking is still weird, but that
> >> has to be sorted out separately.
> >>
> >> There is nothing to do in the PRECOPY_NOTIFY_COMPLETE case. Add some
> >> comments regarding S_DONE handling.
> >>
> >> Fixes: c13c4153f76d ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
> >> Reviewed-by: Alexander Duyck 
> >> Cc: Wei Wang 
> >> Cc: Alexander Duyck 
> >> Signed-off-by: David Hildenbrand 
> > 
> > IIUC this is superceded by Alexander's patches right?
> 
> Not that I know ... @Alex?
> 
> > If not pls rebase ...
> > 

OK then I guess I was confused. This is older so I guess I should
have applied this and asked Alex to rebase his patches, but I did the
reverse.., Sorry about that. Could you rebase on top of
the pci tree pls?


Thanks and sorry for messing up.

> 
> 
> -- 
> Thanks,
> 
> David / dhildenb

[PATCH 00/12] Add a General Virtual Device Fuzzer

2020-07-22 Thread Alexander Bulekov

This is a general virtual-device fuzzer, designed to fuzz devices over Port IO,
MMIO, and DMA.
To get started with this:
 1. Build the fuzzers (see docs/devel/fuzzing.txt)
Note: Build with --enable-sanitizers, or create a "dictionary file":
echo kw1=\"\x84\x05\x5C\x5E\" > dict
and pass it as an argument to libFuzzer with -dict=./dict
This magic value is a command separator that lets the fuzzer perform
multiple IO actions with a single input.

 2. Pick the qemu arguments you wish to fuzz:
export QEMU_FUZZ_ARGS="-M q35 -device virtio-balloon"

 3. Tell the fuzzer which QOM objects or MemoryRegion names to fuzz. I find the
 "info qom-tree", "info qtree" and "info mtree" commands useful for identifying
 these. Supports globbing. Here I will try to simultaneously fuzz(for no good
 reason) virtio-balloon and e1000e, which is included by default in the q35:
export QEMU_FUZZ_OBJECTS='virtio* e1000*'
You can also try to fuzz the whole machine:
export QEMU_FUZZ_OBJECTS='*'

 4. Run the fuzzer for 0 inputs. The fuzzer should output a list of
 MemoryRegions/PCI Devices it will try to fuzz. Confirm that these match your
 expectations.
./i386-softmmu/qemu-fuzz-i386 --fuzz-target=general-fuzz -runs=0

 5. Run the fuzzer:
./i386-softmmu/qemu-fuzz-i386 --fuzz-target=general-fuzz 


Basically, at the core, this fuzzer is an interpreter that splits the input
into a series of commands, such as mmio_write, pio_write, etc. We structure
these commands to hit only MemoryRegions that are associated with the devices
specified in QEMU_FUZZ_OBJECTS. Additionally, these patches add "hooks" to
functions that are typically used by virtual-devices to read from RAM (DMA).
These hooks attempt to populate these DMA regions with fuzzed data, just in
time.

Patch 1 changes the way we tell QTest to log to stderr (becomes important when
building reproducers with this fuzzer)

Patches 2-6 add the fuzzer and the necessary DMA callbacks

Patches 7-10 are my (very rough) attempt at integrating this into OSS-Fuzz

Patches 11-12 contain the "reordering" and minimization scripts used to
produce a QTest reproducer for a crash.

Additional notes:
 * With the latest changes, the
 fuzzer is quite effective at only targeting the device we care about,
 so it will probably be beneficial to allow reboot() as an option for
 resetting state, rather than fork(), for devices where that works well.

 * We have only scratched the surface for device "backends". I.e. I am using
 fake null-co:// drives for block devices and SLiRP for network devices (see
 scripts/oss-fuzz/general_fuzzer_configs.yml). Using more complex backends will
 likely break due to forking/threading/statefulness related reasons and will
 require more work.

* Because I still can't figure out how to make QOS do what I want, this
  only maps PCI BARs on i386. For other targets, the fuzzer can still
  try to do it on its own :). Only did a very simple test on ppc and arm

 * This is failing on GitLab due to a leak:
 https://gitlab.com/a1xndr/qemu/-/jobs/652179729
 I am not sure how to work around it yet, since I don't think we can
 free what the trace says we should free (argv from wordexp that we pass
 to qemu_main).


Some of the issues I have found or reproduced with this fuzzer:
https://bugs.launchpad.net/bugs/1525123
https://bugs.launchpad.net/bugs/1681439
https://bugs.launchpad.net/bugs/1777315
https://bugs.launchpad.net/bugs/1878034
https://bugs.launchpad.net/bugs/1878043
https://bugs.launchpad.net/bugs/1878054
https://bugs.launchpad.net/bugs/1878057
https://bugs.launchpad.net/bugs/1878067
https://bugs.launchpad.net/bugs/1878134
https://bugs.launchpad.net/bugs/1878136
https://bugs.launchpad.net/bugs/1878253
https://bugs.launchpad.net/bugs/1878255
https://bugs.launchpad.net/bugs/1878259
https://bugs.launchpad.net/bugs/1878263
https://bugs.launchpad.net/bugs/1878323
https://bugs.launchpad.net/bugs/1878641
https://bugs.launchpad.net/bugs/1878642
https://bugs.launchpad.net/bugs/1878645
https://bugs.launchpad.net/bugs/1878651
https://bugs.launchpad.net/bugs/1879223
https://bugs.launchpad.net/bugs/1879227
https://bugs.launchpad.net/bugs/1879531
https://bugs.launchpad.net/bugs/1880355
https://bugs.launchpad.net/bugs/1880539
https://bugs.launchpad.net/bugs/1884693
https://bugs.launchpad.net/bugs/1886362
https://bugs.launchpad.net/bugs/1887303
https://bugs.launchpad.net/bugs/1887309
https://bugs.launchpad.net/bugs/697510

-Alex

Alexander Bulekov (12):
  fuzz: Change the way we write qtest log to stderr
  fuzz: Add general virtual-device fuzzer
  fuzz: Add PCI features to the general fuzzer
  fuzz: Add DMA support to the generic-fuzzer
  fuzz: Declare DMA Read callback function
  fuzz: Add fuzzer callbacks to DMA-read functions
  scripts/oss-fuzz: Add wrapper program for generic fuzzer
  scripts/oss-fuzz: Add general-fuzzer build script
  scripts/oss-fuzz: Add general-fuzzer configs for oss-fuzz
  scripts/oss-fuzz: build the general-fuzzer configs

[Bug 1888601] [NEW] QEMU v5.1.0-rc0/rc1 hang with nested virtualization

2020-07-22 Thread Simon Kaegi

Public bug reported:

We're running Kata Containers using QEMU and with v5.1.0rc0 and rc1 have
noticed a problem at startup where QEMu appears to hang. We are not
seeing this problem on our bare metal nodes and only on a VSI that
supports nested virtualization.

We unfortunately see nothing at all in the QEMU logs to help understand
the problem and a hung process is just a guess at this point.

Using git bisect we first see the problem with...

---

f19bcdfedd53ee93412d535a842a89fa27cae7f2 is the first bad commit
commit f19bcdfedd53ee93412d535a842a89fa27cae7f2
Author: Jason Wang 
Date:   Wed Jul 1 22:55:28 2020 +0800

virtio-pci: implement queue_enabled method

With version 1, we can detect whether a queue is enabled via
queue_enabled.

Signed-off-by: Jason Wang 
Signed-off-by: Cindy Lu 
Message-Id: <20200701145538.22333-5-l...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Acked-by: Jason Wang 

 hw/virtio/virtio-pci.c | 13 +
 1 file changed, 13 insertions(+)

---

Reverting this commit seems to work and prevent the hanging.

---

Here's how kata ends up launching qemu in our environment -- 
/opt/kata/bin/qemu-system-x86_64 -name 
sandbox-849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f -uuid 
6bec458e-1da7-4847-a5d7-5ab31d4d2465 -machine pc,accel=kvm,kernel_irqchip -cpu 
host,pmu=off -qmp 
unix:/run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/qmp.sock,server,nowait
 -m 4096M,slots=10,maxmem=30978M -device 
pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= 
-device virtio-serial-pci,disable-modern=true,id=serial0,romfile= -device 
virtconsole,chardev=charconsole0,id=console0 -chardev 
socket,id=charconsole0,path=/run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/console.sock,server,nowait
 -device virtio-scsi-pci,id=scsi0,disable-modern=true,romfile= -object 
rng-random,id=rng0,filename=/dev/urandom -device 
virtio-rng-pci,rng=rng0,romfile= -device 
virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev 
socket,id=charch0,path=/run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/kata.sock,server,nowait
 -chardev 
socket,id=char-396c5c3e19e29353,path=/run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/vhost-fs.sock
 -device 
vhost-user-fs-pci,chardev=char-396c5c3e19e29353,tag=kataShared,romfile= -netdev 
tap,id=network-0,vhost=on,vhostfds=3:4,fds=5:6 -device 
driver=virtio-net-pci,netdev=network-0,mac=52:ac:2d:02:1f:6f,disable-modern=true,mq=on,vectors=6,romfile=
 -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults 
-nographic -daemonize -object 
memory-backend-file,id=dimm1,size=4096M,mem-path=/dev/shm,share=on -numa 
node,memdev=dimm1 -kernel /opt/kata/share/kata-containers/vmlinuz-5.7.9-74 
-initrd 
/opt/kata/share/kata-containers/kata-containers-initrd_alpine_1.11.2-6_agent.initrd
 -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 
i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 
console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug 
panic=1 nr_cpus=4 agent.use_vsock=false scsi_mod.scan=none 
init=/usr/bin/kata-agent -pidfile 
/run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/pid 
-D 
/run/vc/vm/849df14c6065931adedb9d18bc9260a6d896f1814a8c5cfa239865772f1b7a5f/qemu.log
 -smp 2,cores=1,threads=1,sockets=4,maxcpus=4

---

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1888601

Title:
  QEMU v5.1.0-rc0/rc1 hang with nested virtualization

Status in QEMU:
  New

Bug description:
  We're running Kata Containers using QEMU and with v5.1.0rc0 and rc1
  have noticed a problem at startup where QEMu appears to hang. We are
  not seeing this problem on our bare metal nodes and only on a VSI that
  supports nested virtualization.

  We unfortunately see nothing at all in the QEMU logs to help
  understand the problem and a hung process is just a guess at this
  point.

  Using git bisect we first see the problem with...

  ---

  f19bcdfedd53ee93412d535a842a89fa27cae7f2 is the first bad commit
  commit f19bcdfedd53ee93412d535a842a89fa27cae7f2
  Author: Jason Wang 
  Date:   Wed Jul 1 22:55:28 2020 +0800

  virtio-pci: implement queue_enabled method
  
  With version 1, we can detect whether a queue is enabled via
  queue_enabled.
  
  Signed-off-by: Jason Wang 
  Signed-off-by: Cindy Lu 
  Message-Id: <20200701145538.22333-5-l...@redhat.com>
  Reviewed-by: Michael S. Tsirkin 
  Signed-off-by: Michael S. Tsirkin 
  Acked-by: Jason Wang 

   hw/virtio/virtio-pci.c | 13 +
   1 file changed, 13 insertions(+)

  ---

  Reverting this

[PATCH 12/12] scripts/oss-fuzz: Add crash trace minimization script

2020-07-22 Thread Alexander Bulekov

Once we find a crash, we can convert it into a QTest trace. Usually this
trace will contain many operations that are unneeded to reproduce the
crash. This script tries to minimize the crashing trace, by removing
operations and trimming QTest bufwrite(write addr len data...) commands.

Signed-off-by: Alexander Bulekov 
---
I know its hard to make sense of this patch and the previous one without
a real example, but I didn't want to delay sending this set. I'll try to
find and old crash (maybe for one of the e1000e bugs) and show exactly
how I go from binary libFuzzer blob to qtest reproducer.

 scripts/oss-fuzz/minimize_qtest_trace.py | 117 +++
 1 file changed, 117 insertions(+)
 create mode 100755 scripts/oss-fuzz/minimize_qtest_trace.py

diff --git a/scripts/oss-fuzz/minimize_qtest_trace.py 
b/scripts/oss-fuzz/minimize_qtest_trace.py
new file mode 100755
index 00..c318032049
--- /dev/null
+++ b/scripts/oss-fuzz/minimize_qtest_trace.py
@@ -0,0 +1,117 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+"""
+This takes a crashing qtest trace and tries to remove superflous operations
+"""
+
+import sys
+import os
+import subprocess
+import time
+
+QEMU_ARGS = None
+QEMU_PATH = None
+TIMEOUT = 5
+CRASH_TOKEN = None
+
+
+def usage():
+sys.exit("""\
+Usage: QEMU_PATH="/path/to/qemu" QEMU_ARGS="args" {} input_trace output_trace
+By default, will try to use the second-to-last line in the output to identify
+whether the crash occred. Optionally, manually set a string that idenitifes the
+crash by setting CRASH_TOKEN=
+""".format((sys.argv[0])))
+
+
+def check_if_trace_crashes(trace, path):
+global CRASH_TOKEN
+with open(path, "w") as tracefile:
+tracefile.write("".join(trace))
+rc = subprocess.Popen("timeout -s 9 {}s {} {} 2>&1 < {}".format(TIMEOUT,
+  QEMU_PATH, QEMU_ARGS, path),
+  shell=True, stdin=subprocess.PIPE,
+  stdout=subprocess.PIPE)
+stdo, None = rc.communicate()
+output = stdo.decode('unicode_escape')
+if rc.returncode == 137:# Timed Out
+return False
+if len(output.splitlines()) < 2:
+return False
+
+if CRASH_TOKEN is None:
+CRASH_TOKEN = output.splitlines()[-2]
+
+return CRASH_TOKEN in output
+
+
+def minimize_trace(inpath, outpath):
+global TIMEOUT
+with open(inpath) as f:
+trace = f.readlines()
+start = time.time()
+if not check_if_trace_crashes(trace, outpath):
+sys.exit("The input qtest trace didn't cause a crash...")
+end = time.time()
+print("Crashed in {} seconds".format(end-start))
+TIMEOUT = (end-start)*5
+print("Setting the timeout for {} seconds".format(TIMEOUT))
+print("Identifying Crashes by this string: {}".format(CRASH_TOKEN))
+
+i = 0
+newtrace = trace[:]
+while i < len(newtrace):
+prior = newtrace[i]
+# Try to remove the line completely
+newtrace[i] = ""
+if check_if_trace_crashes(newtrace, outpath):
+i += 1
+continue
+newtrace[i] = prior
+# Try to split up writes into multiple commands, each of which can be
+# removed.
+if newtrace[i].startswith("write "):
+addr = int(newtrace[i].split()[1], 16)
+length = int(newtrace[i].split()[2], 16)
+data = newtrace[i].split()[3][2:]
+if length > 1:
+leftlength = int(length/2)
+rightlength = length - leftlength
+newtrace.insert(i+1, "")
+while leftlength > 0:
+newtrace[i] = "write {} {} 0x{}\n".format(
+hex(addr),
+hex(leftlength),
+data[:leftlength*2])
+newtrace[i+1] = "write {} {} 0x{}\n".format(
+hex(addr+leftlength),
+hex(rightlength),
+data[leftlength*2:])
+if check_if_trace_crashes(newtrace, outpath):
+break
+else:
+leftlength -= 1
+rightlength += 1
+if check_if_trace_crashes(newtrace, outpath):
+i -= 1
+else:
+newtrace[i] = prior
+del newtrace[i+1]
+i += 1
+check_if_trace_crashes(newtrace, outpath)
+
+
+if __name__ == '__main__':
+if len(sys.argv) < 3:
+usage()
+
+QEMU_PATH = os.getenv("QEMU_PATH")
+QEMU_ARGS = os.getenv("QEMU_ARGS")
+if QEMU_PATH is None or QEMU_ARGS is None:
+usage()
+if "accel" not in QEMU_ARGS:
+QEMU_ARGS += " -accel qtest"
+CRASH_TOKEN = os.getenv(CRASH_TOKEN)
+QEMU_ARGS += " -qtest stdio -monitor none -serial none "
+minimize_trace(sys.argv[1], sys.argv[2])
-- 
2.27.0

[PATCH 04/12] fuzz: Add DMA support to the generic-fuzzer

2020-07-22 Thread Alexander Bulekov

When a virtual-device tries to access some buffer in memory over DMA, we
add call-backs into the fuzzer(next commit). The fuzzer checks verifies
that the DMA request maps to a physical RAM address and fills the memory
with fuzzer-provided data. The patterns that we use to fill this memory
are specified using add_dma_pattern and clear_dma_patterns operations.

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/general_fuzz.c | 177 
 1 file changed, 177 insertions(+)

diff --git a/tests/qtest/fuzz/general_fuzz.c b/tests/qtest/fuzz/general_fuzz.c
index e715b77d59..4b6967c5d2 100644
--- a/tests/qtest/fuzz/general_fuzz.c
+++ b/tests/qtest/fuzz/general_fuzz.c
@@ -27,6 +27,7 @@
 #include "tests/qtest/libqos/pci.h"
 #include "tests/qtest/libqos/pci-pc.h"
 #include "hw/pci/pci.h"
+#include "hw/boards.h"
 
 /*
  * CMD_SEP is a random 32-bit value used to separate "commands" in the fuzz
@@ -34,6 +35,7 @@
  */
 #define CMD_SEP "\x84\x05\x5C\x5E"
 #define DEFAULT_TIMEOUT_US 10
+#define MAX_DMA_FILL_SIZE 0x1
 
 #define PCI_HOST_BRIDGE_CFG 0xcf8
 #define PCI_HOST_BRIDGE_DATA 0xcfc
@@ -44,6 +46,24 @@ typedef struct {
 } address_range;
 
 static useconds_t timeout = 10;
+/*
+ * A pattern used to populate a DMA region or perform a memwrite. This is
+ * useful for e.g. populating tables of unique addresses.
+ * Example {.index = 1; .stride = 2; .len = 3; .data = "\x00\x01\x02"}
+ * Renders as: 00 01 02   00 03 03   00 05 03   00 07 03 ...
+ */
+typedef struct {
+uint8_t index;  /* Index of a byte to increment by stride */
+uint8_t stride; /* Increment each index'th byte by this amount */
+size_t len;
+const uint8_t *data;
+} pattern;
+
+/* Avoid filling the same DMA region between MMIO/PIO commands ? */
+static bool avoid_double_fetches;
+
+static QTestState *qts_global; /* Need a global for the DMA callback */
+
 /*
  * List of memory regions that are children of QOM objects specified by the
  * user for fuzzing.
@@ -51,6 +71,122 @@ static useconds_t timeout = 10;
 static GPtrArray *fuzzable_memoryregions;
 static GPtrArray *fuzzable_pci_devices;
 
+/*
+ * List of dma regions populated since the last fuzzing command. Used to ensure
+ * that we only write to each DMA address once, to avoid race conditions when
+ * building reproducers.
+ */
+static GArray *dma_regions;
+
+static GArray *dma_patterns;
+int dma_pattern_index;
+
+/*
+ * Allocate a block of memory and populate it with a pattern.
+ */
+static void *pattern_alloc(pattern p, size_t len)
+{
+int i;
+uint8_t *buf = g_malloc(len);
+uint8_t sum = 0;
+
+for (i = 0; i < len; ++i) {
+buf[i] = p.data[i % p.len];
+if ((i % p.len) == p.index) {
+buf[i] += sum;
+sum += p.stride;
+}
+}
+return buf;
+}
+
+/*
+ * Call-back for functions that perform DMA reads from guest memory. Confirm
+ * that the region has not already been populated since the last loop in
+ * general_fuzz(), avoiding potential race-conditions, which we don't have
+ * a good way for reproducing right now.
+ */
+void fuzz_dma_read_cb(size_t addr, size_t len, MemoryRegion *mr, bool is_write)
+{
+/* Are we in the general-fuzzer or are we using another fuzz-target? */
+if (!qts_global) {
+return;
+}
+
+/*
+ * If the device is trying to read from a ROM, exit early. We do not want
+ * to fuzz devices using data that we have no control over.
+ */
+if (mr->readonly) {
+_Exit(0);
+}
+
+/*
+ * Return immediately if:
+ * - We have no DMA patterns defined
+ * - The length of the DMA read request is zero
+ * - The DMA read is hitting an MR other than the machine's main RAM
+ * - The DMA request is not a read (what happens for a address_space_map
+ *   with is_write=True? Can the device use the same pointer to do reads?)
+ * - The DMA request hits past the bounds of our RAM
+ */
+if (dma_patterns->len == 0
+|| len == 0
+|| mr != MACHINE(qdev_get_machine())->ram
+|| is_write
+|| addr > current_machine->ram_size) {
+return;
+}
+
+/*
+ * If we overlap with any existing dma_regions, split the range and only
+ * populate the non-overlapping parts.
+ */
+for (int i = 0; i < dma_regions->len && !avoid_double_fetches; ++i) {
+address_range region = g_array_index(dma_regions, address_range, i);
+if (addr < region.addr + region.len && addr + len > region.addr) {
+if (addr < region.addr) {
+fuzz_dma_read_cb(addr, region.addr - addr, mr, is_write);
+}
+if (addr + len > region.addr + region.len) {
+fuzz_dma_read_cb(region.addr + region.len,
+addr + len - (region.addr + region.len), mr, is_write);
+}
+return;
+}
+}
+
+/* Cap the length of the DMA access to something reasonable */
+len =

[PATCH 11/12] scripts/oss-fuzz: Add script to reorder a general-fuzzer trace

2020-07-22 Thread Alexander Bulekov

The general-fuzzer uses hooks to fulfill DMA requests just-in-time.
This means that if we try to use QTEST_LOG=1 to build a reproducer, the
DMA writes will be logged _after_ the in/out/read/write that triggered
the DMA read. To work work around this, the general-fuzzer annotates
these just-in time DMA fulfilments with a tag that we can use to
discern them. This script simply iterates over a raw qtest
trace (including log messages, errors, timestamps etc), filters it and
re-orders it so that DMA fulfillments are placed directly _before_ the
qtest command that will cause the DMA access.

Signed-off-by: Alexander Bulekov 
---
 .../oss-fuzz/reorder_fuzzer_qtest_trace.py| 94 +++
 1 file changed, 94 insertions(+)
 create mode 100755 scripts/oss-fuzz/reorder_fuzzer_qtest_trace.py

diff --git a/scripts/oss-fuzz/reorder_fuzzer_qtest_trace.py 
b/scripts/oss-fuzz/reorder_fuzzer_qtest_trace.py
new file mode 100755
index 00..9fb7edb6ee
--- /dev/null
+++ b/scripts/oss-fuzz/reorder_fuzzer_qtest_trace.py
@@ -0,0 +1,94 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+"""
+Use this to convert qtest log info from a generic fuzzer input into a qtest
+trace that you can feed into a standard qemu-system process. Example usage:
+
+QEMU_FUZZ_ARGS="-machine q35,accel=qtest" QEMU_FUZZ_OBJECTS="*" \
+./i386-softmmu/qemu-fuzz-i386 --fuzz-target=general-pci-fuzz
+# .. Finds some crash
+QTEST_LOG=1 FUZZ_SERIALIZE_QTEST=1 \
+QEMU_FUZZ_ARGS="-machine q35,accel=qtest" QEMU_FUZZ_OBJECTS="*" \
+./i386-softmmu/qemu-fuzz-i386 --fuzz-target=general-pci-fuzz
+/path/to/crash 2> qtest_log_output
+scripts/oss-fuzz/reorder_fuzzer_qtest_trace.py qtest_log_output > qtest_trace
+./i386-softmmu/qemu-fuzz-i386 -machine q35,accel=qtest \
+-qtest stdin < qtest_trace
+
+### Details ###
+
+Some fuzzer make use of hooks that allow us to populate some memory range, just
+before a DMA read from that range. This means that the fuzzer can produce
+activity that looks like:
+[start] read from mmio addr
+[end]   read from mmio addr
+[start] write to pio addr
+[start] fill a DMA buffer just in time
+[end]   fill a DMA buffer just in time
+[start] fill a DMA buffer just in time
+[end]   fill a DMA buffer just in time
+[end]   write to pio addr
+[start] read from mmio addr
+[end]   read from mmio addr
+
+We annotate these "nested" DMA writes, so with QTEST_LOG=1 the QTest trace
+might look something like:
+[R +0.028431] readw 0x1
+[R +0.028434] outl 0xc000 0xbeef  # Triggers a DMA read from 0xbeef and 0xbf00
+[DMA][R +0.034639] write 0xbeef 0x2 0x
+[DMA][R +0.034639] write 0xbf00 0x2 0x
+[R +0.028431] readw 0xfc000
+
+This script would reorder the above trace so it becomes:
+readw 0x1
+write 0xbeef 0x2 0x
+write 0xbf00 0x2 0x
+outl 0xc000 0xbeef
+readw 0xfc000
+
+I.e. by the time, 0xc000 tries to read from DMA, those DMA buffers have already
+been set up, removing the need for the DMA hooks. We can simply provide this
+reordered trace via -qtest stdio to reproduce the input
+
+Note: this won't work for traces where the device tries to read from the same
+DMA region twice in between MMIO/PIO commands. E.g:
+[R +0.028434] outl 0xc000 0xbeef
+[DMA][R +0.034639] write 0xbeef 0x2 0x
+[DMA][R +0.034639] write 0xbeef 0x2 0x
+"""
+
+import sys
+
+__author__ = "Alexander Bulekov "
+__copyright__  = "Copyright (C) 2020, Red Hat, Inc."
+__license__= "GPL version 2 or (at your option) any later version"
+
+__maintainer__ = "Alexander Bulekov"
+__email__  = "alx...@bu.edu"
+
+
+def usage():
+sys.exit("Usage: {} /path/to/qtest_log_output".format((sys.argv[0])))
+
+
+def main(filename):
+with open(filename, "r") as f:
+trace = f.readlines()
+
+# Leave only lines that look like logged qtest commands
+trace[:] = [x.strip() for x in trace if "[R +" in x
+or "[S +" in x and "CLOSED" not in x]
+
+for i in range(len(trace)):
+if i+1 < len(trace):
+if "[DMA]" in trace[i+1]:
+trace[i], trace[i+1] = trace[i+1], trace[i]
+for line in trace:
+print(line.split("]")[-1].strip())
+
+
+if __name__ == '__main__':
+if len(sys.argv) == 1:
+usage()
+main(sys.argv[1])
-- 
2.27.0

[PATCH 09/12] scripts/oss-fuzz: Add general-fuzzer configs for oss-fuzz

2020-07-22 Thread Alexander Bulekov

Each of these entries is built into a wrapper binary that sets the
needed environment variables and executes the general virtual-device
fuzzer. In the future, we will need additional fields, such as arch=arm,
timeout_per_testcase=0, reset=reboot, etc...

Signed-off-by: Alexander Bulekov 
---
 scripts/oss-fuzz/general_fuzzer_configs.yml | 103 
 1 file changed, 103 insertions(+)
 create mode 100644 scripts/oss-fuzz/general_fuzzer_configs.yml

diff --git a/scripts/oss-fuzz/general_fuzzer_configs.yml 
b/scripts/oss-fuzz/general_fuzzer_configs.yml
new file mode 100644
index 00..748f4db075
--- /dev/null
+++ b/scripts/oss-fuzz/general_fuzzer_configs.yml
@@ -0,0 +1,103 @@
+configs:
+- name: virtio-net-pci-slirp
+  args: >
+-M q35 -nodefaults
+-device virtio-net,netdev=net0 -netdev user,id=net0
+  objects: virtio*
+
+- name: virtio-blk
+  args: >
+-machine q35 -device virtio-blk,drive=disk0
+-drive file=null-co://,id=disk0,if=none,format=raw
+  objects: virtio*
+
+- name: virtio-scsi
+  args: >
+-machine q35 -device virtio-scsi,num_queues=8
+-device scsi-hd,drive=disk0
+-drive file=null-co://,id=disk0,if=none,format=raw
+  objects: scsi* virtio*
+
+- name: virtio-gpu
+  args: -machine q35 -nodefaults -device virtio-gpu
+  objects: virtio*
+
+- name: virtio-vga
+  args: -machine q35 -nodefaults -device virtio-vga
+  objects: virtio*
+
+- name: virtio-rng
+  args: -machine q35 -nodefaults -device virtio-rng
+  objects: virtio*
+
+- name: virtio-balloon
+  args: -machine q35 -nodefaults -device virtio-balloon
+  objects: virtio*
+
+- name: virtio-serial
+  args: -machine q35 -nodefaults -device virtio-serial
+  objects: virtio*
+
+- name: virtio-mouse
+  args: -machine q35 -nodefaults -device virtio-mouse
+  objects: virtio*
+
+- name: e1000
+  args: >
+-M q35 -nodefaults
+-device e1000,netdev=net0 -netdev user,id=net0
+  objects: e1000
+
+- name: e1000e
+  args: >
+-M q35 -nodefaults
+-device e1000e,netdev=net0 -netdev user,id=net0
+  objects: e1000e
+
+- name: cirrus-vga
+  args: -machine q35 -nodefaults -device cirrus-vga
+  objects: cirrus*
+
+- name: bochs-display
+  args: -machine q35 -nodefaults -device bochs-display
+  objects: bochs*
+
+- name: intel-hda
+  args: >
+-machine q35 -nodefaults -device intel-hda,id=hda0
+-device hda-output,bus=hda0.0 -device hda-micro,bus=hda0.0
+-device hda-duplex,bus=hda0.0
+  objects: intel-hda
+
+- name: ide-hd
+  args: >
+-machine q35 -nodefaults
+-drive file=null-co://,if=none,format=raw,id=disk0
+-device ide-hd,drive=disk0
+  objects: ide
+
+- name: floppy
+  args: >
+-machine pc -nodefaults -device floppy,id=floppy0
+-drive id=disk0,file=null-co://,file.read-zeroes=on,if=none
+-device floppy,drive=disk0,drive-type=288
+  objects: fd floppy*
+
+- name: xhci
+  args: >
+-machine q35 -nodefaults
+-drive file=null-co://,if=none,format=raw,id=disk0
+-device qemu-xhci,id=xhci -device usb-tablet,bus=xhci.0 -device usb-bot
+-device usb-storage,drive=disk0 -chardev null,id=cd0 -chardev 
null,id=cd1
+-device usb-braille,chardev=cd0 -device usb-ccid -device usb-ccid
+-device usb-kbd -device usb-mouse -device usb-serial,chardev=cd1
+-device usb-tablet -device usb-wacom-tablet -device usb-audio
+  objects: "*"
+
+- name: pc-i440fx
+  args: -machine pc
+  objects: "*"
+
+- name: pc-q35
+  args: -machine q35
+  objects: "*"
-- 
2.27.0

[PATCH 10/12] scripts/oss-fuzz: build the general-fuzzer configs

2020-07-22 Thread Alexander Bulekov

Build general-fuzzer wrappers for each configuration defined in
general_fuzzer_configs.yml and move the actual general-fuzzer to a
subdirectory, so oss-fuzz doesn't treat it as a standalone fuzzer.

Signed-off-by: Alexander Bulekov 
---
 scripts/oss-fuzz/build.sh | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/scripts/oss-fuzz/build.sh b/scripts/oss-fuzz/build.sh
index a07b3022e8..2071e77ac2 100755
--- a/scripts/oss-fuzz/build.sh
+++ b/scripts/oss-fuzz/build.sh
@@ -38,7 +38,7 @@ OSS_FUZZ_BUILD_DIR="./build-oss-fuzz/"
 # remove it, resulting in an unresolved reference to qemu_build_not_reached
 # Undefine the __OPTIMIZE__ macro which compiler.h relies on to choose whether
 # to " #define qemu_build_not_reached()  g_assert_not_reached() "
-EXTRA_CFLAGS="$CFLAGS -U __OPTIMIZE__"
+EXTRA_CFLAGS="$CFLAGS -U __OPTIMIZE__ -DCONFIG_FUZZ=y"
 
 if ! { [ -e "./COPYING" ] &&
[ -e "./MAINTAINERS" ] &&
@@ -101,5 +101,11 @@ do
 cp ./i386-softmmu/qemu-fuzz-i386 "$DEST_DIR/qemu-fuzz-i386-target-$target"
 done
 
+mkdir -p "$DEST_DIR/deps"
+mv "$DEST_DIR/qemu-fuzz-i386-target-general-fuzz" "$DEST_DIR/deps/"
+
+./scripts/oss-fuzz/build_general_fuzzers.py \
+"./scripts/oss-fuzz/general_fuzzer_configs.yml" "$DEST_DIR/general-fuzz-"
+
 echo "Done. The fuzzers are located in $DEST_DIR"
 exit 0
-- 
2.27.0

[PATCH 06/12] fuzz: Add fuzzer callbacks to DMA-read functions

2020-07-22 Thread Alexander Bulekov

We should be careful to not call any functions besides fuzz_dma_read_cb.
Without --enable-fuzzing, fuzz_dma_read_cb is an empty inlined function.

Signed-off-by: Alexander Bulekov 
---
I'd appreciate another set of eyes on this. Basically, we only care about
DMA reads to RAM. This is why I assume stuff like "addr" or "cache->xlat
+ addr" is an absolute address.

 exec.c| 2 ++
 include/exec/memory.h | 1 +
 include/exec/memory_ldst_cached.inc.h | 3 +++
 memory_ldst.inc.c | 4 
 softmmu/memory.c  | 1 +
 5 files changed, 11 insertions(+)

diff --git a/exec.c b/exec.c
index 6f381f98e2..c81f41514d 100644
--- a/exec.c
+++ b/exec.c
@@ -3241,6 +3241,7 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr 
addr,
 stn_he_p(buf, l, val);
 } else {
 /* RAM case */
+fuzz_dma_read_cb(addr, len, mr, false);
 ram_ptr = qemu_ram_ptr_length(mr->ram_block, addr1, , false);
 memcpy(buf, ram_ptr, l);
 }
@@ -3601,6 +3602,7 @@ void *address_space_map(AddressSpace *as,
 memory_region_ref(mr);
 *plen = flatview_extend_translation(fv, addr, len, mr, xlat,
 l, is_write, attrs);
+fuzz_dma_read_cb(addr, *plen, mr, is_write);
 ptr = qemu_ram_ptr_length(mr->ram_block, xlat, plen, true);
 
 return ptr;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2ec3b597f1..f8b943521a 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2444,6 +2444,7 @@ address_space_read_cached(MemoryRegionCache *cache, 
hwaddr addr,
   void *buf, hwaddr len)
 {
 assert(addr < cache->len && len <= cache->len - addr);
+fuzz_dma_read_cb(cache->xlat + addr, len, cache->mrs.mr, false);
 if (likely(cache->ptr)) {
 memcpy(buf, cache->ptr + addr, len);
 return MEMTX_OK;
diff --git a/include/exec/memory_ldst_cached.inc.h 
b/include/exec/memory_ldst_cached.inc.h
index fd4bbb40e7..aff574039f 100644
--- a/include/exec/memory_ldst_cached.inc.h
+++ b/include/exec/memory_ldst_cached.inc.h
@@ -28,6 +28,7 @@ static inline uint32_t 
ADDRESS_SPACE_LD_CACHED(l)(MemoryRegionCache *cache,
 hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
 {
 assert(addr < cache->len && 4 <= cache->len - addr);
+fuzz_dma_read_cb(cache->xlat + addr, 4, cache->mrs.mr, false);
 if (likely(cache->ptr)) {
 return LD_P(l)(cache->ptr + addr);
 } else {
@@ -39,6 +40,7 @@ static inline uint64_t 
ADDRESS_SPACE_LD_CACHED(q)(MemoryRegionCache *cache,
 hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
 {
 assert(addr < cache->len && 8 <= cache->len - addr);
+fuzz_dma_read_cb(cache->xlat + addr, 8, cache->mrs.mr, false);
 if (likely(cache->ptr)) {
 return LD_P(q)(cache->ptr + addr);
 } else {
@@ -50,6 +52,7 @@ static inline uint32_t 
ADDRESS_SPACE_LD_CACHED(uw)(MemoryRegionCache *cache,
 hwaddr addr, MemTxAttrs attrs, MemTxResult *result)
 {
 assert(addr < cache->len && 2 <= cache->len - addr);
+fuzz_dma_read_cb(cache->xlat + addr, 2, cache->mrs.mr, false);
 if (likely(cache->ptr)) {
 return LD_P(uw)(cache->ptr + addr);
 } else {
diff --git a/memory_ldst.inc.c b/memory_ldst.inc.c
index c54aee4a95..8d45d2eeff 100644
--- a/memory_ldst.inc.c
+++ b/memory_ldst.inc.c
@@ -42,6 +42,7 @@ static inline uint32_t glue(address_space_ldl_internal, 
SUFFIX)(ARG1_DECL,
 MO_32 | devend_memop(endian), attrs);
 } else {
 /* RAM case */
+fuzz_dma_read_cb(addr, 4, mr, false);
 ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
 switch (endian) {
 case DEVICE_LITTLE_ENDIAN:
@@ -110,6 +111,7 @@ static inline uint64_t glue(address_space_ldq_internal, 
SUFFIX)(ARG1_DECL,
 MO_64 | devend_memop(endian), attrs);
 } else {
 /* RAM case */
+fuzz_dma_read_cb(addr, 8, mr, false);
 ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
 switch (endian) {
 case DEVICE_LITTLE_ENDIAN:
@@ -175,6 +177,7 @@ uint32_t glue(address_space_ldub, SUFFIX)(ARG1_DECL,
 r = memory_region_dispatch_read(mr, addr1, , MO_8, attrs);
 } else {
 /* RAM case */
+fuzz_dma_read_cb(addr, 1, mr, false);
 ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
 val = ldub_p(ptr);
 r = MEMTX_OK;
@@ -212,6 +215,7 @@ static inline uint32_t glue(address_space_lduw_internal, 
SUFFIX)(ARG1_DECL,
 MO_16 | devend_memop(endian), attrs);
 } else {
 /* RAM case */
+fuzz_dma_read_cb(addr, 2, mr, false);
 ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
 switch (endian) {
 case DEVICE_LITTLE_ENDIAN:
diff --git a/softmmu/memory.c b/softmmu/memory.c
index b0c2cf2535..be87044641 100644
--- a/softmmu/memory.c
+++

[PATCH 03/12] fuzz: Add PCI features to the general fuzzer

2020-07-22 Thread Alexander Bulekov

This patch compares TYPE_PCI_DEVICE objects against the user-provided
matching pattern. If there is a match, we use some hacks and leverage
QOS to map each possible BAR for that device. Now fuzzed inputs might be
converted to pci_read/write commands which target specific. This means
that we can fuzz a particular device's PCI configuration space,

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/general_fuzz.c | 114 
 1 file changed, 114 insertions(+)

diff --git a/tests/qtest/fuzz/general_fuzz.c b/tests/qtest/fuzz/general_fuzz.c
index fd92cc5bdf..e715b77d59 100644
--- a/tests/qtest/fuzz/general_fuzz.c
+++ b/tests/qtest/fuzz/general_fuzz.c
@@ -24,6 +24,9 @@
 #include "exec/ramblock.h"
 #include "exec/address-spaces.h"
 #include "hw/qdev-core.h"
+#include "tests/qtest/libqos/pci.h"
+#include "tests/qtest/libqos/pci-pc.h"
+#include "hw/pci/pci.h"
 
 /*
  * CMD_SEP is a random 32-bit value used to separate "commands" in the fuzz
@@ -32,6 +35,9 @@
 #define CMD_SEP "\x84\x05\x5C\x5E"
 #define DEFAULT_TIMEOUT_US 10
 
+#define PCI_HOST_BRIDGE_CFG 0xcf8
+#define PCI_HOST_BRIDGE_DATA 0xcfc
+
 typedef struct {
 size_t addr;
 size_t len; /* The number of bytes until the end of the I/O region */
@@ -43,6 +49,8 @@ static useconds_t timeout = 10;
  * user for fuzzing.
  */
 static GPtrArray *fuzzable_memoryregions;
+static GPtrArray *fuzzable_pci_devices;
+
 /*
  * Here we want to convert a fuzzer-provided [io-region-index, offset] to
  * a physical address. To do this, we iterate over all of the matched
@@ -267,6 +275,65 @@ static void op_write(QTestState *s, const unsigned char * 
data, size_t len)
 break;
 }
 }
+static void op_pci_read(QTestState *s, const unsigned char * data, size_t len)
+{
+enum Sizes {Byte, Word, Long, end_sizes};
+struct {
+uint8_t size;
+uint8_t base;
+uint8_t offset;
+} a;
+if (len < sizeof(a) || fuzzable_pci_devices->len == 0) {
+return;
+}
+memcpy(, data, sizeof(a));
+PCIDevice *dev = g_ptr_array_index(fuzzable_pci_devices,
+  a.base % fuzzable_pci_devices->len);
+int devfn = dev->devfn;
+qtest_outl(s, PCI_HOST_BRIDGE_CFG, (1U << 31) | (devfn << 8) | a.offset);
+switch (a.size %= end_sizes) {
+case Byte:
+qtest_inb(s, PCI_HOST_BRIDGE_DATA);
+break;
+case Word:
+qtest_inw(s, PCI_HOST_BRIDGE_DATA);
+break;
+case Long:
+qtest_inl(s, PCI_HOST_BRIDGE_DATA);
+break;
+}
+}
+
+static void op_pci_write(QTestState *s, const unsigned char * data, size_t len)
+{
+enum Sizes {Byte, Word, Long, end_sizes};
+struct {
+uint8_t size;
+uint8_t base;
+uint8_t offset;
+uint32_t value;
+} a;
+if (len < sizeof(a) || fuzzable_pci_devices->len == 0) {
+return;
+}
+memcpy(, data, sizeof(a));
+PCIDevice *dev = g_ptr_array_index(fuzzable_pci_devices,
+  a.base % fuzzable_pci_devices->len);
+int devfn = dev->devfn;
+qtest_outl(s, PCI_HOST_BRIDGE_CFG, (1U << 31) | (devfn << 8) | a.offset);
+switch (a.size %= end_sizes) {
+case Byte:
+qtest_outb(s, PCI_HOST_BRIDGE_DATA, a.value & 0xFF);
+break;
+case Word:
+qtest_outw(s, PCI_HOST_BRIDGE_DATA, a.value & 0x);
+break;
+case Long:
+qtest_outl(s, PCI_HOST_BRIDGE_DATA, a.value & 0x);
+break;
+}
+}
+
 static void op_clock_step(QTestState *s, const unsigned char *data, size_t len)
 {
 qtest_clock_step_next(s);
@@ -311,6 +378,8 @@ static void general_fuzz(QTestState *s, const unsigned char 
*Data, size_t Size)
 op_out,
 op_read,
 op_write,
+op_pci_read,
+op_pci_write,
 op_clock_step,
 };
 const unsigned char *cmd = Data;
@@ -397,6 +466,19 @@ static int locate_fuzz_objects(Object *child, void *opaque)
 printf("Matched Object by Type: %s\n", object_get_typename(child));
 /* Find and save ptrs to any child MemoryRegions */
 object_child_foreach_recursive(child, locate_fuzz_memory_regions, 
NULL);
+
+/*
+ * We matched an object. If its a PCI device, store a pointer to it so
+ * we can map BARs and fuzz its config space.
+ */
+if (object_dynamic_cast(OBJECT(child), TYPE_PCI_DEVICE)) {
+/*
+ * Don't want duplicate pointers to the same PCIDevice, so remove
+ * copies of the pointer, before adding it.
+ */
+g_ptr_array_remove_fast(fuzzable_pci_devices, PCI_DEVICE(child));
+g_ptr_array_add(fuzzable_pci_devices, PCI_DEVICE(child));
+}
 } else if (object_dynamic_cast(OBJECT(child), TYPE_MEMORY_REGION)) {
 if (g_pattern_match_simple(pattern,
 object_get_canonical_path_component(child))) {
@@ -416,6 +498,7 @@ static int locate_fuzz_objects(Object *child,

[PATCH 07/12] scripts/oss-fuzz: Add wrapper program for generic fuzzer

2020-07-22 Thread Alexander Bulekov

On oss-fuzz we need some sort of wrapper to specify command-line
arguments or environment variables. When we had a similar problem with
other targets that I fixed with
05509c8e6d ("fuzz: select fuzz target using executable name")
by selecting the fuzz target based on the executable's name. In the
future should probably commit to one approach (wrapper binary or
argv0-based target selection).

Signed-off-by: Alexander Bulekov 
---
 scripts/oss-fuzz/target.c | 40 +++
 1 file changed, 40 insertions(+)
 create mode 100644 scripts/oss-fuzz/target.c

diff --git a/scripts/oss-fuzz/target.c b/scripts/oss-fuzz/target.c
new file mode 100644
index 00..4a7257412a
--- /dev/null
+++ b/scripts/oss-fuzz/target.c
@@ -0,0 +1,40 @@
+/*
+ * Copyright Red Hat Inc., 2020
+ *
+ * Authors:
+ *  Alexander Bulekov   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+/* Required for oss-fuzz to consider the binary a target. */
+static const char *magic __attribute__((used)) = "LLVMFuzzerTestOneInput";
+static const char args[] = {QEMU_FUZZ_ARGS, 0x00};
+static const char objects[] = {QEMU_FUZZ_OBJECTS, 0x00};
+
+int main(int argc, char *argv[])
+{
+char path[PATH_MAX] = {0};
+char *dir = dirname(argv[0]);
+strncpy(path, dir, PATH_MAX);
+strcat(path, "/deps/qemu-fuzz-i386-target-general-fuzz");
+
+setenv("QEMU_FUZZ_ARGS", args, 0);
+setenv("QEMU_FUZZ_OBJECTS", objects, 0);
+
+argv[0] = path;
+int ret = execvp(path, argv);
+if (ret) {
+perror("execv");
+}
+return ret;
+}
-- 
2.27.0

[PATCH 08/12] scripts/oss-fuzz: Add general-fuzzer build script

2020-07-22 Thread Alexander Bulekov

This parses a yaml file containing general-fuzzer configs and builds a
separate oss-fuzz wrapper binary for each one, changing some
preprocessor macros for each configuration. To avoid dealing with
escaping and stringifying, convert each string into a byte-array
representation

Signed-off-by: Alexander Bulekov 
---
 scripts/oss-fuzz/build_general_fuzzers.py | 62 +++
 1 file changed, 62 insertions(+)
 create mode 100755 scripts/oss-fuzz/build_general_fuzzers.py

diff --git a/scripts/oss-fuzz/build_general_fuzzers.py 
b/scripts/oss-fuzz/build_general_fuzzers.py
new file mode 100755
index 00..79f4664117
--- /dev/null
+++ b/scripts/oss-fuzz/build_general_fuzzers.py
@@ -0,0 +1,62 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+"""
+This script creates wrapper binaries that invoke the general-device-fuzzer with
+configurations specified in a yaml config file.
+"""
+import sys
+import os
+import yaml
+import tempfile
+
+CC = ""
+TEMPLATE = ""
+
+
+def usage():
+print("Usage: CC=COMPILER {} CONFIG_PATH \
+OUTPUT_PATH_PREFIX".format(sys.argv[0]))
+sys.exit(0)
+
+
+def str_to_c_byte_array(s):
+"""
+Convert strings to byte-arrays so we don't worry about formatting
+strings to play nicely with cc -DQEMU_FUZZARGS etc
+"""
+return ','.join('0x{:02x}'.format(ord(x)) for x in s)
+
+
+def compile_wrapper(cfg, path):
+os.system('$CC -DQEMU_FUZZ_ARGS="{}" -DQEMU_FUZZ_OBJECTS="{}" \
+{} -o {}'.format(
+str_to_c_byte_array(cfg["args"].replace("\n", " ")),
+str_to_c_byte_array(cfg["objects"].replace("\n", " ")),
+TEMPLATE, path))
+
+
+def main():
+global CC
+global TEMPLATE
+
+if len(sys.argv) != 3:
+usage()
+
+cfg_path = sys.argv[1]
+out_path = sys.argv[2]
+
+CC = os.getenv("CC")
+TEMPLATE = os.path.join(os.path.dirname(__file__), "target.c")
+
+with open(cfg_path, "r") as f:
+configs = yaml.load(f)["configs"]
+for cfg in configs:
+assert "name" in cfg
+assert "args" in cfg
+assert "objects" in cfg
+compile_wrapper(cfg, out_path + cfg["name"])
+
+
+if __name__ == '__main__':
+main()
-- 
2.27.0

[PATCH 02/12] fuzz: Add general virtual-device fuzzer

2020-07-22 Thread Alexander Bulekov

This is a generic fuzzer designed to fuzz a virtual device's
MemoryRegions, as long as they exist within the Memory or Port IO (if it
exists) AddressSpaces. The fuzzer's input is interpreted into a sequence
of qtest commands (outb, readw, etc). The interpreted commands are
separated by a magic seaparator, which should be easy for the fuzzer to
guess. Without ASan, the separator can be specified as a "dictionary
value" using the -dict argument (see libFuzzer documentation).

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/Makefile.include |   1 +
 tests/qtest/fuzz/general_fuzz.c   | 467 ++
 2 files changed, 468 insertions(+)
 create mode 100644 tests/qtest/fuzz/general_fuzz.c

diff --git a/tests/qtest/fuzz/Makefile.include 
b/tests/qtest/fuzz/Makefile.include
index 5bde793bf2..854322efb6 100644
--- a/tests/qtest/fuzz/Makefile.include
+++ b/tests/qtest/fuzz/Makefile.include
@@ -11,6 +11,7 @@ fuzz-obj-y += tests/qtest/fuzz/qtest_wrappers.o
 fuzz-obj-$(CONFIG_PCI_I440FX) += tests/qtest/fuzz/i440fx_fuzz.o
 fuzz-obj-$(CONFIG_VIRTIO_NET) += tests/qtest/fuzz/virtio_net_fuzz.o
 fuzz-obj-$(CONFIG_SCSI) += tests/qtest/fuzz/virtio_scsi_fuzz.o
+fuzz-obj-y += tests/qtest/fuzz/general_fuzz.o
 
 FUZZ_CFLAGS += -I$(SRC_PATH)/tests -I$(SRC_PATH)/tests/qtest
 
diff --git a/tests/qtest/fuzz/general_fuzz.c b/tests/qtest/fuzz/general_fuzz.c
new file mode 100644
index 00..fd92cc5bdf
--- /dev/null
+++ b/tests/qtest/fuzz/general_fuzz.c
@@ -0,0 +1,467 @@
+/*
+ * General Virtual-Device Fuzzing Target
+ *
+ * Copyright Red Hat Inc., 2020
+ *
+ * Authors:
+ *  Alexander Bulekov   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include 
+
+#include "cpu.h"
+#include "tests/qtest/libqtest.h"
+#include "fuzz.h"
+#include "fork_fuzz.h"
+#include "exec/address-spaces.h"
+#include "string.h"
+#include "exec/memory.h"
+#include "exec/ramblock.h"
+#include "exec/address-spaces.h"
+#include "hw/qdev-core.h"
+
+/*
+ * CMD_SEP is a random 32-bit value used to separate "commands" in the fuzz
+ * input
+ */
+#define CMD_SEP "\x84\x05\x5C\x5E"
+#define DEFAULT_TIMEOUT_US 10
+
+typedef struct {
+size_t addr;
+size_t len; /* The number of bytes until the end of the I/O region */
+} address_range;
+
+static useconds_t timeout = 10;
+/*
+ * List of memory regions that are children of QOM objects specified by the
+ * user for fuzzing.
+ */
+static GPtrArray *fuzzable_memoryregions;
+/*
+ * Here we want to convert a fuzzer-provided [io-region-index, offset] to
+ * a physical address. To do this, we iterate over all of the matched
+ * MemoryRegions. Check whether each region exists within the particular io
+ * space. Return the absolute address of the offset within the index'th region
+ * that is a subregion of the io_space and the distance until the end of the
+ * memory region.
+ */
+static bool get_io_address(address_range *result,
+MemoryRegion *io_space,
+uint8_t index,
+uint32_t offset) {
+MemoryRegion *mr, *root;
+index = index % fuzzable_memoryregions->len;
+int candidate_regions = 0;
+int i = 0;
+int ind = index;
+size_t abs_addr;
+
+while (ind >= 0 && fuzzable_memoryregions->len) {
+*result = (address_range){0, 0};
+mr = g_ptr_array_index(fuzzable_memoryregions, i);
+if (mr->enabled) {
+abs_addr = mr->addr;
+for (root = mr; root->container; ) {
+root = root->container;
+abs_addr += root->addr;
+}
+/*
+ * Only consider the region if it is rooted at the io_space we want
+ */
+if (root == io_space) {
+ind--;
+candidate_regions++;
+result->addr = abs_addr + (offset % mr->size);
+result->len = mr->size - (offset % mr->size);
+}
+}
+++i;
+/* Loop around */
+if (i == fuzzable_memoryregions->len) {
+/* No enabled regions in our io_space? */
+if (candidate_regions == 0) {
+break;
+}
+i = 0;
+}
+}
+return candidate_regions != 0;
+}
+static bool get_pio_address(address_range *result,
+ uint8_t index, uint16_t offset)
+{
+/*
+ * PIO BARs can be set past the maximum port address (0x). Thus, result
+ * can contain an addr that extends past the PIO space. When we pass this
+ * address to qtest_in/qtest_out, it is cast to a uint16_t, so we might end
+ * up fuzzing a completely different MemoryRegion/Device. Therefore, check
+ * that the address here is within the PIO space limits.
+ */
+
+bool success = get_io_address(result, get_system_io(), index,

[PATCH 05/12] fuzz: Declare DMA Read callback function

2020-07-22 Thread Alexander Bulekov

This patch declares the fuzz_dma_read_cb function and uses the
preprocessor and linker(weak symbols) to handle these cases:

When we build softmmu/all with --enable-fuzzing, there should be no
strong symbol defined for fuzz_dma_read_cb, and we link against a weak
stub function.

When we build softmmu/fuzz with --enable-fuzzing, we link agains the
strong symbol in general_fuzz.c

When we build softmmu/all without --enable-fuzzing, fuzz_dma_read_cb is
an empty, inlined function. As long as we don't call any other functions
when building the arguments, there should be no overhead.

Signed-off-by: Alexander Bulekov 
---
 include/exec/memory.h | 15 +++
 softmmu/memory.c  | 13 +
 2 files changed, 28 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 307e527835..2ec3b597f1 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -47,6 +47,21 @@
 OBJECT_GET_CLASS(IOMMUMemoryRegionClass, (obj), \
  TYPE_IOMMU_MEMORY_REGION)
 
+#ifdef CONFIG_FUZZ
+void fuzz_dma_read_cb(size_t addr,
+  size_t len,
+  MemoryRegion *mr,
+  bool is_write);
+#else
+static inline void fuzz_dma_read_cb(size_t addr,
+size_t len,
+MemoryRegion *mr,
+bool is_write)
+{
+/* Do Nothing */
+}
+#endif
+
 extern bool global_dirty_log;
 
 typedef struct MemoryRegionOps MemoryRegionOps;
diff --git a/softmmu/memory.c b/softmmu/memory.c
index af25987518..b0c2cf2535 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -3223,6 +3223,19 @@ void memory_region_init_rom_device(MemoryRegion *mr,
 vmstate_register_ram(mr, owner_dev);
 }
 
+/*
+ * Support softmmu builds with CONFIG_FUZZ using a weak symbol and a stub for
+ * the fuzz_dma_read_cb callback
+ */
+#ifdef CONFIG_FUZZ
+void __attribute__((weak)) fuzz_dma_read_cb(size_t addr,
+  size_t len,
+  MemoryRegion *mr,
+  bool is_write)
+{
+}
+#endif
+
 static const TypeInfo memory_region_info = {
 .parent = TYPE_OBJECT,
 .name   = TYPE_MEMORY_REGION,
-- 
2.27.0

[PATCH 01/12] fuzz: Change the way we write qtest log to stderr

2020-07-22 Thread Alexander Bulekov

Telling QTest to log to /dev/fd/2, essentially results in dup(2). This
is fine, if other code isn't logging to stderr. Otherwise, the order of
the logs is mixed due to buffering issues, since two file-descriptors
are used to write to the same file. We can avoid this, since just
specifying "-qtest" sets the log fd to stderr. If we want to disable
qtest logs, we can just add -qtest-log none.

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/fuzz.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/fuzz/fuzz.c b/tests/qtest/fuzz/fuzz.c
index 031594a686..8234b68754 100644
--- a/tests/qtest/fuzz/fuzz.c
+++ b/tests/qtest/fuzz/fuzz.c
@@ -202,9 +202,8 @@ int LLVMFuzzerInitialize(int *argc, char ***argv, char 
***envp)
 
 /* Run QEMU's softmmu main with the fuzz-target dependent arguments */
 GString *cmd_line = fuzz_target->get_init_cmdline(fuzz_target);
-g_string_append_printf(cmd_line,
-   " -qtest /dev/null -qtest-log %s",
-   getenv("QTEST_LOG") ? "/dev/fd/2" : "/dev/null");
+g_string_append_printf(cmd_line, " %s -qtest /dev/null ",
+   getenv("QTEST_LOG") ? "" : "-qtest-log none");
 
 /* Split the runcmd into an argv and argc */
 wordexp_t result;
-- 
2.27.0

Re: [PATCH v3 1/8] target/arm: Move start-powered-off property to generic CPUState

2020-07-22 Thread Thiago Jung Bauermann



David Gibson  writes:

> On Wed, Jul 22, 2020 at 11:56:50PM -0300, Thiago Jung Bauermann wrote:
>> There are other platforms which also have CPUs that start powered off, so
>> generalize the start-powered-off property so that it can be used by them.
>> 
>> Note that ARMv7MState also has a property of the same name but this patch
>> doesn't change it because that class isn't a subclass of CPUState so it
>> wouldn't be a trivial change.
>> 
>> This change should not cause any change in behavior.
>> 
>> Suggested-by: Eduardo Habkost 
>> Reviewed-by: Philippe Mathieu-Daudé 
>> Signed-off-by: Thiago Jung Bauermann 
>
> Reviewed-by: David Gibson 

Thanks! Sory about the extra work.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v3 3/3] target/riscv: Fix the translation of physical address

2020-07-22 Thread Zong Li

On Wed, Jul 22, 2020 at 5:08 PM Alexander Richardson
 wrote:
>
> On Tue, 21 Jul 2020 at 13:43, Zong Li  wrote:
> >
> > The real physical address should add the 12 bits page offset. It also
> > causes the PMP wrong checking due to the minimum granularity of PMP is
> > 4 byte, but we always get the physical address which is 4KB alignment,
> > that means, we always use the start address of the page to check PMP for
> > all addresses which in the same page.
> >
> > Signed-off-by: Zong Li 
> > ---
> >  target/riscv/cpu_helper.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> > index 75d2ae3434..08b069f0c9 100644
> > --- a/target/riscv/cpu_helper.c
> > +++ b/target/riscv/cpu_helper.c
> > @@ -543,7 +543,8 @@ restart:
> >  /* for superpage mappings, make a fake leaf PTE for the TLB's
> > benefit. */
> >  target_ulong vpn = addr >> PGSHIFT;
> > -*physical = (ppn | (vpn & ((1L << ptshift) - 1))) << PGSHIFT;
> > +*physical = ((ppn | (vpn & ((1L << ptshift) - 1))) << PGSHIFT) 
> > |
> > +(addr & ~TARGET_PAGE_MASK);
> >
> >  /* set permissions on the TLB entry */
> >  if ((pte & PTE_R) || ((pte & PTE_X) && mxr)) {
> > --
> > 2.27.0
>
> I made the same change for our CHERI fork a few months ago but forgot
> to send the patch upstream (despite marking the commit as a candidate
> for upstreaming). Sorry about the duplicated debugging work!
> (https://github.com/CTSRD-CHERI/qemu/commit/61c8e3f2c0fd4965ec3f316146d1751fae673c12)

No, problem.

Re: [PATCH v3 2/3] target/riscv/pmp.c: Fix the index offset on RV64

2020-07-22 Thread Zong Li

On Wed, Jul 22, 2020 at 12:58 PM Bin Meng  wrote:
>
> Hi Zong,
>
> On Tue, Jul 21, 2020 at 8:41 PM Zong Li  wrote:
> >
> > On RV64, the reg_index is 2 (pmpcfg2 CSR) after the seventh pmp
> > entry, it is not 1 (pmpcfg1 CSR) like RV32. In the original
> > implementation, the second parameter of pmp_write_cfg is
> > "reg_index * sizeof(target_ulong)", and we get the the result
> > which is started from 16 if reg_index is 2, but we expect that
> > it should be started from 8. Separate the implementation for
> > RV32 and RV64 respectively.
> >
> > Signed-off-by: Zong Li 
> >
> > Changed in v3:
> >  - Refine the implementation. Suggested by Bin Meng.
> >
> > Changed in v2:
> >  - Move out the shifting operation from loop. Suggested by Bin Meng.
>
> As I mentioned previously, these changelog should go after --- below.
> It should not appear in the commit message.
>

OK, remove it in the next version.

> > ---
> >  target/riscv/pmp.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
> > index 2a2b9f5363..f2d50bace5 100644
> > --- a/target/riscv/pmp.c
> > +++ b/target/riscv/pmp.c
> > @@ -318,6 +318,10 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
> > reg_index,
> >  return;
> >  }
> >
> > +#if defined(TARGET_RISCV64)
> > +reg_index >>= 1;
> > +#endif
> > +
> >  for (i = 0; i < sizeof(target_ulong); i++) {
> >  cfg_val = (val >> 8 * i)  & 0xff;
> >  pmp_write_cfg(env, (reg_index * sizeof(target_ulong)) + i,
> > @@ -335,6 +339,10 @@ target_ulong pmpcfg_csr_read(CPURISCVState *env, 
> > uint32_t reg_index)
> >  target_ulong cfg_val = 0;
> >  target_ulong val = 0;
> >
> > +#if defined(TARGET_RISCV64)
> > +reg_index >>= 1;
> > +#endif
>
> We should also move the following:
>
> trace_pmpcfg_csr_read(env->mhartid, reg_index, cfg_val);
>
> before shifting reg_index. Otherwise it traces the wrong pmpcfg CSR read.

Yes, thanks for the reminding, Fix it in the next version.

>
> > +
> >  for (i = 0; i < sizeof(target_ulong); i++) {
> >  val = pmp_read_cfg(env, (reg_index * sizeof(target_ulong)) + i);
> >  cfg_val |= (val << (i * 8));
> > --
>
> Regards,
> Bin

Re: [PATCH v2 1/9] target/arm: Move start-powered-off property to generic CPUState

2020-07-22 Thread Thiago Jung Bauermann



Hello David,

David Gibson  writes:

> On Wed, Jul 22, 2020 at 12:50:08AM -0300, Thiago Jung Bauermann wrote:
>> There are other platforms which also have CPUs that start powered off, so
>> generalize the start-powered-off property so that it can be used by them.
>> 
>> Note that ARMv7MState also has a property of the same name but this patch
>> doesn't change it because that class isn't a subclass of CPUState so it
>> wouldn't be a trivial change.
>> 
>> This change should not cause any change in behavior.
>> 
>> Suggested-by: Eduardo Habkost 
>> Signed-off-by: Thiago Jung Bauermann 
>
> Reviewed-by: David Gibson 

Thank you very much for your review!

Unfortunately I apparently had a minor email mishap and only got your
reviews after I sent the v3 patches, so I wasn't able to put your
Reviewed-by's in them. Sorry about that.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v3 1/8] target/arm: Move start-powered-off property to generic CPUState

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 11:56:50PM -0300, Thiago Jung Bauermann wrote:
> There are other platforms which also have CPUs that start powered off, so
> generalize the start-powered-off property so that it can be used by them.
> 
> Note that ARMv7MState also has a property of the same name but this patch
> doesn't change it because that class isn't a subclass of CPUState so it
> wouldn't be a trivial change.
> 
> This change should not cause any change in behavior.
> 
> Suggested-by: Eduardo Habkost 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: David Gibson 

> ---
>  exec.c| 1 +
>  include/hw/core/cpu.h | 4 
>  target/arm/cpu.c  | 5 ++---
>  target/arm/cpu.h  | 3 ---
>  target/arm/kvm32.c| 2 +-
>  target/arm/kvm64.c| 2 +-
>  6 files changed, 9 insertions(+), 8 deletions(-)
> 
> NB: I was only able to test that this patch builds. I wasn't able to
> run it.
> 
> diff --git a/exec.c b/exec.c
> index 6f381f98e2..82e82fab09 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -899,6 +899,7 @@ Property cpu_common_props[] = {
>  DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
>   MemoryRegion *),
>  #endif
> +DEFINE_PROP_BOOL("start-powered-off", CPUState, start_powered_off, 
> false),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 8f145733ce..9fc2696db5 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -374,6 +374,10 @@ struct CPUState {
>  bool created;
>  bool stop;
>  bool stopped;
> +
> +/* Should CPU start in powered-off state? */
> +bool start_powered_off;
> +
>  bool unplug;
>  bool crash_occurred;
>  bool exit_request;
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 111579554f..ec65c7653f 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -174,8 +174,8 @@ static void arm_cpu_reset(DeviceState *dev)
>  env->vfp.xregs[ARM_VFP_MVFR1] = cpu->isar.mvfr1;
>  env->vfp.xregs[ARM_VFP_MVFR2] = cpu->isar.mvfr2;
>  
> -cpu->power_state = cpu->start_powered_off ? PSCI_OFF : PSCI_ON;
> -s->halted = cpu->start_powered_off;
> +cpu->power_state = s->start_powered_off ? PSCI_OFF : PSCI_ON;
> +s->halted = s->start_powered_off;
>  
>  if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
>  env->iwmmxt.cregs[ARM_IWMMXT_wCID] = 0x69051000 | 'Q';
> @@ -2182,7 +2182,6 @@ static const ARMCPUInfo arm_cpus[] = {
>  };
>  
>  static Property arm_cpu_properties[] = {
> -DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
>  DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
>  DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
>  DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 9e8ed423ea..a925d26996 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -810,9 +810,6 @@ struct ARMCPU {
>   */
>  uint32_t psci_version;
>  
> -/* Should CPU start in PSCI powered-off state? */
> -bool start_powered_off;
> -
>  /* Current power state, access guarded by BQL */
>  ARMPSCIState power_state;
>  
> diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> index 0af46b41c8..1f2b8f8b7a 100644
> --- a/target/arm/kvm32.c
> +++ b/target/arm/kvm32.c
> @@ -218,7 +218,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  
>  /* Determine init features for this CPU */
>  memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
> -if (cpu->start_powered_off) {
> +if (cs->start_powered_off) {
>  cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_POWER_OFF;
>  }
>  if (kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PSCI_0_2)) {
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 1169237905..f8a6d905fb 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -775,7 +775,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  
>  /* Determine init features for this CPU */
>  memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
> -if (cpu->start_powered_off) {
> +if (cs->start_powered_off) {
>  cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_POWER_OFF;
>  }
>  if (kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PSCI_0_2)) {
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v3 5/8] mips/cps: Use start-powered-off CPUState property

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 11:56:54PM -0300, Thiago Jung Bauermann wrote:
> Instead of setting CPUState::halted to 1 in main_cpu_reset(), use the
> start-powered-off property which makes cpu_common_reset() initialize it
> to 1 in common code.
> 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: David Gibson 

> ---
>  hw/mips/cps.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> NB: I was only able to test that this patch builds. I wasn't able to
> run it.
> 
> diff --git a/hw/mips/cps.c b/hw/mips/cps.c
> index 615e1a1ad2..d5b6c78019 100644
> --- a/hw/mips/cps.c
> +++ b/hw/mips/cps.c
> @@ -52,9 +52,6 @@ static void main_cpu_reset(void *opaque)
>  CPUState *cs = CPU(cpu);
>  
>  cpu_reset(cs);
> -
> -/* All VPs are halted on reset. Leave powering up to CPC. */
> -cs->halted = 1;
>  }
>  
>  static bool cpu_mips_itu_supported(CPUMIPSState *env)
> @@ -89,6 +86,9 @@ static void mips_cps_realize(DeviceState *dev, Error **errp)
>  env->itc_tag = mips_itu_get_tag_region(>itu);
>  env->itu = >itu;
>  }
> +/* All VPs are halted on reset. Leave powering up to CPC. */
> +object_property_set_bool(OBJECT(cpu), "start-powered-off", true,
> + _abort);
>  qemu_register_reset(main_cpu_reset, cpu);
>  }
>  
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v3 7/8] sparc/sun4m: Use start-powered-off CPUState property

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 11:56:56PM -0300, Thiago Jung Bauermann wrote:
> Instead of setting CPUState::halted to 1 in secondary_cpu_reset(), use the
> start-powered-off property which makes cpu_common_reset() initialize it
> to 1 in common code.
> 
> This makes secondary_cpu_reset() unnecessary, so remove it.
> 
> Also remove setting of cs->halted from cpu_devinit(), which seems out of
> place when compared to similar code in other architectures (e.g.,
> ppce500_init() in hw/ppc/e500.c).
> 
> Suggested-by: Philippe Mathieu-Daudé 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: David Gibson 

> ---
>  hw/sparc/sun4m.c | 17 ++---
>  1 file changed, 2 insertions(+), 15 deletions(-)
> 
> NB: I was only able to test that this patch builds. I wasn't able to
> run it.
> 
> diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
> index f1d92df781..fd74e516bb 100644
> --- a/hw/sparc/sun4m.c
> +++ b/hw/sparc/sun4m.c
> @@ -218,15 +218,6 @@ static void dummy_cpu_set_irq(void *opaque, int irq, int 
> level)
>  {
>  }
>  
> -static void secondary_cpu_reset(void *opaque)
> -{
> -SPARCCPU *cpu = opaque;
> -CPUState *cs = CPU(cpu);
> -
> -cpu_reset(cs);
> -cs->halted = 1;
> -}
> -
>  static void cpu_halt_signal(void *opaque, int irq, int level)
>  {
>  if (level && current_cpu) {
> @@ -810,7 +801,6 @@ static const TypeInfo ram_info = {
>  static void cpu_devinit(const char *cpu_type, unsigned int id,
>  uint64_t prom_addr, qemu_irq **cpu_irqs)
>  {
> -CPUState *cs;
>  SPARCCPU *cpu;
>  CPUSPARCState *env;
>  
> @@ -818,11 +808,8 @@ static void cpu_devinit(const char *cpu_type, unsigned 
> int id,
>  env = >env;
>  
>  cpu_sparc_set_id(env, id);
> -if (id != 0) {
> -qemu_register_reset(secondary_cpu_reset, cpu);
> -cs = CPU(cpu);
> -cs->halted = 1;
> -}
> +object_property_set_bool(OBJECT(cpu), "start-powered-off", id != 0,
> + _abort);
>  *cpu_irqs = qemu_allocate_irqs(cpu_set_irq, cpu, MAX_PILS);
>  env->prom_addr = prom_addr;
>  }
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v3 6/8] sparc/sun4m: Remove main_cpu_reset()

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 11:56:55PM -0300, Thiago Jung Bauermann wrote:
> We rely on cpu_common_reset() to set cs->halted to 0, so main_cpu_reset()
> is pointless.
> 
> Suggested-by: Philippe Mathieu-Daudé 
> Signed-off-by: Thiago Jung Bauermann 

Revieed-by: David Gibson 

> ---
>  hw/sparc/sun4m.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> NB: I was only able to test that this patch builds. I wasn't able to
> run it.
> 
> diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
> index 9be930415f..f1d92df781 100644
> --- a/hw/sparc/sun4m.c
> +++ b/hw/sparc/sun4m.c
> @@ -218,15 +218,6 @@ static void dummy_cpu_set_irq(void *opaque, int irq, int 
> level)
>  {
>  }
>  
> -static void main_cpu_reset(void *opaque)
> -{
> -SPARCCPU *cpu = opaque;
> -CPUState *cs = CPU(cpu);
> -
> -cpu_reset(cs);
> -cs->halted = 0;
> -}
> -
>  static void secondary_cpu_reset(void *opaque)
>  {
>  SPARCCPU *cpu = opaque;
> @@ -827,9 +818,7 @@ static void cpu_devinit(const char *cpu_type, unsigned 
> int id,
>  env = >env;
>  
>  cpu_sparc_set_id(env, id);
> -if (id == 0) {
> -qemu_register_reset(main_cpu_reset, cpu);
> -} else {
> +if (id != 0) {
>  qemu_register_reset(secondary_cpu_reset, cpu);
>  cs = CPU(cpu);
>  cs->halted = 1;
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v3 3/8] ppc/spapr: Use start-powered-off CPUState property

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 11:56:52PM -0300, Thiago Jung Bauermann wrote:
65;6003;1c> PowerPC sPAPR CPUs start in the halted state, and spapr_reset_vcpu()
> attempts to implement this by setting CPUState::halted to 1. But that's too
> late for the case of hotplugged CPUs in a machine configure with 2 or more
> threads per core.
> 
> By then, other parts of QEMU have already caused the vCPU to run in an
> unitialized state a couple of times. For example, ppc_cpu_reset() calls
> ppc_tlb_invalidate_all(), which ends up calling async_run_on_cpu(). This
> kicks the new vCPU while it has CPUState::halted = 0, causing QEMU to issue
> a KVM_RUN ioctl on the new vCPU before the guest is able to make the
> start-cpu RTAS call to initialize its register state.
> 
> This problem doesn't seem to cause visible issues for regular guests, but
> on a secure guest running under the Ultravisor it does. The Ultravisor
> relies on being able to snoop on the start-cpu RTAS call to map vCPUs to
> guests, and this issue causes it to see a stray vCPU that doesn't belong to
> any guest.
> 
> Fix by setting the start-powered-off CPUState property in
> spapr_create_vcpu(), which makes cpu_common_reset() initialize
> CPUState::halted to 1 at an earlier moment.
> 
> Suggested-by: Eduardo Habkost 
> Signed-off-by: Thiago Jung Bauermann 

Acked-by: David Gibson 

> ---
>  hw/ppc/spapr_cpu_core.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> NB: Tested on ppc64le pseries KVM guest with two threads per core. 
> Hot-plugging additional cores doesn't cause the bug described above
> anymore.
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index c4f47dcc04..2125fdac34 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -36,11 +36,6 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu)
>  
>  cpu_reset(cs);
>  
> -/* All CPUs start halted.  CPU0 is unhalted from the machine level
> - * reset code and the rest are explicitly started up by the guest
> - * using an RTAS call */
> -cs->halted = 1;
> -
>  env->spr[SPR_HIOR] = 0;
>  
>  lpcr = env->spr[SPR_LPCR];
> @@ -274,6 +269,11 @@ static PowerPCCPU *spapr_create_vcpu(SpaprCpuCore *sc, 
> int i, Error **errp)
>  
>  cs = CPU(obj);
>  cpu = POWERPC_CPU(obj);
> +/*
> + * All CPUs start halted. CPU0 is unhalted from the machine level reset 
> code
> + * and the rest are explicitly started up by the guest using an RTAS 
> call.
> + */
> +cs->start_powered_off = true;
>  cs->cpu_index = cc->core_id + i;
>  spapr_set_vcpu_id(cpu, cs->cpu_index, _err);
>  if (local_err) {
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v3 4/8] ppc/e500: Use start-powered-off CPUState property

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 11:56:53PM -0300, Thiago Jung Bauermann wrote:
> Instead of setting CPUState::halted to 1 in ppce500_cpu_reset_sec(), use
> the start-powered-off property which makes cpu_common_reset() initialize it
> to 1 in common code.
> 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Thiago Jung Bauermann 

Acked-by: David Gibson 

> ---
>  hw/ppc/e500.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> NB: I was only able to test that this patch builds. I wasn't able to
> run it.
> 
> diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
> index ab9884e315..dda71bc05d 100644
> --- a/hw/ppc/e500.c
> +++ b/hw/ppc/e500.c
> @@ -704,9 +704,6 @@ static void ppce500_cpu_reset_sec(void *opaque)
>  
>  cpu_reset(cs);
>  
> -/* Secondary CPU starts in halted state for now. Needs to change when
> -   implementing non-kernel boot. */
> -cs->halted = 1;
>  cs->exception_index = EXCP_HLT;
>  }
>  
> @@ -897,6 +894,13 @@ void ppce500_init(MachineState *machine)
>  } else {
>  /* Secondary CPUs */
>  qemu_register_reset(ppce500_cpu_reset_sec, cpu);
> +
> +/*
> + * Secondary CPU starts in halted state for now. Needs to change
> + * when implementing non-kernel boot.
> + */
> +object_property_set_bool(OBJECT(cs), "start-powered-off", true,
> + _abort);
>  }
>  }
>  
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v3 2/8] target/arm: Move setting of CPU halted state to generic code

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 11:56:51PM -0300, Thiago Jung Bauermann wrote:
> This change is in a separate patch because it's not so obvious that it
> won't cause a regression.
> 
> Suggested-by: Eduardo Habkost 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: David Gibson 

> ---
>  hw/core/cpu.c| 2 +-
>  target/arm/cpu.c | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> NB: I wasn't able to run this patch on an ARM machine. I did run it on
> a ppc64le pseries KVM guest.
> 
> diff --git a/hw/core/cpu.c b/hw/core/cpu.c
> index 594441a150..71bb7859f1 100644
> --- a/hw/core/cpu.c
> +++ b/hw/core/cpu.c
> @@ -258,7 +258,7 @@ static void cpu_common_reset(DeviceState *dev)
>  }
>  
>  cpu->interrupt_request = 0;
> -cpu->halted = 0;
> +cpu->halted = cpu->start_powered_off;
>  cpu->mem_io_pc = 0;
>  cpu->icount_extra = 0;
>  atomic_set(>icount_decr_ptr->u32, 0);
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index ec65c7653f..b6c65e4df6 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -175,7 +175,6 @@ static void arm_cpu_reset(DeviceState *dev)
>  env->vfp.xregs[ARM_VFP_MVFR2] = cpu->isar.mvfr2;
>  
>  cpu->power_state = s->start_powered_off ? PSCI_OFF : PSCI_ON;
> -s->halted = s->start_powered_off;
>  
>  if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
>  env->iwmmxt.cregs[ARM_IWMMXT_wCID] = 0x69051000 | 'Q';
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[PATCH v3 7/8] sparc/sun4m: Use start-powered-off CPUState property

2020-07-22 Thread Thiago Jung Bauermann

Instead of setting CPUState::halted to 1 in secondary_cpu_reset(), use the
start-powered-off property which makes cpu_common_reset() initialize it
to 1 in common code.

This makes secondary_cpu_reset() unnecessary, so remove it.

Also remove setting of cs->halted from cpu_devinit(), which seems out of
place when compared to similar code in other architectures (e.g.,
ppce500_init() in hw/ppc/e500.c).

Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Thiago Jung Bauermann 
---
 hw/sparc/sun4m.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

NB: I was only able to test that this patch builds. I wasn't able to
run it.

diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index f1d92df781..fd74e516bb 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -218,15 +218,6 @@ static void dummy_cpu_set_irq(void *opaque, int irq, int 
level)
 {
 }
 
-static void secondary_cpu_reset(void *opaque)
-{
-SPARCCPU *cpu = opaque;
-CPUState *cs = CPU(cpu);
-
-cpu_reset(cs);
-cs->halted = 1;
-}
-
 static void cpu_halt_signal(void *opaque, int irq, int level)
 {
 if (level && current_cpu) {
@@ -810,7 +801,6 @@ static const TypeInfo ram_info = {
 static void cpu_devinit(const char *cpu_type, unsigned int id,
 uint64_t prom_addr, qemu_irq **cpu_irqs)
 {
-CPUState *cs;
 SPARCCPU *cpu;
 CPUSPARCState *env;
 
@@ -818,11 +808,8 @@ static void cpu_devinit(const char *cpu_type, unsigned int 
id,
 env = >env;
 
 cpu_sparc_set_id(env, id);
-if (id != 0) {
-qemu_register_reset(secondary_cpu_reset, cpu);
-cs = CPU(cpu);
-cs->halted = 1;
-}
+object_property_set_bool(OBJECT(cpu), "start-powered-off", id != 0,
+ _abort);
 *cpu_irqs = qemu_allocate_irqs(cpu_set_irq, cpu, MAX_PILS);
 env->prom_addr = prom_addr;
 }

[PATCH v3 6/8] sparc/sun4m: Remove main_cpu_reset()

2020-07-22 Thread Thiago Jung Bauermann

We rely on cpu_common_reset() to set cs->halted to 0, so main_cpu_reset()
is pointless.

Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Thiago Jung Bauermann 
---
 hw/sparc/sun4m.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

NB: I was only able to test that this patch builds. I wasn't able to
run it.

diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 9be930415f..f1d92df781 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -218,15 +218,6 @@ static void dummy_cpu_set_irq(void *opaque, int irq, int 
level)
 {
 }
 
-static void main_cpu_reset(void *opaque)
-{
-SPARCCPU *cpu = opaque;
-CPUState *cs = CPU(cpu);
-
-cpu_reset(cs);
-cs->halted = 0;
-}
-
 static void secondary_cpu_reset(void *opaque)
 {
 SPARCCPU *cpu = opaque;
@@ -827,9 +818,7 @@ static void cpu_devinit(const char *cpu_type, unsigned int 
id,
 env = >env;
 
 cpu_sparc_set_id(env, id);
-if (id == 0) {
-qemu_register_reset(main_cpu_reset, cpu);
-} else {
+if (id != 0) {
 qemu_register_reset(secondary_cpu_reset, cpu);
 cs = CPU(cpu);
 cs->halted = 1;

[RFC PATCH v3 8/8] target/s390x: Use start-powered-off CPUState property

2020-07-22 Thread Thiago Jung Bauermann

Instead of setting CPUState::halted to 1 in s390_cpu_initfn(), use the
start-powered-off property which makes cpu_common_reset() initialize it
to 1 in common code.

Note that this changes behavior by setting cs->halted to 1 on reset, which
didn't happen before.

Signed-off-by: Thiago Jung Bauermann 
---
 target/s390x/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

NB: I was only able to test that this patch builds. I wasn't able to
run it.

diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
index 08eb674d22..73d7d6007e 100644
--- a/target/s390x/cpu.c
+++ b/target/s390x/cpu.c
@@ -291,7 +291,7 @@ static void s390_cpu_initfn(Object *obj)
 S390CPU *cpu = S390_CPU(obj);
 
 cpu_set_cpustate_pointers(cpu);
-cs->halted = 1;
+cs->start_powered_off = true;
 cs->exception_index = EXCP_HLT;
 #if !defined(CONFIG_USER_ONLY)
 object_property_add(obj, "crash-information", "GuestPanicInformation",

[PATCH v3 3/8] ppc/spapr: Use start-powered-off CPUState property

2020-07-22 Thread Thiago Jung Bauermann

PowerPC sPAPR CPUs start in the halted state, and spapr_reset_vcpu()
attempts to implement this by setting CPUState::halted to 1. But that's too
late for the case of hotplugged CPUs in a machine configure with 2 or more
threads per core.

By then, other parts of QEMU have already caused the vCPU to run in an
unitialized state a couple of times. For example, ppc_cpu_reset() calls
ppc_tlb_invalidate_all(), which ends up calling async_run_on_cpu(). This
kicks the new vCPU while it has CPUState::halted = 0, causing QEMU to issue
a KVM_RUN ioctl on the new vCPU before the guest is able to make the
start-cpu RTAS call to initialize its register state.

This problem doesn't seem to cause visible issues for regular guests, but
on a secure guest running under the Ultravisor it does. The Ultravisor
relies on being able to snoop on the start-cpu RTAS call to map vCPUs to
guests, and this issue causes it to see a stray vCPU that doesn't belong to
any guest.

Fix by setting the start-powered-off CPUState property in
spapr_create_vcpu(), which makes cpu_common_reset() initialize
CPUState::halted to 1 at an earlier moment.

Suggested-by: Eduardo Habkost 
Signed-off-by: Thiago Jung Bauermann 
---
 hw/ppc/spapr_cpu_core.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

NB: Tested on ppc64le pseries KVM guest with two threads per core. 
Hot-plugging additional cores doesn't cause the bug described above
anymore.

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index c4f47dcc04..2125fdac34 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -36,11 +36,6 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu)
 
 cpu_reset(cs);
 
-/* All CPUs start halted.  CPU0 is unhalted from the machine level
- * reset code and the rest are explicitly started up by the guest
- * using an RTAS call */
-cs->halted = 1;
-
 env->spr[SPR_HIOR] = 0;
 
 lpcr = env->spr[SPR_LPCR];
@@ -274,6 +269,11 @@ static PowerPCCPU *spapr_create_vcpu(SpaprCpuCore *sc, int 
i, Error **errp)
 
 cs = CPU(obj);
 cpu = POWERPC_CPU(obj);
+/*
+ * All CPUs start halted. CPU0 is unhalted from the machine level reset 
code
+ * and the rest are explicitly started up by the guest using an RTAS call.
+ */
+cs->start_powered_off = true;
 cs->cpu_index = cc->core_id + i;
 spapr_set_vcpu_id(cpu, cs->cpu_index, _err);
 if (local_err) {

[PATCH v3 4/8] ppc/e500: Use start-powered-off CPUState property

2020-07-22 Thread Thiago Jung Bauermann

Instead of setting CPUState::halted to 1 in ppce500_cpu_reset_sec(), use
the start-powered-off property which makes cpu_common_reset() initialize it
to 1 in common code.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Thiago Jung Bauermann 
---
 hw/ppc/e500.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

NB: I was only able to test that this patch builds. I wasn't able to
run it.

diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index ab9884e315..dda71bc05d 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -704,9 +704,6 @@ static void ppce500_cpu_reset_sec(void *opaque)
 
 cpu_reset(cs);
 
-/* Secondary CPU starts in halted state for now. Needs to change when
-   implementing non-kernel boot. */
-cs->halted = 1;
 cs->exception_index = EXCP_HLT;
 }
 
@@ -897,6 +894,13 @@ void ppce500_init(MachineState *machine)
 } else {
 /* Secondary CPUs */
 qemu_register_reset(ppce500_cpu_reset_sec, cpu);
+
+/*
+ * Secondary CPU starts in halted state for now. Needs to change
+ * when implementing non-kernel boot.
+ */
+object_property_set_bool(OBJECT(cs), "start-powered-off", true,
+ _abort);
 }
 }

[PATCH v3 0/8] Generalize start-powered-off property from ARM

2020-07-22 Thread Thiago Jung Bauermann

The ARM code has a start-powered-off property in ARMCPU, which is a
subclass of CPUState. This property causes arm_cpu_reset() to set
CPUState::halted to 1, signalling that the CPU should start in a halted
state. Other architectures also have code which aim to achieve the same
effect, but without using a property.

The ppc/spapr version has a bug where QEMU does a KVM_RUN on the vcpu
before cs->halted is set to 1, causing the vcpu to run while it's still in
an unitialized state (more details in patch 3).

Peter Maydell mentioned the ARM start-powered-off property and
Eduardo Habkost suggested making it generic, so this patch series does
that, for all cases which I was able to find via grep in the code.

The only problem is that I was only able to test these changes on a ppc64le
pseries KVM guest, so except for patches 2 and 3, all others are only
build-tested. Also, my grasp of QOM lifecycle is basically non-existant so
please be aware of that when reviewing this series.

The last patch may be wrong, as pointed out by Eduardo, so I marked it as
RFC. It may make sense to drop it.

Applies cleanly on yesterday's master.

Changes since v2:

General:
- Added Philippe's Reviewed-by to some of the patches.

Patch "ppc/spapr: Use start-powered-off CPUState property"
- Set the CPUState::start_powered_off variable directly rather than using
  object_property_set_bool(). Suggested by Philippe.

Patch "sparc/sun4m: Remove main_cpu_reset()"
- New patch. Suggested by Philippe.

Patch "sparc/sun4m: Use start-powered-off CPUState property"
- Remove secondary_cpu_reset(). Suggested by Philippe.
- Remove setting of `cs->halted = 1` from cpu_devinit(). Suggested by Philippe.

Patch "Don't set CPUState::halted in cpu_devinit()"
- Squashed into previous patch. Suggested by Philippe.

Patch "sparc/sun4m: Use one cpu_reset() function for main and secondary CPUs"
- Dropped.

Patch "target/s390x: Use start-powered-off CPUState property"
- Set the CPUState::start_powered_off variable directly rather than using
  object_property_set_bool(). Suggested by Philippe.
- Mention in the commit message Eduardo's observation that before this
  patch, the code didn't set cs->halted on reset.

Thiago Jung Bauermann (8):
  target/arm: Move start-powered-off property to generic CPUState
  target/arm: Move setting of CPU halted state to generic code
  ppc/spapr: Use start-powered-off CPUState property
  ppc/e500: Use start-powered-off CPUState property
  mips/cps: Use start-powered-off CPUState property
  sparc/sun4m: Remove main_cpu_reset()
  sparc/sun4m: Use start-powered-off CPUState property
  target/s390x: Use start-powered-off CPUState property

 exec.c  |  1 +
 hw/core/cpu.c   |  2 +-
 hw/mips/cps.c   |  6 +++---
 hw/ppc/e500.c   | 10 +++---
 hw/ppc/spapr_cpu_core.c | 10 +-
 hw/sparc/sun4m.c| 28 ++--
 include/hw/core/cpu.h   |  4 
 target/arm/cpu.c|  4 +---
 target/arm/cpu.h|  3 ---
 target/arm/kvm32.c  |  2 +-
 target/arm/kvm64.c  |  2 +-
 target/s390x/cpu.c  |  2 +-
 12 files changed, 27 insertions(+), 47 deletions(-)

[PATCH v3 5/8] mips/cps: Use start-powered-off CPUState property

2020-07-22 Thread Thiago Jung Bauermann

Instead of setting CPUState::halted to 1 in main_cpu_reset(), use the
start-powered-off property which makes cpu_common_reset() initialize it
to 1 in common code.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Thiago Jung Bauermann 
---
 hw/mips/cps.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

NB: I was only able to test that this patch builds. I wasn't able to
run it.

diff --git a/hw/mips/cps.c b/hw/mips/cps.c
index 615e1a1ad2..d5b6c78019 100644
--- a/hw/mips/cps.c
+++ b/hw/mips/cps.c
@@ -52,9 +52,6 @@ static void main_cpu_reset(void *opaque)
 CPUState *cs = CPU(cpu);
 
 cpu_reset(cs);
-
-/* All VPs are halted on reset. Leave powering up to CPC. */
-cs->halted = 1;
 }
 
 static bool cpu_mips_itu_supported(CPUMIPSState *env)
@@ -89,6 +86,9 @@ static void mips_cps_realize(DeviceState *dev, Error **errp)
 env->itc_tag = mips_itu_get_tag_region(>itu);
 env->itu = >itu;
 }
+/* All VPs are halted on reset. Leave powering up to CPC. */
+object_property_set_bool(OBJECT(cpu), "start-powered-off", true,
+ _abort);
 qemu_register_reset(main_cpu_reset, cpu);
 }

[PATCH v3 1/8] target/arm: Move start-powered-off property to generic CPUState

2020-07-22 Thread Thiago Jung Bauermann

There are other platforms which also have CPUs that start powered off, so
generalize the start-powered-off property so that it can be used by them.

Note that ARMv7MState also has a property of the same name but this patch
doesn't change it because that class isn't a subclass of CPUState so it
wouldn't be a trivial change.

This change should not cause any change in behavior.

Suggested-by: Eduardo Habkost 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Thiago Jung Bauermann 
---
 exec.c| 1 +
 include/hw/core/cpu.h | 4 
 target/arm/cpu.c  | 5 ++---
 target/arm/cpu.h  | 3 ---
 target/arm/kvm32.c| 2 +-
 target/arm/kvm64.c| 2 +-
 6 files changed, 9 insertions(+), 8 deletions(-)

NB: I was only able to test that this patch builds. I wasn't able to
run it.

diff --git a/exec.c b/exec.c
index 6f381f98e2..82e82fab09 100644
--- a/exec.c
+++ b/exec.c
@@ -899,6 +899,7 @@ Property cpu_common_props[] = {
 DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
  MemoryRegion *),
 #endif
+DEFINE_PROP_BOOL("start-powered-off", CPUState, start_powered_off, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 8f145733ce..9fc2696db5 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -374,6 +374,10 @@ struct CPUState {
 bool created;
 bool stop;
 bool stopped;
+
+/* Should CPU start in powered-off state? */
+bool start_powered_off;
+
 bool unplug;
 bool crash_occurred;
 bool exit_request;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 111579554f..ec65c7653f 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -174,8 +174,8 @@ static void arm_cpu_reset(DeviceState *dev)
 env->vfp.xregs[ARM_VFP_MVFR1] = cpu->isar.mvfr1;
 env->vfp.xregs[ARM_VFP_MVFR2] = cpu->isar.mvfr2;
 
-cpu->power_state = cpu->start_powered_off ? PSCI_OFF : PSCI_ON;
-s->halted = cpu->start_powered_off;
+cpu->power_state = s->start_powered_off ? PSCI_OFF : PSCI_ON;
+s->halted = s->start_powered_off;
 
 if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
 env->iwmmxt.cregs[ARM_IWMMXT_wCID] = 0x69051000 | 'Q';
@@ -2182,7 +2182,6 @@ static const ARMCPUInfo arm_cpus[] = {
 };
 
 static Property arm_cpu_properties[] = {
-DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
 DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
 DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
 DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9e8ed423ea..a925d26996 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -810,9 +810,6 @@ struct ARMCPU {
  */
 uint32_t psci_version;
 
-/* Should CPU start in PSCI powered-off state? */
-bool start_powered_off;
-
 /* Current power state, access guarded by BQL */
 ARMPSCIState power_state;
 
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 0af46b41c8..1f2b8f8b7a 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -218,7 +218,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 /* Determine init features for this CPU */
 memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
-if (cpu->start_powered_off) {
+if (cs->start_powered_off) {
 cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_POWER_OFF;
 }
 if (kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PSCI_0_2)) {
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 1169237905..f8a6d905fb 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -775,7 +775,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 /* Determine init features for this CPU */
 memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
-if (cpu->start_powered_off) {
+if (cs->start_powered_off) {
 cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_POWER_OFF;
 }
 if (kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PSCI_0_2)) {

[PATCH v3 2/8] target/arm: Move setting of CPU halted state to generic code

2020-07-22 Thread Thiago Jung Bauermann

This change is in a separate patch because it's not so obvious that it
won't cause a regression.

Suggested-by: Eduardo Habkost 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Thiago Jung Bauermann 
---
 hw/core/cpu.c| 2 +-
 target/arm/cpu.c | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

NB: I wasn't able to run this patch on an ARM machine. I did run it on
a ppc64le pseries KVM guest.

diff --git a/hw/core/cpu.c b/hw/core/cpu.c
index 594441a150..71bb7859f1 100644
--- a/hw/core/cpu.c
+++ b/hw/core/cpu.c
@@ -258,7 +258,7 @@ static void cpu_common_reset(DeviceState *dev)
 }
 
 cpu->interrupt_request = 0;
-cpu->halted = 0;
+cpu->halted = cpu->start_powered_off;
 cpu->mem_io_pc = 0;
 cpu->icount_extra = 0;
 atomic_set(>icount_decr_ptr->u32, 0);
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index ec65c7653f..b6c65e4df6 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -175,7 +175,6 @@ static void arm_cpu_reset(DeviceState *dev)
 env->vfp.xregs[ARM_VFP_MVFR2] = cpu->isar.mvfr2;
 
 cpu->power_state = s->start_powered_off ? PSCI_OFF : PSCI_ON;
-s->halted = s->start_powered_off;
 
 if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
 env->iwmmxt.cregs[ARM_IWMMXT_wCID] = 0x69051000 | 'Q';

Re: [PATCH 2/2] e1000e: make TX reentrant

2020-07-22 Thread Jason Wang




On 2020/7/22 下午8:53, Michael Tokarev wrote:

FWIW, this is not "making TX reentrant", it is about forbidding
reentrancy instead :)

/mjt



Indeed, I will rename the title.

Thanks

Re: [PATCH 2/2] e1000e: make TX reentrant

2020-07-22 Thread Jason Wang




On 2020/7/22 下午7:24, Li Qiang wrote:

Jason Wang  于2020年7月22日周三 下午4:58写道：

In loopback mode, e1000e RX can DMA into TX doorbell which requires
TX to be reentrant. This patch make e1000e's TX routine reentrant by
introducing a per device boolean for recording whether or not a TX
rountine is being called and return early.


Could we introduce a per-queue 'sending' variable just like the RX.
So we can do this in net core layer.



It's kind of not easy since TX routine is called before the packet can 
reach network queue.


Thanks




Thanks,
Li Qiang


Signed-off-by: Jason Wang 
---
  hw/net/e1000e_core.c | 8 
  hw/net/e1000e_core.h | 1 +
  2 files changed, 9 insertions(+)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index bcd186cac5..8126a644a5 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -923,6 +923,12 @@ e1000e_start_xmit(E1000ECore *core, const E1000E_TxRing 
*txr)
  return;
  }

+if (core->sending) {
+return;
+}
+
+core->sending = true;
+
  while (!e1000e_ring_empty(core, txi)) {
  base = e1000e_ring_head_descr(core, txi);

@@ -940,6 +946,8 @@ e1000e_start_xmit(E1000ECore *core, const E1000E_TxRing 
*txr)
  if (!ide || !e1000e_intrmgr_delay_tx_causes(core, )) {
  e1000e_set_interrupt_cause(core, cause);
  }
+
+core->sending = false;
  }

  static bool
diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h
index aee32f7e48..4679c1761f 100644
--- a/hw/net/e1000e_core.h
+++ b/hw/net/e1000e_core.h
@@ -114,6 +114,7 @@ struct E1000Core {
  void (*owner_start_recv)(PCIDevice *d);

  uint32_t msi_causes_pending;
+bool sending;
  };

  void
--
2.20.1

Re: [RFC v2 16/76] target/riscv: rvv-0.9: add VMA and VTA

2020-07-22 Thread Frank Chang

On Thu, Jul 23, 2020 at 2:00 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 7/22/20 2:15 AM, frank.ch...@sifive.com wrote:
> > -static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> > +static void vext_clear(void *tail, uint32_t vta, uint32_t cnt, uint32_t
> tot)
> >  {
> > +if (vta == 0) {
> > +/* tail element undisturbed */
> > +return;
> > +}
> > +
> >  /*
> > + * Tail element agnostic.
> >   * Split the remaining range to two parts.
> >   * The first part is in the last uint64_t unit.
> >   * The second part start from the next uint64_t unit.
> > @@ -152,41 +168,50 @@ static void vext_clear(void *tail, uint32_t cnt,
> uint32_t tot)
> >  if (cnt % 8) {
> >  part1 = 8 - (cnt % 8);
> >  part2 = tot - cnt - part1;
> > -memset((void *)((uintptr_t)tail & ~(7ULL)), 0, part1);
> > -memset((void *)(((uintptr_t)tail + 8) & ~(7ULL)), 0, part2);
> > +memset((void *)((uintptr_t)tail & ~(7ULL)), 1, part1);
> > +memset((void *)(((uintptr_t)tail + 8) & ~(7ULL)), 1, part2);
> >  } else {
> > -memset(tail, 0, part2);
> > +memset(tail, 1, part2);
> >  }
> >  }
>
> "1s" surely means all bits set to 1, not each byte to 1.
>

You're correct, I can't just simply replace the value from 0 to 1.


>
> Is there any reason to do anything with VTA/VMA at all?  One alternative
> for
> "agnostic" is to leave the values undisturbed.  So the quickest thing for
> qemu
> to do is remove all of this code.  Then we don't have to pass the values in
> translate either.
>
> Which is exactly what is recommended in the 4th paragraph of the notes
> following the VTA/VMA description.
>
>
I was trying to keep these codes as an option for the user to specify the
behaviors of VTA.
But as long as it's easier for QEMU to just treat VTA/VMA as agnostic(no
changes)/undisturbed.
I will remove all the clean functions in my next patchset.


>
> r~
>

Frank Chang

Re: [RFC v2 15/76] target/riscv: rvv-0.9: add fractional LMUL

2020-07-22 Thread Frank Chang

On Thu, Jul 23, 2020 at 1:30 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 7/22/20 2:15 AM, frank.ch...@sifive.com wrote:
> >  FIELD(VTYPE, VLMUL, 0, 2)
> >  FIELD(VTYPE, VSEW, 2, 3)
> > -FIELD(VTYPE, VEDIV, 5, 2)
> > -FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
> > +FIELD(VTYPE, VFLMUL, 5, 1)
> > +FIELD(VTYPE, VEDIV, 8, 9)
> > +FIELD(VTYPE, RESERVED, 10, sizeof(target_ulong) * 8 - 11)
> >  FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 1, 1)
>
> The ediv definition is wrong -- should be 8, 2.
>

OK, I will correct it.

>
>
> > @@ -37,4 +38,10 @@ target_ulong fclass_d(uint64_t frs1);
> >  #define SEW32 2
> >  #define SEW64 3
> >
> > +/* table to convert fractional LMUL value */
> > +static const float flmul_table[8] = {
> > +1, 2, 4, 8,  /* LMUL */
> > +-1,  /* reserved */
> > +0.125, 0.25, 0.5 /* fractional LMUL */
> > +};
> >  #endif
>
> Don't define data in a header file; only declare it.
>

Fractional LMUL are used in cpu.h, translate.c and vector_helper.c.
I was trying to declare something which can be shared among these files
to calculate the fractional LMUL value.
Perhaps it's better to declare it as the inline function which
calculates fractional LMUL value in internals.h?
Or I can do the calculation explicitly at every place which requires the
fractional LMUL value?
(only 4 places require this value by far.)

> > @@ -60,6 +60,9 @@ typedef struct DisasContext {
> >  /* vector extension */
> >  bool vill;
> >  uint8_t lmul;
> > +float flmul;
> > +uint8_t eew;
> > +float emul;
>
> Why are you adding floating-point values to DisasContext?
>

flmul, eew and emul are required during rvv-0.9 vector load/store
instructions.
Should I move these declarations to the vector load/store instructions
patch to make it clearer?

> > +static inline float vext_vflmul(uint32_t desc)
> > +{
> > +uint32_t lmul = FIELD_EX32(simd_data(desc), VDATA, LMUL);
> > +return flmul_table[lmul];
> >  }
>
> And in the helpers?  Are you planning on some sort of path through int ->
> float
> -> int for computation?  That seems questionable.
>

desc only saves the raw LMUL bits.
(total 3 bits, I've packed the fractional LMUL bit together with two other
LMUL bits in cpu_get_tb_cpu_state())
The helper here is to convert the 3-bits LMUL into the actual fractional
number it represents.

> r~
>

Frank Chang

Re: [PATCH] hw/misc/edu: support pci device state migration

2020-07-22 Thread Zeng Guang


On 7/22/2020 4:37 PM, Peter Maydell wrote:

On Wed, 22 Jul 2020 at 09:31, Zeng Guang  wrote:

Currently edu device doesn't support live migration. Part of PCI
configuration information would be lost after migration.

PCI device state in source VM:
  Bus  0, device   3, function 0:
  Class 0255: PCI device 1234:11e8
  PCI subsystem 1af4:1100
  IRQ 11, pin A
  BAR0: 32 bit memory at 0xfea0 [0xfeaf].
  id ""

PCI device state in destination VM:
  Bus  0, device   3, function 0:
  Class 0255: PCI device 1234:11e8
  PCI subsystem 1af4:1100
  IRQ 0, pin A
  BAR0: 32 bit memory at 0x [0x000e].
  id ""

Add VMState for edu device to support migration.

Signed-off-by: Gao Chao 
Signed-off-by: Zeng Guang 
Reviewed-by: Wei Wang 

Hi; thanks for adding migration support for this device.



+static const VMStateDescription vmstate_edu = {
+.name = "edu",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_PCI_DEVICE(pdev, EduState),

This isn't the only state that the device has. You
also need to migrate:
stopping, addr4, fact, status, irq_status, the struct dma_state members,
the dma_timer, dma_buf and dma_mask.

Right . I will add those params in VMstate and update patch. Thanks.


thanks
-- PMM

Re: [PATCH v2 4/9] ppc/e500: Use start-powered-off CPUState property

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 12:50:11AM -0300, Thiago Jung Bauermann wrote:
> Instead of setting CPUState::halted to 1 in ppce500_cpu_reset_sec(), use
> the start-powered-off property which makes cpu_common_reset() initialize it
> to 1 in common code.
> 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/e500.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> NB: I was only able to test that this patch builds. I wasn't able to
> run it.
> 
> diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
> index ab9884e315..dda71bc05d 100644
> --- a/hw/ppc/e500.c
> +++ b/hw/ppc/e500.c
> @@ -704,9 +704,6 @@ static void ppce500_cpu_reset_sec(void *opaque)
>  
>  cpu_reset(cs);
>  
> -/* Secondary CPU starts in halted state for now. Needs to change when
> -   implementing non-kernel boot. */
> -cs->halted = 1;
>  cs->exception_index = EXCP_HLT;
>  }
>  
> @@ -897,6 +894,13 @@ void ppce500_init(MachineState *machine)
>  } else {
>  /* Secondary CPUs */
>  qemu_register_reset(ppce500_cpu_reset_sec, cpu);
> +
> +/*
> + * Secondary CPU starts in halted state for now. Needs to change
> + * when implementing non-kernel boot.
> + */
> +object_property_set_bool(OBJECT(cs), "start-powered-off", true,
> + _abort);
>  }
>  }
>  
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2 2/9] target/arm: Move setting of CPU halted state to generic code

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 12:50:09AM -0300, Thiago Jung Bauermann wrote:
> This change is in a separate patch because it's not so obvious that it
> won't cause a regression.
> 
> Suggested-by: Eduardo Habkost 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: David Gibson 

> ---
>  hw/core/cpu.c| 2 +-
>  target/arm/cpu.c | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> NB: I was only able to test that this patch builds. I wasn't able to
> run it on an ARM machine. I did on a ppc64le pseries KVM guest.
> 
> diff --git a/hw/core/cpu.c b/hw/core/cpu.c
> index 594441a150..71bb7859f1 100644
> --- a/hw/core/cpu.c
> +++ b/hw/core/cpu.c
> @@ -258,7 +258,7 @@ static void cpu_common_reset(DeviceState *dev)
>  }
>  
>  cpu->interrupt_request = 0;
> -cpu->halted = 0;
> +cpu->halted = cpu->start_powered_off;
>  cpu->mem_io_pc = 0;
>  cpu->icount_extra = 0;
>  atomic_set(>icount_decr_ptr->u32, 0);
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index ec65c7653f..b6c65e4df6 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -175,7 +175,6 @@ static void arm_cpu_reset(DeviceState *dev)
>  env->vfp.xregs[ARM_VFP_MVFR2] = cpu->isar.mvfr2;
>  
>  cpu->power_state = s->start_powered_off ? PSCI_OFF : PSCI_ON;
> -s->halted = s->start_powered_off;
>  
>  if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
>  env->iwmmxt.cregs[ARM_IWMMXT_wCID] = 0x69051000 | 'Q';
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2 5/9] mips/cps: Use start-powered-off CPUState property

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 12:50:12AM -0300, Thiago Jung Bauermann wrote:
> Instead of setting CPUState::halted to 1 in main_cpu_reset(), use the
> start-powered-off property which makes cpu_common_reset() initialize it
> to 1 in common code.
> 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: David Gibson 

> ---
>  hw/mips/cps.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> NB: I was only able to test that this patch builds. I wasn't able to
> run it.
> 
> diff --git a/hw/mips/cps.c b/hw/mips/cps.c
> index 615e1a1ad2..d5b6c78019 100644
> --- a/hw/mips/cps.c
> +++ b/hw/mips/cps.c
> @@ -52,9 +52,6 @@ static void main_cpu_reset(void *opaque)
>  CPUState *cs = CPU(cpu);
>  
>  cpu_reset(cs);
> -
> -/* All VPs are halted on reset. Leave powering up to CPC. */
> -cs->halted = 1;
>  }
>  
>  static bool cpu_mips_itu_supported(CPUMIPSState *env)
> @@ -89,6 +86,9 @@ static void mips_cps_realize(DeviceState *dev, Error **errp)
>  env->itc_tag = mips_itu_get_tag_region(>itu);
>  env->itu = >itu;
>  }
> +/* All VPs are halted on reset. Leave powering up to CPC. */
> +object_property_set_bool(OBJECT(cpu), "start-powered-off", true,
> + _abort);
>  qemu_register_reset(main_cpu_reset, cpu);
>  }
>  
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2 1/9] target/arm: Move start-powered-off property to generic CPUState

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 12:50:08AM -0300, Thiago Jung Bauermann wrote:
> There are other platforms which also have CPUs that start powered off, so
> generalize the start-powered-off property so that it can be used by them.
> 
> Note that ARMv7MState also has a property of the same name but this patch
> doesn't change it because that class isn't a subclass of CPUState so it
> wouldn't be a trivial change.
> 
> This change should not cause any change in behavior.
> 
> Suggested-by: Eduardo Habkost 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: David Gibson 

> ---
>  exec.c| 1 +
>  include/hw/core/cpu.h | 4 
>  target/arm/cpu.c  | 5 ++---
>  target/arm/cpu.h  | 3 ---
>  target/arm/kvm32.c| 2 +-
>  target/arm/kvm64.c| 2 +-
>  6 files changed, 9 insertions(+), 8 deletions(-)
> 
> NB: I was only able to test that this patch builds. I wasn't able to
> run it.
> 
> diff --git a/exec.c b/exec.c
> index 6f381f98e2..82e82fab09 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -899,6 +899,7 @@ Property cpu_common_props[] = {
>  DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
>   MemoryRegion *),
>  #endif
> +DEFINE_PROP_BOOL("start-powered-off", CPUState, start_powered_off, 
> false),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 8f145733ce..9fc2696db5 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -374,6 +374,10 @@ struct CPUState {
>  bool created;
>  bool stop;
>  bool stopped;
> +
> +/* Should CPU start in powered-off state? */
> +bool start_powered_off;
> +
>  bool unplug;
>  bool crash_occurred;
>  bool exit_request;
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 111579554f..ec65c7653f 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -174,8 +174,8 @@ static void arm_cpu_reset(DeviceState *dev)
>  env->vfp.xregs[ARM_VFP_MVFR1] = cpu->isar.mvfr1;
>  env->vfp.xregs[ARM_VFP_MVFR2] = cpu->isar.mvfr2;
>  
> -cpu->power_state = cpu->start_powered_off ? PSCI_OFF : PSCI_ON;
> -s->halted = cpu->start_powered_off;
> +cpu->power_state = s->start_powered_off ? PSCI_OFF : PSCI_ON;
> +s->halted = s->start_powered_off;
>  
>  if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
>  env->iwmmxt.cregs[ARM_IWMMXT_wCID] = 0x69051000 | 'Q';
> @@ -2182,7 +2182,6 @@ static const ARMCPUInfo arm_cpus[] = {
>  };
>  
>  static Property arm_cpu_properties[] = {
> -DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
>  DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
>  DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
>  DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 9e8ed423ea..a925d26996 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -810,9 +810,6 @@ struct ARMCPU {
>   */
>  uint32_t psci_version;
>  
> -/* Should CPU start in PSCI powered-off state? */
> -bool start_powered_off;
> -
>  /* Current power state, access guarded by BQL */
>  ARMPSCIState power_state;
>  
> diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> index 0af46b41c8..1f2b8f8b7a 100644
> --- a/target/arm/kvm32.c
> +++ b/target/arm/kvm32.c
> @@ -218,7 +218,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  
>  /* Determine init features for this CPU */
>  memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
> -if (cpu->start_powered_off) {
> +if (cs->start_powered_off) {
>  cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_POWER_OFF;
>  }
>  if (kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PSCI_0_2)) {
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 1169237905..f8a6d905fb 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -775,7 +775,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  
>  /* Determine init features for this CPU */
>  memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
> -if (cpu->start_powered_off) {
> +if (cs->start_powered_off) {
>  cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_POWER_OFF;
>  }
>  if (kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PSCI_0_2)) {
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2 6/9] sparc/sun4m: Use start-powered-off CPUState property

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 12:50:13AM -0300, Thiago Jung Bauermann wrote:
> Instead of setting CPUState::halted to 1 in secondary_cpu_reset(), use the
> start-powered-off property which makes cpu_common_reset() initialize it
> to 1 in common code.
> 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: David Gibson 

> ---
>  hw/sparc/sun4m.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> NB: I was only able to test that this patch builds. I wasn't able to
> run it.
> 
> diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
> index 9be930415f..766e79bb5e 100644
> --- a/hw/sparc/sun4m.c
> +++ b/hw/sparc/sun4m.c
> @@ -233,7 +233,6 @@ static void secondary_cpu_reset(void *opaque)
>  CPUState *cs = CPU(cpu);
>  
>  cpu_reset(cs);
> -cs->halted = 1;
>  }
>  
>  static void cpu_halt_signal(void *opaque, int irq, int level)
> @@ -833,6 +832,8 @@ static void cpu_devinit(const char *cpu_type, unsigned 
> int id,
>  qemu_register_reset(secondary_cpu_reset, cpu);
>  cs = CPU(cpu);
>  cs->halted = 1;
> +object_property_set_bool(OBJECT(cs), "start-powered-off", true,
> + _abort);
>  }
>  *cpu_irqs = qemu_allocate_irqs(cpu_set_irq, cpu, MAX_PILS);
>  env->prom_addr = prom_addr;
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH] ppc/xive: Fix some typos in comments

2020-07-22 Thread David Gibson

On Wed, Jul 22, 2020 at 07:43:54PM -0400, Gustavo Romero wrote:
> Fix some typos in comments about code modeling coalescing points in the
> XIVE routing engine (IVRE).
> 
> Signed-off-by: Gustavo Romero 

Applied to ppc-for-5.2.

> ---
>  hw/intc/xive.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 9a16243..9b55e03 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -1502,7 +1502,7 @@ static bool xive_presenter_notify(XiveFabric *xfb, 
> uint8_t format,
>  
>  /*
>   * Notification using the END ESe/ESn bit (Event State Buffer for
> - * escalation and notification). Profide futher coalescing in the
> + * escalation and notification). Provide further coalescing in the
>   * Router.
>   */
>  static bool xive_router_end_es_notify(XiveRouter *xrtr, uint8_t end_blk,
> @@ -1581,7 +1581,7 @@ static void xive_router_end_notify(XiveRouter *xrtr, 
> uint8_t end_blk,
>  
>  /*
>   * Check the END ESn (Event State Buffer for notification) for
> - * even futher coalescing in the Router
> + * even further coalescing in the Router
>   */
>  if (!xive_end_is_notify()) {
>  /* ESn[Q]=1 : end of notification */
> @@ -1660,7 +1660,7 @@ do_escalation:
>  
>  /*
>   * Check the END ESe (Event State Buffer for escalation) for even
> - * futher coalescing in the Router
> + * further coalescing in the Router
>   */
>  if (!xive_end_is_uncond_escalation()) {
>  /* ESe[Q]=1 : end of notification */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [RFC PATCH v2 9/9] target/s390x: Use start-powered-off CPUState property

2020-07-22 Thread Thiago Jung Bauermann



Eduardo Habkost  writes:

> On Wed, Jul 22, 2020 at 12:50:16AM -0300, Thiago Jung Bauermann wrote:
>> Instead of setting CPUState::halted to 1 in s390_cpu_initfn(), use the
>> start-powered-off property which makes cpu_common_reset() initialize it to
>> 1 in common code.
>>
>> Signed-off-by: Thiago Jung Bauermann 
>> ---
>>  target/s390x/cpu.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> NB: I was only able to test that this patch builds. I wasn't able to
>> run it.
>>
>> diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
>> index 08eb674d22..d3a14af1d9 100644
>> --- a/target/s390x/cpu.c
>> +++ b/target/s390x/cpu.c
>> @@ -291,7 +291,8 @@ static void s390_cpu_initfn(Object *obj)
>>  S390CPU *cpu = S390_CPU(obj);
>>
>>  cpu_set_cpustate_pointers(cpu);
>> -cs->halted = 1;
>> +object_property_set_bool(OBJECT(cs), "start-powered-off", true,
>> + _abort);
>
> Is this really OK?  s390 CPUs don't seem to set halted=1 on reset
> today.

Hm, good point. That is indeed a behavior change that this patch
introduces. I'll point it out in the description for v3, and if it's
wrong then this patch can simply be dropped.

--
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [RFC PATCH v2 9/9] target/s390x: Use start-powered-off CPUState property

2020-07-22 Thread Thiago Jung Bauermann



Philippe Mathieu-Daudé  writes:

> On 7/22/20 5:50 AM, Thiago Jung Bauermann wrote:
>> Instead of setting CPUState::halted to 1 in s390_cpu_initfn(), use the
>> start-powered-off property which makes cpu_common_reset() initialize it to
>> 1 in common code.
>> 
>> Signed-off-by: Thiago Jung Bauermann 
>> ---
>>  target/s390x/cpu.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> NB: I was only able to test that this patch builds. I wasn't able to
>> run it.
>> 
>> diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
>> index 08eb674d22..d3a14af1d9 100644
>> --- a/target/s390x/cpu.c
>> +++ b/target/s390x/cpu.c
>> @@ -291,7 +291,8 @@ static void s390_cpu_initfn(Object *obj)
>>  S390CPU *cpu = S390_CPU(obj);
>>  
>>  cpu_set_cpustate_pointers(cpu);
>> -cs->halted = 1;
>> +object_property_set_bool(OBJECT(cs), "start-powered-off", true,
>> + _abort);
>
> Here this seems overkill since this is the same object, so you can
> directly do:
>
>   +cs->start_powered_off = true;

I adopted your suggestion.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [RFC PATCH v2 8/9] sparc/sun4m: Use one cpu_reset() function for main and secondary CPUs

2020-07-22 Thread Thiago Jung Bauermann



Philippe Mathieu-Daudé  writes:

> On 7/22/20 5:50 AM, Thiago Jung Bauermann wrote:
>> If we rely on cpu_common_reset() setting CPUState::halted according to the
>> start-powered-off property, both reset functions become equivalent and we
>> can use only one.
>>
>> Signed-off-by: Thiago Jung Bauermann 
>> ---
>>  hw/sparc/sun4m.c | 21 -
>>  1 file changed, 4 insertions(+), 17 deletions(-)
>>
>> NB: I was only able to test that this patch builds. I wasn't able to
>> run it.
>>
>> diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
>> index 7b3042a801..deb5e9f027 100644
>> --- a/hw/sparc/sun4m.c
>> +++ b/hw/sparc/sun4m.c
>> @@ -218,16 +218,7 @@ static void dummy_cpu_set_irq(void *opaque, int irq, 
>> int level)
>>  {
>>  }
>>
>> -static void main_cpu_reset(void *opaque)
>> -{
>> -SPARCCPU *cpu = opaque;
>> -CPUState *cs = CPU(cpu);
>> -
>> -cpu_reset(cs);
>> -cs->halted = 0;
>> -}
>> -
>> -static void secondary_cpu_reset(void *opaque)
>> +static void sun4m_cpu_reset(void *opaque)
>>  {
>>  SPARCCPU *cpu = opaque;
>>  CPUState *cs = CPU(cpu);
>> @@ -818,7 +809,6 @@ static const TypeInfo ram_info = {
>>  static void cpu_devinit(const char *cpu_type, unsigned int id,
>>  uint64_t prom_addr, qemu_irq **cpu_irqs)
>>  {
>> -CPUState *cs;
>>  SPARCCPU *cpu;
>>  CPUSPARCState *env;
>>
>> @@ -826,12 +816,9 @@ static void cpu_devinit(const char *cpu_type, unsigned 
>> int id,
>>  env = >env;
>>
>>  cpu_sparc_set_id(env, id);
>> -if (id == 0) {
>> -qemu_register_reset(main_cpu_reset, cpu);
>
> IMO it is easier to review this patch in 2, first drop main_cpu_reset
> as it is pointless (we rely on cpu_common_reset), then set the
> "start-powered-off" property and drop secondary_cpu_reset().

That's a good idea. I made those patches for v3.

>> -} else {
>> -qemu_register_reset(secondary_cpu_reset, cpu);
>> -cs = CPU(cpu);
>> -object_property_set_bool(OBJECT(cs), "start-powered-off", true,
>> +qemu_register_reset(sun4m_cpu_reset, cpu);
>
> Why do you still keep it?

I didn't know that not registering a reset function would cause
cpu_reset() to be caused.

>> +if (id != 0) {
>> +object_property_set_bool(OBJECT(cpu), "start-powered-off", true,
>>   _abort);
>
> At this point the CPU is realized, so this is correct.

Great. Thanks for confirming!

> I'd use directly:
>
>object_property_set_bool(OBJECT(cpu), "start-powered-off", !!id,
> _abort);

I used a slight variation of your suggestion, with `id != 0` instead of
`!!id` because I think it makes the code easier to read.

--
Thiago Jung Bauermann
IBM Linux Technology Center

[PATCH] ppc/xive: Fix some typos in comments

2020-07-22 Thread Gustavo Romero

Fix some typos in comments about code modeling coalescing points in the
XIVE routing engine (IVRE).

Signed-off-by: Gustavo Romero 
---
 hw/intc/xive.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 9a16243..9b55e03 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -1502,7 +1502,7 @@ static bool xive_presenter_notify(XiveFabric *xfb, 
uint8_t format,
 
 /*
  * Notification using the END ESe/ESn bit (Event State Buffer for
- * escalation and notification). Profide futher coalescing in the
+ * escalation and notification). Provide further coalescing in the
  * Router.
  */
 static bool xive_router_end_es_notify(XiveRouter *xrtr, uint8_t end_blk,
@@ -1581,7 +1581,7 @@ static void xive_router_end_notify(XiveRouter *xrtr, 
uint8_t end_blk,
 
 /*
  * Check the END ESn (Event State Buffer for notification) for
- * even futher coalescing in the Router
+ * even further coalescing in the Router
  */
 if (!xive_end_is_notify()) {
 /* ESn[Q]=1 : end of notification */
@@ -1660,7 +1660,7 @@ do_escalation:
 
 /*
  * Check the END ESe (Event State Buffer for escalation) for even
- * futher coalescing in the Router
+ * further coalescing in the Router
  */
 if (!xive_end_is_uncond_escalation()) {
 /* ESe[Q]=1 : end of notification */
-- 
2.7.4

Re: [RFC PATCH v2 7/9] sparc/sun4m: Don't set CPUState::halted in cpu_devinit()

2020-07-22 Thread Thiago Jung Bauermann



Philippe Mathieu-Daudé  writes:

> On 7/22/20 5:50 AM, Thiago Jung Bauermann wrote:
>> Remove setting of cs->halted from cpu_devinit(), which seems out of place
>> when compared to similar code in other architectures (e.g., ppce500_init()
>> in hw/ppc/e500.c).
>>
>> Signed-off-by: Thiago Jung Bauermann 
>> ---
>>  hw/sparc/sun4m.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> NB: I was only able to test that this patch builds. I wasn't able to
>> run it.
>>
>> diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
>> index 766e79bb5e..7b3042a801 100644
>> --- a/hw/sparc/sun4m.c
>> +++ b/hw/sparc/sun4m.c
>> @@ -831,7 +831,6 @@ static void cpu_devinit(const char *cpu_type, unsigned 
>> int id,
>>  } else {
>>  qemu_register_reset(secondary_cpu_reset, cpu);
>>  cs = CPU(cpu);
>> -cs->halted = 1;
>>  object_property_set_bool(OBJECT(cs), "start-powered-off", true,
>>   _abort);
>>  }
>>
>
> Why not squash with previous patch?

I wasn't sure about this change, and it's also not strictly necessary
for this patch set so I wanted to make it easy for maintainers to not
apply it.

I squashed it for v3.

--
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v2 5/9] mips/cps: Use start-powered-off CPUState property

2020-07-22 Thread Thiago Jung Bauermann



Philippe Mathieu-Daudé  writes:

> On 7/22/20 5:50 AM, Thiago Jung Bauermann wrote:
>> Instead of setting CPUState::halted to 1 in main_cpu_reset(), use the
>> start-powered-off property which makes cpu_common_reset() initialize it
>> to 1 in common code.
>> 
>> Signed-off-by: Thiago Jung Bauermann 
>> ---
>>  hw/mips/cps.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> NB: I was only able to test that this patch builds. I wasn't able to
>> run it.
>> 
>> diff --git a/hw/mips/cps.c b/hw/mips/cps.c
>> index 615e1a1ad2..d5b6c78019 100644
>> --- a/hw/mips/cps.c
>> +++ b/hw/mips/cps.c
>> @@ -52,9 +52,6 @@ static void main_cpu_reset(void *opaque)
>>  CPUState *cs = CPU(cpu);
>>  
>>  cpu_reset(cs);
>> -
>> -/* All VPs are halted on reset. Leave powering up to CPC. */
>> -cs->halted = 1;
>>  }
>>  
>>  static bool cpu_mips_itu_supported(CPUMIPSState *env)
>> @@ -89,6 +86,9 @@ static void mips_cps_realize(DeviceState *dev, Error 
>> **errp)
>>  env->itc_tag = mips_itu_get_tag_region(>itu);
>>  env->itu = >itu;
>>  }
>> +/* All VPs are halted on reset. Leave powering up to CPC. */
>> +object_property_set_bool(OBJECT(cpu), "start-powered-off", true,
>> + _abort);
>
> This is indeed better as now the property is set once, *after* realize
> but *before* reset.
>
> Reviewed-by: Philippe Mathieu-Daudé 

Thanks for confirming!

>>  qemu_register_reset(main_cpu_reset, cpu);
>>  }
>>  
>> 


-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v2 3/9] ppc/spapr: Use start-powered-off CPUState property

2020-07-22 Thread Thiago Jung Bauermann



Philippe Mathieu-Daudé  writes:

> On 7/22/20 5:50 AM, Thiago Jung Bauermann wrote:
>> PowerPC sPAPR CPUs start in the halted state, and spapr_reset_vcpu()
>> attempts to implement this by setting CPUState::halted to 1. But that's too
>> late for the case of hotplugged CPUs in a machine configure with 2 or more
>> threads per core.
>> 
>> By then, other parts of QEMU have already caused the vCPU to run in an
>> unitialized state a couple of times. For example, ppc_cpu_reset() calls
>> ppc_tlb_invalidate_all(), which ends up calling async_run_on_cpu(). This
>> kicks the new vCPU while it has CPUState::halted = 0, causing QEMU to issue
>> a KVM_RUN ioctl on the new vCPU before the guest is able to make the
>> start-cpu RTAS call to initialize its register state.
>> 
>> This problem doesn't seem to cause visible issues for regular guests, but
>> on a secure guest running under the Ultravisor it does. The Ultravisor
>> relies on being able to snoop on the start-cpu RTAS call to map vCPUs to
>> guests, and this issue causes it to see a stray vCPU that doesn't belong to
>> any guest.
>> 
>> Fix by setting the start-powered-off CPUState property in
>> spapr_create_vcpu(), which makes cpu_common_reset() initialize
>> CPUState::halted to 1 at an earlier moment.
>> 
>> Suggested-by: Eduardo Habkost 
>> Signed-off-by: Thiago Jung Bauermann 
>> ---
>>  hw/ppc/spapr_cpu_core.c | 12 +++-
>>  1 file changed, 7 insertions(+), 5 deletions(-)
>> 
>> NB: Tested on ppc64le pseries KVM guest with two threads per core. 
>> Hot-plugging additional cores doesn't cause the bug described above
>> anymore.
>> 
>> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
>> index c4f47dcc04..09feeb5f8f 100644
>> --- a/hw/ppc/spapr_cpu_core.c
>> +++ b/hw/ppc/spapr_cpu_core.c
>> @@ -36,11 +36,6 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu)
>>  
>>  cpu_reset(cs);
>>  
>> -/* All CPUs start halted.  CPU0 is unhalted from the machine level
>> - * reset code and the rest are explicitly started up by the guest
>> - * using an RTAS call */
>> -cs->halted = 1;
>> -
>>  env->spr[SPR_HIOR] = 0;
>>  
>>  lpcr = env->spr[SPR_LPCR];
>> @@ -288,6 +283,13 @@ static PowerPCCPU *spapr_create_vcpu(SpaprCpuCore *sc, 
>> int i, Error **errp)
>>  
>>  cpu->machine_data = g_new0(SpaprCpuState, 1);
>>  
>> +/*
>> + * All CPUs start halted. CPU0 is unhalted from the machine level reset 
>> code
>> + * and the rest are explicitly started up by the guest using an RTAS 
>> call.
>> + */
>> +object_property_set_bool(OBJECT(cs), "start-powered-off", true,
>> + _abort);
>
> Since here object_new() is used, it is simpler to set the field before
> the object is realized, similarly to cs->cpu_index:
>
> -- >8 --
> @@ -275,6 +275,11 @@ static PowerPCCPU *spapr_create_vcpu(SpaprCpuCore
> *sc, int i, Error **errp)
>  cs = CPU(obj);
>  cpu = POWERPC_CPU(obj);
>  cs->cpu_index = cc->core_id + i;
> +/*
> + * All CPUs start halted. CPU0 is unhalted from the machine level
> reset code
> + * and the rest are explicitly started up by the guest using an
> RTAS call.
> + */
> +cs->start_powered_off = true;
>  spapr_set_vcpu_id(cpu, cs->cpu_index, _err);
>  if (local_err) {
>  goto err;
> ---

Good point. I adopted your suggestion.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v2 1/9] target/arm: Move start-powered-off property to generic CPUState

2020-07-22 Thread Thiago Jung Bauermann



Philippe Mathieu-Daudé  writes:

> Hi Thiago,
>
> On 7/22/20 5:50 AM, Thiago Jung Bauermann wrote:
>> There are other platforms which also have CPUs that start powered off, so
>> generalize the start-powered-off property so that it can be used by them.
>>
>> Note that ARMv7MState also has a property of the same name but this patch
>> doesn't change it because that class isn't a subclass of CPUState so it
>> wouldn't be a trivial change.
>>
>> This change should not cause any change in behavior.
>>
>> Suggested-by: Eduardo Habkost 
>> Signed-off-by: Thiago Jung Bauermann 
>
> As I participated in reviewing your v1, I'd have appreciated
> being Cc'ed for v2.

I'm sorry about this. I fixed the Cc list for the next version.

> Reviewed-by: Philippe Mathieu-Daudé 

Thank you very much for your prompt review, and suggestions.

I will post a new version addressing your comments shortly.

>> ---
>>  exec.c| 1 +
>>  include/hw/core/cpu.h | 4 
>>  target/arm/cpu.c  | 5 ++---
>>  target/arm/cpu.h  | 3 ---
>>  target/arm/kvm32.c| 2 +-
>>  target/arm/kvm64.c| 2 +-
>>  6 files changed, 9 insertions(+), 8 deletions(-)


--
Thiago Jung Bauermann
IBM Linux Technology Center

[PATCH 1/2] configure: avx2 and avx512f detection for clang

2020-07-22 Thread Shu-Chun Weng

Since clang does not support "#pragma GCC", the instruction sets are
always disabled. In this change, we

 1. wrap "#pragma GCC" inside "#ifndef __clang__",
 2. only retain them around "#include <{e,i,s}mmintrin.h>" to work
around gcc bug,
 3. and annotate each function with `__attribute__((target(*)))` which
is recognized by both gcc and clang.

Signed-off-by: Shu-Chun Weng 
---
 configure   | 16 ++--
 util/bufferiszero.c | 33 +++--
 2 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/configure b/configure
index 4bd80ed507..d9ce3aa5db 100755
--- a/configure
+++ b/configure
@@ -5808,10 +5808,16 @@ fi
 
 if test "$cpuid_h" = "yes" && test "$avx2_opt" != "no"; then
   cat > $TMPC << EOF
+#include 
+#ifndef __clang__
 #pragma GCC push_options
 #pragma GCC target("avx2")
-#include 
+#endif
 #include 
+#ifndef __clang__
+#pragma GCC pop_options
+#endif
+__attribute__((target("avx2")))
 static int bar(void *a) {
 __m256i x = *(__m256i *)a;
 return _mm256_testz_si256(x, x);
@@ -5835,10 +5841,16 @@ fi
 
 if test "$cpuid_h" = "yes" && test "$avx512f_opt" = "yes"; then
   cat > $TMPC << EOF
+#include 
+#ifndef __clang__
 #pragma GCC push_options
 #pragma GCC target("avx512f")
-#include 
+#endif
 #include 
+#ifndef __clang__
+#pragma GCC pop_options
+#endif
+__attribute__((target("avx512f")))
 static int bar(void *a) {
 __m512i x = *(__m512i *)a;
 return _mm512_test_epi64_mask(x, x);
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 695bb4ce28..ca836b6e8c 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -64,17 +64,18 @@ buffer_zero_int(const void *buf, size_t len)
 }
 
 #if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || 
defined(__SSE2__)
-/* Do not use push_options pragmas unnecessarily, because clang
- * does not support them.
- */
-#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT)
+#ifndef __clang__
 #pragma GCC push_options
 #pragma GCC target("sse2")
 #endif
 #include 
+#ifndef __clang__
+#pragma GCC pop_options
+#endif
 
 /* Note that each of these vectorized functions require len >= 64.  */
 
+__attribute__((target("sse2")))
 static bool
 buffer_zero_sse2(const void *buf, size_t len)
 {
@@ -104,19 +105,22 @@ buffer_zero_sse2(const void *buf, size_t len)
 
 return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0x;
 }
-#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT)
-#pragma GCC pop_options
-#endif
 
 #ifdef CONFIG_AVX2_OPT
 /* Note that due to restrictions/bugs wrt __builtin functions in gcc <= 4.8,
  * the includes have to be within the corresponding push_options region, and
  * therefore the regions themselves have to be ordered with increasing ISA.
  */
+#ifndef __clang__
 #pragma GCC push_options
 #pragma GCC target("sse4")
+#endif
 #include 
+#ifndef __clang__
+#pragma GCC pop_options
+#endif
 
+__attribute__((target("sse4")))
 static bool
 buffer_zero_sse4(const void *buf, size_t len)
 {
@@ -145,11 +149,16 @@ buffer_zero_sse4(const void *buf, size_t len)
 return _mm_testz_si128(t, t);
 }
 
-#pragma GCC pop_options
+#ifndef __clang__
 #pragma GCC push_options
 #pragma GCC target("avx2")
+#endif
 #include 
+#ifndef __clang__
+#pragma GCC pop_options
+#endif
 
+__attribute__((target("avx2")))
 static bool
 buffer_zero_avx2(const void *buf, size_t len)
 {
@@ -176,14 +185,19 @@ buffer_zero_avx2(const void *buf, size_t len)
 
 return _mm256_testz_si256(t, t);
 }
-#pragma GCC pop_options
 #endif /* CONFIG_AVX2_OPT */
 
 #ifdef CONFIG_AVX512F_OPT
+#ifndef __clang__
 #pragma GCC push_options
 #pragma GCC target("avx512f")
+#endif
 #include 
+#ifndef __clang__
+#pragma GCC pop_options
+#endif
 
+__attribute__((target("avx512f")))
 static bool
 buffer_zero_avx512(const void *buf, size_t len)
 {
@@ -210,7 +224,6 @@ buffer_zero_avx512(const void *buf, size_t len)
 return !_mm512_test_epi64_mask(t, t);
 
 }
-#pragma GCC pop_options
 #endif
 
 
-- 
2.28.0.rc0.105.gf9edc3c819-goog

[PATCH 0/2] Instruction set detection for clang.

2020-07-22 Thread Shu-Chun Weng

Currently when configuring QEMU with clang, AVX2, AVX512F, ATOMIC64, and
ATOMIC128 are all disabled because the detection code is GCC-only. With these
two patches, I am able to configure, build, and run tests with clang with all of
the above enabled.

Shu-Chun Weng (2):
  configure: avx2 and avx512f detection for clang
  configure: atomic64/128 detection for clang

 configure   | 34 +++---
 util/bufferiszero.c | 33 +++--
 2 files changed, 46 insertions(+), 21 deletions(-)

-- 
2.28.0.rc0.105.gf9edc3c819-goog

[PATCH 2/2] configure: atomic64/128 detection for clang

2020-07-22 Thread Shu-Chun Weng

The public interface for __atomic_* and __sync_* do not contain the
explicit *_{number} versions:
  https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
  https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html

They appear to be GCC's internal symbols which happen to work. However,
clang does not recognize them. Replace the existing usages with the `_n`
versions (or no suffix) which are the documented API.

Signed-off-by: Shu-Chun Weng 
---
 configure | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index d9ce3aa5db..0613a049e9 100755
--- a/configure
+++ b/configure
@@ -5894,9 +5894,9 @@ if test "$int128" = "yes"; then
 int main(void)
 {
   unsigned __int128 x = 0, y = 0;
-  y = __atomic_load_16(, 0);
-  __atomic_store_16(, y, 0);
-  __atomic_compare_exchange_16(, , x, 0, 0, 0);
+  y = __atomic_load_n(, 0);
+  __atomic_store_n(, y, 0);
+  __atomic_compare_exchange_n(, , x, 0, 0, 0);
   return 0;
 }
 EOF
@@ -5911,7 +5911,7 @@ if test "$int128" = yes && test "$atomic128" = no; then
 int main(void)
 {
   unsigned __int128 x = 0, y = 0;
-  __sync_val_compare_and_swap_16(, y, x);
+  __sync_val_compare_and_swap(, y, x);
   return 0;
 }
 EOF
@@ -5931,11 +5931,11 @@ int main(void)
 {
   uint64_t x = 0, y = 0;
 #ifdef __ATOMIC_RELAXED
-  y = __atomic_load_8(, 0);
-  __atomic_store_8(, y, 0);
-  __atomic_compare_exchange_8(, , x, 0, 0, 0);
-  __atomic_exchange_8(, y, 0);
-  __atomic_fetch_add_8(, y, 0);
+  y = __atomic_load_n(, 0);
+  __atomic_store_n(, y, 0);
+  __atomic_compare_exchange_n(, , x, 0, 0, 0);
+  __atomic_exchange_n(, y, 0);
+  __atomic_fetch_add(, y, 0);
 #else
   typedef char is_host64[sizeof(void *) >= sizeof(uint64_t) ? 1 : -1];
   __sync_lock_test_and_set(, y);
-- 
2.28.0.rc0.105.gf9edc3c819-goog

[PATCH 3/6] linux-user: Update SO_TIMESTAMP to SO_TIMESTAMP_OLD/NEW

2020-07-22 Thread Shu-Chun Weng

Both guest options map to host SO_TIMESTAMP while keeping a bit in
fd_trans to remember if the guest expects the old or the new format.

Added a multiarch test to verify.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/alpha/sockbits.h|   8 +-
 linux-user/fd-trans.h  |  41 +++-
 linux-user/generic/sockbits.h  |   9 +-
 linux-user/hppa/sockbits.h |   8 +-
 linux-user/mips/sockbits.h |   8 +-
 linux-user/sparc/sockbits.h|   8 +-
 linux-user/strace.c|   7 +-
 linux-user/syscall.c   |  69 --
 tests/tcg/multiarch/socket_timestamp.c | 292 +
 9 files changed, 419 insertions(+), 31 deletions(-)
 create mode 100644 tests/tcg/multiarch/socket_timestamp.c

diff --git a/linux-user/alpha/sockbits.h b/linux-user/alpha/sockbits.h
index d54dc98c09..40f0644df0 100644
--- a/linux-user/alpha/sockbits.h
+++ b/linux-user/alpha/sockbits.h
@@ -48,8 +48,6 @@
 #define TARGET_SO_DETACH_FILTER27
 
 #define TARGET_SO_PEERNAME  28
-#define TARGET_SO_TIMESTAMP 29
-#define TARGET_SCM_TIMESTAMPTARGET_SO_TIMESTAMP
 
 #define TARGET_SO_PEERSEC   30
 #define TARGET_SO_PASSSEC   34
@@ -75,6 +73,12 @@
 /* Instruct lower device to use last 4-bytes of skb data as FCS */
 #define TARGET_SO_NOFCS 43
 
+#define TARGET_SO_TIMESTAMP_OLD29
+#define TARGET_SCM_TIMESTAMP_OLD   TARGET_SO_TIMESTAMP_OLD
+
+#define TARGET_SO_TIMESTAMP_NEW63
+#define TARGET_SCM_TIMESTAMP_NEW   TARGET_SO_TIMESTAMP_NEW
+
 /* TARGET_O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
  */
diff --git a/linux-user/fd-trans.h b/linux-user/fd-trans.h
index a3fcdaabc7..8ab650dfd2 100644
--- a/linux-user/fd-trans.h
+++ b/linux-user/fd-trans.h
@@ -22,6 +22,16 @@ typedef struct TargetFdTrans {
 TargetFdDataFunc host_to_target_data;
 TargetFdDataFunc target_to_host_data;
 TargetFdAddrFunc target_to_host_addr;
+
+/* If `true`, this struct is dynamically allocated and should be
+ * `g_free()`ed when unregistering.
+ */
+bool free_when_unregister;
+
+/* The socket's timestamp option (`SO_TIMESTAMP`, `SO_TIMESTAMPNS`, and
+ * `SO_TIMESTAMPING`) is using the `_NEW` version.
+ */
+bool socket_timestamp_new;
 } TargetFdTrans;
 
 extern TargetFdTrans **target_fd_trans;
@@ -52,6 +62,14 @@ static inline TargetFdAddrFunc 
fd_trans_target_to_host_addr(int fd)
 return NULL;
 }
 
+static inline bool fd_trans_socket_timestamp_new(int fd)
+{
+if (fd >= 0 && fd < target_fd_max && target_fd_trans[fd]) {
+return target_fd_trans[fd]->socket_timestamp_new;
+}
+return false;
+}
+
 static inline void fd_trans_register(int fd, TargetFdTrans *trans)
 {
 unsigned int oldmax;
@@ -70,6 +88,9 @@ static inline void fd_trans_register(int fd, TargetFdTrans 
*trans)
 static inline void fd_trans_unregister(int fd)
 {
 if (fd >= 0 && fd < target_fd_max) {
+if (target_fd_trans[fd] && target_fd_trans[fd]->free_when_unregister) {
+g_free(target_fd_trans[fd]);
+}
 target_fd_trans[fd] = NULL;
 }
 }
@@ -78,8 +99,26 @@ static inline void fd_trans_dup(int oldfd, int newfd)
 {
 fd_trans_unregister(newfd);
 if (oldfd < target_fd_max && target_fd_trans[oldfd]) {
-fd_trans_register(newfd, target_fd_trans[oldfd]);
+TargetFdTrans *trans = target_fd_trans[oldfd];
+if (trans->free_when_unregister) {
+trans = g_new(TargetFdTrans, 1);
+*trans = *target_fd_trans[oldfd];
+}
+fd_trans_register(newfd, trans);
+}
+}
+
+static inline void fd_trans_mark_socket_timestamp_new(int fd, bool value)
+{
+if (fd < 0) return;
+if (fd >= target_fd_max || target_fd_trans[fd] == NULL) {
+if (!value) return; /* default is false */
+
+TargetFdTrans* trans = g_new0(TargetFdTrans, 1);
+trans->free_when_unregister = true;
+fd_trans_register(fd, trans);
 }
+target_fd_trans[fd]->socket_timestamp_new = value;
 }
 
 extern TargetFdTrans target_packet_trans;
diff --git a/linux-user/generic/sockbits.h b/linux-user/generic/sockbits.h
index e44733c601..532cf2d3dc 100644
--- a/linux-user/generic/sockbits.h
+++ b/linux-user/generic/sockbits.h
@@ -49,10 +49,15 @@
 #define TARGET_SO_DETACH_FILTER27
 
 #define TARGET_SO_PEERNAME 28
-#define TARGET_SO_TIMESTAMP29
-#define TARGET_SCM_TIMESTAMP   TARGET_SO_TIMESTAMP
 
 #define TARGET_SO_ACCEPTCONN   30
 
 #define TARGET_SO_PEERSEC  31
+
+#define TARGET_SO_TIMESTAMP_OLD29
+#define TARGET_SCM_TIMESTAMP_OLD   TARGET_SO_TIMESTAMP_OLD
+
+#define TARGET_SO_TIMESTAMP_NEW63
+#define TARGET_SCM_TIMESTAMP_NEW   TARGET_SO_TIMESTAMP_NEW
+
 #endif
diff --git a/linux-user/hppa/sockbits.h b/linux-user/hppa/sockbits.h
index 23f69a3293..284a47e74e 100644

[PATCH 1/6] linux-user: Support F_ADD_SEALS and F_GET_SEALS fcntls

2020-07-22 Thread Shu-Chun Weng

Signed-off-by: Shu-Chun Weng 
---
 linux-user/syscall.c  | 10 ++
 linux-user/syscall_defs.h | 14 --
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 1211e759c2..f97337b0b4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6312,6 +6312,14 @@ static int target_to_host_fcntl_cmd(int cmd)
 case TARGET_F_GETPIPE_SZ:
 ret = F_GETPIPE_SZ;
 break;
+#endif
+#ifdef F_ADD_SEALS
+case TARGET_F_ADD_SEALS:
+ret = F_ADD_SEALS;
+break;
+case TARGET_F_GET_SEALS:
+ret = F_GET_SEALS;
+break;
 #endif
 default:
 ret = -TARGET_EINVAL;
@@ -6598,6 +6606,8 @@ static abi_long do_fcntl(int fd, int cmd, abi_ulong arg)
 case TARGET_F_GETLEASE:
 case TARGET_F_SETPIPE_SZ:
 case TARGET_F_GETPIPE_SZ:
+case TARGET_F_ADD_SEALS:
+case TARGET_F_GET_SEALS:
 ret = get_errno(safe_fcntl(fd, host_cmd, arg));
 break;
 
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 3c261cff0e..70df1a94fb 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -2292,12 +2292,14 @@ struct target_statfs64 {
 #endif
 
 #define TARGET_F_LINUX_SPECIFIC_BASE 1024
-#define TARGET_F_SETLEASE (TARGET_F_LINUX_SPECIFIC_BASE + 0)
-#define TARGET_F_GETLEASE (TARGET_F_LINUX_SPECIFIC_BASE + 1)
-#define TARGET_F_DUPFD_CLOEXEC (TARGET_F_LINUX_SPECIFIC_BASE + 6)
-#define TARGET_F_SETPIPE_SZ (TARGET_F_LINUX_SPECIFIC_BASE + 7)
-#define TARGET_F_GETPIPE_SZ (TARGET_F_LINUX_SPECIFIC_BASE + 8)
-#define TARGET_F_NOTIFY  (TARGET_F_LINUX_SPECIFIC_BASE+2)
+#define TARGET_F_SETLEASE(TARGET_F_LINUX_SPECIFIC_BASE + 0)
+#define TARGET_F_GETLEASE(TARGET_F_LINUX_SPECIFIC_BASE + 1)
+#define TARGET_F_DUPFD_CLOEXEC   (TARGET_F_LINUX_SPECIFIC_BASE + 6)
+#define TARGET_F_NOTIFY  (TARGET_F_LINUX_SPECIFIC_BASE + 2)
+#define TARGET_F_SETPIPE_SZ  (TARGET_F_LINUX_SPECIFIC_BASE + 7)
+#define TARGET_F_GETPIPE_SZ  (TARGET_F_LINUX_SPECIFIC_BASE + 8)
+#define TARGET_F_ADD_SEALS   (TARGET_F_LINUX_SPECIFIC_BASE + 9)
+#define TARGET_F_GET_SEALS   (TARGET_F_LINUX_SPECIFIC_BASE + 10)
 
 #include "target_fcntl.h"
 
-- 
2.28.0.rc0.105.gf9edc3c819-goog

[PATCH 4/6] linux-user: setsockopt() SO_TIMESTAMPNS and SO_TIMESTAMPING

2020-07-22 Thread Shu-Chun Weng

This change supports SO_TIMESTAMPNS_OLD/NEW and SO_TIMESTAMPING_OLD/NEW
for setsocketopt() with SOL_SOCKET. Based on the SO_TIMESTAMP_OLD/NEW
framework. The three pairs share the same flag `SOCK_TSTAMP_NEW` in
linux kernel for deciding if the old or the new format is used.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/alpha/sockbits.h|  13 +-
 linux-user/generic/sockbits.h  |   8 +
 linux-user/hppa/sockbits.h |  12 +-
 linux-user/mips/sockbits.h |   8 +
 linux-user/sparc/sockbits.h|  13 +-
 linux-user/strace.c|  12 +
 linux-user/syscall.c   | 119 ++-
 tests/tcg/multiarch/socket_timestamp.c | 458 +++--
 8 files changed, 521 insertions(+), 122 deletions(-)

diff --git a/linux-user/alpha/sockbits.h b/linux-user/alpha/sockbits.h
index 40f0644df0..c2c88f432b 100644
--- a/linux-user/alpha/sockbits.h
+++ b/linux-user/alpha/sockbits.h
@@ -51,8 +51,6 @@
 
 #define TARGET_SO_PEERSEC   30
 #define TARGET_SO_PASSSEC   34
-#define TARGET_SO_TIMESTAMPNS   35
-#define TARGET_SCM_TIMESTAMPNS  TARGET_SO_TIMESTAMPNS
 
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define TARGET_SO_SECURITY_AUTHENTICATION   19
@@ -61,9 +59,6 @@
 
 #define TARGET_SO_MARK  36
 
-#define TARGET_SO_TIMESTAMPING  37
-#define TARGET_SCM_TIMESTAMPING TARGET_SO_TIMESTAMPING
-
 #define TARGET_SO_RXQ_OVFL 40
 
 #define TARGET_SO_WIFI_STATUS   41
@@ -75,9 +70,17 @@
 
 #define TARGET_SO_TIMESTAMP_OLD29
 #define TARGET_SCM_TIMESTAMP_OLD   TARGET_SO_TIMESTAMP_OLD
+#define TARGET_SO_TIMESTAMPNS_OLD  35
+#define TARGET_SCM_TIMESTAMPNS_OLD TARGET_SO_TIMESTAMPNS_OLD
+#define TARGET_SO_TIMESTAMPING_OLD 37
+#define TARGET_SCM_TIMESTAMPING_OLDTARGET_SO_TIMESTAMPING_OLD
 
 #define TARGET_SO_TIMESTAMP_NEW63
 #define TARGET_SCM_TIMESTAMP_NEW   TARGET_SO_TIMESTAMP_NEW
+#define TARGET_SO_TIMESTAMPNS_NEW  64
+#define TARGET_SCM_TIMESTAMPNS_NEW TARGET_SO_TIMESTAMPNS_NEW
+#define TARGET_SO_TIMESTAMPING_NEW 65
+#define TARGET_SCM_TIMESTAMPING_NEWTARGET_SO_TIMESTAMPING_NEW
 
 /* TARGET_O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
diff --git a/linux-user/generic/sockbits.h b/linux-user/generic/sockbits.h
index 532cf2d3dc..a0496d8751 100644
--- a/linux-user/generic/sockbits.h
+++ b/linux-user/generic/sockbits.h
@@ -56,8 +56,16 @@
 
 #define TARGET_SO_TIMESTAMP_OLD29
 #define TARGET_SCM_TIMESTAMP_OLD   TARGET_SO_TIMESTAMP_OLD
+#define TARGET_SO_TIMESTAMPNS_OLD  35
+#define TARGET_SCM_TIMESTAMPNS_OLD TARGET_SO_TIMESTAMPNS_OLD
+#define TARGET_SO_TIMESTAMPING_OLD 37
+#define TARGET_SCM_TIMESTAMPING_OLDTARGET_SO_TIMESTAMPING_OLD
 
 #define TARGET_SO_TIMESTAMP_NEW63
 #define TARGET_SCM_TIMESTAMP_NEW   TARGET_SO_TIMESTAMP_NEW
+#define TARGET_SO_TIMESTAMPNS_NEW  64
+#define TARGET_SCM_TIMESTAMPNS_NEW TARGET_SO_TIMESTAMPNS_NEW
+#define TARGET_SO_TIMESTAMPING_NEW 65
+#define TARGET_SCM_TIMESTAMPING_NEWTARGET_SO_TIMESTAMPING_NEW
 
 #endif
diff --git a/linux-user/hppa/sockbits.h b/linux-user/hppa/sockbits.h
index 284a47e74e..d7e9aa340d 100644
--- a/linux-user/hppa/sockbits.h
+++ b/linux-user/hppa/sockbits.h
@@ -29,8 +29,6 @@
 #define TARGET_SO_BSDCOMPAT0x400e
 #define TARGET_SO_PASSCRED 0x4010
 #define TARGET_SO_PEERCRED 0x4011
-#define TARGET_SO_TIMESTAMPNS  0x4013
-#define TARGET_SCM_TIMESTAMPNS TARGET_SO_TIMESTAMPNS
 
 #define TARGET_SO_SECURITY_AUTHENTICATION  0x4016
 #define TARGET_SO_SECURITY_ENCRYPTION_TRANSPORT0x4017
@@ -44,8 +42,6 @@
 #define TARGET_SO_PEERSEC  0x401d
 #define TARGET_SO_PASSSEC  0x401e
 #define TARGET_SO_MARK 0x401f
-#define TARGET_SO_TIMESTAMPING 0x4020
-#define TARGET_SCM_TIMESTAMPINGTARGET_SO_TIMESTAMPING
 #define TARGET_SO_RXQ_OVFL 0x4021
 #define TARGET_SO_WIFI_STATUS  0x4022
 #define TARGET_SCM_WIFI_STATUS TARGET_SO_WIFI_STATUS
@@ -67,9 +63,17 @@
 
 #define TARGET_SO_TIMESTAMP_OLD0x4012
 #define TARGET_SCM_TIMESTAMP_OLD   TARGET_SO_TIMESTAMP_OLD
+#define TARGET_SO_TIMESTAMPNS_OLD  0x4013
+#define TARGET_SCM_TIMESTAMPNS_OLD TARGET_SO_TIMESTAMPNS_OLD
+#define TARGET_SO_TIMESTAMPING_OLD 0x4020
+#define TARGET_SCM_TIMESTAMPING_OLDTARGET_SO_TIMESTAMPING_OLD
 
 #define TARGET_SO_TIMESTAMP_NEW0x4038
 #define TARGET_SCM_TIMESTAMP_NEW   TARGET_SO_TIMESTAMP_NEW
+#define TARGET_SO_TIMESTAMPNS_NEW  0x4039
+#define TARGET_SCM_TIMESTAMPNS_NEW TARGET_SO_TIMESTAMPNS_NEW
+#define TARGET_SO_TIMESTAMPING_NEW 0x403A
+#define TARGET_SCM_TIMESTAMPING_NEWTARGET_SO_TIMESTAMPING_NEW
 
 /* TARGET_O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
diff

[PATCH 2/6] linux-user: add missing UDP and IPv6 get/setsockopt options

2020-07-22 Thread Shu-Chun Weng

UDP: SOL_UDP manipulate options at UDP level. All six options currently
defined in linux source include/uapi/linux/udp.h take integer values.

IPv6: IPV6_ADDR_PREFERENCES (RFC5014: Source address selection) was not
supported.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/syscall.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index f97337b0b4..a53db446d4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -51,8 +51,10 @@
 #include 
 #include 
 //#include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1945,7 +1947,8 @@ static abi_long do_setsockopt(int sockfd, int level, int 
optname,
 
 switch(level) {
 case SOL_TCP:
-/* TCP options all take an 'int' value.  */
+case SOL_UDP:
+/* TCP and UDP options all take an 'int' value.  */
 if (optlen < sizeof(uint32_t))
 return -TARGET_EINVAL;
 
@@ -2031,6 +2034,7 @@ static abi_long do_setsockopt(int sockfd, int level, int 
optname,
 case IPV6_RECVDSTOPTS:
 case IPV6_2292DSTOPTS:
 case IPV6_TCLASS:
+case IPV6_ADDR_PREFERENCES:
 #ifdef IPV6_RECVPATHMTU
 case IPV6_RECVPATHMTU:
 #endif
@@ -2593,7 +2597,8 @@ get_timeout:
 }
 break;
 case SOL_TCP:
-/* TCP options all take an 'int' value.  */
+case SOL_UDP:
+/* TCP and UDP options all take an 'int' value.  */
 int_case:
 if (get_user_u32(len, optlen))
 return -TARGET_EFAULT;
@@ -2684,6 +2689,7 @@ get_timeout:
 case IPV6_RECVDSTOPTS:
 case IPV6_2292DSTOPTS:
 case IPV6_TCLASS:
+case IPV6_ADDR_PREFERENCES:
 #ifdef IPV6_RECVPATHMTU
 case IPV6_RECVPATHMTU:
 #endif
-- 
2.28.0.rc0.105.gf9edc3c819-goog

[PATCH 6/6] linux-user: Add support for SIOCETHTOOL ioctl

2020-07-22 Thread Shu-Chun Weng

The ioctl numeric values are platform-independent and determined by
the file include/uapi/linux/sockios.h in Linux kernel source code:

  #define SIOCETHTOOL   0x8946

These ioctls get (or set) various structures pointed by the field
ifr_data in the structure ifreq depending on the first 4 bytes of the
memory region.

This change clones the ioctl framework into ethtool-specific dispatch
logic in its own file. A number of definitions previously only visible
in syscall.c are thus exported to syscall_defs.h to be used in the new
files.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/Makefile.objs  |   3 +-
 linux-user/ethtool.c  | 819 ++
 linux-user/ethtool.h  |  19 +
 linux-user/ethtool_entries.h  | 107 +
 linux-user/ioctls.h   |   2 +
 linux-user/qemu.h |   1 +
 linux-user/syscall.c  |  35 +-
 linux-user/syscall_defs.h |  12 +
 linux-user/syscall_types.h| 277 
 tests/tcg/multiarch/ethtool.c | 417 +
 10 files changed, 1680 insertions(+), 12 deletions(-)
 create mode 100644 linux-user/ethtool.c
 create mode 100644 linux-user/ethtool.h
 create mode 100644 linux-user/ethtool_entries.h
 create mode 100644 tests/tcg/multiarch/ethtool.c

diff --git a/linux-user/Makefile.objs b/linux-user/Makefile.objs
index 1940910a73..971d43173a 100644
--- a/linux-user/Makefile.objs
+++ b/linux-user/Makefile.objs
@@ -1,7 +1,8 @@
 obj-y = main.o syscall.o strace.o mmap.o signal.o \
elfload.o linuxload.o uaccess.o uname.o \
safe-syscall.o $(TARGET_ABI_DIR)/signal.o \
-$(TARGET_ABI_DIR)/cpu_loop.o exit.o fd-trans.o
+   $(TARGET_ABI_DIR)/cpu_loop.o exit.o fd-trans.o \
+   ethtool.o
 
 obj-$(TARGET_HAS_BFLT) += flatload.o
 obj-$(TARGET_I386) += vm86.o
diff --git a/linux-user/ethtool.c b/linux-user/ethtool.c
new file mode 100644
index 00..cb134e7c9b
--- /dev/null
+++ b/linux-user/ethtool.c
@@ -0,0 +1,819 @@
+/*
+ *  Linux ioctl system call SIOCETHTOOL requests
+ *
+ *  Copyright (c) 2020 Shu-Chun Weng
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+#include "qemu/osdep.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "ethtool.h"
+#include "qemu.h"
+#include "syscall_defs.h"
+
+/* Non-standard ethtool structure definitions. */
+/* struct ethtool_rxnfc {
+ * __u32 cmd;
+ * __u32 flow_type;
+ * __u64 data;
+ * struct ethtool_rx_flow_spec fs;
+ * union {
+ * __u32 rule_cnt;
+ * __u32 rss_context;
+ * };
+ * __u32 rule_locs[0];
+ * };
+ *
+ * Originally defined for ETHTOOL_{G,S}RXFH with only the cmd, flow_type and
+ * data members. For other commands, dedicated standard structure definitions
+ * are listed in syscall_types.h.
+ */
+static void host_to_target_ethtool_rxnfc_get_set_rxfh(void *dst,
+  const void *src)
+{
+static const argtype ethtool_rx_flow_spec_argtype[] = {
+MK_STRUCT(STRUCT_ethtool_rx_flow_spec), TYPE_NULL };
+struct ethtool_rxnfc *target = dst;
+const struct ethtool_rxnfc *host = src;
+
+target->cmd = tswap32(host->cmd);
+target->flow_type = tswap32(host->flow_type);
+target->data = tswap64(host->data);
+
+if (host->cmd == ETHTOOL_SRXFH) {
+/* struct ethtool_rxnfc was originally defined for ETHTOOL_{G,S}RXFH
+ * with only the cmd, flow_type and data members. Guest program might
+ * still be using that definition.
+ */
+return;
+}
+if (host->cmd != ETHTOOL_GRXFH) {
+fprintf(stderr, "host_to_target_ethtool_rxnfc_get_set_rxfh called with 
"
+"command 0x%x which is not ETHTOOL_SRXFH or ETHTOOL_GRXFH\n",
+host->cmd);
+}
+if ((host->flow_type & FLOW_RSS) == 0) {
+return;
+}
+/* If `FLOW_RSS` was requested then guest program must be using the new
+ * definition.
+ */
+thunk_convert(>fs, >fs, ethtool_rx_flow_spec_argtype,
+  THUNK_TARGET);
+target->rule_cnt = tswap32(host->rule_cnt);
+}
+
+static void target_to_host_ethtool_rxnfc_get_set_rxfh(void *dst,
+  const void *src)
+{
+static const argtype ethtool_rx_flow_spec_argtype[] = {
+MK_STRUCT(STRUCT_ethtool_rx_flow_spec), TYPE_NULL };
+struct

[PATCH 0/6] fcntl, sockopt, and ioctl options

2020-07-22 Thread Shu-Chun Weng

Hi Laurent,

This is a series of 6 patches in 4 groups, putting into a single thread for
easier tracking.

[PATCH 1/6] linux-user: Support F_ADD_SEALS and F_GET_SEALS fcntls
  An incidental follow up on
  https://lists.nongnu.org/archive/html/qemu-devel/2019-09/msg01925.html

[PATCH 2/6] linux-user: add missing UDP and IPv6 get/setsockopt
  Updated https://lists.nongnu.org/archive/html/qemu-devel/2019-09/msg01317.html
  to consistently add them in get/setsockopt

[PATCH 3/6] linux-user: Update SO_TIMESTAMP to SO_TIMESTAMP_OLD/NEW
[PATCH 4/6] linux-user: setsockopt() SO_TIMESTAMPNS and SO_TIMESTAMPING
  Updated https://lists.nongnu.org/archive/html/qemu-devel/2019-09/msg01319.html
  to only use TARGET_SO_*_OLD/NEW

[PATCH 5/6] thunk: supports flexible arrays
[PATCH 6/6] linux-user: Add support for SIOCETHTOOL ioctl
  Updated https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg05090.html

Shu-Chun Weng (6):
  linux-user: Support F_ADD_SEALS and F_GET_SEALS fcntls
  linux-user: add missing UDP and IPv6 get/setsockopt options
  linux-user: Update SO_TIMESTAMP to SO_TIMESTAMP_OLD/NEW
  linux-user: setsockopt() SO_TIMESTAMPNS and SO_TIMESTAMPING
  thunk: supports flexible arrays
  linux-user: Add support for SIOCETHTOOL ioctl

 include/exec/user/thunk.h  |  20 +
 linux-user/Makefile.objs   |   3 +-
 linux-user/alpha/sockbits.h|  21 +-
 linux-user/ethtool.c   | 819 +
 linux-user/ethtool.h   |  19 +
 linux-user/ethtool_entries.h   | 107 
 linux-user/fd-trans.h  |  41 +-
 linux-user/generic/sockbits.h  |  17 +-
 linux-user/hppa/sockbits.h |  20 +-
 linux-user/ioctls.h|   2 +
 linux-user/mips/sockbits.h |  16 +-
 linux-user/qemu.h  |   1 +
 linux-user/sparc/sockbits.h|  21 +-
 linux-user/strace.c|  19 +-
 linux-user/syscall.c   | 233 ++-
 linux-user/syscall_defs.h  |  26 +-
 linux-user/syscall_types.h | 277 +
 tests/tcg/multiarch/ethtool.c  | 417 +
 tests/tcg/multiarch/socket_timestamp.c | 542 
 thunk.c| 151 -
 20 files changed, 2706 insertions(+), 66 deletions(-)
 create mode 100644 linux-user/ethtool.c
 create mode 100644 linux-user/ethtool.h
 create mode 100644 linux-user/ethtool_entries.h
 create mode 100644 tests/tcg/multiarch/ethtool.c
 create mode 100644 tests/tcg/multiarch/socket_timestamp.c

-- 
2.28.0.rc0.105.gf9edc3c819-goog

[PATCH 5/6] thunk: supports flexible arrays

2020-07-22 Thread Shu-Chun Weng

Flexible arrays may appear in the last field of a struct and are heavily
used in the ioctl(SIOCETHTOOL) system call on Linux. E.g.

  struct ethtool_regs {
  __u32   cmd;
  __u32   version; /* driver-specific, indicates different chips/revs */
  __u32   len; /* bytes */
  __u8data[0];
  };

where number of elements in `data` is specified in `len`. It is translated
into:

  STRUCT(ethtool_regs,
 TYPE_INT, /* cmd */
 TYPE_INT, /* version */
 TYPE_INT, /* len */
 MK_FLEXIBLE_ARRAY(TYPE_CHAR, 2)) /* data[0]: len */

where the "2" passed to `MK_FLEXIBLE_ARRAY` means the number of element
is specified by field number 2 (0-index).

Signed-off-by: Shu-Chun Weng 
---
 include/exec/user/thunk.h |  20 +
 thunk.c   | 151 +-
 2 files changed, 169 insertions(+), 2 deletions(-)

diff --git a/include/exec/user/thunk.h b/include/exec/user/thunk.h
index 7992475c9f..080d84e806 100644
--- a/include/exec/user/thunk.h
+++ b/include/exec/user/thunk.h
@@ -39,12 +39,19 @@ typedef enum argtype {
 TYPE_ARRAY,
 TYPE_STRUCT,
 TYPE_OLDDEVT,
+TYPE_FLEXIBLE_ARRAY,
 } argtype;
 
 #define MK_PTR(type) TYPE_PTR, type
 #define MK_ARRAY(type, size) TYPE_ARRAY, size, type
 #define MK_STRUCT(id) TYPE_STRUCT, id
 
+/* Should only appear as the last element of a TYPE_STRUCT. `len_field_idx` is
+ * the index into the fields in the enclosing struct that specify the length of
+ * the flexibly array. The length field MUST be a TYPE_INT field. */
+#define MK_FLEXIBLE_ARRAY(type, len_field_idx) \
+TYPE_FLEXIBLE_ARRAY, len_field_idx, type
+
 #define THUNK_TARGET 0
 #define THUNK_HOST   1
 
@@ -55,6 +62,8 @@ typedef struct {
 int *field_offsets[2];
 /* special handling */
 void (*convert[2])(void *dst, const void *src);
+int (*thunk_size[2])(const void *src);
+
 int size[2];
 int align[2];
 const char *name;
@@ -75,6 +84,11 @@ const argtype *thunk_convert(void *dst, const void *src,
  const argtype *type_ptr, int to_host);
 const argtype *thunk_print(void *arg, const argtype *type_ptr);
 
+bool thunk_type_has_flexible_array(const argtype *type_ptr);
+/* thunk_type_size but can handle TYPE_FLEXIBLE_ARRAY */
+int thunk_type_size_with_src(const void *src, const argtype *type_ptr,
+ int is_host);
+
 extern StructEntry *struct_entries;
 
 int thunk_type_size_array(const argtype *type_ptr, int is_host);
@@ -137,6 +151,10 @@ static inline int thunk_type_size(const argtype *type_ptr, 
int is_host)
 case TYPE_STRUCT:
 se = struct_entries + type_ptr[1];
 return se->size[is_host];
+case TYPE_FLEXIBLE_ARRAY:
+/* Flexible arrays do not count toward sizeof(). Users of structures
+ * containing them need to calculate it themselves. */
+return 0;
 default:
 g_assert_not_reached();
 }
@@ -187,6 +205,8 @@ static inline int thunk_type_align(const argtype *type_ptr, 
int is_host)
 case TYPE_STRUCT:
 se = struct_entries + type_ptr[1];
 return se->align[is_host];
+case TYPE_FLEXIBLE_ARRAY:
+return thunk_type_align_array(type_ptr + 2, is_host);
 default:
 g_assert_not_reached();
 }
diff --git a/thunk.c b/thunk.c
index c5d9719747..7b89332712 100644
--- a/thunk.c
+++ b/thunk.c
@@ -50,6 +50,8 @@ static inline const argtype *thunk_type_next(const argtype 
*type_ptr)
 return thunk_type_next_ptr(type_ptr + 1);
 case TYPE_STRUCT:
 return type_ptr + 1;
+case TYPE_FLEXIBLE_ARRAY:
+return thunk_type_next_ptr(type_ptr + 1);
 default:
 return NULL;
 }
@@ -122,6 +124,34 @@ void thunk_register_struct_direct(int id, const char *name,
 se->name = name;
 }
 
+static const argtype *
+thunk_convert_flexible_array(void *dst, const void *src,
+ const uint8_t *dst_struct,
+ const uint8_t *src_struct, const argtype 
*type_ptr,
+ const StructEntry *se, int to_host) {
+int len_field_idx, dst_size, src_size, i;
+uint32_t array_length;
+uint8_t *d;
+const uint8_t *s;
+
+assert(*type_ptr == TYPE_FLEXIBLE_ARRAY);
+type_ptr++;
+len_field_idx = *type_ptr++;
+array_length =
+*(const uint32_t *)(to_host ?
+dst_struct + se->field_offsets[1][len_field_idx] :
+src_struct + se->field_offsets[0][len_field_idx]);
+dst_size = thunk_type_size(type_ptr, to_host);
+src_size = thunk_type_size(type_ptr, to_host);
+d = dst;
+s = src;
+for (i = 0; i < array_length; i++) {
+thunk_convert(d, s, type_ptr, to_host);
+d += dst_size;
+s += src_size;
+}
+return thunk_type_next(type_ptr);
+}
 
 /* now we can define the main conversion functions */
 const argtype *thunk_convert(void *dst, const void *src,
@@ -246,7 +276,7 @@ const

Re: What is TYPE_TPM_TIS_ISA? (Not an ISA Device)

2020-07-22 Thread Stefan Berger


On 7/22/20 1:55 AM, Markus Armbruster wrote:

pm socket --tpmstate dir=tpm --ctrl type=unixio,path=tpm/swtpm-soc
running in another terminal.


3/ no machine plug it using isa_register_ioport()
(it is not registered to the ISA memory space)

There's no requirement for an ISA device to have IO ports...

thanks
-- PMM

Thread hijack!  Since I didn't have swtpm installed, I tried to take a
shortcut:

 $ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -chardev 
null,id=tpm0 -tpmdev emulator,id=tpm0,chardev=chrtpm -device tpm-tis,tpmdev=tpm0
 qemu-system-x86_64: -tpmdev emulator,id=tpm0,chardev=chrtpm: tpm-emulator: 
tpm chardev 'chrtpm' not found.
 qemu-system-x86_64: -tpmdev emulator,id=tpm0,chardev=chrtpm: tpm-emulator: 
Could not cleanly shutdown the TPM: No such file or directory
 QEMU 5.0.90 monitor - type 'help' for more information
 (qemu) qemu-system-x86_64: -device tpm-tis,tpmdev=tpm0: Property 
'tpm-tis.tpmdev' can't find value 'tpm0'
 $ echo $?
 1

That a null chardev doesn't work is fine.  But the error handling looks
broken: QEMU diagnoses and reports the problem, then continues.  The
final error message indicates that it continued without creating the
backend "tpm0".  That's wrong.



This issue can be solve via the following change that then displays this 
error:


$ x86_64-softmmu/qemu-system-x86_64 -nodefaults -S -display none 
-monitor stdio -chardev null,id=tpm0 -tpmdev 
emulator,id=tpm0,chardev=chrtpm -device tpm-tis,tpmdev=tpm0
qemu-system-x86_64: -tpmdev emulator,id=tpm0,chardev=chrtpm: 
tpm-emulator: tpm chardev 'chrtpm' not found.
qemu-system-x86_64: -tpmdev emulator,id=tpm0,chardev=chrtpm: 
tpm-emulator: Could not cleanly shutdown the TPM: No such file or directory



diff --git a/tpm.c b/tpm.c
index 358566cb10..857a861e69 100644
--- a/tpm.c
+++ b/tpm.c
@@ -170,8 +170,10 @@ void tpm_cleanup(void)
  */
 void tpm_init(void)
 {
-    qemu_opts_foreach(qemu_find_opts("tpmdev"),
-  tpm_init_tpmdev, NULL, _fatal);
+    if (qemu_opts_foreach(qemu_find_opts("tpmdev"),
+  tpm_init_tpmdev, NULL, _fatal)) {
+    exit(1);
+    }
 }

 /*

We had something like this before this patch here was applied: 
https://github.com/qemu/qemu/commit/d10e05f15d5c3dd5e5cc59c5dfff460d89d48580#diff-0ec5df49c6751cb2dc9fa18ed5cf9f0e



Do we now want to partially revert this patch or call the exit(1) as 
shown here?



   Stefan

Re: [PATCH for-5.1] libvhost-user: Add missing GCC_FMT_ATTR and fix format errors

2020-07-22 Thread Marc-André Lureau

On Thu, Jul 23, 2020 at 12:54 AM Stefan Weil  wrote:

> Signed-off-by: Stefan Weil 
>

Reviewed-by: Marc-André Lureau 

---
>  contrib/libvhost-user/libvhost-user.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/contrib/libvhost-user/libvhost-user.c
> b/contrib/libvhost-user/libvhost-user.c
> index d315db1396..6e659aff37 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -151,7 +151,7 @@ vu_request_to_string(unsigned int req)
>  }
>  }
>
> -static void
> +static void GCC_FMT_ATTR(2, 3)
>  vu_panic(VuDev *dev, const char *msg, ...)
>  {
>  char *buf = NULL;
> @@ -2074,7 +2074,7 @@ virtqueue_get_head(VuDev *dev, VuVirtq *vq,
>
>  /* If their number is silly, that's a fatal mistake. */
>  if (*head >= vq->vring.num) {
> -vu_panic(dev, "Guest says index %u is available", head);
> +vu_panic(dev, "Guest says index %u is available", *head);
>  return false;
>  }
>
> @@ -2133,7 +2133,7 @@ virtqueue_read_next_desc(VuDev *dev, struct
> vring_desc *desc,
>  smp_wmb();
>
>  if (*next >= max) {
> -vu_panic(dev, "Desc next is %u", next);
> +vu_panic(dev, "Desc next is %u", *next);
>  return VIRTQUEUE_READ_DESC_ERROR;
>  }
>
> --
> 2.27.0
>
>

-- 
Marc-André Lureau

Re: [RFC PATCH-for-5.1 2/2] tpm: List the available TPM backends

2020-07-22 Thread Stefan Berger


On 7/22/20 7:23 AM, Philippe Mathieu-Daudé wrote:

When an incorrect backend is selected, tpm_display_backend_drivers()
is supposed to list the available backends. However the error is
directly propagated, and we never display the list. The user only
gets "Parameter 'type' expects a TPM backend type" error.

Convert the fprintf(stderr,) calls to error hints propagated with
the error.

Signed-off-by: Philippe Mathieu-Daudé 
---
RFC because this is now odd in tpm_config_parse():


because it's not using the fprintf anymore ?

Re: [Bug 1884831] Re: qemu-nbd fails to discard bigger chunks

2020-07-22 Thread Eric Blake


On 6/23/20 4:35 PM, Eric Blake wrote:

Let's get nbd.ko out of the picture.  The problem can be reproduced in
user space (here, where I built qemu-nbd to log trace messages to
stderr):

$ truncate --size=3G file
$ qemu-nbd -f raw file --trace=nbd_\*
$ nbdsh -u nbd://localhost:10810 -c 'h.trim(3*1024*1024*1024,0)'



nbd.Error: nbd_trim: trim: command failed: Input/output error (EIO)





so this is definitely a case of qemu as NBD server NOT honoring requests
between 2G and 4G.  I'll have a patch posted soon.


https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg06592.html

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH for-5.1] nbd: Fix large trim/zero requests

2020-07-22 Thread Eric Blake

Although qemu as NBD client limits requests to <2G, the NBD protocol
allows clients to send requests almost all the way up to 4G.  But
because our block layer is not yet 64-bit clean, we accidentally wrap
such requests into a negative size, and fail with EIO instead of
performing the intended operation.

The bug is visible in modern systems with something as simple as:

$ qemu-img create -f qcow2 /tmp/image.img 5G
$ sudo qemu-nbd --connect=/dev/nbd0 /tmp/image.img
$ sudo blkdiscard /dev/nbd0

or with user-space only:

$ truncate --size=3G file
$ qemu-nbd -f raw file
$ nbdsh -u nbd://localhost:10809 -c 'h.trim(3*1024*1024*1024,0)'

Alas, our iotests do not currently make it easy to add external
dependencies on blkdiscard or nbdsh, so we have to rely on manual
testing for now.

This patch can be reverted when we later improve the overall block
layer to be 64-bit clean, but for now, a minimal fix was deemed less
risky prior to release.

CC: qemu-sta...@nongnu.org
Fixes: 1f4d6d18ed
Fixes: 1c6c4bb7f0
Fixes: https://github.com/systemd/systemd/issues/16242
Signed-off-by: Eric Blake 
---
 nbd/server.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 4752a6c8bc07..029618017c90 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2378,8 +2378,17 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 if (request->flags & NBD_CMD_FLAG_FAST_ZERO) {
 flags |= BDRV_REQ_NO_FALLBACK;
 }
-ret = blk_pwrite_zeroes(exp->blk, request->from + exp->dev_offset,
-request->len, flags);
+ret = 0;
+/* FIXME simplify this when blk_pwrite_zeroes switches to 64-bit */
+while (ret == 0 && request->len) {
+int align = client->check_align ?: 1;
+int len = MIN(request->len, QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
+align));
+ret = blk_pwrite_zeroes(exp->blk, request->from + exp->dev_offset,
+len, flags);
+request->len -= len;
+request->from += len;
+}
 return nbd_send_generic_reply(client, request->handle, ret,
   "writing to file failed", errp);

@@ -2393,8 +2402,17 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
   "flush failed", errp);

 case NBD_CMD_TRIM:
-ret = blk_co_pdiscard(exp->blk, request->from + exp->dev_offset,
-  request->len);
+ret = 0;
+/* FIXME simplify this when blk_co_pdiscard switches to 64-bit */
+while (ret == 0 && request->len) {
+int align = client->check_align ?: 1;
+int len = MIN(request->len, QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
+align));
+ret = blk_co_pdiscard(exp->blk, request->from + exp->dev_offset,
+  len);
+request->len -= len;
+request->from += len;
+}
 if (ret == 0 && request->flags & NBD_CMD_FLAG_FUA) {
 ret = blk_co_flush(exp->blk);
 }
-- 
2.27.0

Re: 5.1.0-rc1 regression: reset fails with kvm and -cpu host

2020-07-22 Thread Eduardo Habkost

On Wed, Jul 22, 2020 at 04:47:32PM -0400, Eduardo Habkost wrote:
> On Wed, Jul 22, 2020 at 08:05:01PM +0200, Jan Kiszka wrote:
> > On 22.07.20 19:35, Eduardo Habkost wrote:
> > > Hi Jan,
> > > 
> > > What was the last version where it worked for you?  Does using
> > > "-cpu host,-vmx" help?
> > 
> > Yeah, -vmx does indeed help.
> > 
> > I didn't have the time to bisect yet. Just check my reflog, picked
> > eb6490f544, and that works.
> 
> Thanks!
> 
> I could reproduce it locally[1], I will bisect it.
> 
> The good news is that "-cpu host,+vmx" still works, on commit
> eb6490f544.
> 
> [1] Linux 5.6.19-300.fc32.x86_64, Intel Core i7-8665U CPU.

Bisected to:

commit b16c0e20c74218f2d69710cedad11da7dd4d2190
Author: Paolo Bonzini 
Date:   Wed May 20 10:49:22 2020 -0400

KVM: add support for AMD nested live migration

Support for nested guest live migration is part of Linux 5.8, add the
corresponding code to QEMU.  The migration format consists of a few
flags, is an opaque 4k blob.

The blob is in VMCB format (the control area represents the L1 VMCB
control fields, the save area represents the pre-vmentry state; KVM does
not use the host save area since the AMD manual allows that) but QEMU
does not really care about that.  However, the flags need to be
copied to hflags/hflags2 and back.

In addition, support for retrieving and setting the AMD nested 
virtualization
states allows the L1 guest to be reset while running a nested guest, but
a small bug in CPU reset needs to be fixed for that to work.

Signed-off-by: Paolo Bonzini 

-- 
Eduardo

[PATCH for-5.1] Fix grammar in documentation

2020-07-22 Thread Stefan Weil

Signed-off-by: Stefan Weil 
---
 docs/system/build-platforms.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/system/build-platforms.rst b/docs/system/build-platforms.rst
index c2b92a9698..9734eba2f1 100644
--- a/docs/system/build-platforms.rst
+++ b/docs/system/build-platforms.rst
@@ -57,12 +57,12 @@ macOS
 -
 
 The project supports building with the two most recent versions of
-macOS, with the current homebrew package set available.
+macOS, with the current Homebrew package set available.
 
 FreeBSD
 ---
 
-The project aims to support the all the versions which are not end of
+The project aims to support all versions which are not end of
 life.
 
 NetBSD
@@ -75,5 +75,5 @@ new major version is released.
 OpenBSD
 ---
 
-The project aims to support the all the versions which are not end of
+The project aims to support all versions which are not end of
 life.
-- 
2.27.0

[PATCH for-5.1] libvhost-user: Add missing GCC_FMT_ATTR and fix format errors

2020-07-22 Thread Stefan Weil

Signed-off-by: Stefan Weil 
---
 contrib/libvhost-user/libvhost-user.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c 
b/contrib/libvhost-user/libvhost-user.c
index d315db1396..6e659aff37 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -151,7 +151,7 @@ vu_request_to_string(unsigned int req)
 }
 }
 
-static void
+static void GCC_FMT_ATTR(2, 3)
 vu_panic(VuDev *dev, const char *msg, ...)
 {
 char *buf = NULL;
@@ -2074,7 +2074,7 @@ virtqueue_get_head(VuDev *dev, VuVirtq *vq,
 
 /* If their number is silly, that's a fatal mistake. */
 if (*head >= vq->vring.num) {
-vu_panic(dev, "Guest says index %u is available", head);
+vu_panic(dev, "Guest says index %u is available", *head);
 return false;
 }
 
@@ -2133,7 +2133,7 @@ virtqueue_read_next_desc(VuDev *dev, struct vring_desc 
*desc,
 smp_wmb();
 
 if (*next >= max) {
-vu_panic(dev, "Desc next is %u", next);
+vu_panic(dev, "Desc next is %u", *next);
 return VIRTQUEUE_READ_DESC_ERROR;
 }
 
-- 
2.27.0

Re: 5.1.0-rc1 regression: reset fails with kvm and -cpu host

2020-07-22 Thread Eduardo Habkost

On Wed, Jul 22, 2020 at 08:05:01PM +0200, Jan Kiszka wrote:
> On 22.07.20 19:35, Eduardo Habkost wrote:
> > Hi Jan,
> > 
> > What was the last version where it worked for you?  Does using
> > "-cpu host,-vmx" help?
> 
> Yeah, -vmx does indeed help.
> 
> I didn't have the time to bisect yet. Just check my reflog, picked
> eb6490f544, and that works.

Thanks!

I could reproduce it locally[1], I will bisect it.

The good news is that "-cpu host,+vmx" still works, on commit
eb6490f544.

[1] Linux 5.6.19-300.fc32.x86_64, Intel Core i7-8665U CPU.

-- 
Eduardo

[PATCH for-5.1] sd/milkymist-memcard: Fix format string

2020-07-22 Thread Stefan Weil

Signed-off-by: Stefan Weil 
---
 hw/sd/milkymist-memcard.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/sd/milkymist-memcard.c b/hw/sd/milkymist-memcard.c
index afdb8aa0c0..11f61294fc 100644
--- a/hw/sd/milkymist-memcard.c
+++ b/hw/sd/milkymist-memcard.c
@@ -281,7 +281,7 @@ static void milkymist_memcard_realize(DeviceState *dev, 
Error **errp)
 carddev = qdev_new(TYPE_SD_CARD);
 qdev_prop_set_drive(carddev, "drive", blk);
 if (!qdev_realize_and_unref(carddev, BUS(>sdbus), )) {
-error_propagate_prepend(errp, err, "failed to init SD card: %s");
+error_propagate_prepend(errp, err, "failed to init SD card");
 return;
 }
 s->enabled = blk && blk_is_inserted(blk);
-- 
2.27.0

[PATCH v3 1/5] linux-user: Make cpu_env accessible in strace.c

2020-07-22 Thread Filip Bozuta

Variable "cpu_env" is used in file "syscall.c" to store
the information about the cpu environment. This variable
is used because values of some syscalls can vary between
cpu architectures. This patch makes the "cpu_env" accessible
in "strace.c" so it can enable aproppriate "-strace" argument
printing for these syscalls. This will be a useful addition
for future "-strace" implementation in QEMU.

Implementation notes:

Functions "print_syscall()" and "print_syscall_ret()" which
are stated and defined in "qemu.h" and "strace.c" respectively
are used to print syscall arguments before and after syscall
execution. These functions were changed with addition of a
new argument "void *cpu_env". Strucute "struct syscallname"
in "strace.c" is used to store the information about syscalls.
Fields "call" and "result" represent pointers to functions which
are used to print syscall arguments before and after execution.
These fields were also changed with addition of a new "void *"
argumetn.
Also, all defined "print_*" and "print_syscall_ret*" functions
in "strace.c" were changed to have the new "void *cpu_env".
This was done to not cause build errors (even though none of
these functions use this argument).

Signed-off-by: Filip Bozuta 
Reviewed-by: Laurent Vivier 
---
 linux-user/qemu.h|   4 +-
 linux-user/strace.c  | 479 ++-
 linux-user/syscall.c |   5 +-
 3 files changed, 247 insertions(+), 241 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 5c964389c1..63ddfe86fd 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -400,10 +400,10 @@ extern long safe_syscall_base(int *pending, long number, 
...);
 int host_to_target_waitstatus(int status);
 
 /* strace.c */
-void print_syscall(int num,
+void print_syscall(void *cpu_env, int num,
abi_long arg1, abi_long arg2, abi_long arg3,
abi_long arg4, abi_long arg5, abi_long arg6);
-void print_syscall_ret(int num, abi_long ret,
+void print_syscall_ret(void *cpu_env, int num, abi_long ret,
abi_long arg1, abi_long arg2, abi_long arg3,
abi_long arg4, abi_long arg5, abi_long arg6);
 /**
diff --git a/linux-user/strace.c b/linux-user/strace.c
index 13981341b3..f0624b6206 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -16,10 +16,10 @@ struct syscallname {
 int nr;
 const char *name;
 const char *format;
-void (*call)(const struct syscallname *,
+void (*call)(void *, const struct syscallname *,
  abi_long, abi_long, abi_long,
  abi_long, abi_long, abi_long);
-void (*result)(const struct syscallname *, abi_long,
+void (*result)(void *, const struct syscallname *, abi_long,
abi_long, abi_long, abi_long,
abi_long, abi_long, abi_long);
 };
@@ -634,7 +634,7 @@ print_clockid(int clockid, int last)
 /* select */
 #ifdef TARGET_NR__newselect
 static void
-print_newselect(const struct syscallname *name,
+print_newselect(void *cpu_env, const struct syscallname *name,
 abi_long arg1, abi_long arg2, abi_long arg3,
 abi_long arg4, abi_long arg5, abi_long arg6)
 {
@@ -652,7 +652,7 @@ print_newselect(const struct syscallname *name,
 
 #ifdef TARGET_NR_semctl
 static void
-print_semctl(const struct syscallname *name,
+print_semctl(void *cpu_env, const struct syscallname *name,
  abi_long arg1, abi_long arg2, abi_long arg3,
  abi_long arg4, abi_long arg5, abi_long arg6)
 {
@@ -664,7 +664,7 @@ print_semctl(const struct syscallname *name,
 #endif
 
 static void
-print_execve(const struct syscallname *name,
+print_execve(void *cpu_env, const struct syscallname *name,
  abi_long arg1, abi_long arg2, abi_long arg3,
  abi_long arg4, abi_long arg5, abi_long arg6)
 {
@@ -697,7 +697,7 @@ print_execve(const struct syscallname *name,
 
 #ifdef TARGET_NR_ipc
 static void
-print_ipc(const struct syscallname *name,
+print_ipc(void *cpu_env, const struct syscallname *name,
   abi_long arg1, abi_long arg2, abi_long arg3,
   abi_long arg4, abi_long arg5, abi_long arg6)
 {
@@ -741,9 +741,10 @@ print_syscall_err(abi_long ret)
 }
 
 static void
-print_syscall_ret_addr(const struct syscallname *name, abi_long ret,
-   abi_long arg0, abi_long arg1, abi_long arg2,
-   abi_long arg3, abi_long arg4, abi_long arg5)
+print_syscall_ret_addr(void *cpu_env, const struct syscallname *name,
+   abi_long ret, abi_long arg0, abi_long arg1,
+   abi_long arg2, abi_long arg3, abi_long arg4,
+   abi_long arg5)
 {
 if (!print_syscall_err(ret)) {
 qemu_log("0x" TARGET_ABI_FMT_lx, ret);
@@ -761,9 +762,10 @@ print_syscall_ret_raw(struct syscallname *name, abi_long 
ret)
 
 #ifdef TARGET_NR__newselect
 static void

[PATCH v3 4/5] linux-user: Add an api to print enumareted argument values with strace

2020-07-22 Thread Filip Bozuta

This patch introduces a type 'struct enums' and function 'print_enums()'
that can be used to print enumerated argument values of some syscalls
in strace. This can be used in future strace implementations.

Also, macros 'ENUM_GENERIC()', 'ENUM_TARGET()' and 'ENUM_END', are
introduced to enable automatic generation of aproppriate enumarated
values and their repsective string representations (these macros are
exactly the same as 'FLAG_GENERIC()', 'FLAG_TARGET()' and 'FLAG_END').

Future patches are planned to modify all existing print functions in
'strace.c' that print arguments of syscalls with enumerated values to
use this new api.

Signed-off-by: Filip Bozuta 
---
 linux-user/strace.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 40f863c6e2..def92c4d73 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -52,9 +52,23 @@ struct flags {
 /* end of flags array */
 #define FLAG_END   { 0, NULL }
 
+/* Structure used to translate enumerated values into strings */
+struct enums {
+abi_longe_value;   /* enum value */
+const char  *e_string; /* stringified enum */
+};
+
+/* common enums for all architectures */
+#define ENUM_GENERIC(name) { name, #name }
+/* target specific enums */
+#define ENUM_TARGET(name)  { TARGET_ ## name, #name }
+/* end of enums array */
+#define ENUM_END   { 0, NULL }
+
 UNUSED static const char *get_comma(int);
 UNUSED static void print_pointer(abi_long, int);
 UNUSED static void print_flags(const struct flags *, abi_long, int);
+UNUSED static void print_enums(const struct enums *, abi_long, int);
 UNUSED static void print_at_dirfd(abi_long, int);
 UNUSED static void print_file_mode(abi_long, int);
 UNUSED static void print_open_flags(abi_long, int);
@@ -1248,6 +1262,23 @@ print_flags(const struct flags *f, abi_long flags, int 
last)
 }
 }
 
+static void
+print_enums(const struct enums *e, abi_long enum_arg, int last)
+{
+for (; e->e_string != NULL; e++) {
+if (e->e_value == enum_arg) {
+qemu_log("%s", e->e_string);
+break;
+}
+}
+
+if (e->e_string == NULL) {
+qemu_log("%#x", (unsigned int)enum_arg);
+}
+
+qemu_log("%s", get_comma(last));
+}
+
 static void
 print_at_dirfd(abi_long dirfd, int last)
 {
-- 
2.25.1

[PATCH v3 5/5] linux-user: Add strace support for printing arguments of some clock and time functions

2020-07-22 Thread Filip Bozuta

This patch implements strace argument printing functionality for following 
syscalls:

* clock_getres, clock_gettime, clock_settime - clock and time functions

int clock_getres(clockid_t clockid, struct timespec *res)
int clock_gettime(clockid_t clockid, struct timespec *tp)
int clock_settime(clockid_t clockid, const struct timespec *tp)
man page: https://man7.org/linux/man-pages/man2/clock_getres.2.html

* gettimeofday - get time

int gettimeofday(struct timeval *tv, struct timezone *tz)
man page: https://man7.org/linux/man-pages/man2/gettimeofday.2.html

* getitimer, setitimer - get or set value of an interval timer

int getitimer(int which, struct itimerval *curr_value)
int setitimer(int which, const struct itimerval *new_value,
  struct itimerval *old_value)
man page: https://man7.org/linux/man-pages/man2/getitimer.2.html

Implementation notes:

All of the syscalls have some structue types as argument types and thus
a separate printing function was stated in file "strace.list" for each
of them. All of these functions use existing functions for their
appropriate structure types ("print_timeval()" and "print_timezone()").

Functions "print_timespec()" and "print_itimerval()" were added in this
patch so that they can be used to print types "struct timespec" and
"struct itimerval" used by some of the syscalls. Function 
"print_itimerval()"
uses the existing function "print_timeval()" to print fields of the
structure "struct itimerval" that are of type "struct timeval".

Function "print_enums()", which was introduced in the previous patch, is 
used
to print the interval timer type which is the first argument of 
"getitimer()"
and "setitimer()". Also, this function is used to print the clock id which
is the first argument of "clock_getres()" and "clock_gettime()". For that
reason, the existing function "print_clockid()" was removed in this patch.
Existing function "print_clock_adjtime()" was also changed for this reason
to use "print_enums()".

The existing function "print_timeval()" was changed a little so that it
prints the field names beside the values.

Syscalls "clock_getres()" and "clock_gettime()" have the same number
and types of arguments and thus their print functions "print_clock_getres"
and "print_clock_gettime" share a common definition in file "strace.c".

Signed-off-by: Filip Bozuta 
---
 linux-user/strace.c| 285 +++--
 linux-user/strace.list |  17 ++-
 2 files changed, 230 insertions(+), 72 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index def92c4d73..aa5539f468 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -78,7 +78,9 @@ UNUSED static void print_string(abi_long, int);
 UNUSED static void print_buf(abi_long addr, abi_long len, int last);
 UNUSED static void print_raw_param(const char *, abi_long, int);
 UNUSED static void print_timeval(abi_ulong, int);
+UNUSED static void print_timespec(abi_ulong, int);
 UNUSED static void print_timezone(abi_ulong, int);
+UNUSED static void print_itimerval(abi_ulong, int);
 UNUSED static void print_number(abi_long, int);
 UNUSED static void print_signal(abi_ulong, int);
 UNUSED static void print_sockaddr(abi_ulong, abi_long, int);
@@ -578,69 +580,6 @@ print_fdset(int n, abi_ulong target_fds_addr)
 }
 #endif
 
-#ifdef TARGET_NR_clock_adjtime
-/* IDs of the various system clocks */
-#define TARGET_CLOCK_REALTIME  0
-#define TARGET_CLOCK_MONOTONIC 1
-#define TARGET_CLOCK_PROCESS_CPUTIME_ID2
-#define TARGET_CLOCK_THREAD_CPUTIME_ID 3
-#define TARGET_CLOCK_MONOTONIC_RAW 4
-#define TARGET_CLOCK_REALTIME_COARSE   5
-#define TARGET_CLOCK_MONOTONIC_COARSE  6
-#define TARGET_CLOCK_BOOTTIME  7
-#define TARGET_CLOCK_REALTIME_ALARM8
-#define TARGET_CLOCK_BOOTTIME_ALARM9
-#define TARGET_CLOCK_SGI_CYCLE 10
-#define TARGET_CLOCK_TAI   11
-
-static void
-print_clockid(int clockid, int last)
-{
-switch (clockid) {
-case TARGET_CLOCK_REALTIME:
-qemu_log("CLOCK_REALTIME");
-break;
-case TARGET_CLOCK_MONOTONIC:
-qemu_log("CLOCK_MONOTONIC");
-break;
-case TARGET_CLOCK_PROCESS_CPUTIME_ID:
-qemu_log("CLOCK_PROCESS_CPUTIME_ID");
-break;
-case TARGET_CLOCK_THREAD_CPUTIME_ID:
-qemu_log("CLOCK_THREAD_CPUTIME_ID");
-break;
-case TARGET_CLOCK_MONOTONIC_RAW:
-qemu_log("CLOCK_MONOTONIC_RAW");
-break;
-case TARGET_CLOCK_REALTIME_COARSE:
-qemu_log("CLOCK_REALTIME_COARSE");
-break;
-case TARGET_CLOCK_MONOTONIC_COARSE:
-qemu_log("CLOCK_MONOTONIC_COARSE");
-break;
-case TARGET_CLOCK_BOOTTIME:
-qemu_log("CLOCK_BOOTTIME");
-break;
-case TARGET_CLOCK_REALTIME_ALARM:
-

[PATCH v3 3/5] linux-user: Add strace support for printing arguments of syscalls used to lock and unlock memory

2020-07-22 Thread Filip Bozuta

This patch implements strace argument printing functionality for following 
syscalls:

* mlock, munlock, mlockall, munlockall - lock and unlock memory

   int mlock(const void *addr, size_t len)
   int munlock(const void *addr, size_t len)
   int mlockall(int flags)
   int munlockall(void)
   man page: https://man7.org/linux/man-pages/man2/mlock.2.html

Implementation notes:

Syscall mlockall() takes an argument that is composed of predefined values
which represent flags that determine the type of locking operation that is
to be performed. For that reason, a printing function "print_mlockall" was
stated in file "strace.list". This printing function uses an already 
existing
function "print_flags()" to print the "flags" argument.  These flags are 
stated
inside an array "mlockall_flags" that contains values of type "struct 
flags".
These values are instantiated using an existing macro "FLAG_TARGET()" that
crates aproppriate target flag values based on those defined in files
'/target_syscall.h'. These target flag values were changed from
"TARGET_MLOCKALL_MCL*" to "TARGET_MCL_*" so that they can be aproppriately 
set
and recognised in "strace.c" with "FLAG_TARGET()". Value for "MCL_ONFAULT"
was added in this patch. This value was also added in "syscall.c" in 
function
"target_to_host_mlockall_arg()". Because this flag value was added in kernel
version 4.4, it is enwrapped in an #ifdef directive (both in "syscall.c" and
in "strace.c") as to support older kernel versions.
The other syscalls have only primitive argument types, so the
rest of the implementation was handled by stating an appropriate
printing format in file "strace.list". Syscall mlock2() is not implemented 
in
"syscall.c" and thus it's argument printing is not implemented in this 
patch.

Signed-off-by: Filip Bozuta 
Reviewed-by: Laurent Vivier 
---
 linux-user/aarch64/target_syscall.h|  5 +++--
 linux-user/alpha/target_syscall.h  |  5 +++--
 linux-user/arm/target_syscall.h|  6 --
 linux-user/cris/target_syscall.h   |  5 +++--
 linux-user/hppa/target_syscall.h   |  5 +++--
 linux-user/i386/target_syscall.h   |  5 +++--
 linux-user/m68k/target_syscall.h   |  6 +++---
 linux-user/microblaze/target_syscall.h |  5 +++--
 linux-user/mips/target_syscall.h   |  5 +++--
 linux-user/mips64/target_syscall.h |  5 +++--
 linux-user/nios2/target_syscall.h  |  5 +++--
 linux-user/openrisc/target_syscall.h   |  5 +++--
 linux-user/ppc/target_syscall.h|  5 +++--
 linux-user/riscv/target_syscall.h  |  5 +++--
 linux-user/s390x/target_syscall.h  |  5 +++--
 linux-user/sh4/target_syscall.h|  5 +++--
 linux-user/sparc/target_syscall.h  |  5 +++--
 linux-user/sparc64/target_syscall.h|  5 +++--
 linux-user/strace.c| 21 +
 linux-user/strace.list |  8 
 linux-user/syscall.c   | 10 --
 linux-user/tilegx/target_syscall.h |  5 +++--
 linux-user/x86_64/target_syscall.h |  5 +++--
 linux-user/xtensa/target_syscall.h |  5 +++--
 24 files changed, 97 insertions(+), 49 deletions(-)

diff --git a/linux-user/aarch64/target_syscall.h 
b/linux-user/aarch64/target_syscall.h
index 995e475c73..3194e6b009 100644
--- a/linux-user/aarch64/target_syscall.h
+++ b/linux-user/aarch64/target_syscall.h
@@ -16,8 +16,9 @@ struct target_pt_regs {
 #define UNAME_MINIMUM_RELEASE "3.8.0"
 #define TARGET_CLONE_BACKWARDS
 #define TARGET_MINSIGSTKSZ   2048
-#define TARGET_MLOCKALL_MCL_CURRENT 1
-#define TARGET_MLOCKALL_MCL_FUTURE  2
+#define TARGET_MCL_CURRENT 1
+#define TARGET_MCL_FUTURE  2
+#define TARGET_MCL_ONFAULT 4
 
 #define TARGET_PR_SVE_SET_VL  50
 #define TARGET_PR_SVE_GET_VL  51
diff --git a/linux-user/alpha/target_syscall.h 
b/linux-user/alpha/target_syscall.h
index 3426cc5b4e..fd389422e3 100644
--- a/linux-user/alpha/target_syscall.h
+++ b/linux-user/alpha/target_syscall.h
@@ -258,7 +258,8 @@ struct target_pt_regs {
 #define TARGET_UAC_NOFIX   2
 #define TARGET_UAC_SIGBUS  4
 #define TARGET_MINSIGSTKSZ  4096
-#define TARGET_MLOCKALL_MCL_CURRENT 0x2000
-#define TARGET_MLOCKALL_MCL_FUTURE  0x4000
+#define TARGET_MCL_CURRENT 0x2000
+#define TARGET_MCL_FUTURE  0x4000
+#define TARGET_MCL_ONFAULT 0x8000
 
 #endif /* ALPHA_TARGET_SYSCALL_H */
diff --git a/linux-user/arm/target_syscall.h b/linux-user/arm/target_syscall.h
index f85cbdaf56..e870ed7a54 100644
--- a/linux-user/arm/target_syscall.h
+++ b/linux-user/arm/target_syscall.h
@@ -28,8 +28,10 @@ struct target_pt_regs {
 #define TARGET_CLONE_BACKWARDS
 
 #define TARGET_MINSIGSTKSZ 2048
-#define TARGET_MLOCKALL_MCL_CURRENT 1
-#define TARGET_MLOCKALL_MCL_FUTURE  2
+#define TARGET_MCL_CURRENT 1
+#define TARGET_MCL_FUTURE  2
+#define TARGET_MCL_ONFAULT 4
+
 #define TARGET_WANT_OLD_SYS_SELECT
 
 #define

[PATCH v3 0/5] Add strace support for printing arguments for a group of selected syscalls

2020-07-22 Thread Filip Bozuta

This series covers strace support for following syscalls:

   *truncate() *munlock()  *clock_gettimeofday()
   *ftruncate()*munlockall()   *clock_getitimer()
   *getsid()   *clock_getres() *clock_setitimer()
   *mlock()*clock_gettime()
   *mlockall() *clock_settime()

Testing method:

Mini test programs were written that run these syscalls for different 
arguments.
Those programs were compiled (sometimes using cross-compilers) for the 
following
architectures:

* Intel 64-bit (little endian) (gcc)
* Power pc 32-bit (big endian) (powerpc-linux-gnu-gcc)
* Power pc 64-bit (big endian) (powerpc64-linux-gnu-gcc)

The corresponding native programs were executed with strace, without using
QEMU, on intel (x86_64) host.

All applicable compiled programs were in turn executed with "-strace"
through QEMU and the strace printing results obtained were the same
ones gotten for native execution.

v2:
* added patch that enables 'cpu_env' to be accessible from "strace.c"
* cut and pasted "regpairs_aligned" from 'syscall.c' to 'qemu.h' so
  that it can be used for "print_truncate64" and "print_ftruncate64"
* changed flag names from 'TARGET_MLOCKALL_MCL_*' to 'TARGET_MCL_*'
* added target flag value 'TARGET_MCL_ONFAULT' for 'MCL_ONFAULT'
* added 'print_syscall_ret_setitimer' for old value of the interval
  timer
* added a function 'print_itimer_type' that prints the interval timer
  type

v3:

* added patch that introduces an api that prints enumarted values
  with strace
* used this new introduced api to print certain arguments of syscalls
  in patch 4
* rebased the series to use the new 'print_syscall_err()'

Filip Bozuta (5):
  linux-user: Make cpu_env accessible in strace.c
  linux-user: Add strace support for printing arguments of
truncate()/ftruncate() and getsid()
  linux-user: Add strace support for printing arguments of syscalls used
to lock and unlock memory
  linux-user: Add an api to print enumareted argument values with strace
  linux-user: Add strace support for printing arguments of some clock
and time functions

 linux-user/aarch64/target_syscall.h|   5 +-
 linux-user/alpha/target_syscall.h  |   5 +-
 linux-user/arm/target_syscall.h|   6 +-
 linux-user/cris/target_syscall.h   |   5 +-
 linux-user/hppa/target_syscall.h   |   5 +-
 linux-user/i386/target_syscall.h   |   5 +-
 linux-user/m68k/target_syscall.h   |   6 +-
 linux-user/microblaze/target_syscall.h |   5 +-
 linux-user/mips/target_syscall.h   |   5 +-
 linux-user/mips64/target_syscall.h |   5 +-
 linux-user/nios2/target_syscall.h  |   5 +-
 linux-user/openrisc/target_syscall.h   |   5 +-
 linux-user/ppc/target_syscall.h|   5 +-
 linux-user/qemu.h  |  39 +-
 linux-user/riscv/target_syscall.h  |   5 +-
 linux-user/s390x/target_syscall.h  |   5 +-
 linux-user/sh4/target_syscall.h|   5 +-
 linux-user/sparc/target_syscall.h  |   5 +-
 linux-user/sparc64/target_syscall.h|   5 +-
 linux-user/strace.c| 863 -
 linux-user/strace.list |  35 +-
 linux-user/syscall.c   |  47 +-
 linux-user/tilegx/target_syscall.h |   5 +-
 linux-user/x86_64/target_syscall.h |   5 +-
 linux-user/xtensa/target_syscall.h |   5 +-
 25 files changed, 692 insertions(+), 399 deletions(-)

-- 
2.25.1

[PATCH v3 2/5] linux-user: Add strace support for printing arguments of truncate()/ftruncate() and getsid()

2020-07-22 Thread Filip Bozuta

This patch implements strace argument printing functionality for following 
syscalls:

* truncate, ftruncate - truncate a file to a specified length

int truncate/truncate64(const char *path, off_t length)
int ftruncate/ftruncate64(int fd, off_t length)
man page: https://man7.org/linux/man-pages/man2/truncate.2.html

* getsid - get session ID

pid_t getsid(pid_t pid)
man page: https://man7.org/linux/man-pages/man2/getsid.2.html

Implementation notes:

Syscalls truncate/truncate64 take string argument types and thus a
separate print function "print_truncate/print_truncate64" is stated in
file "strace.list". This function is defined and implemented in "strace.c"
by using an existing function used to print string arguments: 
"print_string()".
For syscall ftruncate64, a separate printing function was also stated in
"strace.c" as it requires a special kind of handling.
The other syscalls have only primitive argument types, so the rest of the
implementation was handled by stating an appropriate printing format in file
"strace.list".
Function "regpairs_aligned()" was cut & pasted from "syscall.c" to "qemu.h"
as it is used by functions "print_truncate64()" and "print_ftruncate64()"
to print the offset arguments of "truncate64()" and "ftruncate64()".

Signed-off-by: Filip Bozuta 
Reviewed-by: Laurent Vivier 
---
 linux-user/qemu.h  | 35 +++
 linux-user/strace.c| 47 ++
 linux-user/strace.list | 10 -
 linux-user/syscall.c   | 32 
 4 files changed, 87 insertions(+), 37 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 63ddfe86fd..f431805e57 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -706,6 +706,41 @@ static inline uint64_t target_offset64(uint64_t word0, 
uint64_t word1)
 }
 #endif /* TARGET_ABI_BITS != 32 */
 
+
+/* ARM EABI and MIPS expect 64bit types aligned even on pairs or registers */
+#ifdef TARGET_ARM
+static inline int regpairs_aligned(void *cpu_env, int num)
+{
+return CPUARMState *)cpu_env)->eabi) == 1) ;
+}
+#elif defined(TARGET_MIPS) && (TARGET_ABI_BITS == 32)
+static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
+#elif defined(TARGET_PPC) && !defined(TARGET_PPC64)
+/*
+ * SysV AVI for PPC32 expects 64bit parameters to be passed on odd/even pairs
+ * of registers which translates to the same as ARM/MIPS, because we start with
+ * r3 as arg1
+ */
+static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
+#elif defined(TARGET_SH4)
+/* SH4 doesn't align register pairs, except for p{read,write}64 */
+static inline int regpairs_aligned(void *cpu_env, int num)
+{
+switch (num) {
+case TARGET_NR_pread64:
+case TARGET_NR_pwrite64:
+return 1;
+
+default:
+return 0;
+}
+}
+#elif defined(TARGET_XTENSA)
+static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
+#else
+static inline int regpairs_aligned(void *cpu_env, int num) { return 0; }
+#endif
+
 /**
  * preexit_cleanup: housekeeping before the guest exits
  *
diff --git a/linux-user/strace.c b/linux-user/strace.c
index f0624b6206..7dc239b9f1 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -1958,6 +1958,53 @@ print_lseek(void *cpu_env, const struct syscallname 
*name,
 }
 #endif
 
+#ifdef TARGET_NR_truncate
+static void
+print_truncate(void *cpu_env, const struct syscallname *name,
+   abi_long arg0, abi_long arg1, abi_long arg2,
+   abi_long arg3, abi_long arg4, abi_long arg5)
+{
+print_syscall_prologue(name);
+print_string(arg0, 0);
+print_raw_param(TARGET_ABI_FMT_ld, arg1, 1);
+print_syscall_epilogue(name);
+}
+#endif
+
+#ifdef TARGET_NR_truncate64
+static void
+print_truncate64(void *cpu_env, const struct syscallname *name,
+ abi_long arg0, abi_long arg1, abi_long arg2,
+ abi_long arg3, abi_long arg4, abi_long arg5)
+{
+print_syscall_prologue(name);
+print_string(arg0, 0);
+if (regpairs_aligned(cpu_env, TARGET_NR_truncate64)) {
+arg1 = arg2;
+arg2 = arg3;
+}
+print_raw_param("%" PRIu64, target_offset64(arg1, arg2), 1);
+print_syscall_epilogue(name);
+}
+#endif
+
+#ifdef TARGET_NR_ftruncate64
+static void
+print_ftruncate64(void *cpu_env, const struct syscallname *name,
+  abi_long arg0, abi_long arg1, abi_long arg2,
+  abi_long arg3, abi_long arg4, abi_long arg5)
+{
+print_syscall_prologue(name);
+print_raw_param("%d", arg0, 0);
+if (regpairs_aligned(cpu_env, TARGET_NR_ftruncate64)) {
+arg1 = arg2;
+arg2 = arg3;
+}
+print_raw_param("%" PRIu64, target_offset64(arg1, arg2), 1);
+print_syscall_epilogue(name);
+}
+#endif
+
 #if defined(TARGET_NR_socket)
 static void
 print_socket(void *cpu_env, const struct syscallname *name,
diff

Re: [RFC v2 21/76] target/riscv: rvv-0.9: configure instructions

2020-07-22 Thread Richard Henderson

On 7/22/20 2:15 AM, frank.ch...@sifive.com wrote:
> +float vflmul = flmul_table[lmul];
> +
> +if ((sew > cpu->cfg.elen)
> +|| vill
> +|| vflmul < ((float)sew / cpu->cfg.elen)

Hmm.  I suppose this is fairly compact.

Expanding this to integer code would take something like

if (vflmul & 4) {
/* Fractional LMUL. */
if (vflmul == 4 ||
cpu->cfg.elen >> (8 - vflmul) < sew) {
vill = 1;
}
}


r~

Re: [RFC v2 19/76] target/riscv: rvv-0.9: add narrower_nanbox_fpr helper

2020-07-22 Thread Richard Henderson

On 7/22/20 2:15 AM, frank.ch...@sifive.com wrote:
> From: Frank Chang 
> 
> For floating-point operations, the scalar can be taken from a scalar
> f register. If FLEN > SEW, the value in the f registers is checked for
> a valid NaN-boxed value, in which case the least-significant SEW bits
> of the f register are used, else the canonical NaN value is used.
> 
> Add helper to generate the correspond NaN-boxed value or the SEW-bit
> canonical NaN for floating-point operations.
> 
> Signed-off-by: Frank Chang 
> ---
>  target/riscv/helper.h|  2 ++
>  target/riscv/vector_helper.c | 32 
>  2 files changed, 34 insertions(+)

The helper can be done inline in two tcg ops.

Though, really, we need to coordinate with Liu Zhiwei's other patch set that
also deals with nan-boxing.


r~

Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option

2020-07-22 Thread Dr. David Alan Gilbert

* Stefan Hajnoczi (stefa...@redhat.com) wrote:
> virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> is required to create namespaces.
> 
> Introduce a weaker sandbox that is sufficient in container environments
> because the container runtime already sets up namespaces. Use chroot to
> restrict path traversal to the shared directory.
> 
> virtiofsd loses the following:
> 
> 1. Mount namespace. The process chroots to the shared directory but
>leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
>syscalls.
> 
> 2. Pid namespace. This should be fine because virtiofsd is the only
>process running in the container.
> 
> 3. Network namespace. This should be fine because seccomp already
>rejects the connect(2) syscall, but an additional layer of security
>is lost. Container runtime-specific network security policies can be
>used drop network traffic (except for the vhost-user UNIX domain
>socket).
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  tools/virtiofsd/helper.c |  3 +++
>  tools/virtiofsd/passthrough_ll.c | 44 ++--
>  2 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 3105b6c23a..7421c9ca1a 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
> "-o cache=cache mode. could be one of 
> \"auto, "
> "always, none\"\n"
> "   default: auto\n"
> +   "-o chroot|no_chrootuse container-friendly chroot 
> instead\n"
> +   "   of stronger mount namespace 
> sandbox\n"
> +   "   default: false\n"
> "-o flock|no_flock  enable/disable flock\n"
> "   default: no_flock\n"
> "-o log_level=   log level, default to \"info\"\n"
> diff --git a/tools/virtiofsd/passthrough_ll.c 
> b/tools/virtiofsd/passthrough_ll.c
> index 50a164a599..990c0a8a70 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -139,6 +139,7 @@ enum {
>  
>  struct lo_data {
>  pthread_mutex_t mutex;
> +int chroot; /* 1 - use chroot, 0 - use mount namespace */
>  int debug;
>  int writeback;
>  int flock;
> @@ -162,6 +163,8 @@ struct lo_data {
>  };
>  
>  static const struct fuse_opt lo_opts[] = {
> +{ "chroot", offsetof(struct lo_data, chroot), 1 },
> +{ "no_chroot", offsetof(struct lo_data, chroot), 0 },
>  { "writeback", offsetof(struct lo_data, writeback), 1 },
>  { "no_writeback", offsetof(struct lo_data, writeback), 0 },
>  { "source=%s", offsetof(struct lo_data, source), 0 },
> @@ -2665,6 +2668,37 @@ static void setup_capabilities(char *modcaps_in)
>  pthread_mutex_unlock();
>  }
>  
> +/*
> + * Use chroot as a weaker sandbox for environment where the process is 
> launched
> + * without CAP_SYS_ADMIN.
> + */
> +static void setup_chroot(struct lo_data *lo)
> +{
> +lo->proc_self_fd = open("/proc/self/fd", O_PATH);
> +if (lo->proc_self_fd == -1) {
> +fuse_log(FUSE_LOG_ERR, "open(\"/proc/self/fd\", O_PATH): %m\n");
> +exit(1);
> +}
> +
> +/*
> + * Make the shared directory the file system root so that FUSE_OPEN
> + * (lo_open()) cannot escape the shared directory by opening a symlink.
> + *
> + * It's still possible to escape the chroot via lo->proc_self_fd but that
> + * requires gaining control of the process first.
> + */
> +if (chroot(lo->source) != 0) {
> +fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> +exit(1);
> +}

I'm seeing suggestions that you should drop CAP_SYS_CHROOT after
chroot'ing to stop an old escape (where you create another jail inside
the jail and the kernel then lets you walk outside of the old one).

Dave

> +/* Move into the chroot */
> +if (chdir("/") != 0) {
> +fuse_log(FUSE_LOG_ERR, "chdir(\"/\"): %m\n");
> +exit(1);
> +}
> +}
> +
>  /*
>   * Lock down this process to prevent access to other processes or files 
> outside
>   * source directory.  This reduces the impact of arbitrary code execution 
> bugs.
> @@ -2672,8 +2706,13 @@ static void setup_capabilities(char *modcaps_in)
>  static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
>bool enable_syslog)
>  {
> -setup_namespaces(lo, se);
> -setup_mounts(lo->source);
> +if (lo->chroot) {
> +setup_chroot(lo);
> +} else {
> +setup_namespaces(lo, se);
> +setup_mounts(lo->source);
> +}
> +
>  setup_seccomp(enable_syslog);
>  setup_capabilities(g_strdup(lo->modcaps));
>  }
> @@ -2820,6 +2859,7 @@ int main(int argc, char *argv[])
>  struct fuse_session *se;
>  struct fuse_cmdline_opts

Re: [PATCH v2 06/12] accel/tcg: better handle memory constrained systems

2020-07-22 Thread Richard Henderson

On 7/22/20 9:44 AM, Daniel P. Berrangé wrote:
> OpenStack uses TCG in alot of their CI infrastructure for example
> and runs multiple VMs. If there's 4 VMs, that's another 4 GB of
> RAM usage just silently added on top of the explicit -m value.
> 
> I wouldn't be surprised if this pushes CI into OOM, even without
> containers or cgroups being involved, as they have plenty of other
> services consuming RAM in the CI VMs.

I would hope that CI would also supply a -tb_size to go along with that -m
value.  Because we really can't guess on their behalf.

> The commit 600e17b261555c56a048781b8dd5ba3985650013 talks about this
> minimizing codegen cache flushes, but doesn't mention the real world
> performance impact of eliminating those flushes ?

Somewhere on the mailing list was this info.  It was so dreadfully slow it was
*really* noticable.  Timeouts everywhere.

> 
> Presumably this makes the guest OS boot faster, but what's the before
> and after time ?  And what's the time like for values in between the
> original 32mb and the new 1 GB ?

But it wasn't "the original 32MB".
It was the original "ram_size / 4", until that broke due to argument parsing
ordering.

I don't know what CI usually uses, but I usually use at least -m 4G, sometimes
more.  What's the libvirt default?

r~

Re: https booting

2020-07-22 Thread Laszlo Ersek

On 07/22/20 16:13, Daniel P. Berrangé wrote:
> On Wed, Jul 22, 2020 at 03:55:38PM +0200, Gerd Hoffmann wrote:
 How does edk2 handle the root ca problem?
>>>
>>> There are two fw_cfg paths
>>>
>>>   - etc/edk2/https/ciphers
>>>   - etc/edk2/https/cacerts
>>>
>>> The first sets the cipher algorithms that are permitted and their
>>> priority, the second sets the CA certificate bundle.
>>
>> Ok, ipxe should be able to fetch them.  Would be roughly the same as
>> compiling in the certificates, except that they don't take up space in
>> the rom and are much easier to update.
> 
> 
> 
>>
>> What is in cacerts?
>> Basically /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem of the host
>> machine?
> 
> Not that file exactly. Instead
> 
>/etc/pki/ca-trust/extracted/edk2/cacerts.bin
> 
> which is the same certs, but in a different format:
> 
> [quote man update-ca-trust]
>The directory /etc/pki/ca-trust/extracted/edk2/ contains a
>CA certificate bundle ("cacerts.bin") in the "sequence of
>EFI_SIGNATURE_LISTs" format, defined in the UEFI-2.7
>specification, sections "31.4.1 Signature Database" and
>"EFI_CERT_X509_GUID". Distrust information cannot be
>represented in this file format, and distrusted certificates
>are missing from these files. File "cacerts.bin" contains CA
>certificates trusted for TLS server authentication.
> [/quote]
> 
> On Fedora/RHEL  the "update-ca-trust" tool creates the file in this
> format automatically now.
> 
> I don't know if that's a useful format for iPXE or not.
> 
> We could easily define etc/ipxe/https/{ciphers,cacerts} paths in a
> different format if better suited for iPXE.

I agree.

The p11-kit extractor for edk2 was implemented in p11-kit commit range 
ba6ebb05fc0c..de963b96929b:

  https://github.com/p11-glue/p11-kit/commit/59054e4f9fe3
  https://github.com/p11-glue/p11-kit/commit/ee27f9153a14
  https://github.com/p11-glue/p11-kit/commit/de963b96929b

  https://github.com/p11-glue/p11-kit/pull/137
  https://github.com/p11-glue/p11-kit/pull/139

The dependent "update-ca-trust" changes are here:

  
https://src.fedoraproject.org/rpms/ca-certificates/c/6220683f7640?branch=master
  
https://src.fedoraproject.org/rpms/ca-certificates/c/34c0da9058d6?branch=master

I think these commits could be used as model for an "iPXE extractor" if 
necessary.

Thanks,
Laszlo

> Libvirt can set the right
> path depending on whether its booting a VM with EDK2 vs legacy BIOS
> 
> Regards,
> Daniel
>

Re: https booting

2020-07-22 Thread Laszlo Ersek

On 07/22/20 14:08, Gerd Hoffmann wrote:

> How does edk2 handle the root ca problem?

It has no builtin CA certificate. HTTPS boot will not work until at
least one trusted CA cert is imported.

The setup TUI offers an option to import CA cert(s) from local files
(which must be on such filesystems that edk2 can read).

The platform may set up CA certs without (guest-)user interaction, too.
That's what OVMF and ArmVirtQemu do. On the host side, the command

  p11-kit extract --format=edk2-cacerts --filter=ca-anchors \
--overwrite --purpose=server-auth 

translates the host-side trusted CA cert list into a format that edk2
can consume.

This p11-kit command is usually invoked as part of the higher-level command

  update-ca-trust extract

When "p11-kit extract" is invoked like that, then the  pathname
is (for example)

  /etc/pki/ca-trust/extracted/edk2/cacerts.bin

Then QEMU is launched with the following option:

  -fw_cfg name=etc/edk2/https/cacerts,file=

OVMF and ArmVirtQemu then fetch the CA cert list from fw_cfg, and make
the generic TLS code use it:

- 9c7d0d499296 ("OvmfPkg/TlsAuthConfigLib: configure trusted CA certs
for HTTPS boot", 2018-03-30)

- ffe048a0807b ("ArmVirtPkg: handle NETWORK_TLS_ENABLE in ArmVirtQemu*",
2019-06-28)

Thanks
Laszlo

Re: [PATCH v7 33/47] mirror: Deal with filters

2020-07-22 Thread Andrey Shinkevich


On 25.06.2020 18:22, Max Reitz wrote:

This includes some permission limiting (for example, we only need to
take the RESIZE permission for active commits where the base is smaller
than the top).

Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
"target_backing_bs", because that is what it really refers to.

Signed-off-by: Max Reitz 
---
  qapi/block-core.json |   6 ++-
  block/mirror.c   | 118 +--
  blockdev.c   |  36 +
  3 files changed, 121 insertions(+), 39 deletions(-)


...

diff --git a/block/mirror.c b/block/mirror.c
index 469acf4600..770de3b34e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -42,6 +42,7 @@ typedef struct MirrorBlockJob {
  BlockBackend *target;
  BlockDriverState *mirror_top_bs;
  BlockDriverState *base;
+BlockDriverState *base_overlay;
  
  /* The name of the graph node to replace */

  char *replaces;
@@ -677,8 +678,10 @@ static int mirror_exit_common(Job *job)
   _abort);
  if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
  BlockDriverState *backing = s->is_none_mode ? src : s->base;
-if (backing_bs(target_bs) != backing) {
-bdrv_set_backing_hd(target_bs, backing, _err);
+BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs);
+
+if (bdrv_cow_bs(unfiltered_target) != backing) {



I just worry about a filter node of the concurrent job right below the 
unfiltered_target. The filter has unfiltered_target in its parent list. 
Will that filter node be replaced correctly then?



Andrey

...


+/*
+ * The topmost node with
+ * bdrv_skip_filters(filtered_target) == bdrv_skip_filters(target)
+ */
+filtered_target = bdrv_cow_bs(bdrv_find_overlay(bs, target));
+
+assert(bdrv_skip_filters(filtered_target) ==
+   bdrv_skip_filters(target));
+
+/*
+ * XXX BLK_PERM_WRITE needs to be allowed so we don't block
+ * ourselves at s->base (if writes are blocked for a node, they are
+ * also blocked for its backing file). The other options would be a
+ * second filter driver above s->base (== target).
+ */
+iter_shared_perms = BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE;
+
+for (iter = bdrv_filter_or_cow_bs(bs); iter != target;
+ iter = bdrv_filter_or_cow_bs(iter))
+{
+if (iter == filtered_target) {



For one filter node only?



+/*
+ * From here on, all nodes are filters on the base.
+ * This allows us to share BLK_PERM_CONSISTENT_READ.
+ */
+iter_shared_perms |= BLK_PERM_CONSISTENT_READ;
+}
+
  ret = block_job_add_bdrv(>common, "intermediate node", iter, 0,
- BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
- errp);
+ iter_shared_perms, errp);
  if (ret < 0) {
  goto fail;
  }

...

@@ -3042,6 +3053,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
   " named node of the graph");
  goto out;
  }
+replaces_node_name = arg->replaces;



What is the idea behind the variables substitution?

Probably, the patch might be split out.

Andrey

Re: [Virtio-fs] [PATCH for-5.1 0/3] virtiofsd: allow virtiofsd to run in a container

2020-07-22 Thread Vivek Goyal

On Wed, Jul 22, 2020 at 02:02:03PM +0100, Stefan Hajnoczi wrote:
> Container runtimes handle namespace setup and remove privileges needed by
> virtiofsd to perform sandboxing. Luckily the container environment already
> provides most of the sandbox that virtiofsd needs for security.
> 
> Introduce a new "virtiofsd -o chroot" option that uses chroot(2) instead of
> namespaces. This option allows virtiofsd to work inside a container.
> 
> Please see the individual patches for details on the changes and security
> implications.
> 
> Given that people are starting to attempt running virtiofsd in containers I
> think this should go into QEMU 5.1.

Hi Stefan,

I have written a document to help with testing virtiofs with any changes.

https://github.com/rhvgoyal/misc/blob/master/virtiofs-tests/virtio-fs-testing-requirement.txt

Will be good to run some of these tests to make sure there are no
regressions due to these changes.

Thanks
Vivek

> 
> Stefan Hajnoczi (3):
>   virtiofsd: drop CAP_DAC_READ_SEARCH
>   virtiofsd: add container-friendly -o chroot sandboxing option
>   virtiofsd: probe unshare(CLONE_FS) and print an error
> 
>  tools/virtiofsd/fuse_virtio.c| 13 +
>  tools/virtiofsd/helper.c |  3 +++
>  tools/virtiofsd/passthrough_ll.c | 45 +---
>  3 files changed, 58 insertions(+), 3 deletions(-)
> 
> -- 
> 2.26.2
> 
> 
> ___
> Virtio-fs mailing list
> virtio...@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs

Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option

2020-07-22 Thread Vivek Goyal

On Wed, Jul 22, 2020 at 02:02:05PM +0100, Stefan Hajnoczi wrote:
> virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> is required to create namespaces.
> 
> Introduce a weaker sandbox that is sufficient in container environments
> because the container runtime already sets up namespaces. Use chroot to
> restrict path traversal to the shared directory.
> 
> virtiofsd loses the following:
> 
> 1. Mount namespace. The process chroots to the shared directory but
>leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
>syscalls.
> 
> 2. Pid namespace. This should be fine because virtiofsd is the only
>process running in the container.
> 
> 3. Network namespace. This should be fine because seccomp already
>rejects the connect(2) syscall, but an additional layer of security
>is lost. Container runtime-specific network security policies can be
>used drop network traffic (except for the vhost-user UNIX domain
>socket).
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  tools/virtiofsd/helper.c |  3 +++
>  tools/virtiofsd/passthrough_ll.c | 44 ++--
>  2 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 3105b6c23a..7421c9ca1a 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
> "-o cache=cache mode. could be one of 
> \"auto, "
> "always, none\"\n"
> "   default: auto\n"
> +   "-o chroot|no_chrootuse container-friendly chroot 
> instead\n"
> +   "   of stronger mount namespace 
> sandbox\n"
> +   "   default: false\n"

This option name disabling namespace setup is little confusing to me.

Will it make sense to provide another option to disable/enable
namespaces. "-o no-namespaces" and that disables setting up
namespaces.

Thanks
Vivek

> "-o flock|no_flock  enable/disable flock\n"
> "   default: no_flock\n"
> "-o log_level=   log level, default to \"info\"\n"
> diff --git a/tools/virtiofsd/passthrough_ll.c 
> b/tools/virtiofsd/passthrough_ll.c
> index 50a164a599..990c0a8a70 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -139,6 +139,7 @@ enum {
>  
>  struct lo_data {
>  pthread_mutex_t mutex;
> +int chroot; /* 1 - use chroot, 0 - use mount namespace */
>  int debug;
>  int writeback;
>  int flock;
> @@ -162,6 +163,8 @@ struct lo_data {
>  };
>  
>  static const struct fuse_opt lo_opts[] = {
> +{ "chroot", offsetof(struct lo_data, chroot), 1 },
> +{ "no_chroot", offsetof(struct lo_data, chroot), 0 },
>  { "writeback", offsetof(struct lo_data, writeback), 1 },
>  { "no_writeback", offsetof(struct lo_data, writeback), 0 },
>  { "source=%s", offsetof(struct lo_data, source), 0 },
> @@ -2665,6 +2668,37 @@ static void setup_capabilities(char *modcaps_in)
>  pthread_mutex_unlock();
>  }
>  
> +/*
> + * Use chroot as a weaker sandbox for environment where the process is 
> launched
> + * without CAP_SYS_ADMIN.
> + */
> +static void setup_chroot(struct lo_data *lo)
> +{
> +lo->proc_self_fd = open("/proc/self/fd", O_PATH);
> +if (lo->proc_self_fd == -1) {
> +fuse_log(FUSE_LOG_ERR, "open(\"/proc/self/fd\", O_PATH): %m\n");
> +exit(1);
> +}
> +
> +/*
> + * Make the shared directory the file system root so that FUSE_OPEN
> + * (lo_open()) cannot escape the shared directory by opening a symlink.
> + *
> + * It's still possible to escape the chroot via lo->proc_self_fd but that
> + * requires gaining control of the process first.
> + */
> +if (chroot(lo->source) != 0) {
> +fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> +exit(1);
> +}
> +
> +/* Move into the chroot */
> +if (chdir("/") != 0) {
> +fuse_log(FUSE_LOG_ERR, "chdir(\"/\"): %m\n");
> +exit(1);
> +}
> +}
> +
>  /*
>   * Lock down this process to prevent access to other processes or files 
> outside
>   * source directory.  This reduces the impact of arbitrary code execution 
> bugs.
> @@ -2672,8 +2706,13 @@ static void setup_capabilities(char *modcaps_in)
>  static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
>bool enable_syslog)
>  {
> -setup_namespaces(lo, se);
> -setup_mounts(lo->source);
> +if (lo->chroot) {
> +setup_chroot(lo);
> +} else {
> +setup_namespaces(lo, se);
> +setup_mounts(lo->source);
> +}
> +
>  setup_seccomp(enable_syslog);
>  setup_capabilities(g_strdup(lo->modcaps));
>  }
> @@ -2820,6 +2859,7 @@ int main(int argc, char *argv[])
>  struct fuse_session *se;

Re: [PATCH v4 2/2] nvme: allow cmb and pmr to be enabled on same device

2020-07-22 Thread Andrzej Jakowski

On 7/22/20 10:21 AM, Klaus Jensen wrote:
> On Jul 22 10:00, Andrzej Jakowski wrote:
>> On 7/22/20 12:43 AM, Klaus Jensen wrote:
>>> @keith, please see below - can you comment on the Linux kernel 2 MB
>>> boundary requirement for the CMB? Or should we hail Stephen (or Logan
>>> maybe) since this seems to be related to p2pdma?
>>>
>>> On Jul 21 14:54, Andrzej Jakowski wrote:
 On 7/15/20 1:06 AM, Klaus Jensen wrote:
> Hi Andrzej,
>
> I've not been ignoring this, but sorry for not following up earlier.
>
> I'm hesitent to merge anything that very obviously breaks an OS that we
> know is used a lot to this using this device. Also because the issue has
> not been analyzed well enough to actually know if this is a QEMU or
> kernel issue.

 Hi Klaus,

 Thx for your response! I understand your hesitance on merging stuff that
 obviously breaks guest OS. 

>
> Now, as far as I can test, having the MSI-X vector table and PBA in BAR
> 0, PMR in BAR 2 and CMB in BAR 4 seems to make everyone happy
> (irregardless of IOMMU on/off).
>
> Later, when the issue is better understood, we can add options to set
> offsets, BIRs etc.
>
> The patch below replaces your "[PATCH v4 2/2] nvme: allow cmb and pmr to
> be enabled" (but still requires "[PATCH v4 1/2] ...") and applies to
> git://git.infradead.org/qemu-nvme.git nvme-next branch.
>
> Can you reproduce the issues with that patch? I can't on a stock Arch
> Linux 5.7.5-arch1-1 kernel.

 While I'm happy that approach with MSIX and PBA in BAR0 works fine, I
 feel that investigation part why it works while mine doesn't is
 missing. It looks to me that both patches are basically following same 
 approach: create memory subregion and overlay on top of other memory
 region. Why one works and the other doesn't then?

 Having in mind that, I have recently focused on understanding problem.
 I observed that when guest assigns address to BAR4, addr field in
 nvme-bar4 memory region gets populated, but it doesn't get automatically
 populated in ctrl_mem (cmb) memory subregion, so later when 
 nvme_addr_is_cmb() 
 is called address check works incorrectly and as a consequence vmm does 
 dma 
 read instead of memcpy.
 I created a patch that sets correct address on ctrl_mem subregion and 
 guest 
 OS boots up correctly.

 When I looked into pci and memory region code I noticed that indeed address
 is only assigned to top level memory region but not to contained 
 subregions.
 I think that because in your approach cmb grabs whole bar exclusively it 
 works
 fine.

 Here is my question (perhaps pci folks can help answer :)): if we consider 
 memory region overlapping for pci devices as valid use case should pci 
 code on configuration cycles walk through all contained subregion and
 update addr field accordingly?

 Thx!

>>>
>>> Hi Andrzej,
>>>
>>> Thanks for looking into this. I think your analysis helped me nail this.
>>> The problem is that we added the use of a subregion and have some
>>> assumptions that no longer hold.
>>>
>>> nvme_addr_is_cmb() assumes that n->ctrl_mem.addr is an absolute address.
>>> But when the memory region is a subregion, addr holds an offset into the
>>> parent container instead. Thus, changing all occurances of
>>> n->ctrl_mem.addr to (n->bar0.addr + n->ctrl_mem.addr) fixes the issue
>>> (this is required in nvme_addr_is_cmb and nvme_map_prp). I patched that
>>> in your original patch[1]. The reason my version worked is because there
>>> was no subregion involved for the CMB, so the existing address
>>> validation calculations were still correct.
>>
>> I'm a little bit concerned with this approach:
>> (n->bar0.addr + n->ctrl_mem.addr) and hoping to have some debate. Let me 
>> describe my understanding of the problem.
> 
> Oh. In the context of your patch I meant bar4 of course, but anyway.
> 
>> It looks to me that addr field sometimes contains *absolute* address (when 
>> no 
>> hierarchy is used) and other times it contains *relative* address (when
>> hierarchy is created). From my perspective use of this field is inconsistent
>> and thus error-prone.  
>> Because of that I think that doing n->bar0.addr + n->ctrl_mem.addr doesn't
>> solve root problem and is still prone to the same problem if in the future
>> we potentially build even more complicated hierarchy.
>> I think that we could solve it by introducing helper function like
>>
>> hwaddr memory_region_get_abs_addr(MemoryRegion *mr) 
>>
>> to retrieve absolute address and in the documentation indicate that addr 
>> field
>> can be relative or absolute and it is recommended to use above function to 
>> retrieve absolute address.
>> What do you think?
>>
> 
> I'm all for a helper - I was not gonna cheer for the quick'n'dirty fix I
> did just to convince myself

Re: [RFC v2 18/76] target/riscv: introduce more imm value modes in translator functions

2020-07-22 Thread Richard Henderson

On 7/22/20 2:15 AM, frank.ch...@sifive.com wrote:
> +#define IMM_ZX  0   /* Zero-extended */
> +#define IMM_SX  1   /* Sign-extended */
> +#define IMM_TRUNC_SEW   2   /* Truncate to log(SEW) bits */
> +#define IMM_TRUNC_2SEW  3   /* Truncate to log(2*SEW) bits */

Please use an enum.

> +case IMM_TRUNC_SEW:
> +src1 = tcg_const_tl(
> +extract64(imm, 0, 5) & ((1 << (s->sew + 3)) - 1) & 0x1f);
> +break;
> +case IMM_TRUNC_2SEW:
> +src1 = tcg_const_tl(
> +extract64(imm, 0, 5) & ((2 << (s->sew + 3)) - 1) & 0x1f);
> +break;

Either the extract or the "& 0x1f" is redundant.  Remove one.

It would be worth splitting the integer arithmetic out to a helper function so
that you don't have to replicate it again in do_opivi_gvec.


r~

Re: 5.1.0-rc1 regression: reset fails with kvm and -cpu host

2020-07-22 Thread Jan Kiszka


On 22.07.20 19:35, Eduardo Habkost wrote:

Hi Jan,

What was the last version where it worked for you?  Does using
"-cpu host,-vmx" help?


Yeah, -vmx does indeed help.

I didn't have the time to bisect yet. Just check my reflog, picked 
eb6490f544, and that works.


HTH,
Jan




On Wed, Jul 22, 2020 at 11:15:43AM +0200, Jan Kiszka wrote:

Hi all,

this locks up the guest:

- qemu-system-x86_64 -enable-kvm -cpu host
- trigger hard reset

Host kernel: 5.7.7.
Host CPU: i7-8850H

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux





--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

Re: [RFC v2 16/76] target/riscv: rvv-0.9: add VMA and VTA

2020-07-22 Thread Richard Henderson

On 7/22/20 2:15 AM, frank.ch...@sifive.com wrote:
> -static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
> +static void vext_clear(void *tail, uint32_t vta, uint32_t cnt, uint32_t tot)
>  {
> +if (vta == 0) {
> +/* tail element undisturbed */
> +return;
> +}
> +
>  /*
> + * Tail element agnostic.
>   * Split the remaining range to two parts.
>   * The first part is in the last uint64_t unit.
>   * The second part start from the next uint64_t unit.
> @@ -152,41 +168,50 @@ static void vext_clear(void *tail, uint32_t cnt, 
> uint32_t tot)
>  if (cnt % 8) {
>  part1 = 8 - (cnt % 8);
>  part2 = tot - cnt - part1;
> -memset((void *)((uintptr_t)tail & ~(7ULL)), 0, part1);
> -memset((void *)(((uintptr_t)tail + 8) & ~(7ULL)), 0, part2);
> +memset((void *)((uintptr_t)tail & ~(7ULL)), 1, part1);
> +memset((void *)(((uintptr_t)tail + 8) & ~(7ULL)), 1, part2);
>  } else {
> -memset(tail, 0, part2);
> +memset(tail, 1, part2);
>  }
>  }

"1s" surely means all bits set to 1, not each byte to 1.

Is there any reason to do anything with VTA/VMA at all?  One alternative for
"agnostic" is to leave the values undisturbed.  So the quickest thing for qemu
to do is remove all of this code.  Then we don't have to pass the values in
translate either.

Which is exactly what is recommended in the 4th paragraph of the notes
following the VTA/VMA description.


r~

1 2 3 4 5 >

1 - 100 of 416 matches

Mail list logo