date:20191217

Re: [PATCH 2/6] hw/display/tcx: Add missing fall through comments

2019-12-17 Thread Aleksandar Markovic

On Tuesday, December 17, 2019, Philippe Mathieu-Daudé 
wrote:

> GCC9 is confused by this comment when building with
> CFLAG -Wimplicit-fallthrough=2:
>
>   hw/display/tcx.c: In function ‘tcx_dac_writel’:
>   hw/display/tcx.c:453:26: error: this statement may fall through
> [-Werror=implicit-fallthrough=]
> 453 | s->dac_index = (s->dac_index + 1) & 0xff; /* Index
> autoincrement */
> | ~^~~
>   hw/display/tcx.c:454:9: note: here
> 454 | default:
> | ^~~
>   hw/display/tcx.c: In function ‘tcx_dac_readl’:
>   hw/display/tcx.c:412:22: error: this statement may fall through
> [-Werror=implicit-fallthrough=]
> 412 | s->dac_index = (s->dac_index + 1) & 0xff; /* Index
> autoincrement */
> | ~^~~
>   hw/display/tcx.c:413:5: note: here
> 413 | default:
> | ^~~
>   cc1: all warnings being treated as errors
>
> Add the missing fall through comments.
>
> Fixes: 55d7bfe22
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Olivier Danet 
> Cc: Mark Cave-Ayland 
> ---
>  hw/display/tcx.c | 2 ++
>  1 file changed, 2 insertions(+)
>
>
The content of the patch is fine, but the commit message is, I think,
inacurate: gcc is not confused at all, it does what it was told to.

The title is fine.


> diff --git a/hw/display/tcx.c b/hw/display/tcx.c
> index 14e829d3fa..abbeb30284 100644
> --- a/hw/display/tcx.c
> +++ b/hw/display/tcx.c
> @@ -410,6 +410,7 @@ static uint64_t tcx_dac_readl(void *opaque, hwaddr
> addr,
>  case 2:
>  val = s->b[s->dac_index] << 24;
>  s->dac_index = (s->dac_index + 1) & 0xff; /* Index autoincrement
> */
> +/* fall through */
>  default:
>  s->dac_state = 0;
>  break;
> @@ -451,6 +452,7 @@ static void tcx_dac_writel(void *opaque, hwaddr addr,
> uint64_t val,
>  s->b[index] = val >> 24;
>  update_palette_entries(s, index, index + 1);
>  s->dac_index = (s->dac_index + 1) & 0xff; /* Index
> autoincrement */
> +/* fall through */
>  default:
>  s->dac_state = 0;
>  break;
> --
> 2.21.0
>
>
>

Re: [PULL 00/34] Error reporting patches for 2019-12-16

2019-12-17 Thread Markus Armbruster

Peter Maydell  writes:

> On Tue, 17 Dec 2019 at 06:33, Markus Armbruster  wrote:
>>
>> The following changes since commit cb88904a54903ef6ba21a68a61d9cd51e2166304:
>>
>>   Merge remote-tracking branch 
>> 'remotes/amarkovic/tags/mips-queue-dec-16-2019' into staging (2019-12-16 
>> 14:07:56 +)
>>
>> are available in the Git repository at:
>>
>>   git://repo.or.cz/qemu/armbru.git tags/pull-error-2019-12-16
>>
>> for you to fetch changes up to 0e7f83bab6559775cd71e418b12a49145e59faa7:
>>
>>   nbd: assert that Error** is not NULL in nbd_iter_channel_error (2019-12-16 
>> 20:50:16 +0100)
>>
>> 
>> Error reporting patches for 2019-12-16
>>
>> 
>
> This gets conflicts:
> diff --cc target/ppc/kvm.c
> index 7406d18945,27ea3ce535..00
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@@ -2076,7 -2076,7 +2076,11 @@@ int kvmppc_set_smt_threads(int smt
>   return ret;
>   }
>
> ++<<< HEAD
>  +void kvmppc_error_append_smt_possible_hint(Error **errp_in)
> ++===
> + void kvmppc_error_append_smt_possible_hint(Error *const *errp)
> ++>>> remotes/armbru/tags/pull-error-2019-12-16
>   {
>   int i;
>   GString *g;
> diff --cc target/ppc/kvm_ppc.h
> index 47b08a4030,f22daabf51..00
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@@ -28,7 -28,7 +28,11 @@@ void kvmppc_set_papr(PowerPCCPU *cpu)
>   int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr);
>   void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);
>   int kvmppc_smt_threads(void);
> ++<<< HEAD
>  +void kvmppc_error_append_smt_possible_hint(Error **errp_in);
> ++===
> + void kvmppc_error_append_smt_possible_hint(Error *const *errp);
> ++>>> remotes/armbru/tags/pull-error-2019-12-16
>   int kvmppc_set_smt_threads(int smt);
>   int kvmppc_clear_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits);
>   int kvmppc_or_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits);
> @@@ -164,7 -164,7 +168,11 @@@ static inline int kvmppc_smt_threads(vo
>   return 1;
>   }
>
> ++<<< HEAD
>  +static inline void kvmppc_error_append_smt_possible_hint(Error **errp_in)
> ++===
> + static inline void kvmppc_error_append_smt_possible_hint(Error *const *errp)
> ++>>> remotes/armbru/tags/pull-error-2019-12-16
>   {
>   return;
>   }
>
> Furthermore, it turns out that the conflicts are due to
> different patches from the same author to the same function
> ("ppc: well form kvmppc_hint_smt_possible error hint helper"
> and "ppc: make Error **errp const where it is appropriate")
> which both seem to be addressing broadly the same thing
> but conflict with each other and arrived via different
> pull requests.
>
> So I'm just bouncing this one back for you to fix and
> figure out which version you want...

You got an outdated version of the patch via David's pull request.
Happens.  I'll fix things up and repost.

Re: [PULL 34/34] nbd: assert that Error** is not NULL in nbd_iter_channel_error

2019-12-17 Thread Markus Armbruster

Eric Blake  writes:

> On 12/17/19 12:26 AM, Markus Armbruster wrote:
>> From: Vladimir Sementsov-Ogievskiy 
>>
>> All callers of nbd_iter_channel_error() pass the address of a
>> local_err variable, and only call this function if an error has
>> already occurred, using this function to propagate that error.
>> This is already implied by its name (local_err instead of the classic
>> errp), but it is worth additionally stressing this by adding an
>> assertion to make it part of the function contract.
>>
>> The local_err parameter is not here to return information about
>> nbd_iter_channel_error failure. Instead it's assumed to be filled when
>> passed to the function. This is already stressed by its name
>> (local_err, instead of classic errp). Stress it additionally by
>> assertion.
>
> Redundant paragraph, but probably too late to worry about it now that
> we have a pull request.

I'll have to respin anyway.  I'll drop the second paragraph.

Re: [PATCH 0/7] configure: Improve PIE and other linkage

2019-12-17 Thread Fangrui Song


On 2019-12-17, Richard Henderson wrote:

This begins by dropping the -Ttext-segment stuff, which Fangrui Song
correctly points out does not work with lld.  But it's also obsolete,
so instead of adding support for lld's --image-base, remove it all.

Then, remove some other legacy random addresses that were supposed
to apply to softmmu, but didn't really make any sense, and aren't
used anyway when PIE is used, which is the default with a modern
linux distribution.

Then, clean up some of the configure logic surrounding PIE, and its
current non-application to non-x86.

Finally, add support for static-pie linking.


r~


Richard Henderson (7):
 configure: Drop adjustment of textseg
 tcg: Remove softmmu code_gen_buffer fixed address
 configure: Do not force pie=no for non-x86
 configure: Always detect -no-pie toolchain support
 configure: Unnest detection of -z,relro and -z,now
 configure: Override the os default with --disable-pie
 configure: Support -static-pie if requested

accel/tcg/translate-all.c |  37 ++--
configure | 120 --
2 files changed, 41 insertions(+), 116 deletions(-)

--
2.20.1


Thank you for the patch set. I hope this will make that lld qemu user
happy.

How will this patch set affect statically linked user mode binaries?
(qemu-user-static packages on Debian, CentOS, ...)

Re: [PATCH 3/7] configure: Do not force pie=no for non-x86

2019-12-17 Thread Thomas Huth

On 18/12/2019 04.19, Richard Henderson wrote:
> PIE is supported on many other hosts besides x86.
> 
> The default for non-x86 is now the same as x86: pie is used
> if supported, and may be forced via --enable/--disable-pie.

The original commit that introduce this code (40d6444e91c) said:

 "Non-x86 are not changed, as they require TCG changes"

... are these "TCG changes" in place nowadays? Did you check on non-x86
systems? If so, please mention this in the commit message.

 Thomas

Re: [PATCH 2/7] tcg: Remove softmmu code_gen_buffer fixed address

2019-12-17 Thread Thomas Huth

On 18/12/2019 04.19, Richard Henderson wrote:
> The commentary talks about "in concert with the addresses
> assigned in the relevant linker script", except there is no
> linker script for softmmu, nor has there been for some time.
> 
> (Do not confuse the user-only linker script editing that was
> removed in the previous patch, because user-only does not
> use this code_gen_buffer allocation method.)
> 
> Signed-off-by: Richard Henderson 
> ---
>  accel/tcg/translate-all.c | 37 +
>  1 file changed, 5 insertions(+), 32 deletions(-)
> 
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 9f48da9472..88468a1c08 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1032,47 +1032,20 @@ static inline void *alloc_code_gen_buffer(void)
>  {
>  int prot = PROT_WRITE | PROT_READ | PROT_EXEC;
>  int flags = MAP_PRIVATE | MAP_ANONYMOUS;
> -uintptr_t start = 0;
>  size_t size = tcg_ctx->code_gen_buffer_size;
>  void *buf;
>  
> -/* Constrain the position of the buffer based on the host cpu.
> -   Note that these addresses are chosen in concert with the
> -   addresses assigned in the relevant linker script file.  */
> -# if defined(__PIE__) || defined(__PIC__)
> -/* Don't bother setting a preferred location if we're building
> -   a position-independent executable.  We're more likely to get
> -   an address near the main executable if we let the kernel
> -   choose the address.  */
> -# elif defined(__x86_64__) && defined(MAP_32BIT)
> -/* Force the memory down into low memory with the executable.
> -   Leave the choice of exact location with the kernel.  */
> -flags |= MAP_32BIT;
> -/* Cannot expect to map more than 800MB in low memory.  */
> -if (size > 800u * 1024 * 1024) {
> -tcg_ctx->code_gen_buffer_size = size = 800u * 1024 * 1024;
> -}
> -# elif defined(__sparc__)
> -start = 0x4000ul;
> -# elif defined(__s390x__)
> -start = 0x9000ul;
> -# elif defined(__mips__)
> -#  if _MIPS_SIM == _ABI64
> -start = 0x12800ul;
> -#  else
> -start = 0x0800ul;
> -#  endif
> -# endif
> -
> -buf = mmap((void *)start, size, prot, flags, -1, 0);
> +buf = mmap(NULL, size, prot, flags, -1, 0);
>  if (buf == MAP_FAILED) {
>  return NULL;
>  }
>  
>  #ifdef __mips__
>  if (cross_256mb(buf, size)) {
> -/* Try again, with the original still mapped, to avoid re-acquiring
> -   that 256mb crossing.  This time don't specify an address.  */
> +/*
> + * Try again, with the original still mapped, to avoid re-acquiring
> + * the same 256mb crossing.
> + */
>  size_t size2;
>  void *buf2 = mmap(NULL, size, prot, flags, -1, 0);
>  switch ((int)(buf2 != MAP_FAILED)) {
> 

Reviewed-by: Thomas Huth

Re: [PATCH 1/7] configure: Drop adjustment of textseg

2019-12-17 Thread Thomas Huth

On 18/12/2019 04.19, Richard Henderson wrote:
> This adjustment was random and unnecessary.  The user mode
> startup code in probe_guest_base() will choose a value for
> guest_base that allows the host qemu binary to not conflict
> with the guest binary.
> 
> With modern distributions, this isn't even used, as the default
> is PIE, which does the same job in a more portable way.
> 
> Signed-off-by: Richard Henderson 
> ---
>  configure | 47 ---
>  1 file changed, 47 deletions(-)
> 
> diff --git a/configure b/configure
> index 84b413dbfc..255ac432af 100755
> --- a/configure
> +++ b/configure
> @@ -6292,49 +6292,6 @@ if test "$cpu" = "s390x" ; then
>fi
>  fi
>  
> -# Probe for the need for relocating the user-only binary.
> -if ( [ "$linux_user" = yes ] || [ "$bsd_user" = yes ] ) && [ "$pie" = no ]; 
> then
> -  textseg_addr=
> -  case "$cpu" in
> -arm | i386 | ppc* | s390* | sparc* | x86_64 | x32)
> -  # ??? Rationale for choosing this address
> -  textseg_addr=0x6000
> -  ;;
> -mips)
> -  # A 256M aligned address, high in the address space, with enough
> -  # room for the code_gen_buffer above it before the stack.
> -  textseg_addr=0x6000
> -  ;;
> -  esac
> -  if [ -n "$textseg_addr" ]; then
> -cat > $TMPC < -int main(void) { return 0; }
> -EOF
> -textseg_ldflags="-Wl,-Ttext-segment=$textseg_addr"
> -if ! compile_prog "" "$textseg_ldflags"; then
> -  # In case ld does not support -Ttext-segment, edit the default linker
> -  # script via sed to set the .text start addr.  This is needed on 
> FreeBSD
> -  # at least.
> -  if ! $ld --verbose >/dev/null 2>&1; then
> -error_exit \
> -"We need to link the QEMU user mode binaries at a" \
> -"specific text address. Unfortunately your linker" \
> -"doesn't support either the -Ttext-segment option or" \
> -"printing the default linker script with --verbose." \
> -"If you don't want the user mode binaries, pass the" \
> -"--disable-user option to configure."
> -  fi
> -
> -  $ld --verbose | sed \
> --e '1,/==/d' \
> --e '/==/,$d' \
> --e "s/[.] = [0-9a-fx]* [+] SIZEOF_HEADERS/. = $textseg_addr + 
> SIZEOF_HEADERS/" \
> --e "s/__executable_start = [0-9a-fx]*/__executable_start = 
> $textseg_addr/" > config-host.ld
> -  textseg_ldflags="-Wl,-T../config-host.ld"

config-host.ld is mentioned one more time in the main "Makefile" ... I
think you could remove it from there now, too.

With such a hunk added:

Reviewed-by: Thomas Huth

Re: [PATCH] docker: gtester is no longer used

2019-12-17 Thread Thomas Huth

On 18/12/2019 02.30, Paolo Bonzini wrote:
> We are using tap-driver.pl, do not require anymore gtester to be installed
> to run the testsuite in docker-based tests.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  tests/docker/common.rc | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/tests/docker/common.rc b/tests/docker/common.rc
> index 512202b..02cd67a 100755
> --- a/tests/docker/common.rc
> +++ b/tests/docker/common.rc
> @@ -53,12 +53,7 @@ check_qemu()
>  INVOCATION="$@"
>  fi
>  
> -if command -v gtester > /dev/null 2>&1 && \
> -   gtester --version > /dev/null 2>&1; then
> -make $MAKEFLAGS $INVOCATION
> -else
> -echo "No working gtester, skipping make $INVOCATION"
> -fi
> +make $MAKEFLAGS $INVOCATION
>  }
>  
>  test_fail()
> 

Reviewed-by: Thomas Huth

Re: [PATCH 0/6] Fix more GCC9 -O3 warnings

2019-12-17 Thread Markus Armbruster

"Chubb, Peter (Data61, Kensington NSW)" 
writes:

>> "Philippe" == Philippe Mathieu-Daudé  writes:
>
> Philippe> Fix some trivial warnings when building with -O3.
>
> For compatibility with lint and other older checkers, it'd be good to keep
> this as /* FALLTHROUGH */ (which gcc should accept according to its
> manual).

We have hundreds of /* fall through */ comments already.

> Fixing the comments' placement is a different matter, and should be
> done.  Seems to me that until gcc started warning for this, noone had
> actually run a checker, and the comments were just for human info.
>
> Peter C

Re: virtio capabilities

2019-12-17 Thread Michael S. Tsirkin

On Wed, Dec 18, 2019 at 04:19:57PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 13/12/2019 19:36, Michael S. Tsirkin wrote:
> > On Fri, Dec 13, 2019 at 07:29:40PM +1100, Alexey Kardashevskiy wrote:
> >>
> >>
> >> On 13/12/2019 18:24, Michael S. Tsirkin wrote:
> >>> On Fri, Dec 13, 2019 at 05:05:05PM +1100, Alexey Kardashevskiy wrote:
>  Hi!
> 
>  I am having an issue with capabilities (hopefully the chunk formatting
>  won't break).
> 
>  The problem is that when virtio_pci_find_capability() reads
>  pci_find_capability(dev, PCI_CAP_ID_VNDR), 0 is returned; if repeated,
>  it returns a valid number (0x84). Timing seems to matter. pci_cfg_read
>  trace shows that that first time read does not reach QEMU but others do
>  reach QEMU and return what is expected.
> 
>  How to debug this, any quick ideas?
>  The config space is not a MMIO BAR
>  or KVM memory slot or anything like this, right? :) Thanks,
> >>>
> >>> Depends on the platform.
> >>>
> >>> E.g. on x86, when using cf8/cfc pair, if guest doesn't
> >>
> >>
> >> Is there an easy way to tell if it is this "cf8/cfc" case?
> >>
> >> I have these bars, is any of them related to cf8/cfc? Thanks,
> >>
> >> root@le-dbg:~# (qemu) info mtree -f
> >> FlatView #0
> >>  AS "memory", root: system
> >>  AS "cpu-memory-0", root: system
> >>  Root memory region: system
> >>   - (prio 0, ram): ppc_spapr.ram kvm
> >>   20008000-2000802f (prio 0, i/o): msix-table
> >>   20008800-20008807 (prio 0, i/o): msix-pba
> >>   2100-21000fff (prio 0, i/o): virtio-pci-common
> >>   21001000-21001fff (prio 0, i/o): virtio-pci-isr
> >>   21002000-21002fff (prio 0, i/o): virtio-pci-device
> >>   21003000-21003fff (prio 0, i/o): virtio-pci-notify
> >>
> > 
> > 
> > No, you want stuff in hw/ppc/spapr_pci.c
> 
> 
> The problem was with our firmware, fixing that now.
> 
> Out of curiosity. I do not see cf8/cfc on x86 either, or I just do not
> recognize those, what is this cf8/cfc?

E.g. i440fx:

static void i440fx_pcihost_realize(DeviceState *dev, Error **errp)
{
PCIHostState *s = PCI_HOST_BRIDGE(dev);
SysBusDevice *sbd = SYS_BUS_DEVICE(dev);

sysbus_add_io(sbd, 0xcf8, >conf_mem);
sysbus_init_ioports(sbd, 0xcf8, 4);

sysbus_add_io(sbd, 0xcfc, >data_mem);
sysbus_init_ioports(sbd, 0xcfc, 4);

/* register i440fx 0xcf8 port as coalesced pio */
memory_region_set_flush_coalesced(>data_mem);
memory_region_add_coalescing(>conf_mem, 0, 4);
}



> Thanks,
> 
> FlatView #2
> 
>  AS "memory", root: system
> 
>  AS "cpu-memory-0", root: system
> 
>  AS "piix3-ide", root: bus master container
> 
>  AS "virtio-net-pci", root: bus master container
> 
>  Root memory region: system
> 
>   -000b (prio 0, ram): pc.ram kvm
> 
>   000c-000c0fff (prio 0, rom): pc.ram
> @000c kvm
>   000c1000-000c3fff (prio 0, ram): pc.ram
> @000c1000 kvm
>   000c4000-000e7fff (prio 0, rom): pc.ram
> @000c4000 kvm
>   000e8000-000e (prio 0, ram): pc.ram
> @000e8000 kvm
>   000f-000f (prio 0, rom): pc.ram
> @000f kvm
>   0010-7fff (prio 0, ram): pc.ram
> @0010 kvm
>   febc-febc002f (prio 0, i/o): msix-table
> 
>   febc0800-febc0807 (prio 0, i/o): msix-pba
> 
>   febfc000-febfcfff (prio 0, i/o): virtio-pci-common
> 
>   febfd000-febfdfff (prio 0, i/o): virtio-pci-isr
> 
>   febfe000-febfefff (prio 0, i/o): virtio-pci-device
> 
>   febff000-febf (prio 0, i/o): virtio-pci-notify
> 
>   fec0-fec00fff (prio 0, i/o): kvm-ioapic
> 
>   fed0-fed003ff (prio 0, i/o): hpet
> 
>   fee0-feef (prio 4096, i/o): kvm-apic-msi
> 
>   fffc- (prio 0, rom): pc.bios kvm
> 
> 
> 
> -- 
> Alexey

Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits

2019-12-17 Thread Tao Xu


On 12/18/2019 9:33 AM, Tao Xu wrote:

On 12/17/2019 6:25 PM, Markus Armbruster wrote:

Tao Xu  writes:


On 12/5/19 11:29 PM, Markus Armbruster wrote:

Tao Xu  writes:


Parse input string both as a double and as a uint64_t, then use the
method which consumes more characters. Update the related test cases.

Signed-off-by: Tao Xu 
---

[...]

diff --git a/util/cutils.c b/util/cutils.c
index 77acadc70a..b08058c57c 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const 
char **end,

 const char default_suffix, int64_t unit,
 uint64_t *result)
   {
-    int retval;
-    const char *endptr;
+    int retval, retd, retu;
+    const char *suffix, *suffixd, *suffixu;
   unsigned char c;
   int mul_required = 0;
-    double val, mul, integral, fraction;
+    bool use_strtod;
+    uint64_t valu;
+    double vald, mul, integral, fraction;


Note for later: @mul is double.


+
+    retd = qemu_strtod_finite(nptr, , );
+    retu = qemu_strtou64(nptr, , 0, );


Note for later: passing 0 to base accepts octal and hexadecimal
integers.


+    use_strtod = strlen(suffixd) < strlen(suffixu);
+
+    /*
+ * Parse @nptr both as a double and as a uint64_t, then use 
the method

+ * which consumes more characters.
+ */


The comment is in a funny place.  I'd put it right before the
qemu_strtod_finite() line.


+    if (use_strtod) {
+    suffix = suffixd;
+    retval = retd;
+    } else {
+    suffix = suffixu;
+    retval = retu;
+    }
   -    retval = qemu_strtod_finite(nptr, , );
   if (retval) {
   goto out;
   }


This is even more subtle than it looks.

A close reading of the function contracts leads to three cases for each
conversion:

* parse error (including infinity and NaN)

    @retu / @retd is -EINVAL
    @valu / @vald is uninitialized
    @suffixu / @suffixd is @nptr

* range error

    @retu / @retd is -ERANGE
    @valu / @vald is our best approximation of the conversion result
    @suffixu / @suffixd points to the first character not consumed 
by the

    conversion.

    Sub-cases:

    - uint64_t overflow

  We know the conversion result exceeds UINT64_MAX.

    - double overflow

  we know the conversion result's magnitude exceeds the largest
  representable finite double DBL_MAX.

    - double underflow

  we know the conversion result is close to zero (closer than 
DBL_MIN,

  the smallest normalized positive double).

* success

    @retu / @retd is 0
    @valu / @vald is the conversion result
    @suffixu / @suffixd points to the first character not consumed 
by the

    conversion.

This leads to a matrix (parse error, uint64_t overflow, success) x
(parse error, double overflow, double underflow, success).  We need to
check the code does what we want for each element of this matrix, and
document any behavior that's not perfectly obvious.

(success, success): we pick uint64_t if qemu_strtou64() consumed more
characters than qemu_strtod_finite(), else double.  "More" is important
here; when they consume the same characters, we *need* to use the
uint64_t result.  Example: for "18446744073709551615", we need to use
uint64_t 18446744073709551615, not double 18446744073709551616.0.  But
for "18446744073709551616.", we need to use the double.  Good.


Also fun: for "0123", we use uint64_t 83, not double 123.0.  But for
"0123.", we use 123.0, not 83.

Do we really want to accept octal and hexadecimal integers?



Thank you for reminding me. Octal and hexadecimal may bring more 
confusion. I will use qemu_strtou64(nptr, , 10, ) and add 
test for input like "0123".




Hi Markus,

After I use qemu_strtou64(nptr, , 10, ), it cause another 
question. Because qemu_strtod_finite support hexadecimal input, so in 
this situation, it will parsed as double. It will also let large 
hexadecimal integers be rounded. So there may be two solution:


1: use qemu_strtou64(nptr, , 0, ) and parse octal as 
decimal. This will keep hexadecimal valid as now.


"0123" --> 123; "0x123" --> 291

2: use qemu_strtou64(nptr, , 10, ) and reject octal and 
decimal.


"0123" --> Error; "0x123" --> Error

Re: virtio capabilities

2019-12-17 Thread Alexey Kardashevskiy




On 13/12/2019 19:36, Michael S. Tsirkin wrote:
> On Fri, Dec 13, 2019 at 07:29:40PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 13/12/2019 18:24, Michael S. Tsirkin wrote:
>>> On Fri, Dec 13, 2019 at 05:05:05PM +1100, Alexey Kardashevskiy wrote:
 Hi!

 I am having an issue with capabilities (hopefully the chunk formatting
 won't break).

 The problem is that when virtio_pci_find_capability() reads
 pci_find_capability(dev, PCI_CAP_ID_VNDR), 0 is returned; if repeated,
 it returns a valid number (0x84). Timing seems to matter. pci_cfg_read
 trace shows that that first time read does not reach QEMU but others do
 reach QEMU and return what is expected.

 How to debug this, any quick ideas?
 The config space is not a MMIO BAR
 or KVM memory slot or anything like this, right? :) Thanks,
>>>
>>> Depends on the platform.
>>>
>>> E.g. on x86, when using cf8/cfc pair, if guest doesn't
>>
>>
>> Is there an easy way to tell if it is this "cf8/cfc" case?
>>
>> I have these bars, is any of them related to cf8/cfc? Thanks,
>>
>> root@le-dbg:~# (qemu) info mtree -f
>> FlatView #0
>>  AS "memory", root: system
>>  AS "cpu-memory-0", root: system
>>  Root memory region: system
>>   - (prio 0, ram): ppc_spapr.ram kvm
>>   20008000-2000802f (prio 0, i/o): msix-table
>>   20008800-20008807 (prio 0, i/o): msix-pba
>>   2100-21000fff (prio 0, i/o): virtio-pci-common
>>   21001000-21001fff (prio 0, i/o): virtio-pci-isr
>>   21002000-21002fff (prio 0, i/o): virtio-pci-device
>>   21003000-21003fff (prio 0, i/o): virtio-pci-notify
>>
> 
> 
> No, you want stuff in hw/ppc/spapr_pci.c


The problem was with our firmware, fixing that now.

Out of curiosity. I do not see cf8/cfc on x86 either, or I just do not
recognize those, what is this cf8/cfc? Thanks,

FlatView #2

 AS "memory", root: system

 AS "cpu-memory-0", root: system

 AS "piix3-ide", root: bus master container

 AS "virtio-net-pci", root: bus master container

 Root memory region: system

  -000b (prio 0, ram): pc.ram kvm

  000c-000c0fff (prio 0, rom): pc.ram
@000c kvm
  000c1000-000c3fff (prio 0, ram): pc.ram
@000c1000 kvm
  000c4000-000e7fff (prio 0, rom): pc.ram
@000c4000 kvm
  000e8000-000e (prio 0, ram): pc.ram
@000e8000 kvm
  000f-000f (prio 0, rom): pc.ram
@000f kvm
  0010-7fff (prio 0, ram): pc.ram
@0010 kvm
  febc-febc002f (prio 0, i/o): msix-table

  febc0800-febc0807 (prio 0, i/o): msix-pba

  febfc000-febfcfff (prio 0, i/o): virtio-pci-common

  febfd000-febfdfff (prio 0, i/o): virtio-pci-isr

  febfe000-febfefff (prio 0, i/o): virtio-pci-device

  febff000-febf (prio 0, i/o): virtio-pci-notify

  fec0-fec00fff (prio 0, i/o): kvm-ioapic

  fed0-fed003ff (prio 0, i/o): hpet

  fee0-feef (prio 4096, i/o): kvm-apic-msi

  fffc- (prio 0, rom): pc.bios kvm



-- 
Alexey

[PATCH 1/4] qemu-file: Don't do IO after shutdown

2019-12-17 Thread Juan Quintela

Be sure that we are not doing neither read/write after shutdown of the
QEMUFile.

Signed-off-by: Juan Quintela 
---
 migration/qemu-file.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 26fb25ddc1..1e5543a279 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -53,6 +53,8 @@ struct QEMUFile {
 
 int last_error;
 Error *last_error_obj;
+/* has the file has been shutdown */
+bool shutdown;
 };
 
 /*
@@ -61,6 +63,7 @@ struct QEMUFile {
  */
 int qemu_file_shutdown(QEMUFile *f)
 {
+f->shutdown = true;
 if (!f->ops->shut_down) {
 return -ENOSYS;
 }
@@ -214,6 +217,9 @@ void qemu_fflush(QEMUFile *f)
 return;
 }
 
+if (f->shutdown) {
+return;
+}
 if (f->iovcnt > 0) {
 expect = iov_size(f->iov, f->iovcnt);
 ret = f->ops->writev_buffer(f->opaque, f->iov, f->iovcnt, f->pos,
@@ -328,6 +334,10 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
 f->buf_index = 0;
 f->buf_size = pending;
 
+if (f->shutdown) {
+return 0;
+}
+
 len = f->ops->get_buffer(f->opaque, f->buf + pending, f->pos,
  IO_BUF_SIZE - pending, _error);
 if (len > 0) {
@@ -642,6 +652,9 @@ int64_t qemu_ftell(QEMUFile *f)
 
 int qemu_file_rate_limit(QEMUFile *f)
 {
+if (f->shutdown) {
+return 1;
+}
 if (qemu_file_get_error(f)) {
 return 1;
 }
-- 
2.23.0

[PATCH 3/4] migration-test: Make sure that multifd and cancel works

2019-12-17 Thread Juan Quintela

Test that this sequerce works:

- launch source
- launch target
- start migration
- cancel migration
- relaunch target
- do migration again

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 108 -
 1 file changed, 107 insertions(+), 1 deletion(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 7588f50b9b..1c93b3e5bc 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -527,6 +527,14 @@ static void migrate_recover(QTestState *who, const char 
*uri)
 qobject_unref(rsp);
 }
 
+static void migrate_cancel(QTestState *who)
+{
+QDict *rsp;
+
+rsp = wait_command(who, "{ 'execute': 'migrate_cancel' }");
+qobject_unref(rsp);
+}
+
 static void migrate_set_capability(QTestState *who, const char *capability,
bool value)
 {
@@ -583,6 +591,8 @@ static void migrate_postcopy_start(QTestState *from, 
QTestState *to)
 typedef struct {
 bool hide_stderr;
 bool use_shmem;
+/* only launch the target process */
+bool only_target;
 char *opts_source;
 char *opts_target;
 } MigrateStart;
@@ -704,7 +714,9 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  arch_source, shmem_opts, args->opts_source,
  ignore_stderr);
 g_free(arch_source);
-*from = qtest_init(cmd_source);
+if (!args->only_target) {
+*from = qtest_init(cmd_source);
+}
 g_free(cmd_source);
 
 cmd_target = g_strdup_printf("-machine %saccel=kvm:tcg%s "
@@ -1470,6 +1482,99 @@ static void test_multifd_tcp_zstd(void)
 test_multifd_tcp("zstd");
 }
 
+/*
+ * This test does:
+ *  source   target
+ *   migrate_incoming
+ * migrate
+ * migrate_cancel
+ *   launch another target
+ * migrate
+ *
+ *  And see that it works
+ */
+
+static void test_multifd_tcp_cancel(void)
+{
+MigrateStart *args = migrate_start_new();
+QTestState *from, *to;
+QDict *rsp;
+char *uri;
+
+if (test_migrate_start(, , "defer", args)) {
+return;
+}
+
+/*
+ * We want to pick a speed slow enough that the test completes
+ * quickly, but that it doesn't complete precopy even on a slow
+ * machine, so also set the downtime.
+ */
+/* 1 ms should make it not converge*/
+migrate_set_parameter_int(from, "downtime-limit", 1);
+/* 1GB/s */
+migrate_set_parameter_int(from, "max-bandwidth", 10);
+
+migrate_set_parameter_int(from, "multifd-channels", 16);
+migrate_set_parameter_int(to, "multifd-channels", 16);
+
+migrate_set_capability(from, "multifd", "true");
+migrate_set_capability(to, "multifd", "true");
+
+/* Start incoming migration from the 1st socket */
+rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
+   "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+qobject_unref(rsp);
+
+/* Wait for the first serial output from the source */
+wait_for_serial("src_serial");
+
+uri = migrate_get_socket_address(to, "socket-address");
+
+migrate(from, uri, "{}");
+
+wait_for_migration_pass(from);
+
+printf("before cancel\n");
+migrate_cancel(from);
+printf("after cancel\n");
+
+args = migrate_start_new();
+args->only_target = true;
+
+if (test_migrate_start(, , "defer", args)) {
+return;
+}
+
+migrate_set_parameter_int(to, "multifd-channels", 16);
+
+migrate_set_capability(to, "multifd", "true");
+
+/* Start incoming migration from the 1st socket */
+rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
+   "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+qobject_unref(rsp);
+
+/* 300ms it should converge */
+migrate_set_parameter_int(from, "downtime-limit", 600);
+
+uri = migrate_get_socket_address(to, "socket-address");
+
+migrate(from, uri, "{}");
+
+wait_for_migration_pass(from);
+
+if (!got_stop) {
+qtest_qmp_eventwait(from, "STOP");
+}
+qtest_qmp_eventwait(to, "RESUME");
+
+wait_for_serial("dest_serial");
+wait_for_migration_complete(from);
+test_migrate_end(from, to, true);
+free(uri);
+}
+
 int main(int argc, char **argv)
 {
 char template[] = "/tmp/migration-test-XX";
@@ -1537,6 +1642,7 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/multifd/tcp/none", test_multifd_tcp_none);
 qtest_add_func("/migration/multifd/tcp/zlib", test_multifd_tcp_zlib);
 qtest_add_func("/migration/multifd/tcp/zstd", test_multifd_tcp_zstd);
+qtest_add_func("/migration/multifd/tcp/cancel", test_multifd_tcp_cancel);
 
 ret = g_test_run();
 
-- 
2.23.0

[PATCH 0/4] Fix multifd + cancel + multifd

2019-12-17 Thread Juan Quintela

Hi

This series:
- create a test that does:
  launch multifd on target
  migrate to target
  cancel on source
  create another target
  migrate again

- And fixes the cases that made it fail:
* Make sure that we don't try ever IO after shutdown/error

Please, review.

Juan Quintela (4):
  qemu-file: Don't do IO after shutdown
  multifd: Make sure that we don't do any IO after an error
  migration-test: Make sure that multifd and cancel works
  migration: Make sure that we don't call write() in case of error

 migration/qemu-file.c  |  13 +
 migration/ram.c|  41 
 tests/migration-test.c | 108 -
 3 files changed, 152 insertions(+), 10 deletions(-)

-- 
2.23.0

[PATCH 2/4] multifd: Make sure that we don't do any IO after an error

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/ram.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index db90237977..4b44578e57 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4132,7 +4132,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 {
 RAMState **temp = opaque;
 RAMState *rs = *temp;
-int ret;
+int ret = 0;
 int i;
 int64_t t0;
 int done = 0;
@@ -4203,12 +4203,14 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 ram_control_after_iterate(f, RAM_CONTROL_ROUND);
 
 out:
-multifd_send_sync_main(rs);
-qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-qemu_fflush(f);
-ram_counters.transferred += 8;
+if (ret >= 0) {
+multifd_send_sync_main(rs);
+qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+qemu_fflush(f);
+ram_counters.transferred += 8;
 
-ret = qemu_file_get_error(f);
+ret = qemu_file_get_error(f);
+}
 if (ret < 0) {
 return ret;
 }
@@ -4260,9 +4262,11 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 ram_control_after_iterate(f, RAM_CONTROL_FINISH);
 }
 
-multifd_send_sync_main(rs);
-qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-qemu_fflush(f);
+if (ret >= 0) {
+multifd_send_sync_main(rs);
+qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+qemu_fflush(f);
+}
 
 return ret;
 }
-- 
2.23.0

[PATCH 4/4] migration: Make sure that we don't call write() in case of error

2019-12-17 Thread Juan Quintela

If we are exiting due to an error/finish/ Just don't try to even
touch the channel with one IO operation.

Signed-off-by: Juan Quintela 
---
 migration/ram.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 4b44578e57..909ef6d237 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1601,6 +1601,12 @@ struct {
 QemuSemaphore channels_ready;
 /* multifd ops */
 MultiFDMethods *ops;
+/*
+ * Have we already run terminate threads.  There is a race when it
+ * happens that we got one error while we are exiting.
+ * We will use atomic operations.  Only valid values are 0 and 1.
+ */
+int exiting;
 } *multifd_send_state;
 
 /*
@@ -1629,6 +1635,10 @@ static int multifd_send_pages(RAMState *rs)
 MultiFDPages_t *pages = multifd_send_state->pages;
 uint64_t transferred;
 
+if (atomic_read(_send_state->exiting)) {
+return -1;
+}
+
 qemu_sem_wait(_send_state->channels_ready);
 for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
 p = _send_state->params[i];
@@ -1710,6 +1720,10 @@ static void multifd_send_terminate_threads(Error *err)
 }
 }
 
+if (atomic_xchg(_send_state->exiting, 1)) {
+return;
+}
+
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
@@ -1824,6 +1838,10 @@ static void *multifd_send_thread(void *opaque)
 
 while (true) {
 qemu_sem_wait(>sem);
+
+if (atomic_read(_send_state->exiting)) {
+break;
+}
 qemu_mutex_lock(>mutex);
 
 if (p->pending_job) {
@@ -1938,6 +1956,7 @@ int multifd_save_setup(Error **errp)
 multifd_send_state->pages = multifd_pages_init(page_count);
 qemu_sem_init(_send_state->channels_ready, 0);
 multifd_send_state->ops = multifd_ops[migrate_multifd_method()];
+atomic_set(_send_state->exiting, 0);
 
 for (i = 0; i < thread_count; i++) {
 MultiFDSendParams *p = _send_state->params[i];
-- 
2.23.0

[PATCH v3 ppc-for-5.0 1/2] linux-headers: Update

2019-12-17 Thread Bharata B Rao

Update to mainline commit: d1eef1c61974 ("Linux 5.5-rc2")

Signed-off-by: Bharata B Rao 
---
 include/standard-headers/asm-x86/bootparam.h  |  7 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h | 15 +++-
 include/standard-headers/drm/drm_fourcc.h | 28 ++-
 .../linux/input-event-codes.h | 77 +++
 include/standard-headers/linux/pci_regs.h |  3 +
 .../standard-headers/rdma/vmw_pvrdma-abi.h|  5 ++
 linux-headers/linux/kvm.h |  1 +
 7 files changed, 132 insertions(+), 4 deletions(-)

diff --git a/include/standard-headers/asm-x86/bootparam.h 
b/include/standard-headers/asm-x86/bootparam.h
index a6f7cf535e..072e2ed546 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_X86_BOOTPARAM_H
 #define _ASM_X86_BOOTPARAM_H
 
-/* setup_data types */
+/* setup_data/setup_indirect types */
 #define SETUP_NONE 0
 #define SETUP_E820_EXT 1
 #define SETUP_DTB  2
@@ -11,6 +11,11 @@
 #define SETUP_APPLE_PROPERTIES 5
 #define SETUP_JAILHOUSE6
 
+#define SETUP_INDIRECT (1<<31)
+
+/* SETUP_INDIRECT | max(SETUP_*) */
+#define SETUP_TYPE_MAX (SETUP_INDIRECT | SETUP_JAILHOUSE)
+
 /* ram_size flags */
 #define RAMDISK_IMAGE_START_MASK   0x07FF
 #define RAMDISK_PROMPT_FLAG0x8000
diff --git 
a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h 
b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h
index d019872608..a5a1c8234e 100644
--- a/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h
+++ b/include/standard-headers/drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h
@@ -58,7 +58,8 @@
 #define PVRDMA_ROCEV1_VERSION  17
 #define PVRDMA_ROCEV2_VERSION  18
 #define PVRDMA_PPN64_VERSION   19
-#define PVRDMA_VERSION PVRDMA_PPN64_VERSION
+#define PVRDMA_QPHANDLE_VERSION20
+#define PVRDMA_VERSION PVRDMA_QPHANDLE_VERSION
 
 #define PVRDMA_BOARD_ID1
 #define PVRDMA_REV_ID  1
@@ -581,6 +582,17 @@ struct pvrdma_cmd_create_qp_resp {
uint32_t max_inline_data;
 };
 
+struct pvrdma_cmd_create_qp_resp_v2 {
+   struct pvrdma_cmd_resp_hdr hdr;
+   uint32_t qpn;
+   uint32_t qp_handle;
+   uint32_t max_send_wr;
+   uint32_t max_recv_wr;
+   uint32_t max_send_sge;
+   uint32_t max_recv_sge;
+   uint32_t max_inline_data;
+};
+
 struct pvrdma_cmd_modify_qp {
struct pvrdma_cmd_hdr hdr;
uint32_t qp_handle;
@@ -663,6 +675,7 @@ union pvrdma_cmd_resp {
struct pvrdma_cmd_create_cq_resp create_cq_resp;
struct pvrdma_cmd_resize_cq_resp resize_cq_resp;
struct pvrdma_cmd_create_qp_resp create_qp_resp;
+   struct pvrdma_cmd_create_qp_resp_v2 create_qp_resp_v2;
struct pvrdma_cmd_query_qp_resp query_qp_resp;
struct pvrdma_cmd_destroy_qp_resp destroy_qp_resp;
struct pvrdma_cmd_create_srq_resp create_srq_resp;
diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index a308c91b4f..46d279f515 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -68,7 +68,7 @@ extern "C" {
 #define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \
 ((uint32_t)(c) << 16) | ((uint32_t)(d) << 24))
 
-#define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of 
little endian */
+#define DRM_FORMAT_BIG_ENDIAN (1U<<31) /* format is big endian instead of 
little endian */
 
 /* Reserve 0 for the invalid format specifier */
 #define DRM_FORMAT_INVALID 0
@@ -647,7 +647,21 @@ extern "C" {
  * Further information on the use of AFBC modifiers can be found in
  * Documentation/gpu/afbc.rst
  */
-#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode)   fourcc_mod_code(ARM, 
__afbc_mode)
+
+/*
+ * The top 4 bits (out of the 56 bits alloted for specifying vendor specific
+ * modifiers) denote the category for modifiers. Currently we have only two
+ * categories of modifiers ie AFBC and MISC. We can have a maximum of sixteen
+ * different categories.
+ */
+#define DRM_FORMAT_MOD_ARM_CODE(__type, __val) \
+   fourcc_mod_code(ARM, ((uint64_t)(__type) << 52) | ((__val) & 
0x000fULL))
+
+#define DRM_FORMAT_MOD_ARM_TYPE_AFBC 0x00
+#define DRM_FORMAT_MOD_ARM_TYPE_MISC 0x01
+
+#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode) \
+   DRM_FORMAT_MOD_ARM_CODE(DRM_FORMAT_MOD_ARM_TYPE_AFBC, __afbc_mode)
 
 /*
  * AFBC superblock size
@@ -741,6 +755,16 @@ extern "C" {
  */
 #define AFBC_FORMAT_MOD_BCH (1ULL << 11)
 
+/*
+ * Arm 16x16 Block U-Interleaved modifier
+ *
+ * This is used by Arm Mali Utgard and Midgard GPUs. It divides the image
+ * into 16x16 pixel blocks. Blocks are

[PATCH v3 ppc-for-5.0 2/2] ppc/spapr: Support reboot of secure pseries guest

2019-12-17 Thread Bharata B Rao

A pseries guest can be run as a secure guest on Ultravisor-enabled
POWER platforms. When such a secure guest is reset, we need to
release/reset a few resources both on ultravisor and hypervisor side.
This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the
machine reset path.

As part of this ioctl, the secure guest is essentially transitioned
back to normal mode so that it can reboot like a regular guest and
become secure again.

This ioctl has no effect when invoked for a normal guest. If this ioctl
fails for a secure guest, the guest is terminated.

Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c   |  1 +
 target/ppc/kvm.c | 15 +++
 target/ppc/kvm_ppc.h |  6 ++
 3 files changed, 22 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f11422fc41..e62c89b3dd 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1597,6 +1597,7 @@ static void spapr_machine_reset(MachineState *machine)
 void *fdt;
 int rc;
 
+kvmppc_svm_off(_fatal);
 spapr_caps_apply(spapr);
 
 first_ppc_cpu = POWERPC_CPU(first_cpu);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 7406d18945..ae920ec310 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2900,3 +2900,18 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t 
tb_offset)
 kvm_set_one_reg(cs, KVM_REG_PPC_TB_OFFSET, _offset);
 }
 }
+
+/*
+ * Don't set error if KVM_PPC_SVM_OFF ioctl is invoked on kernels
+ * that don't support this ioctl.
+ */
+void kvmppc_svm_off(Error **errp)
+{
+int rc;
+KVMState *s = KVM_STATE(current_machine->accelerator);
+
+rc = kvm_vm_ioctl(s, KVM_PPC_SVM_OFF);
+if (rc && rc != -ENOTTY) {
+error_setg(errp, "KVM_PPC_SVM_OFF ioctl failed");
+}
+}
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 47b08a4030..9a9bca1b72 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -37,6 +37,7 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
 target_ulong kvmppc_configure_v3_mmu(PowerPCCPU *cpu,
  bool radix, bool gtse,
  uint64_t proc_tbl);
+void kvmppc_svm_off(Error **errp);
 #ifndef CONFIG_USER_ONLY
 bool kvmppc_spapr_use_multitce(void);
 int kvmppc_spapr_enable_inkernel_multitce(void);
@@ -201,6 +202,11 @@ static inline target_ulong 
kvmppc_configure_v3_mmu(PowerPCCPU *cpu,
 return 0;
 }
 
+static inline void kvmppc_svm_off(Error **errp)
+{
+return;
+}
+
 static inline void kvmppc_set_reg_ppc_online(PowerPCCPU *cpu,
  unsigned int online)
 {
-- 
2.21.0

[PATCH v3 ppc-for-5.0 0/2] ppc/spapr: Support reboot of secure pseries guest

2019-12-17 Thread Bharata B Rao

This patchset adds KVM_PPC_SVM_OFF ioctl which is required to support
reset of secure guest. This includes linux-headers update so that we get
the newly introduced ioctl.

v2: https://lists.gnu.org/archive/html/qemu-ppc/2019-12/msg00162.html

Changes in v3:
-
- Use of error_fatal as David Gibson suggested.
- Updated linux-headers to 5.5.0-rc2

Bharata B Rao (2):
  linux-headers: Update
  ppc/spapr: Support reboot of secure pseries guest

 hw/ppc/spapr.c|  1 +
 include/standard-headers/asm-x86/bootparam.h  |  7 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h | 15 +++-
 include/standard-headers/drm/drm_fourcc.h | 28 ++-
 .../linux/input-event-codes.h | 77 +++
 include/standard-headers/linux/pci_regs.h |  3 +
 .../standard-headers/rdma/vmw_pvrdma-abi.h|  5 ++
 linux-headers/linux/kvm.h |  1 +
 target/ppc/kvm.c  | 15 
 target/ppc/kvm_ppc.h  |  6 ++
 10 files changed, 154 insertions(+), 4 deletions(-)

-- 
2.21.0

Re: [PATCH 6/6] qemu-io-cmds: Silent GCC9 format-overflow warning

2019-12-17 Thread Richard Henderson

On 12/17/19 7:34 AM, Philippe Mathieu-Daudé wrote:
> GCC9 is confused when building with CFLAG -O3:
> 
>   In function ‘help_oneline’,
>   inlined from ‘help_all’ at qemu-io-cmds.c:2414:9,
>   inlined from ‘help_f’ at qemu-io-cmds.c:2424:9:
>   qemu-io-cmds.c:2389:9: error: ‘%s’ directive argument is null 
> [-Werror=format-overflow=]
>2389 | printf("%s ", ct->name);
> | ^~~
> 
> Audit shows this can't happen. Give a hint to GCC adding an
> assert() call.

This deserves more investigation.  From my glance it appears you are right --
and moreover impossible for gcc to have come to this conclusion.  Which begs
the question of how that is.

Did you file a gcc bug report?


r~

> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: qemu-bl...@nongnu.org
> ---
>  qemu-io-cmds.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 1b7e700020..9e956a5dd4 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -2411,6 +2411,7 @@ static void help_all(void)
>  const cmdinfo_t *ct;
>  
>  for (ct = cmdtab; ct < [ncmds]; ct++) {
> +assert(ct->name);
>  help_oneline(ct->name, ct);
>  }
>  printf("\nUse 'help commandname' for extended help.\n");
>

Re: [PATCH v6 4/4] hw/arm: Add the Netduino Plus 2

2019-12-17 Thread Alistair Francis

On Tue, Dec 17, 2019 at 8:03 AM Peter Maydell  wrote:
>
> On Sat, 14 Dec 2019 at 02:44, Alistair Francis  wrote:
> >
> > Signed-off-by: Alistair Francis 
> > ---
> >  MAINTAINERS|  6 +
> >  hw/arm/Kconfig |  3 +++
> >  hw/arm/Makefile.objs   |  1 +
> >  hw/arm/netduinoplus2.c | 52 ++
> >  4 files changed, 62 insertions(+)
> >  create mode 100644 hw/arm/netduinoplus2.c
> >
> > diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> > index 7bfdc3a7ac..881e7f56e7 100644
> > --- a/hw/arm/Kconfig
> > +++ b/hw/arm/Kconfig
> > @@ -105,6 +105,9 @@ config NETDUINOPLUS2
> >  bool
> >  select STM32F405_SOC
> >
> > +config NETDUINOPLUS2
> > +bool
> > +
> >  config NSERIES
> >  bool
> >  select OMAP
>
> Something odd has happened here -- your patch 1/4 already
> had a stanza:
>
> +config NETDUINOPLUS2
> +bool
> +select STM32F405_SOC
>
> so either that should be in this patch or this fragment here
> should just be deleted.

Good catch.

It kind of makes sense to have that fragment in this patch, but then I
don't see a nice way to build the flies as they are added, so I
removed the fragment from this patch.

>
> Assuming you sort that out,
> Reviewed-by: Peter Maydell 

Thanks Peter

Alistair

>
> thanks
> -- PMM

Re: [PATCH v6 2/4] hw/misc: Add the STM32F4xx EXTI device

2019-12-17 Thread Alistair Francis

On Sat, Dec 14, 2019 at 5:49 AM Philippe Mathieu-Daudé
 wrote:
>
> Hi Alistair,
>
> On 12/14/19 3:44 AM, Alistair Francis wrote:
> > Signed-off-by: Alistair Francis 
> > Reviewed-by: Peter Maydell 
> > ---
> >   hw/arm/Kconfig   |   1 +
> >   hw/misc/Kconfig  |   3 +
> >   hw/misc/Makefile.objs|   1 +
> >   hw/misc/stm32f4xx_exti.c | 189 +++
> >   hw/misc/trace-events |   5 +
> >   include/hw/misc/stm32f4xx_exti.h |  60 ++
> >   6 files changed, 259 insertions(+)
> >   create mode 100644 hw/misc/stm32f4xx_exti.c
> >   create mode 100644 include/hw/misc/stm32f4xx_exti.h
> >
> > diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> > index 4660d14715..3d86691ae0 100644
> > --- a/hw/arm/Kconfig
> > +++ b/hw/arm/Kconfig
> > @@ -315,6 +315,7 @@ config STM32F405_SOC
> >   bool
> >   select ARM_V7M
> >   select STM32F4XX_SYSCFG
> > +select STM32F4XX_EXTI
> >
> >   config XLNX_ZYNQMP_ARM
> >   bool
> > diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
> > index 72609650b7..bdd77d8020 100644
> > --- a/hw/misc/Kconfig
> > +++ b/hw/misc/Kconfig
> > @@ -85,6 +85,9 @@ config STM32F2XX_SYSCFG
> >   config STM32F4XX_SYSCFG
> >   bool
> >
> > +config STM32F4XX_EXTI
> > +bool
> > +
> >   config MIPS_ITU
> >   bool
> >
> > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > index ea8025e0bb..c6ecbdd7b0 100644
> > --- a/hw/misc/Makefile.objs
> > +++ b/hw/misc/Makefile.objs
> > @@ -59,6 +59,7 @@ common-obj-$(CONFIG_ZYNQ) += zynq_slcr.o
> >   common-obj-$(CONFIG_ZYNQ) += zynq-xadc.o
> >   common-obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> >   common-obj-$(CONFIG_STM32F4XX_SYSCFG) += stm32f4xx_syscfg.o
> > +common-obj-$(CONFIG_STM32F4XX_EXTI) += stm32f4xx_exti.o
> >   obj-$(CONFIG_MIPS_CPS) += mips_cmgcr.o
> >   obj-$(CONFIG_MIPS_CPS) += mips_cpc.o
> >   obj-$(CONFIG_MIPS_ITU) += mips_itu.o
> > diff --git a/hw/misc/stm32f4xx_exti.c b/hw/misc/stm32f4xx_exti.c
> > new file mode 100644
> > index 00..7f87a885aa
> > --- /dev/null
> > +++ b/hw/misc/stm32f4xx_exti.c
> > @@ -0,0 +1,189 @@
> > +/*
> > + * STM32F4XX EXTI
> > + *
> > + * Copyright (c) 2014 Alistair Francis 
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a 
> > copy
> > + * of this software and associated documentation files (the "Software"), 
> > to deal
> > + * in the Software without restriction, including without limitation the 
> > rights
> > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or 
> > sell
> > + * copies of the Software, and to permit persons to whom the Software is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included 
> > in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
> > OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> > OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> > FROM,
> > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS 
> > IN
> > + * THE SOFTWARE.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/log.h"
> > +#include "trace.h"
> > +#include "hw/irq.h"
> > +#include "migration/vmstate.h"
> > +#include "hw/misc/stm32f4xx_exti.h"
> > +
> > +static void stm32f4xx_exti_reset(DeviceState *dev)
> > +{
> > +STM32F4xxExtiState *s = STM32F4XX_EXTI(dev);
> > +
> > +s->exti_imr = 0x;
> > +s->exti_emr = 0x;
> > +s->exti_rtsr = 0x;
> > +s->exti_ftsr = 0x;
> > +s->exti_swier = 0x;
> > +s->exti_pr = 0x;
> > +}
> > +
> > +static void stm32f4xx_exti_set_irq(void *opaque, int irq, int level)
> > +{
> > +STM32F4xxExtiState *s = opaque;
> > +
> > +if (!((1 << irq) & s->exti_imr)) {
> > +/* Interrupt is masked */
> > +return;
>
> I'm not sure this is correct, don't you need to set the bit in the
> exti_pr register regardless it is masked? Else in masked polling mode
> the guest will never see IRQ delivered.
>
> So I'd drop this if statement, ...
>
> > +}
> > +
> > +trace_stm32f4xx_exti_set_irq(irq, level);
> > +
> > +if (((1 << irq) & s->exti_rtsr) && level) {
> > +/* Rising Edge */
> > +qemu_irq_pulse(s->irq[irq]);
>
> ... do not pulse here, ...
>
> > +s->exti_pr |= 1 << irq;
> > +}
> > +
> > +if (((1 << irq) & s->exti_ftsr) && !level) {
> > +/* Falling Edge */
> > +qemu_irq_pulse(s->irq[irq]);
>
> ... do not pulse here, ...
>
> > +s->exti_pr |= 1 << irq;
> > +}
>
> ... and here pulse if not masked:
>
> if

Re: [PATCH 5/6] hw/scsi/megasas: Silent GCC9 duplicated-cond warning

2019-12-17 Thread Richard Henderson

On 12/17/19 7:34 AM, Philippe Mathieu-Daudé wrote:
> GCC9 is confused when building with CFLAG -O3:
> 
>   hw/scsi/megasas.c: In function ‘megasas_scsi_realize’:
>   hw/scsi/megasas.c:2387:26: error: duplicated ‘if’ condition 
> [-Werror=duplicated-cond]
>2387 | } else if (s->fw_sge >= 128 - MFI_PASS_FRAME_SIZE) {
>   hw/scsi/megasas.c:2385:19: note: previously used here
>2385 | if (s->fw_sge >= MEGASAS_MAX_SGE - MFI_PASS_FRAME_SIZE) {
>   cc1: all warnings being treated as errors
> 
> When this device was introduced in commit e8f943c3bcc, the author
> cared about modularity, using a definition for the firmware limit.
> If we modify the limit, the code is valid. Add a check if the
> definition got modified to a bigger limit.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Hannes Reinecke 
> Cc: Paolo Bonzini 
> Cc: Fam Zheng 
> Cc: qemu-bl...@nongnu.org
> ---
>  hw/scsi/megasas.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
> index de9bd20887..ece1601b66 100644
> --- a/hw/scsi/megasas.c
> +++ b/hw/scsi/megasas.c
> @@ -2382,7 +2382,8 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
> **errp)
>  if (!s->hba_serial) {
>  s->hba_serial = g_strdup(MEGASAS_HBA_SERIAL);
>  }
> -if (s->fw_sge >= MEGASAS_MAX_SGE - MFI_PASS_FRAME_SIZE) {
> +if (MEGASAS_MAX_SGE > 128
> +&& s->fw_sge >= MEGASAS_MAX_SGE - MFI_PASS_FRAME_SIZE) {
>  s->fw_sge = MEGASAS_MAX_SGE - MFI_PASS_FRAME_SIZE;
>  } else if (s->fw_sge >= 128 - MFI_PASS_FRAME_SIZE) {
>  s->fw_sge = 128 - MFI_PASS_FRAME_SIZE;
> 

I'm not keen on this.  It looks to me like the raw 128 case should be removed
-- surely that's the point of the symbolic constant.  But I'll defer if a
maintainer disagrees.


r~

Re: [PATCH 2/6] hw/display/tcx: Add missing fall through comments

2019-12-17 Thread Richard Henderson

On 12/17/19 7:34 AM, Philippe Mathieu-Daudé wrote:
> GCC9 is confused by this comment when building with
> CFLAG -Wimplicit-fallthrough=2:
> 
>   hw/display/tcx.c: In function ‘tcx_dac_writel’:
>   hw/display/tcx.c:453:26: error: this statement may fall through 
> [-Werror=implicit-fallthrough=]
> 453 | s->dac_index = (s->dac_index + 1) & 0xff; /* Index 
> autoincrement */
> | ~^~~
>   hw/display/tcx.c:454:9: note: here
> 454 | default:
> | ^~~
>   hw/display/tcx.c: In function ‘tcx_dac_readl’:
>   hw/display/tcx.c:412:22: error: this statement may fall through 
> [-Werror=implicit-fallthrough=]
> 412 | s->dac_index = (s->dac_index + 1) & 0xff; /* Index 
> autoincrement */
> | ~^~~
>   hw/display/tcx.c:413:5: note: here
> 413 | default:
> | ^~~
>   cc1: all warnings being treated as errors
> 
> Add the missing fall through comments.
> 
> Fixes: 55d7bfe22
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Olivier Danet 
> Cc: Mark Cave-Ayland 
> ---
>  hw/display/tcx.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Richard Henderson 

r~

Re: [kvm-unit-tests PATCH 05/16] arm/arm64: ITS: Introspection tests

2019-12-17 Thread Zenghui Yu


Hi Eric,

I have to admit that this is the first time I've looked into
the kvm-unit-tests code, so only some minor comments inline :)

On 2019/12/16 22:02, Eric Auger wrote:

Detect the presence of an ITS as part of the GICv3 init
routine, initialize its base address and read few registers
the IIDR, the TYPER to store its dimensioning parameters.

This is our first ITS test, belonging to a new "its" group.

Signed-off-by: Eric Auger 


[...]


diff --git a/lib/arm/asm/gic-v3-its.h b/lib/arm/asm/gic-v3-its.h
new file mode 100644
index 000..2ce483e
--- /dev/null
+++ b/lib/arm/asm/gic-v3-its.h
@@ -0,0 +1,116 @@
+/*
+ * All ITS* defines are lifted from include/linux/irqchip/arm-gic-v3.h
+ *
+ * Copyright (C) 2016, Red Hat Inc, Andrew Jones 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#ifndef _ASMARM_GIC_V3_ITS_H_
+#define _ASMARM_GIC_V3_ITS_H_
+
+#ifndef __ASSEMBLY__
+
+#define GITS_CTLR  0x
+#define GITS_IIDR  0x0004
+#define GITS_TYPER 0x0008
+#define GITS_CBASER0x0080
+#define GITS_CWRITER   0x0088
+#define GITS_CREADR0x0090
+#define GITS_BASER 0x0100
+
+#define GITS_TYPER_PLPIS(1UL << 0)
+#define GITS_TYPER_IDBITS_SHIFT 8
+#define GITS_TYPER_DEVBITS_SHIFT13
+#define GITS_TYPER_DEVBITS(r)   r) >> GITS_TYPER_DEVBITS_SHIFT) & 
0x1f) + 1)
+#define GITS_TYPER_PTA  (1UL << 19)
+#define GITS_TYPER_HWCOLLCNT_SHIFT  24
+
+#define GITS_CTLR_ENABLE(1U << 0)
+
+#define GITS_CBASER_VALID   (1UL << 63)
+#define GITS_CBASER_SHAREABILITY_SHIFT  (10)
+#define GITS_CBASER_INNER_CACHEABILITY_SHIFT(59)
+#define GITS_CBASER_OUTER_CACHEABILITY_SHIFT(53)
+#define GITS_CBASER_SHAREABILITY_MASK   \
+   GIC_BASER_SHAREABILITY(GITS_CBASER, SHAREABILITY_MASK)
+#define GITS_CBASER_INNER_CACHEABILITY_MASK \
+   GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, MASK)
+#define GITS_CBASER_OUTER_CACHEABILITY_MASK \
+   GIC_BASER_CACHEABILITY(GITS_CBASER, OUTER, MASK)
+#define GITS_CBASER_CACHEABILITY_MASK GITS_CBASER_INNER_CACHEABILITY_MASK
+
+#define GITS_CBASER_InnerShareable  \
+   GIC_BASER_SHAREABILITY(GITS_CBASER, InnerShareable)
+
+#define GITS_CBASER_nCnBGIC_BASER_CACHEABILITY(GITS_CBASER, INNER, 
nCnB)
+#define GITS_CBASER_nC  GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, nC)
+#define GITS_CBASER_RaWtGIC_BASER_CACHEABILITY(GITS_CBASER, INNER, 
RaWt)
+#define GITS_CBASER_RaWbGIC_BASER_CACHEABILITY(GITS_CBASER, INNER, 
RaWt)


s/RaWt/RaWb/


+#define GITS_CBASER_WaWtGIC_BASER_CACHEABILITY(GITS_CBASER, INNER, 
WaWt)
+#define GITS_CBASER_WaWbGIC_BASER_CACHEABILITY(GITS_CBASER, INNER, 
WaWb)
+#define GITS_CBASER_RaWaWt  GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, 
RaWaWt)
+#define GITS_CBASER_RaWaWb  GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, 
RaWaWb)
+
+#define GITS_BASER_NR_REGS  8
+
+#define GITS_BASER_VALID(1UL << 63)
+#define GITS_BASER_INDIRECT (1ULL << 62)
+
+#define GITS_BASER_INNER_CACHEABILITY_SHIFT (59)
+#define GITS_BASER_OUTER_CACHEABILITY_SHIFT (53)
+#define GITS_BASER_CACHEABILITY_MASK   0x7
+
+#define GITS_BASER_nCnB GIC_BASER_CACHEABILITY(GITS_BASER, INNER, nCnB)
+
+#define GITS_BASER_TYPE_SHIFT   (56)
+#define GITS_BASER_TYPE(r)  (((r) >> GITS_BASER_TYPE_SHIFT) & 7)
+#define GITS_BASER_ENTRY_SIZE_SHIFT (48)
+#define GITS_BASER_ENTRY_SIZE(r)r) >> GITS_BASER_ENTRY_SIZE_SHIFT) 
& 0x1f) + 1)
+#define GITS_BASER_SHAREABILITY_SHIFT   (10)
+#define GITS_BASER_InnerShareable   \
+   GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)
+#define GITS_BASER_PAGE_SIZE_SHIFT  (8)
+#define GITS_BASER_PAGE_SIZE_4K (0UL << GITS_BASER_PAGE_SIZE_SHIFT)
+#define GITS_BASER_PAGE_SIZE_16K(1UL << GITS_BASER_PAGE_SIZE_SHIFT)
+#define GITS_BASER_PAGE_SIZE_64K(2UL << GITS_BASER_PAGE_SIZE_SHIFT)
+#define GITS_BASER_PAGE_SIZE_MASK   (3UL << GITS_BASER_PAGE_SIZE_SHIFT)
+#define GITS_BASER_PAGES_MAX256
+#define GITS_BASER_PAGES_SHIFT  (0)
+#define GITS_BASER_NR_PAGES(r)  (((r) & 0xff) + 1)
+#define GITS_BASER_PHYS_ADDR_MASK  0xF000
+
+#define GITS_BASER_TYPE_NONE0
+#define GITS_BASER_TYPE_DEVICE  1
+#define GITS_BASER_TYPE_VCPU2
+#define GITS_BASER_TYPE_CPU 3


'3' is one of the reserved values of the GITS_BASER.Type field, and
what do we expect with a "GITS_BASER_TYPE_CPU" table type? ;-)

I think we can copy (and might update in the future) all these
macros against the latest Linux kernel.

Re: [PATCH 3/6] hw/net/imx_fec: Rewrite fall through comments

2019-12-17 Thread Richard Henderson

On 12/17/19 7:34 AM, Philippe Mathieu-Daudé wrote:
> GCC9 is confused by this comment when building with CFLAG
> -Wimplicit-fallthrough=2:
> 
>   hw/net/imx_fec.c: In function ‘imx_eth_write’:
>   hw/net/imx_fec.c:906:12: error: this statement may fall through 
> [-Werror=implicit-fallthrough=]
> 906 | if (unlikely(single_tx_ring)) {
> |^
>   hw/net/imx_fec.c:912:5: note: here
> 912 | case ENET_TDAR: /* FALLTHROUGH */
> | ^~~~
>   cc1: all warnings being treated as errors
> 
> Rewrite the comments in the correct place,  using 'fall through'
> which is recognized by GCC and static analyzers.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Peter Chubb 
> Cc: Peter Maydell 
> Cc: Jason Wang 
> Cc: qemu-...@nongnu.org
> ---
>  hw/net/imx_fec.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH 1/6] audio/audio: Add missing fall through comment

2019-12-17 Thread Richard Henderson

On 12/17/19 7:34 AM, Philippe Mathieu-Daudé wrote:
> GCC9 is confused by this comment when building with
> CFLAG -Wimplicit-fallthrough=2:
> 
>   audio/audio.c: In function ‘audio_pcm_init_info’:
>   audio/audio.c:306:14: error: this statement may fall through 
> [-Werror=implicit-fallthrough=]
> 306 | sign = 1;
> | ~^~~
>   audio/audio.c:307:5: note: here
> 307 | case AUDIO_FORMAT_U8:
> | ^~~~
>   cc1: all warnings being treated as errors
> 
> Add the missing fall through comment, similarly to e46349414.
...
> diff --git a/audio/audio.c b/audio/audio.c
> index 56fae55047..57daf3f620 100644
> --- a/audio/audio.c
> +++ b/audio/audio.c
> @@ -304,6 +304,7 @@ void audio_pcm_init_info (struct audio_pcm_info *info, 
> struct audsettings *as)
>  switch (as->fmt) {
>  case AUDIO_FORMAT_S8:
>  sign = 1;
> +/* fall through */
>  case AUDIO_FORMAT_U8:
>  mul = 1;
>  break;

Reviewed-by: Richard Henderson 

r~

Re: [PATCH] target/ppc: Remove unused PPC_INPUT_INT defines

2019-12-17 Thread David Gibson

On Tue, Dec 17, 2019 at 10:46:16PM -0300, Fabiano Rosas wrote:
> They were added in "16415335be Use correct input constant" with a
> single use in kvm_arch_pre_run but that function's implementation was
> removed by "1e8f51e856 ppc: remove idle_timer logic".
> 
> Signed-off-by: Fabiano Rosas 

Applied to ppc-for-5.0, thanks.

> ---
>  target/ppc/kvm.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 7406d18945..b19555e97e 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -1325,12 +1325,6 @@ int kvmppc_set_interrupt(PowerPCCPU *cpu, int irq, int 
> level)
>  return 0;
>  }
>  
> -#if defined(TARGET_PPC64)
> -#define PPC_INPUT_INT PPC970_INPUT_INT
> -#else
> -#define PPC_INPUT_INT PPC6xx_INPUT_INT
> -#endif
> -
>  void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
>  {
>  return;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH] target/ppc: Handle AIL=0 in ppc_excp_vector_offset

2019-12-17 Thread David Gibson

On Tue, Dec 17, 2019 at 11:25:12AM -0300, Fabiano Rosas wrote:
> The exception vector offset calculation was moved into a function but
> the case when AIL=0 was not checked.
> 
> The reason we got away with this is that the sole caller of
> ppc_excp_vector_offset checks the AIL before calling the function:
> 
> /* Handle AIL */
> if (ail) {
> ...
> vector |= ppc_excp_vector_offset(cs, ail);
> }
> 
> Fixes: 2586a4d7a0 ("target/ppc: Move exception vector offset computation into 
> a function")
> Signed-off-by: Fabiano Rosas 

Applied to ppc-for-5.0, thanks.

> ---
>  target/ppc/excp_helper.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 50b004d00d..5752ed4a4d 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -112,6 +112,8 @@ static uint64_t ppc_excp_vector_offset(CPUState *cs, int 
> ail)
>  uint64_t offset = 0;
>  
>  switch (ail) {
> +case AIL_NONE:
> +break;
>  case AIL_0001_8000:
>  offset = 0x18000;
>  break;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PULL 51/88] ppc: well form kvmppc_hint_smt_possible error hint helper

2019-12-17 Thread David Gibson

On Tue, Dec 17, 2019 at 07:32:15AM +0100, Markus Armbruster wrote:
> David Gibson  writes:
> 
> > From: Vladimir Sementsov-Ogievskiy 
> >
> > Make kvmppc_hint_smt_possible hint append helper well formed:
> > rename errp to errp_in, as it is IN-parameter here (which is unusual
> > for errp), rename function to be kvmppc_error_append_*_hint.
> >
> > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > Reviewed-by: Marc-André Lureau 
> > Message-Id: <20191127191434.20945-1-vsement...@virtuozzo.com>
> > Reviewed-by: Greg Kurz 
> > Signed-off-by: David Gibson 
> 
> Review led to the commit message to be replaced for this and related
> patches.  It's in my "[PULL 00/34] Error reporting patches for
> 2019-12-16".  No big deal, but if you respin, either steal that message
> or drop this patch.

Uh, sorry.  I realized moments before sending this that this patch had
been updated.  I didn't want to re-do all my pre-pull testing, though
and it's not actually harmful, so I left it.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[PATCH 2/7] tcg: Remove softmmu code_gen_buffer fixed address

2019-12-17 Thread Richard Henderson

The commentary talks about "in concert with the addresses
assigned in the relevant linker script", except there is no
linker script for softmmu, nor has there been for some time.

(Do not confuse the user-only linker script editing that was
removed in the previous patch, because user-only does not
use this code_gen_buffer allocation method.)

Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.c | 37 +
 1 file changed, 5 insertions(+), 32 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 9f48da9472..88468a1c08 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1032,47 +1032,20 @@ static inline void *alloc_code_gen_buffer(void)
 {
 int prot = PROT_WRITE | PROT_READ | PROT_EXEC;
 int flags = MAP_PRIVATE | MAP_ANONYMOUS;
-uintptr_t start = 0;
 size_t size = tcg_ctx->code_gen_buffer_size;
 void *buf;
 
-/* Constrain the position of the buffer based on the host cpu.
-   Note that these addresses are chosen in concert with the
-   addresses assigned in the relevant linker script file.  */
-# if defined(__PIE__) || defined(__PIC__)
-/* Don't bother setting a preferred location if we're building
-   a position-independent executable.  We're more likely to get
-   an address near the main executable if we let the kernel
-   choose the address.  */
-# elif defined(__x86_64__) && defined(MAP_32BIT)
-/* Force the memory down into low memory with the executable.
-   Leave the choice of exact location with the kernel.  */
-flags |= MAP_32BIT;
-/* Cannot expect to map more than 800MB in low memory.  */
-if (size > 800u * 1024 * 1024) {
-tcg_ctx->code_gen_buffer_size = size = 800u * 1024 * 1024;
-}
-# elif defined(__sparc__)
-start = 0x4000ul;
-# elif defined(__s390x__)
-start = 0x9000ul;
-# elif defined(__mips__)
-#  if _MIPS_SIM == _ABI64
-start = 0x12800ul;
-#  else
-start = 0x0800ul;
-#  endif
-# endif
-
-buf = mmap((void *)start, size, prot, flags, -1, 0);
+buf = mmap(NULL, size, prot, flags, -1, 0);
 if (buf == MAP_FAILED) {
 return NULL;
 }
 
 #ifdef __mips__
 if (cross_256mb(buf, size)) {
-/* Try again, with the original still mapped, to avoid re-acquiring
-   that 256mb crossing.  This time don't specify an address.  */
+/*
+ * Try again, with the original still mapped, to avoid re-acquiring
+ * the same 256mb crossing.
+ */
 size_t size2;
 void *buf2 = mmap(NULL, size, prot, flags, -1, 0);
 switch ((int)(buf2 != MAP_FAILED)) {
-- 
2.20.1

[PATCH 6/7] configure: Override the os default with --disable-pie

2019-12-17 Thread Richard Henderson

Some distributions, e.g. Ubuntu 19.10, enable PIE by default.
If for some reason one wishes to build a non-pie binary, we
must provide additional options to override.

At the same time, reorg the code to an elif chain.

Signed-off-by: Richard Henderson 
---
 configure | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/configure b/configure
index f8981eec15..1645a58b3a 100755
--- a/configure
+++ b/configure
@@ -2029,19 +2029,18 @@ if compile_prog "-Werror -fno-pie" "-no-pie"; then
   LDFLAGS_NOPIE="-no-pie"
 fi
 
-if test "$pie" != "no" ; then
-  if compile_prog "-fPIE -DPIE" "-pie"; then
-QEMU_CFLAGS="-fPIE -DPIE $QEMU_CFLAGS"
-LDFLAGS="-pie $LDFLAGS"
-pie="yes"
-  else
-if test "$pie" = "yes"; then
-  error_exit "PIE not available due to missing toolchain support"
-else
-  echo "Disabling PIE due to missing toolchain support"
-  pie="no"
-fi
-  fi
+if test "$pie" = "no"; then
+  QEMU_CFLAGS="$CFLAGS_NOPIE $QEMU_CFLAGS"
+  LDFLAGS="$LDFLAGS_NOPIE $LDFLAGS"
+elif compile_prog "-fPIE -DPIE" "-pie"; then
+  QEMU_CFLAGS="-fPIE -DPIE $QEMU_CFLAGS"
+  LDFLAGS="-pie $LDFLAGS"
+  pie="yes"
+elif test "$pie" = "yes"; then
+  error_exit "PIE not available due to missing toolchain support"
+else
+  echo "Disabling PIE due to missing toolchain support"
+  pie="no"
 fi
 
 # Detect support for DT_BIND_NOW.
-- 
2.20.1

[PATCH 5/7] configure: Unnest detection of -z,relro and -z,now

2019-12-17 Thread Richard Henderson

There is nothing about these options that is related to PIE.
Nor is there anything that specifically ties them to each other.
Use them unconditionally.

Signed-off-by: Richard Henderson 
---
 configure | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 972ce7396f..f8981eec15 100755
--- a/configure
+++ b/configure
@@ -2034,9 +2034,6 @@ if test "$pie" != "no" ; then
 QEMU_CFLAGS="-fPIE -DPIE $QEMU_CFLAGS"
 LDFLAGS="-pie $LDFLAGS"
 pie="yes"
-if compile_prog "" "-Wl,-z,relro -Wl,-z,now" ; then
-  LDFLAGS="-Wl,-z,relro -Wl,-z,now $LDFLAGS"
-fi
   else
 if test "$pie" = "yes"; then
   error_exit "PIE not available due to missing toolchain support"
@@ -2047,6 +2044,16 @@ if test "$pie" != "no" ; then
   fi
 fi
 
+# Detect support for DT_BIND_NOW.
+if compile_prog "" "-Wl,-z,now" ; then
+  LDFLAGS="-Wl,-z,now $LDFLAGS"
+fi
+
+# Detect support for PT_GNU_RELRO.
+if compile_prog "" "-Wl,-z,relro" ; then
+  LDFLAGS="-Wl,-z,relro $LDFLAGS"
+fi
+
 ##
 # __sync_fetch_and_and requires at least -march=i486. Many toolchains
 # use i686 as default anyway, but for those that don't, an explicit
-- 
2.20.1

[PATCH 1/7] configure: Drop adjustment of textseg

2019-12-17 Thread Richard Henderson

This adjustment was random and unnecessary.  The user mode
startup code in probe_guest_base() will choose a value for
guest_base that allows the host qemu binary to not conflict
with the guest binary.

With modern distributions, this isn't even used, as the default
is PIE, which does the same job in a more portable way.

Signed-off-by: Richard Henderson 
---
 configure | 47 ---
 1 file changed, 47 deletions(-)

diff --git a/configure b/configure
index 84b413dbfc..255ac432af 100755
--- a/configure
+++ b/configure
@@ -6292,49 +6292,6 @@ if test "$cpu" = "s390x" ; then
   fi
 fi
 
-# Probe for the need for relocating the user-only binary.
-if ( [ "$linux_user" = yes ] || [ "$bsd_user" = yes ] ) && [ "$pie" = no ]; 
then
-  textseg_addr=
-  case "$cpu" in
-arm | i386 | ppc* | s390* | sparc* | x86_64 | x32)
-  # ??? Rationale for choosing this address
-  textseg_addr=0x6000
-  ;;
-mips)
-  # A 256M aligned address, high in the address space, with enough
-  # room for the code_gen_buffer above it before the stack.
-  textseg_addr=0x6000
-  ;;
-  esac
-  if [ -n "$textseg_addr" ]; then
-cat > $TMPC &1; then
-error_exit \
-"We need to link the QEMU user mode binaries at a" \
-"specific text address. Unfortunately your linker" \
-"doesn't support either the -Ttext-segment option or" \
-"printing the default linker script with --verbose." \
-"If you don't want the user mode binaries, pass the" \
-"--disable-user option to configure."
-  fi
-
-  $ld --verbose | sed \
--e '1,/==/d' \
--e '/==/,$d' \
--e "s/[.] = [0-9a-fx]* [+] SIZEOF_HEADERS/. = $textseg_addr + 
SIZEOF_HEADERS/" \
--e "s/__executable_start = [0-9a-fx]*/__executable_start = 
$textseg_addr/" > config-host.ld
-  textseg_ldflags="-Wl,-T../config-host.ld"
-fi
-  fi
-fi
-
 # Check that the C++ compiler exists and works with the C compiler.
 # All the QEMU_CXXFLAGS are based on QEMU_CFLAGS. Keep this at the end to 
don't miss any other that could be added.
 if has $cxx; then
@@ -7897,10 +7854,6 @@ if test "$gprof" = "yes" ; then
   fi
 fi
 
-if test "$target_linux_user" = "yes" || test "$target_bsd_user" = "yes" ; then
-  ldflags="$ldflags $textseg_ldflags"
-fi
-
 # Newer kernels on s390 check for an S390_PGSTE program header and
 # enable the pgste page table extensions in that case. This makes
 # the vm.allocate_pgste sysctl unnecessary. We enable this program
-- 
2.20.1

[PATCH 0/7] configure: Improve PIE and other linkage

2019-12-17 Thread Richard Henderson

This begins by dropping the -Ttext-segment stuff, which Fangrui Song
correctly points out does not work with lld.  But it's also obsolete,
so instead of adding support for lld's --image-base, remove it all.

Then, remove some other legacy random addresses that were supposed
to apply to softmmu, but didn't really make any sense, and aren't
used anyway when PIE is used, which is the default with a modern
linux distribution.

Then, clean up some of the configure logic surrounding PIE, and its
current non-application to non-x86.

Finally, add support for static-pie linking.


r~


Richard Henderson (7):
  configure: Drop adjustment of textseg
  tcg: Remove softmmu code_gen_buffer fixed address
  configure: Do not force pie=no for non-x86
  configure: Always detect -no-pie toolchain support
  configure: Unnest detection of -z,relro and -z,now
  configure: Override the os default with --disable-pie
  configure: Support -static-pie if requested

 accel/tcg/translate-all.c |  37 ++--
 configure | 120 --
 2 files changed, 41 insertions(+), 116 deletions(-)

-- 
2.20.1

[PATCH 7/7] configure: Support -static-pie if requested

2019-12-17 Thread Richard Henderson

Recent toolchains support static and pie at the same time.

As with normal dynamic builds, allow --static to default to PIE
if supported by the toolchain.  Allow --enable/--disable-pie to
override the default.

Signed-off-by: Richard Henderson 
---
 configure | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/configure b/configure
index 1645a58b3a..c03491018a 100755
--- a/configure
+++ b/configure
@@ -1023,7 +1023,6 @@ for opt do
   ;;
   --static)
 static="yes"
-LDFLAGS="-static $LDFLAGS"
 QEMU_PKG_CONFIG_FLAGS="--static $QEMU_PKG_CONFIG_FLAGS"
   ;;
   --mandir=*) mandir="$optarg"
@@ -1994,11 +1993,6 @@ if test "$static" = "yes" ; then
   if test "$modules" = "yes" ; then
 error_exit "static and modules are mutually incompatible"
   fi
-  if test "$pie" = "yes" ; then
-error_exit "static and pie are mutually incompatible"
-  else
-pie="no"
-  fi
 fi
 
 # Unconditional check for compiler __thread support
@@ -2032,6 +2026,17 @@ fi
 if test "$pie" = "no"; then
   QEMU_CFLAGS="$CFLAGS_NOPIE $QEMU_CFLAGS"
   LDFLAGS="$LDFLAGS_NOPIE $LDFLAGS"
+elif test "$static" = "yes"; then
+  if compile_prog "-fPIE -DPIE" "-static-pie"; then
+QEMU_CFLAGS="-fPIE -DPIE $QEMU_CFLAGS"
+LDFLAGS="-static-pie $LDFLAGS"
+pie="yes"
+  elif test "$pie" = "yes"; then
+error_exit "-static-pie not available due to missing toolchain support"
+  else
+LDFLAGS="-static $LDFLAGS"
+pie="no"
+  fi
 elif compile_prog "-fPIE -DPIE" "-pie"; then
   QEMU_CFLAGS="-fPIE -DPIE $QEMU_CFLAGS"
   LDFLAGS="-pie $LDFLAGS"
-- 
2.20.1

[PATCH 3/7] configure: Do not force pie=no for non-x86

2019-12-17 Thread Richard Henderson

PIE is supported on many other hosts besides x86.

The default for non-x86 is now the same as x86: pie is used
if supported, and may be forced via --enable/--disable-pie.

Signed-off-by: Richard Henderson 
---
 configure | 10 --
 1 file changed, 10 deletions(-)

diff --git a/configure b/configure
index 255ac432af..2fb4457d7c 100755
--- a/configure
+++ b/configure
@@ -2012,16 +2012,6 @@ if ! compile_prog "-Werror" "" ; then
"Thread-Local Storage (TLS). Please upgrade to a version that does."
 fi
 
-if test "$pie" = ""; then
-  case "$cpu-$targetos" in
-i386-Linux|x86_64-Linux|x32-Linux|i386-OpenBSD|x86_64-OpenBSD)
-  ;;
-*)
-  pie="no"
-  ;;
-  esac
-fi
-
 if test "$pie" != "no" ; then
   cat > $TMPC << EOF
 
-- 
2.20.1

[PATCH 4/7] configure: Always detect -no-pie toolchain support

2019-12-17 Thread Richard Henderson

The CFLAGS_NOPIE and LDFLAGS_NOPIE variables are used
in pc-bios/optionrom/Makefile, which has nothing to do
with the PIE setting of the main qemu executables.

This overrides any operating system default to build
all executables as PIE, which is important for ROMs.

Signed-off-by: Richard Henderson 
---
 configure | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/configure b/configure
index 2fb4457d7c..972ce7396f 100755
--- a/configure
+++ b/configure
@@ -2012,26 +2012,24 @@ if ! compile_prog "-Werror" "" ; then
"Thread-Local Storage (TLS). Please upgrade to a version that does."
 fi
 
-if test "$pie" != "no" ; then
-  cat > $TMPC << EOF
+cat > $TMPC << EOF
 
 #ifdef __linux__
 #  define THREAD __thread
 #else
 #  define THREAD
 #endif
-
 static THREAD int tls_var;
-
 int main(void) { return tls_var; }
-
 EOF
-  # check we support --no-pie first...
-  if compile_prog "-Werror -fno-pie" "-no-pie"; then
-CFLAGS_NOPIE="-fno-pie"
-LDFLAGS_NOPIE="-nopie"
-  fi
 
+# Check we support --no-pie first; we will need this for building ROMs.
+if compile_prog "-Werror -fno-pie" "-no-pie"; then
+  CFLAGS_NOPIE="-fno-pie"
+  LDFLAGS_NOPIE="-no-pie"
+fi
+
+if test "$pie" != "no" ; then
   if compile_prog "-fPIE -DPIE" "-pie"; then
 QEMU_CFLAGS="-fPIE -DPIE $QEMU_CFLAGS"
 LDFLAGS="-pie $LDFLAGS"
-- 
2.20.1

[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2019-12-17 Thread Rafael David Tinoco

Hello Fred,

Based on Dann's feedback on testing, I'm failing to see where your patch
fixes the "root" cause (despite being able to mitigate the issue by
changing the aio notification mechanism).

I think the root cause is best described in this 2 emails from the
thread:

https://lore.kernel.org/qemu-devel/20191009080220.GA2905@hc/

and

https://lore.kernel.org/qemu-devel/966c119d-aa76-2149-108f-
867aebd77...@redhat.com/

So, by adding ctx->notify_for_convert, it is very likely you
workarounded the issue by doing what Jan already said: removing both
variables (ctx->list_lock and, in old case, ctx->notify_me, in your
case, ctx->notify_for_convert) from the same cacheline and making the
issue to "disappear" (as we would eventually do in a workaround patch).

What about aarch64 issue with both, ctx->list_lock and
ctx->notify_for_convert, being synchronized by qemu used primitives, and
being in the same cache line ?

Any "workaround" here would try to dodge the same cacheline situation,
but, for upstream, I suppose Paolo wants to have something else
regarding aarch64 ATOMIC_SEQ_CST.

like describe in this part of the discussion:

https://lore.kernel.org/qemu-devel/96c26e21-5996-0c63-ce8b-
99a1b5473...@redhat.com/

Unless I'm missing something, am I ?

Thank you!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Confirmed
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Bionic:
  Confirmed
Status in qemu source package in Disco:
  Confirmed
Status in qemu source package in Eoan:
  In Progress
Status in qemu source package in Focal:
  Confirmed

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  [ Original Description ]

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=,
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=,
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3

Re: [RFC PATCH 0/9] Introduce mediate ops in vfio-pci

2019-12-17 Thread Jason Wang




On 2019/12/12 下午1:47, Yan Zhao wrote:

On Thu, Dec 12, 2019 at 11:48:25AM +0800, Jason Wang wrote:

On 2019/12/6 下午8:49, Yan Zhao wrote:

On Fri, Dec 06, 2019 at 05:40:02PM +0800, Jason Wang wrote:

On 2019/12/6 下午4:22, Yan Zhao wrote:

On Thu, Dec 05, 2019 at 09:05:54PM +0800, Jason Wang wrote:

On 2019/12/5 下午4:51, Yan Zhao wrote:

On Thu, Dec 05, 2019 at 02:33:19PM +0800, Jason Wang wrote:

Hi:

On 2019/12/5 上午11:24, Yan Zhao wrote:

For SRIOV devices, VFs are passthroughed into guest directly without host
driver mediation. However, when VMs migrating with passthroughed VFs,
dynamic host mediation is required to  (1) get device states, (2) get
dirty pages. Since device states as well as other critical information
required for dirty page tracking for VFs are usually retrieved from PFs,
it is handy to provide an extension in PF driver to centralizingly control
VFs' migration.

Therefore, in order to realize (1) passthrough VFs at normal time, (2)
dynamically trap VFs' bars for dirty page tracking and

A silly question, what's the reason for doing this, is this a must for dirty
page tracking?


For performance consideration. VFs' bars should be passthoughed at
normal time and only enter into trap state on need.

Right, but how does this matter for the case of dirty page tracking?


Take NIC as an example, to trap its VF dirty pages, software way is
required to trap every write of ring tail that resides in BAR0.

Interesting, but it looks like we need:
- decode the instruction
- mediate all access to BAR0
All of which seems a great burden for the VF driver. I wonder whether or
not doing interrupt relay and tracking head is better in this case.


hi Jason

not familiar with the way you mentioned. could you elaborate more?


It looks to me that you want to intercept the bar that contains the
head. Then you can figure out the buffers submitted from driver and you
still need to decide a proper time to mark them as dirty.


Not need to be accurate, right? just a superset of real dirty bitmap is
enough.



If the superset is too large compared with the dirty pages, it will lead 
a lot of side effects.






What I meant is, intercept the interrupt, then you can figure still
figure out the buffers which has been modified by the device and make
them as dirty.

Then there's no need to trap BAR and do decoding/emulation etc.

But it will still be tricky to be correct...


intercept the interrupt is a little hard if post interrupt is enabled..



We don't care about the performance too much in this case. Can we simply 
disable it?




I think what you worried about here is the timing to mark dirty pages,
right? upon interrupt receiving, you regard DMAs are finished and safe
to make them dirty.
But with BAR trap way, we at least can keep those dirtied pages as dirty
until device stop. Of course we have other methods to optimize it.



I'm not sure this will not confuse the migration converge time estimation.





There's
still no IOMMU Dirty bit available.

  (3) centralizing
VF critical states retrieving and VF controls into one driver, we propose
to introduce mediate ops on top of current vfio-pci device driver.


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  __   register mediate ops|  ___ ___|
|  |<---| VF|   |   |
| vfio-pci |  | |  mediate  |   | PF driver |   |
|__|--->|   driver  |   |___|
  |open(pdev)  |  ---  | |
  ||
  ||_ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _ _|
 \|/  \|/
--- 
|VF   | |PF|
--- 


VF mediate driver could be a standalone driver that does not bind to
any devices (as in demo code in patches 5-6) or it could be a built-in
extension of PF driver (as in patches 7-9) .

Rather than directly bind to VF, VF mediate driver register a mediate
ops into vfio-pci in driver init. vfio-pci maintains a list of such
mediate ops.
(Note that: VF mediate driver can register mediate ops into vfio-pci
before vfio-pci binding to any devices. And VF mediate driver can
support mediating multiple devices.)

When opening a device (e.g. a VF), vfio-pci goes through the mediate ops
list and calls each vfio_pci_mediate_ops->open() with pdev of the opening
device as a parameter.
VF mediate driver should return success or failure depending on it
supports the pdev or not.
E.g. VF mediate driver would compare its supported VF devfn with the
devfn of the passed-in pdev.
Once vfio-pci finds a successful vfio_pci_mediate_ops->open(), it will
stop querying other mediate ops and bind the opening

Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits

2019-12-17 Thread Tao Xu


On 12/17/2019 11:01 PM, Markus Armbruster wrote:

Christophe de Dinechin  writes:


On 17 Dec 2019, at 15:08, Markus Armbruster  wrote:

Christophe de Dinechin  writes:


On 5 Dec 2019, at 16:29, Markus Armbruster  wrote:

Tao Xu  writes:


Parse input string both as a double and as a uint64_t, then use the
method which consumes more characters. Update the related test cases.

Signed-off-by: Tao Xu 
---

[...]

diff --git a/util/cutils.c b/util/cutils.c
index 77acadc70a..b08058c57c 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end,
  const char default_suffix, int64_t unit,
  uint64_t *result)
{
-int retval;
-const char *endptr;
+int retval, retd, retu;
+const char *suffix, *suffixd, *suffixu;
unsigned char c;
int mul_required = 0;
-double val, mul, integral, fraction;
+bool use_strtod;
+uint64_t valu;
+double vald, mul, integral, fraction;


Note for later: @mul is double.


+
+retd = qemu_strtod_finite(nptr, , );
+retu = qemu_strtou64(nptr, , 0, );
+use_strtod = strlen(suffixd) < strlen(suffixu);
+
+/*
+ * Parse @nptr both as a double and as a uint64_t, then use the method
+ * which consumes more characters.
+ */


The comment is in a funny place.  I'd put it right before the
qemu_strtod_finite() line.


+if (use_strtod) {
+suffix = suffixd;
+retval = retd;
+} else {
+suffix = suffixu;
+retval = retu;
+}

-retval = qemu_strtod_finite(nptr, , );
if (retval) {
goto out;
}


This is even more subtle than it looks.


But why it is even necessary?

The “contract” for the function used to be that it returned rounded values
beyond 2^53, which in itself is curious.

But now it’s a 6-dimensional matrix of hell with NaNs and barfnots, when the
name implies it’s simply doing a text to u64 conversion…

There is certainly a reason, but I’m really curious what it is :-)


It all goes back to commit 9f9b17a4f0 "Introduce strtosz() library
function to convert a string to a byte count.".  To support "convenient"
usage like "1.5G", it parses the number part with strtod().  This limits
us to 53 bits of precision.  Larger sizes get rounded.

I guess the excuse for this was that when you're dealing with sizes that
large (petabytes!), your least significant bits are zero anyway.

Regardless, the interface is *awful*.  We should've forced the author to
spell it out in all its glory in a proper function contract.  That tends
to cool the enthusiasm for "convenient" syntax amazingly fast.

The awful interface has been confusing people for close to a decade now.

What to do?


I see. Thanks for the rationale. I knew it had to make sense :-)


For a value of "sense"...


I’d probably avoid strtod even with the convenient syntax above.
Do you want 1.33e-6M to be allowed? Do we want to ever
accept or generate NaN or Inf values?


NaN or Inf definitely not.  That's why we use qemu_strtod_finite()
before and after the patch.

No sane person should ever use 1.33e-6M.  Or even 1.1k (which yields
1126, rounded silently from machine number 1126.40001, which
approximates the true value 1126.4).

Certain fractions are actually sane.  1.5k denotes a perfectly fine
integer, which the code manages not to screw up.  I'd recommend against
using fractions regardless.

What usage are we prepared to break?  What kind of confusion are we
willing to bear?  Those are the questions.


Tao Xu's patch tries to make the function do what its users expect,
namely parse a bleepin' 64 bit integer, without breaking any of the
"convenience" syntax.  Turns out that's amazingly subtle.  Are we making
things less confusing or more?


Thanks for your explanation. I think another reason is build-in 'size' 
is really commonly used. May be someone use '-m 1.5G' to boot QEMU or 
write it to a config file.

[PATCH] qcow2: Move error check of local_err near its assignment

2019-12-17 Thread Tuguoyi


Signed-off-by: Guoyi Tu 
---
 block/qcow2.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 7c18721..ce3db29 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1705,14 +1705,14 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 if (!(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) {
 /* It's case 1, 2 or 3.2. Or 3.1 which is BUG in management layer. */
 bool header_updated = qcow2_load_dirty_bitmaps(bs, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail;
+}
 
 update_header = update_header && !header_updated;
 }
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-ret = -EINVAL;
-goto fail;
-}
 
 if (update_header) {
 ret = qcow2_update_header(bs);
-- 
2.7.4

[PATCH v2 10/10] migration: Add zstd compression multifd support

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 hw/core/qdev-properties.c |   2 +-
 migration/ram.c   | 288 ++
 qapi/migration.json   |   2 +-
 tests/migration-test.c|   6 +
 4 files changed, 296 insertions(+), 2 deletions(-)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index e8ff317a60..b75467704f 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -654,7 +654,7 @@ const PropertyInfo qdev_prop_fdc_drive_type = {
 const PropertyInfo qdev_prop_multifd_compress = {
 .name = "MultifdCompress",
 .description = "multifd_compress values, "
-   "none/zlib",
+   "none/zlib/zstd",
 .enum_table = _lookup,
 .get = get_enum,
 .set = set_enum,
diff --git a/migration/ram.c b/migration/ram.c
index 5006d719b4..db90237977 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -29,6 +29,9 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include 
+#ifdef CONFIG_ZSTD
+#include 
+#endif
 #include "qemu/cutils.h"
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
@@ -584,6 +587,7 @@ exit:
 #define MULTIFD_FLAG_SYNC (1 << 0)
 #define MULTIFD_FLAG_NOCOMP (1 << 1)
 #define MULTIFD_FLAG_ZLIB (1 << 2)
+#define MULTIFD_FLAG_ZSTD (1 << 3)
 
 /* This value needs to be a multiple of qemu_target_page_size() */
 #define MULTIFD_PACKET_SIZE (512 * 1024)
@@ -635,6 +639,22 @@ struct zlib_data {
 uint32_t zbuff_len;
 };
 
+#ifdef CONFIG_ZSTD
+struct zstd_data {
+/* stream for compression */
+ZSTD_CStream *zcs;
+/* stream for decompression */
+ZSTD_DStream *zds;
+/* buffers */
+ZSTD_inBuffer in;
+ZSTD_outBuffer out;
+/* compressed buffer */
+uint8_t *zbuff;
+/* size of compressed buffer */
+uint32_t zbuff_len;
+};
+#endif
+
 typedef struct {
 /* this fields are not changed once the thread is created */
 /* channel number */
@@ -1109,9 +1129,277 @@ static MultiFDMethods multifd_zlib_ops = {
 .recv_pages = zlib_recv_pages
 };
 
+#ifdef CONFIG_ZSTD
+/* Multifd zstd compression */
+
+/**
+ * zstd_send_setup: setup send side
+ *
+ * Setup each channel with zstd compression.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @errp: pointer to an error
+ */
+static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
+{
+uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+struct zstd_data *z = g_new0(struct zstd_data, 1);
+int res;
+
+p->data = z;
+z->zcs = ZSTD_createCStream();
+if (!z->zcs) {
+g_free(z);
+error_setg(errp, "multifd %d: zstd createCStream failed", p->id);
+return -1;
+}
+
+res = ZSTD_initCStream(z->zcs, migrate_compress_level());
+if (ZSTD_isError(res)) {
+ZSTD_freeCStream(z->zcs);
+g_free(z);
+error_setg(errp, "multifd %d: initCStream failed", p->id);
+return -1;
+}
+/* We will never have more than page_count pages */
+z->zbuff_len = page_count * qemu_target_page_size();
+z->zbuff_len *= 2;
+z->zbuff = g_try_malloc(z->zbuff_len);
+if (!z->zbuff) {
+ZSTD_freeCStream(z->zcs);
+g_free(z);
+error_setg(errp, "multifd %d: out of memory for zbuff", p->id);
+return -1;
+}
+return 0;
+}
+
+/**
+ * zstd_send_cleanup: cleanup send side
+ *
+ * Close the channel and return memory.
+ *
+ * @p: Params for the channel that we are using
+ */
+static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
+{
+struct zstd_data *z = p->data;
+
+ZSTD_freeCStream(z->zcs);
+z->zcs = NULL;
+g_free(z->zbuff);
+z->zbuff = NULL;
+g_free(p->data);
+p->data = NULL;
+}
+
+/**
+ * zstd_send_prepare: prepare date to be able to send
+ *
+ * Create a compressed buffer with all the pages that we are going to
+ * send.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @used: number of pages used
+ */
+static int zstd_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
+{
+struct iovec *iov = p->pages->iov;
+struct zstd_data *z = p->data;
+int ret;
+uint32_t i;
+
+z->out.dst = z->zbuff;
+z->out.size = z->zbuff_len;
+z->out.pos = 0;
+
+for (i = 0; i < used; i++) {
+ZSTD_EndDirective flush = ZSTD_e_continue;
+
+if (i == used - 1) {
+flush = ZSTD_e_flush;
+}
+z->in.src = iov[i].iov_base;
+z->in.size = iov[i].iov_len;
+z->in.pos = 0;
+
+ret = ZSTD_compressStream2(z->zcs, >out, >in, flush);
+if (ZSTD_isError(ret)) {
+error_setg(errp, "multifd %d: compressStream error %s",
+   p->id, ZSTD_getErrorName(ret));
+return -1;
+}
+}
+p->next_packet_size = z->out.pos;
+p->flags |= MULTIFD_FLAG_ZSTD;
+
+return 0;
+}
+
+/**
+ * zstd_send_write: do the actual write of the data
+ *
+ * Do the actual

[PATCH v2 03/10] migration-test: introduce functions to handle string parameters

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 1c9f2c4e6a..fc221f172a 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -460,6 +460,43 @@ static void migrate_set_parameter_int(QTestState *who, 
const char *parameter,
 migrate_check_parameter_int(who, parameter, value);
 }
 
+static char *migrate_get_parameter_str(QTestState *who,
+   const char *parameter)
+{
+QDict *rsp;
+char *result;
+
+rsp = wait_command(who, "{ 'execute': 'query-migrate-parameters' }");
+result = g_strdup(qdict_get_str(rsp, parameter));
+qobject_unref(rsp);
+return result;
+}
+
+static void migrate_check_parameter_str(QTestState *who, const char *parameter,
+const char *value)
+{
+char *result;
+
+result = migrate_get_parameter_str(who, parameter);
+g_assert_cmpstr(result, ==, value);
+g_free(result);
+}
+
+__attribute__((unused))
+static void migrate_set_parameter_str(QTestState *who, const char *parameter,
+  const char *value)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who,
+"{ 'execute': 'migrate-set-parameters',"
+"'arguments': { %s: %s } }",
+parameter, value);
+g_assert(qdict_haskey(rsp, "return"));
+qobject_unref(rsp);
+migrate_check_parameter_str(who, parameter, value);
+}
+
 static void migrate_pause(QTestState *who)
 {
 QDict *rsp;
-- 
2.23.0

[PATCH v2 09/10] configure: Enable test and libs for zstd

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 configure | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/configure b/configure
index 84b413dbfc..a8f3027c67 100755
--- a/configure
+++ b/configure
@@ -447,6 +447,7 @@ lzo=""
 snappy=""
 bzip2=""
 lzfse=""
+zstd=""
 guest_agent=""
 guest_agent_with_vss="no"
 guest_agent_ntddscsi="no"
@@ -1341,6 +1342,10 @@ for opt do
   ;;
   --disable-lzfse) lzfse="no"
   ;;
+  --disable-zstd) zstd="no"
+  ;;
+  --enable-zstd) zstd="yes"
+  ;;
   --enable-guest-agent) guest_agent="yes"
   ;;
   --disable-guest-agent) guest_agent="no"
@@ -1788,6 +1793,8 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   (for reading bzip2-compressed dmg images)
   lzfse   support of lzfse compression library
   (for reading lzfse-compressed dmg images)
+  zstdsupport for zstd compression library
+  (for migration compression)
   seccomp seccomp support
   coroutine-pool  coroutine freelist (better performance)
   glusterfs   GlusterFS backend
@@ -2401,6 +2408,24 @@ EOF
 fi
 fi
 
+##
+# zstd check
+
+if test "$zstd" != "no" ; then
+if $pkg_config --exist libzstd ; then
+zstd_cflags="$($pkg_config --cflags libzstd)"
+zstd_libs="$($pkg_config --libs libzstd)"
+LIBS="$zstd_libs $LIBS"
+QEMU_CFLAGS="$QEMU_CFLAGS $zstd_cflags"
+zstd="yes"
+else
+if test "$zstd" = "yes" ; then
+feature_not_found "libzstd" "Install libzstd devel"
+fi
+zstd="no"
+fi
+fi
+
 ##
 # libseccomp check
 
@@ -6535,6 +6560,7 @@ echo "lzo support   $lzo"
 echo "snappy support$snappy"
 echo "bzip2 support $bzip2"
 echo "lzfse support $lzfse"
+echo "zstd support  $zstd"
 echo "NUMA host support $numa"
 echo "libxml2   $libxml2"
 echo "tcmalloc support  $tcmalloc"
@@ -7102,6 +7128,10 @@ if test "$lzfse" = "yes" ; then
   echo "LZFSE_LIBS=-llzfse" >> $config_host_mak
 fi
 
+if test "$zstd" = "yes" ; then
+  echo "CONFIG_ZSTD=y" >> $config_host_mak
+fi
+
 if test "$libiscsi" = "yes" ; then
   echo "CONFIG_LIBISCSI=m" >> $config_host_mak
   echo "LIBISCSI_CFLAGS=$libiscsi_cflags" >> $config_host_mak
-- 
2.23.0

[PATCH v2 06/10] migration: Add multifd-compress parameter

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 

---
Rename it to NONE
Fix typos (dave)
We don't need to chek values returned by visit_type_MultifdCompress (markus)
Fix yet more typos (wei)
---
 hw/core/qdev-properties.c| 13 +
 include/hw/qdev-properties.h |  3 +++
 migration/migration.c| 13 +
 monitor/hmp-cmds.c   | 13 +
 qapi/migration.json  | 30 +++---
 tests/migration-test.c   | 13 ++---
 6 files changed, 79 insertions(+), 6 deletions(-)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index ac28890e5a..644705235e 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -8,6 +8,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qemu/ctype.h"
 #include "qemu/error-report.h"
+#include "qapi/qapi-types-migration.h"
 #include "hw/block/block.h"
 #include "net/hub.h"
 #include "qapi/visitor.h"
@@ -648,6 +649,18 @@ const PropertyInfo qdev_prop_fdc_drive_type = {
 .set_default_value = set_default_value_enum,
 };
 
+/* --- MultifdCompress --- */
+
+const PropertyInfo qdev_prop_multifd_compress = {
+.name = "MultifdCompress",
+.description = "multifd_compress values, "
+   "none",
+.enum_table = _lookup,
+.get = get_enum,
+.set = set_enum,
+.set_default_value = set_default_value_enum,
+};
+
 /* --- pci address --- */
 
 /*
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index c6a8cb5516..07d7bba682 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -21,6 +21,7 @@ extern const PropertyInfo qdev_prop_tpm;
 extern const PropertyInfo qdev_prop_ptr;
 extern const PropertyInfo qdev_prop_macaddr;
 extern const PropertyInfo qdev_prop_on_off_auto;
+extern const PropertyInfo qdev_prop_multifd_compress;
 extern const PropertyInfo qdev_prop_losttickpolicy;
 extern const PropertyInfo qdev_prop_blockdev_on_error;
 extern const PropertyInfo qdev_prop_bios_chs_trans;
@@ -204,6 +205,8 @@ extern const PropertyInfo qdev_prop_pcie_link_width;
 DEFINE_PROP(_n, _s, _f, qdev_prop_macaddr, MACAddr)
 #define DEFINE_PROP_ON_OFF_AUTO(_n, _s, _f, _d) \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_on_off_auto, OnOffAuto)
+#define DEFINE_PROP_MULTIFD_COMPRESS(_n, _s, _f, _d) \
+DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_multifd_compress, 
MultifdCompress)
 #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_losttickpolicy, \
 LostTickPolicy)
diff --git a/migration/migration.c b/migration/migration.c
index cf6cec5fb6..93c6ed10a6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -87,6 +87,7 @@
 /* The delay time (in ms) between two COLO checkpoints */
 #define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
 #define DEFAULT_MIGRATE_MULTIFD_CHANNELS 16
+#define DEFAULT_MIGRATE_MULTIFD_COMPRESS MULTIFD_COMPRESS_NONE
 
 /* Background transfer rate for postcopy, 0 means unlimited, note
  * that page requests can still exceed this limit.
@@ -774,6 +775,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->block_incremental = s->parameters.block_incremental;
 params->has_multifd_channels = true;
 params->multifd_channels = s->parameters.multifd_channels;
+params->has_multifd_compress = true;
+params->multifd_compress = s->parameters.multifd_compress;
 params->has_xbzrle_cache_size = true;
 params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
 params->has_max_postcopy_bandwidth = true;
@@ -1281,6 +1284,9 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 if (params->has_multifd_channels) {
 dest->multifd_channels = params->multifd_channels;
 }
+if (params->has_multifd_compress) {
+dest->multifd_compress = params->multifd_compress;
+}
 if (params->has_xbzrle_cache_size) {
 dest->xbzrle_cache_size = params->xbzrle_cache_size;
 }
@@ -1377,6 +1383,9 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 if (params->has_multifd_channels) {
 s->parameters.multifd_channels = params->multifd_channels;
 }
+if (params->has_multifd_compress) {
+s->parameters.multifd_compress = params->multifd_compress;
+}
 if (params->has_xbzrle_cache_size) {
 s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
 xbzrle_cache_resize(params->xbzrle_cache_size, errp);
@@ -3474,6 +3483,9 @@ static Property migration_properties[] = {
 DEFINE_PROP_UINT8("multifd-channels", MigrationState,
   parameters.multifd_channels,
   DEFAULT_MIGRATE_MULTIFD_CHANNELS),
+DEFINE_PROP_MULTIFD_COMPRESS("multifd-compress", MigrationState,
+  parameters.multifd_compress,
+  DEFAULT_MIGRATE_MULTIFD_COMPRESS),
 DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState,

[PATCH v2 01/10] migration: Increase default number of multifd channels to 16

2019-12-17 Thread Juan Quintela

We can scale much better with 16, so we can scale to higher numbers.

Signed-off-by: Juan Quintela 
---
 migration/migration.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 354ad072fa..e7f707e033 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -86,7 +86,7 @@
 
 /* The delay time (in ms) between two COLO checkpoints */
 #define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
-#define DEFAULT_MIGRATE_MULTIFD_CHANNELS 2
+#define DEFAULT_MIGRATE_MULTIFD_CHANNELS 16
 
 /* Background transfer rate for postcopy, 0 means unlimited, note
  * that page requests can still exceed this limit.
-- 
2.23.0

[PATCH v2 07/10] migration: Make no compression operations into its own structure

2019-12-17 Thread Juan Quintela

It will be used later.

Signed-off-by: Juan Quintela 

---
Move setup of ->ops helper to proper place (wei)
Rename s/none/nocomp/ (dave)
Introduce MULTIFD_FLAG_NOCOMP
---
 migration/migration.c |   9 ++
 migration/migration.h |   1 +
 migration/ram.c   | 194 --
 3 files changed, 196 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 93c6ed10a6..56203eb536 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2213,6 +2213,15 @@ int migrate_multifd_channels(void)
 return s->parameters.multifd_channels;
 }
 
+int migrate_multifd_method(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->parameters.multifd_compress;
+}
+
 int migrate_use_xbzrle(void)
 {
 MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index 545f283ae7..d3ea45e25a 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -299,6 +299,7 @@ bool migrate_auto_converge(void);
 bool migrate_use_multifd(void);
 bool migrate_pause_before_switchover(void);
 int migrate_multifd_channels(void);
+int migrate_multifd_method(void);
 
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
diff --git a/migration/ram.c b/migration/ram.c
index fcf50e648a..10661e03ae 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -44,6 +44,7 @@
 #include "page_cache.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "qapi/qapi-types-migration.h"
 #include "qapi/qapi-events-migration.h"
 #include "qapi/qmp/qerror.h"
 #include "trace.h"
@@ -581,6 +582,7 @@ exit:
 #define MULTIFD_VERSION 1
 
 #define MULTIFD_FLAG_SYNC (1 << 0)
+#define MULTIFD_FLAG_NOCOMP (1 << 1)
 
 /* This value needs to be a multiple of qemu_target_page_size() */
 #define MULTIFD_PACKET_SIZE (512 * 1024)
@@ -662,6 +664,8 @@ typedef struct {
 uint64_t num_pages;
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
+/* used for compression methods */
+void *data;
 }  MultiFDSendParams;
 
 typedef struct {
@@ -699,8 +703,153 @@ typedef struct {
 uint64_t num_pages;
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
+/* used for de-compression methods */
+void *data;
 } MultiFDRecvParams;
 
+typedef struct {
+/* Setup for sending side */
+int (*send_setup)(MultiFDSendParams *p, Error **errp);
+/* Cleanup for sending side */
+void (*send_cleanup)(MultiFDSendParams *p, Error **errp);
+/* Prepare the send packet */
+int (*send_prepare)(MultiFDSendParams *p, uint32_t used, Error **errp);
+/* Write the send packet */
+int (*send_write)(MultiFDSendParams *p, uint32_t used, Error **errp);
+/* Setup for receiving side */
+int (*recv_setup)(MultiFDRecvParams *p, Error **errp);
+/* Cleanup for receiving side */
+void (*recv_cleanup)(MultiFDRecvParams *p);
+/* Read all pages */
+int (*recv_pages)(MultiFDRecvParams *p, uint32_t used, Error **errp);
+} MultiFDMethods;
+
+/* Multifd without compression */
+
+/**
+ * nocomp_send_setup: setup send side
+ *
+ * For no compression this function does nothing.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @errp: pointer to an error
+ */
+static int nocomp_send_setup(MultiFDSendParams *p, Error **errp)
+{
+return 0;
+}
+
+/**
+ * nocomp_send_cleanup: cleanup send side
+ *
+ * For no compression this function does nothing.
+ *
+ * @p: Params for the channel that we are using
+ */
+static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
+{
+return;
+}
+
+/**
+ * nocomp_send_prepare: prepare date to be able to send
+ *
+ * For no compression we just have to calculate the size of the
+ * packet.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @used: number of pages used
+ * @errp: pointer to an error
+ */
+static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
+   Error **errp)
+{
+p->next_packet_size = used * qemu_target_page_size();
+p->flags |= MULTIFD_FLAG_NOCOMP;
+return 0;
+}
+
+/**
+ * nocomp_send_write: do the actual write of the data
+ *
+ * For no compression we just have to write the data.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @used: number of pages used
+ * @errp: pointer to an error
+ */
+static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
+{
+return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
+}
+
+/**
+ * nocomp_recv_setup: setup receive side
+ *
+ * For no compression this function does nothing.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @errp: pointer to an error
+ */
+static int nocomp_recv_setup(MultiFDRecvParams *p, Error **errp)
+{
+return 0;
+}
+
+/**
+ * nocomp_recv_cleanup: setup

[PATCH v2 05/10] migration: Make multifd_load_setup() get an Error parameter

2019-12-17 Thread Juan Quintela

We need to change the full chain to pass the Error parameter.

Signed-off-by: Juan Quintela 
---
 migration/migration.c | 10 +-
 migration/migration.h |  2 +-
 migration/ram.c   |  2 +-
 migration/ram.h   |  2 +-
 migration/rdma.c  |  2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5a56bd0c91..cf6cec5fb6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -518,11 +518,11 @@ fail:
 exit(EXIT_FAILURE);
 }
 
-static void migration_incoming_setup(QEMUFile *f)
+static void migration_incoming_setup(QEMUFile *f, Error **errp)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 
-if (multifd_load_setup() != 0) {
+if (multifd_load_setup(errp) != 0) {
 /* We haven't been able to create multifd threads
nothing better to do */
 exit(EXIT_FAILURE);
@@ -572,13 +572,13 @@ static bool postcopy_try_recover(QEMUFile *f)
 return false;
 }
 
-void migration_fd_process_incoming(QEMUFile *f)
+void migration_fd_process_incoming(QEMUFile *f, Error **errp)
 {
 if (postcopy_try_recover(f)) {
 return;
 }
 
-migration_incoming_setup(f);
+migration_incoming_setup(f, errp);
 migration_incoming_process();
 }
 
@@ -596,7 +596,7 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error 
**errp)
 return;
 }
 
-migration_incoming_setup(f);
+migration_incoming_setup(f, errp);
 
 /*
  * Common migration only needs one channel, so we can start
diff --git a/migration/migration.h b/migration/migration.h
index 79b3dda146..545f283ae7 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -265,7 +265,7 @@ struct MigrationState
 
 void migrate_set_state(int *state, int old_state, int new_state);
 
-void migration_fd_process_incoming(QEMUFile *f);
+void migration_fd_process_incoming(QEMUFile *f, Error **errp);
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
 void migration_incoming_process(void);
 
diff --git a/migration/ram.c b/migration/ram.c
index 1f364cc23d..fcf50e648a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1421,7 +1421,7 @@ static void *multifd_recv_thread(void *opaque)
 return NULL;
 }
 
-int multifd_load_setup(void)
+int multifd_load_setup(Error **errp)
 {
 int thread_count;
 uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
diff --git a/migration/ram.h b/migration/ram.h
index da22a417ea..42be471d52 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -43,7 +43,7 @@ uint64_t ram_bytes_total(void);
 
 int multifd_save_setup(Error **errp);
 void multifd_save_cleanup(void);
-int multifd_load_setup(void);
+int multifd_load_setup(Error **errp);
 int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
 bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
diff --git a/migration/rdma.c b/migration/rdma.c
index e241dcb992..2379b8345b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -4004,7 +4004,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 }
 
 rdma->migration_started_on_destination = 1;
-migration_fd_process_incoming(f);
+migration_fd_process_incoming(f, errp);
 }
 
 void rdma_start_incoming_migration(const char *host_port, Error **errp)
-- 
2.23.0

[PATCH v2 02/10] migration-test: Add migration multifd test

2019-12-17 Thread Juan Quintela

We set multifd-channels.

Signed-off-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Thomas Huth 
Tested-by: Wei Yang 
Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 56 ++
 1 file changed, 56 insertions(+)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index f58430c1cb..1c9f2c4e6a 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -1361,6 +1361,61 @@ static void test_migrate_auto_converge(void)
 test_migrate_end(from, to, true);
 }
 
+static void test_multifd_tcp(void)
+{
+MigrateStart *args = migrate_start_new();
+QTestState *from, *to;
+QDict *rsp;
+char *uri;
+
+if (test_migrate_start(, , "defer", args)) {
+return;
+}
+
+/*
+ * We want to pick a speed slow enough that the test completes
+ * quickly, but that it doesn't complete precopy even on a slow
+ * machine, so also set the downtime.
+ */
+/* 1 ms should make it not converge*/
+migrate_set_parameter_int(from, "downtime-limit", 1);
+/* 1GB/s */
+migrate_set_parameter_int(from, "max-bandwidth", 10);
+
+migrate_set_parameter_int(from, "multifd-channels", 16);
+migrate_set_parameter_int(to, "multifd-channels", 16);
+
+migrate_set_capability(from, "multifd", "true");
+migrate_set_capability(to, "multifd", "true");
+
+/* Start incoming migration from the 1st socket */
+rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
+   "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+qobject_unref(rsp);
+
+/* Wait for the first serial output from the source */
+wait_for_serial("src_serial");
+
+uri = migrate_get_socket_address(to, "socket-address");
+
+migrate(from, uri, "{}");
+
+wait_for_migration_pass(from);
+
+/* 300ms it should converge */
+migrate_set_parameter_int(from, "downtime-limit", 600);
+
+if (!got_stop) {
+qtest_qmp_eventwait(from, "STOP");
+}
+qtest_qmp_eventwait(to, "RESUME");
+
+wait_for_serial("dest_serial");
+wait_for_migration_complete(from);
+test_migrate_end(from, to, true);
+free(uri);
+}
+
 int main(int argc, char **argv)
 {
 char template[] = "/tmp/migration-test-XX";
@@ -1425,6 +1480,7 @@ int main(int argc, char **argv)
test_validate_uuid_dst_not_set);
 
 qtest_add_func("/migration/auto_converge", test_migrate_auto_converge);
+qtest_add_func("/migration/multifd/tcp", test_multifd_tcp);
 
 ret = g_test_run();
 
-- 
2.23.0

[PATCH v2 10/10] migration-test: Use a struct for test_migrate_start parameters

2019-12-17 Thread Juan Quintela

It has two bools and two strings, it is very difficult to remember
which does what.  And it makes very difficult to add new parameters as
we need to modify all the callers.

Signed-off-by: Juan Quintela 

---

Move the free after last use.
---
 tests/migration-test.c | 118 +++--
 1 file changed, 78 insertions(+), 40 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 37e9663ab4..f58430c1cb 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -544,10 +544,31 @@ static void migrate_postcopy_start(QTestState *from, 
QTestState *to)
 qtest_qmp_eventwait(to, "RESUME");
 }
 
+typedef struct {
+bool hide_stderr;
+bool use_shmem;
+char *opts_source;
+char *opts_target;
+} MigrateStart;
+
+static MigrateStart *migrate_start_new(void)
+{
+MigrateStart *args = g_new0(MigrateStart, 1);
+
+args->opts_source = g_strdup("");
+args->opts_target = g_strdup("");
+return args;
+}
+
+static void migrate_start_destroy(MigrateStart *args)
+{
+g_free(args->opts_source);
+g_free(args->opts_target);
+g_free(args);
+}
+
 static int test_migrate_start(QTestState **from, QTestState **to,
-   const char *uri, bool hide_stderr,
-   bool use_shmem, const char *opts_src,
-   const char *opts_dst)
+  const char *uri, MigrateStart *args)
 {
 gchar *arch_source, *arch_target;
 gchar *cmd_source, *cmd_target;
@@ -560,10 +581,7 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 const char *machine_args;
 const char *memory_size;
 
-opts_src = opts_src ? opts_src : "";
-opts_dst = opts_dst ? opts_dst : "";
-
-if (use_shmem) {
+if (args->use_shmem) {
 if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
 g_test_skip("/dev/shm is not supported");
 return -1;
@@ -623,13 +641,13 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 
 g_free(bootpath);
 
-if (hide_stderr) {
+if (args->hide_stderr) {
 ignore_stderr = "2>/dev/null";
 } else {
 ignore_stderr = "";
 }
 
-if (use_shmem) {
+if (args->use_shmem) {
 shmem_path = g_strdup_printf("/dev/shm/qemu-%d", getpid());
 shmem_opts = g_strdup_printf(
 "-object memory-backend-file,id=mem0,size=%s"
@@ -647,7 +665,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  "%s %s %s %s",
  machine_type, machine_args,
  memory_size, tmpfs,
- arch_source, shmem_opts, opts_src,
+ arch_source, shmem_opts, args->opts_source,
  ignore_stderr);
 g_free(arch_source);
 *from = qtest_init(cmd_source);
@@ -661,8 +679,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  "%s %s %s %s",
  machine_type, machine_args,
  memory_size, tmpfs, uri,
- arch_target, shmem_opts, opts_dst,
- ignore_stderr);
+ arch_target, shmem_opts,
+ args->opts_target, ignore_stderr);
 g_free(arch_target);
 *to = qtest_init(cmd_target);
 g_free(cmd_target);
@@ -672,10 +690,11 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
  * Remove shmem file immediately to avoid memory leak in test failed case.
  * It's valid becase QEMU has already opened this file
  */
-if (use_shmem) {
+if (args->use_shmem) {
 unlink(shmem_path);
 g_free(shmem_path);
 }
+migrate_start_destroy(args);
 
 return 0;
 }
@@ -762,13 +781,13 @@ static void test_deprecated(void)
 }
 
 static int migrate_postcopy_prepare(QTestState **from_ptr,
- QTestState **to_ptr,
- bool hide_error)
+QTestState **to_ptr,
+MigrateStart *args)
 {
 char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
 QTestState *from, *to;
 
-if (test_migrate_start(, , uri, hide_error, false, NULL, NULL)) {
+if (test_migrate_start(, , uri, args)) {
 return -1;
 }
 
@@ -813,9 +832,10 @@ static void migrate_postcopy_complete(QTestState *from, 
QTestState *to)
 
 static void test_postcopy(void)
 {
+MigrateStart *args = migrate_start_new();
 QTestState *from, *to;
 
-if (migrate_postcopy_prepare(, , false)) {
+if (migrate_postcopy_prepare(, , args)) {
 return;
 }
 migrate_postcopy_start(from, to);
@@ -824,10 +844,13 @@ static void test_postcopy(void)
 
 static void

[PATCH v2 04/10] migration: Make multifd_save_setup() get an Error parameter

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/migration.c | 2 +-
 migration/ram.c   | 2 +-
 migration/ram.h   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e7f707e033..5a56bd0c91 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3400,7 +3400,7 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 return;
 }
 
-if (multifd_save_setup() != 0) {
+if (multifd_save_setup(_in) != 0) {
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_FAILED);
 migrate_fd_cleanup(s);
diff --git a/migration/ram.c b/migration/ram.c
index 38070f1bb2..1f364cc23d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1208,7 +1208,7 @@ static void multifd_new_send_channel_async(QIOTask *task, 
gpointer opaque)
 }
 }
 
-int multifd_save_setup(void)
+int multifd_save_setup(Error **errp)
 {
 int thread_count;
 uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
diff --git a/migration/ram.h b/migration/ram.h
index bd0eee79b6..da22a417ea 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -41,7 +41,7 @@ int xbzrle_cache_resize(int64_t new_size, Error **errp);
 uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_total(void);
 
-int multifd_save_setup(void);
+int multifd_save_setup(Error **errp);
 void multifd_save_cleanup(void);
 int multifd_load_setup(void);
 int multifd_load_cleanup(Error **errp);
-- 
2.23.0

[PATCH v2 00/10] Multifd Migration Compression

2019-12-17 Thread Juan Quintela

[v2]
- rebase on top of previous arguments posted to the list
- introduces zlib compression
- introduces zstd compression

Please help if you know anything about zstd/zlib compression.

This puts compression on top of multifd. Advantages about current
compression:

- We copy all pages in a single packet and then compress the whole
  thing.

- We reuse the compression stream for all the packets sent through the
  same channel.

- We can select nocomp/zlib/zstd levels of compression.

Please, review.

Juan Quintela (10):
  migration: Increase default number of multifd channels to 16
  migration-test: Add migration multifd test
  migration-test: introduce functions to handle string parameters
  migration: Make multifd_save_setup() get an Error parameter
  migration: Make multifd_load_setup() get an Error parameter
  migration: Add multifd-compress parameter
  migration: Make no compression operations into its own structure
  migration: Add zlib compression multifd support
  configure: Enable test and libs for zstd
  migration: Add zstd compression multifd support

 configure|  30 ++
 hw/core/qdev-properties.c|  13 +
 include/hw/qdev-properties.h |   3 +
 migration/migration.c|  36 +-
 migration/migration.h|   3 +-
 migration/ram.c  | 750 ++-
 migration/ram.h  |   4 +-
 migration/rdma.c |   2 +-
 monitor/hmp-cmds.c   |  13 +
 qapi/migration.json  |  30 +-
 tests/migration-test.c   | 112 ++
 11 files changed, 972 insertions(+), 24 deletions(-)

-- 
2.23.0

[PATCH v2 08/10] migration-test: Move -incomming handling to common commandline

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index e1304d70fc..14f2ce30fb 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -580,9 +580,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 machine_args = "";
 memory_size = "150M";
 cmd_src = g_strdup_printf("-drive file=%s,format=raw", bootpath);
-cmd_dst = g_strdup_printf("-drive file=%s,format=raw"
-  " -incoming %s",
-  bootpath, uri);
+cmd_dst = g_strdup(cmd_src);
 start_address = X86_TEST_MEM_START;
 end_address = X86_TEST_MEM_END;
 } else if (g_str_equal(arch, "s390x")) {
@@ -591,9 +589,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 machine_args = "";
 memory_size = "128M";
 cmd_src = g_strdup_printf("-bios %s", bootpath);
-cmd_dst = g_strdup_printf("-bios %s"
-  " -incoming %s",
-  bootpath, uri);
+cmd_dst = g_strdup(cmd_src);
 start_address = S390_TEST_MEM_START;
 end_address = S390_TEST_MEM_END;
 } else if (strcmp(arch, "ppc64") == 0) {
@@ -605,7 +601,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
   "'nvramrc=hex .\" _\" begin %x %x "
   "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
   "until'", end_address, start_address);
-cmd_dst = g_strdup_printf(" -incoming %s", uri);
+cmd_dst = g_strdup("");
 start_address = PPC_TEST_MEM_START;
 end_address = PPC_TEST_MEM_END;
 } else if (strcmp(arch, "aarch64") == 0) {
@@ -616,11 +612,7 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 cmd_src = g_strdup_printf("-cpu max "
   "-kernel %s",
   bootpath);
-cmd_dst = g_strdup_printf("-cpu max "
-  "-kernel %s "
-  "-incoming %s",
-  bootpath, uri);
-
+cmd_dst = g_strdup(cmd_src);
 start_address = ARM_TEST_MEM_START;
 end_address = ARM_TEST_MEM_END;
 
@@ -650,11 +642,11 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 
 cmd_source = g_strdup_printf("-machine %saccel=kvm:tcg%s "
  "-name source,debug-threads=on "
- "-serial file:%s/src_serial "
  "-m %s "
+ "-serial file:%s/src_serial "
  "%s %s %s %s",
  machine_type, machine_args,
- tmpfs, memory_size,
+ memory_size, tmpfs,
  cmd_src, shmem_opts, opts_src, ignore_stderr);
 g_free(cmd_src);
 *from = qtest_init(cmd_source);
@@ -664,9 +656,10 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
  "-name target,debug-threads=on "
  "-m %s "
  "-serial file:%s/dest_serial "
+ "-incoming %s "
  "%s %s %s %s",
  machine_type, machine_args,
- tmpfs, memory_size,
+ memory_size, tmpfs, uri,
  cmd_dst, shmem_opts, opts_dst, ignore_stderr);
 g_free(cmd_dst);
 *to = qtest_init(cmd_target);
-- 
2.23.0

[PATCH v2 07/10] migration-test: Move -serial handling to common commandline

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 41 -
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 6e828fbc6c..e1304d70fc 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -579,13 +579,10 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_type = "";
 machine_args = "";
 memory_size = "150M";
-cmd_src = g_strdup_printf(" -serial file:%s/src_serial"
-  " -drive file=%s,format=raw",
-  tmpfs, bootpath);
-cmd_dst = g_strdup_printf(" -serial file:%s/dest_serial"
-  " -drive file=%s,format=raw"
+cmd_src = g_strdup_printf("-drive file=%s,format=raw", bootpath);
+cmd_dst = g_strdup_printf("-drive file=%s,format=raw"
   " -incoming %s",
-  tmpfs, bootpath, uri);
+  bootpath, uri);
 start_address = X86_TEST_MEM_START;
 end_address = X86_TEST_MEM_END;
 } else if (g_str_equal(arch, "s390x")) {
@@ -593,28 +590,22 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_type = "";
 machine_args = "";
 memory_size = "128M";
-cmd_src = g_strdup_printf(" -serial file:%s/src_serial -bios %s",
-  tmpfs, bootpath);
-cmd_dst = g_strdup_printf(" -serial file:%s/dest_serial -bios %s"
+cmd_src = g_strdup_printf("-bios %s", bootpath);
+cmd_dst = g_strdup_printf("-bios %s"
   " -incoming %s",
-  tmpfs, bootpath, uri);
+  bootpath, uri);
 start_address = S390_TEST_MEM_START;
 end_address = S390_TEST_MEM_END;
 } else if (strcmp(arch, "ppc64") == 0) {
 machine_type = "";
 machine_args = ",vsmt=8";
 memory_size = "256M";
-cmd_src = g_strdup_printf("-nodefaults"
-  " -serial file:%s/src_serial"
-  " -prom-env 'use-nvramrc?=true' -prom-env "
+cmd_src = g_strdup_printf("-nodefaults "
+  "-prom-env 'use-nvramrc?=true' -prom-env "
   "'nvramrc=hex .\" _\" begin %x %x "
   "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
-  "until'", tmpfs, end_address,
-  start_address);
-cmd_dst = g_strdup_printf(" -serial file:%s/dest_serial"
-  " -incoming %s",
-  tmpfs, uri);
-
+  "until'", end_address, start_address);
+cmd_dst = g_strdup_printf(" -incoming %s", uri);
 start_address = PPC_TEST_MEM_START;
 end_address = PPC_TEST_MEM_END;
 } else if (strcmp(arch, "aarch64") == 0) {
@@ -623,14 +614,12 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_args = "gic-version=max";
 memory_size = "150M";
 cmd_src = g_strdup_printf("-cpu max "
-  "-serial file:%s/src_serial "
   "-kernel %s",
-  tmpfs, bootpath);
+  bootpath);
 cmd_dst = g_strdup_printf("-cpu max "
-  "-serial file:%s/dest_serial "
   "-kernel %s "
   "-incoming %s",
-  tmpfs, bootpath, uri);
+  bootpath, uri);
 
 start_address = ARM_TEST_MEM_START;
 end_address = ARM_TEST_MEM_END;
@@ -661,10 +650,11 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 
 cmd_source = g_strdup_printf("-machine %saccel=kvm:tcg%s "
  "-name source,debug-threads=on "
+ "-serial file:%s/src_serial "
  "-m %s "
  "%s %s %s %s",
  machine_type, machine_args,
- memory_size,
+ tmpfs, memory_size,
  cmd_src, shmem_opts, opts_src, ignore_stderr);
 g_free(cmd_src);
 *from = qtest_init(cmd_source);
@@ -673,9 +663,10 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 cmd_target = g_strdup_printf("-machine %saccel=kvm:tcg%s "
  "-name target,debug-threads=on "
  "-m %s "
+ "-serial file:%s/dest_serial "

[PATCH v2 08/10] migration: Add zlib compression multifd support

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 hw/core/qdev-properties.c |   2 +-
 migration/ram.c   | 264 ++
 qapi/migration.json   |   2 +-
 tests/migration-test.c|   6 +
 4 files changed, 272 insertions(+), 2 deletions(-)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 644705235e..e8ff317a60 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -654,7 +654,7 @@ const PropertyInfo qdev_prop_fdc_drive_type = {
 const PropertyInfo qdev_prop_multifd_compress = {
 .name = "MultifdCompress",
 .description = "multifd_compress values, "
-   "none",
+   "none/zlib",
 .enum_table = _lookup,
 .get = get_enum,
 .set = set_enum,
diff --git a/migration/ram.c b/migration/ram.c
index 10661e03ae..5006d719b4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -583,6 +583,7 @@ exit:
 
 #define MULTIFD_FLAG_SYNC (1 << 0)
 #define MULTIFD_FLAG_NOCOMP (1 << 1)
+#define MULTIFD_FLAG_ZLIB (1 << 2)
 
 /* This value needs to be a multiple of qemu_target_page_size() */
 #define MULTIFD_PACKET_SIZE (512 * 1024)
@@ -625,6 +626,15 @@ typedef struct {
 RAMBlock *block;
 } MultiFDPages_t;
 
+struct zlib_data {
+/* stream for compression */
+z_stream zs;
+/* compressed buffer */
+uint8_t *zbuff;
+/* size of compressed buffer */
+uint32_t zbuff_len;
+};
+
 typedef struct {
 /* this fields are not changed once the thread is created */
 /* channel number */
@@ -846,8 +856,262 @@ static MultiFDMethods multifd_nocomp_ops = {
 .recv_pages = nocomp_recv_pages
 };
 
+/* Multifd zlib compression */
+
+/**
+ * zlib_send_setup: setup send side
+ *
+ * Setup each channel with zlib compression.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @errp: pointer to an error
+ */
+static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
+{
+uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+struct zlib_data *z = g_malloc0(sizeof(struct zlib_data));
+z_stream *zs = >zs;
+
+p->data = z;
+zs->zalloc = Z_NULL;
+zs->zfree = Z_NULL;
+zs->opaque = Z_NULL;
+if (deflateInit(zs, migrate_compress_level()) != Z_OK) {
+g_free(z);
+error_setg(errp, "multifd %d: deflate init failed", p->id);
+return -1;
+}
+/* We will never have more than page_count pages */
+z->zbuff_len = page_count * qemu_target_page_size();
+z->zbuff_len *= 2;
+z->zbuff = g_try_malloc(z->zbuff_len);
+if (!z->zbuff) {
+g_free(z);
+error_setg(errp, "multifd %d: out of memory for zbuff", p->id);
+return -1;
+}
+return 0;
+}
+
+/**
+ * zlib_send_cleanup: cleanup send side
+ *
+ * Close the channel and return memory.
+ *
+ * @p: Params for the channel that we are using
+ */
+static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
+{
+struct zlib_data *z = p->data;
+
+deflateEnd(>zs);
+g_free(z->zbuff);
+z->zbuff = NULL;
+g_free(p->data);
+p->data = NULL;
+}
+
+/**
+ * zlib_send_prepare: prepare date to be able to send
+ *
+ * Create a compressed buffer with all the pages that we are going to
+ * send.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @used: number of pages used
+ */
+static int zlib_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
+{
+struct iovec *iov = p->pages->iov;
+struct zlib_data *z = p->data;
+z_stream *zs = >zs;
+uint32_t out_size = 0;
+int ret;
+uint32_t i;
+
+for (i = 0; i < used; i++) {
+uint32_t available = z->zbuff_len - out_size;
+int flush = Z_NO_FLUSH;
+
+if (i == used  - 1) {
+flush = Z_SYNC_FLUSH;
+}
+
+zs->avail_in = iov[i].iov_len;
+zs->next_in = iov[i].iov_base;
+
+zs->avail_out = available;
+zs->next_out = z->zbuff + out_size;
+
+ret = deflate(zs, flush);
+if (ret != Z_OK) {
+error_setg(errp, "multifd %d: deflate returned %d instead of Z_OK",
+   p->id, ret);
+return -1;
+}
+out_size += available - zs->avail_out;
+}
+p->next_packet_size = out_size;
+p->flags |= MULTIFD_FLAG_ZLIB;
+
+return 0;
+}
+
+/**
+ * zlib_send_write: do the actual write of the data
+ *
+ * Do the actual write of the comprresed buffer.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @used: number of pages used
+ * @errp: pointer to an error
+ */
+static int zlib_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
+{
+struct zlib_data *z = p->data;
+
+return qio_channel_write_all(p->c, (void *)z->zbuff, p->next_packet_size,
+ errp);
+}
+
+/**
+ * zlib_recv_setup: setup receive side
+ *
+ * Create the compressed channel and buffer.
+

[PATCH v2 04/10] migration-test: Move memory size to common commandline

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 44 --
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 5a63158872..9d40f2d30c 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -565,6 +565,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 const char *arch = qtest_get_arch();
 const char *machine_type;
 const char *machine_args;
+const char *memory_size;
 
 opts_src = opts_src ? opts_src : "";
 opts_dst = opts_dst ? opts_dst : "";
@@ -585,15 +586,14 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 init_bootfile(bootpath, x86_bootsect, sizeof(x86_bootsect));
 machine_type = "";
 machine_args = "";
-extra_opts = use_shmem ? get_shmem_opts("150M", shmem_path) : NULL;
-cmd_src = g_strdup_printf("-m 150M"
-  " -name source,debug-threads=on"
+memory_size = "150M";
+extra_opts = use_shmem ? get_shmem_opts(memory_size, shmem_path) : 
NULL;
+cmd_src = g_strdup_printf(" -name source,debug-threads=on"
   " -serial file:%s/src_serial"
   " -drive file=%s,format=raw %s",
   tmpfs, bootpath,
   extra_opts ? extra_opts : "");
-cmd_dst = g_strdup_printf("-m 150M"
-  " -name target,debug-threads=on"
+cmd_dst = g_strdup_printf(" -name target,debug-threads=on"
   " -serial file:%s/dest_serial"
   " -drive file=%s,format=raw"
   " -incoming %s %s",
@@ -605,14 +605,13 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 init_bootfile(bootpath, s390x_elf, sizeof(s390x_elf));
 machine_type = "";
 machine_args = "";
-extra_opts = use_shmem ? get_shmem_opts("128M", shmem_path) : NULL;
-cmd_src = g_strdup_printf("-m 128M"
-  " -name source,debug-threads=on"
+memory_size = "128M";
+extra_opts = use_shmem ? get_shmem_opts(memory_size, shmem_path) : 
NULL;
+cmd_src = g_strdup_printf(" -name source,debug-threads=on"
   " -serial file:%s/src_serial -bios %s %s",
   tmpfs, bootpath,
   extra_opts ? extra_opts : "");
-cmd_dst = g_strdup_printf("-m 128M"
-  " -name target,debug-threads=on"
+cmd_dst = g_strdup_printf(" -name target,debug-threads=on"
   " -serial file:%s/dest_serial -bios %s"
   " -incoming %s %s",
   tmpfs, bootpath, uri,
@@ -622,8 +621,9 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 } else if (strcmp(arch, "ppc64") == 0) {
 machine_type = "";
 machine_args = ",vsmt=8";
-extra_opts = use_shmem ? get_shmem_opts("256M", shmem_path) : NULL;
-cmd_src = g_strdup_printf("-m 256M -nodefaults"
+memory_size = "256M";
+extra_opts = use_shmem ? get_shmem_opts(memory_size, shmem_path) : 
NULL;
+cmd_src = g_strdup_printf("-nodefaults"
   " -name source,debug-threads=on"
   " -serial file:%s/src_serial"
   " -prom-env 'use-nvramrc?=true' -prom-env "
@@ -631,8 +631,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
   "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
   "until' %s", tmpfs, end_address,
   start_address, extra_opts ? extra_opts : "");
-cmd_dst = g_strdup_printf("-m 256M"
-  " -name target,debug-threads=on"
+cmd_dst = g_strdup_printf(" -name target,debug-threads=on"
   " -serial file:%s/dest_serial"
   " -incoming %s %s",
   tmpfs, uri,
@@ -644,14 +643,15 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 init_bootfile(bootpath, aarch64_kernel, sizeof(aarch64_kernel));
 machine_type = "virt,";
 machine_args = "gic-version=max";
-extra_opts = use_shmem ? get_shmem_opts("150M", shmem_path) : NULL;
+memory_size = "150M";
+extra_opts = use_shmem ? get_shmem_opts(memory_size, shmem_path) : 
NULL;
 cmd_src = g_strdup_printf("-name vmsource,debug-threads=on -cpu max "
-  "-m 150M -serial file:%s/src_serial "
+  "-serial file:%s/src_serial "

[PATCH v2 06/10] migration-test: Move -name handling to common commandline

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index e17d432043..6e828fbc6c 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -579,12 +579,10 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_type = "";
 machine_args = "";
 memory_size = "150M";
-cmd_src = g_strdup_printf(" -name source,debug-threads=on"
-  " -serial file:%s/src_serial"
+cmd_src = g_strdup_printf(" -serial file:%s/src_serial"
   " -drive file=%s,format=raw",
   tmpfs, bootpath);
-cmd_dst = g_strdup_printf(" -name target,debug-threads=on"
-  " -serial file:%s/dest_serial"
+cmd_dst = g_strdup_printf(" -serial file:%s/dest_serial"
   " -drive file=%s,format=raw"
   " -incoming %s",
   tmpfs, bootpath, uri);
@@ -595,11 +593,9 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_type = "";
 machine_args = "";
 memory_size = "128M";
-cmd_src = g_strdup_printf(" -name source,debug-threads=on"
-  " -serial file:%s/src_serial -bios %s",
+cmd_src = g_strdup_printf(" -serial file:%s/src_serial -bios %s",
   tmpfs, bootpath);
-cmd_dst = g_strdup_printf(" -name target,debug-threads=on"
-  " -serial file:%s/dest_serial -bios %s"
+cmd_dst = g_strdup_printf(" -serial file:%s/dest_serial -bios %s"
   " -incoming %s",
   tmpfs, bootpath, uri);
 start_address = S390_TEST_MEM_START;
@@ -609,15 +605,13 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_args = ",vsmt=8";
 memory_size = "256M";
 cmd_src = g_strdup_printf("-nodefaults"
-  " -name source,debug-threads=on"
   " -serial file:%s/src_serial"
   " -prom-env 'use-nvramrc?=true' -prom-env "
   "'nvramrc=hex .\" _\" begin %x %x "
   "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
   "until'", tmpfs, end_address,
   start_address);
-cmd_dst = g_strdup_printf(" -name target,debug-threads=on"
-  " -serial file:%s/dest_serial"
+cmd_dst = g_strdup_printf(" -serial file:%s/dest_serial"
   " -incoming %s",
   tmpfs, uri);
 
@@ -628,11 +622,11 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_type = "virt,";
 machine_args = "gic-version=max";
 memory_size = "150M";
-cmd_src = g_strdup_printf("-name vmsource,debug-threads=on -cpu max "
+cmd_src = g_strdup_printf("-cpu max "
   "-serial file:%s/src_serial "
   "-kernel %s",
   tmpfs, bootpath);
-cmd_dst = g_strdup_printf("-name vmdest,debug-threads=on -cpu max "
+cmd_dst = g_strdup_printf("-cpu max "
   "-serial file:%s/dest_serial "
   "-kernel %s "
   "-incoming %s",
@@ -666,6 +660,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 }
 
 cmd_source = g_strdup_printf("-machine %saccel=kvm:tcg%s "
+ "-name source,debug-threads=on "
  "-m %s "
  "%s %s %s %s",
  machine_type, machine_args,
@@ -676,6 +671,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 g_free(cmd_source);
 
 cmd_target = g_strdup_printf("-machine %saccel=kvm:tcg%s "
+ "-name target,debug-threads=on "
  "-m %s "
  "%s %s %s %s",
  machine_type, machine_args,
-- 
2.23.0

[PATCH v2 09/10] migration-test: Rename cmd_src/dst to arch_source/arch_target

2019-12-17 Thread Juan Quintela

This explains better what they do and avoid confussino with
command_src/target.

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 14f2ce30fb..37e9663ab4 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -549,7 +549,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
bool use_shmem, const char *opts_src,
const char *opts_dst)
 {
-gchar *cmd_src, *cmd_dst;
+gchar *arch_source, *arch_target;
 gchar *cmd_source, *cmd_target;
 const gchar *ignore_stderr;
 char *bootpath = NULL;
@@ -579,8 +579,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 machine_type = "";
 machine_args = "";
 memory_size = "150M";
-cmd_src = g_strdup_printf("-drive file=%s,format=raw", bootpath);
-cmd_dst = g_strdup(cmd_src);
+arch_source = g_strdup_printf("-drive file=%s,format=raw", bootpath);
+arch_target = g_strdup(arch_source);
 start_address = X86_TEST_MEM_START;
 end_address = X86_TEST_MEM_END;
 } else if (g_str_equal(arch, "s390x")) {
@@ -588,20 +588,20 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_type = "";
 machine_args = "";
 memory_size = "128M";
-cmd_src = g_strdup_printf("-bios %s", bootpath);
-cmd_dst = g_strdup(cmd_src);
+arch_source = g_strdup_printf("-bios %s", bootpath);
+arch_target = g_strdup(arch_source);
 start_address = S390_TEST_MEM_START;
 end_address = S390_TEST_MEM_END;
 } else if (strcmp(arch, "ppc64") == 0) {
 machine_type = "";
 machine_args = ",vsmt=8";
 memory_size = "256M";
-cmd_src = g_strdup_printf("-nodefaults "
-  "-prom-env 'use-nvramrc?=true' -prom-env "
-  "'nvramrc=hex .\" _\" begin %x %x "
-  "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
-  "until'", end_address, start_address);
-cmd_dst = g_strdup("");
+arch_source = g_strdup_printf("-nodefaults "
+  "-prom-env 'use-nvramrc?=true' -prom-env 
"
+  "'nvramrc=hex .\" _\" begin %x %x "
+  "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
+  "until'", end_address, start_address);
+arch_target = g_strdup("");
 start_address = PPC_TEST_MEM_START;
 end_address = PPC_TEST_MEM_END;
 } else if (strcmp(arch, "aarch64") == 0) {
@@ -609,10 +609,10 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_type = "virt,";
 machine_args = "gic-version=max";
 memory_size = "150M";
-cmd_src = g_strdup_printf("-cpu max "
-  "-kernel %s",
-  bootpath);
-cmd_dst = g_strdup(cmd_src);
+arch_source = g_strdup_printf("-cpu max "
+  "-kernel %s",
+  bootpath);
+arch_target = g_strdup(arch_source);
 start_address = ARM_TEST_MEM_START;
 end_address = ARM_TEST_MEM_END;
 
@@ -647,8 +647,9 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  "%s %s %s %s",
  machine_type, machine_args,
  memory_size, tmpfs,
- cmd_src, shmem_opts, opts_src, ignore_stderr);
-g_free(cmd_src);
+ arch_source, shmem_opts, opts_src,
+ ignore_stderr);
+g_free(arch_source);
 *from = qtest_init(cmd_source);
 g_free(cmd_source);
 
@@ -660,8 +661,9 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  "%s %s %s %s",
  machine_type, machine_args,
  memory_size, tmpfs, uri,
- cmd_dst, shmem_opts, opts_dst, ignore_stderr);
-g_free(cmd_dst);
+ arch_target, shmem_opts, opts_dst,
+ ignore_stderr);
+g_free(arch_target);
 *to = qtest_init(cmd_target);
 g_free(cmd_target);
 
-- 
2.23.0

[PATCH v2 01/10] migration-test: Create cmd_soure and cmd_target

2019-12-17 Thread Juan Quintela

We are repeating almost everything for each machine while creating the
command line for migration.  And once for source and another for
destination.  We start putting there opts_src and opts_dst.

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 44 --
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index a5343fdc66..fbddcf2317 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -557,6 +557,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
const char *opts_dst)
 {
 gchar *cmd_src, *cmd_dst;
+gchar *cmd_source, *cmd_target;
 char *bootpath = NULL;
 char *extra_opts = NULL;
 char *shmem_path = NULL;
@@ -584,16 +585,16 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 cmd_src = g_strdup_printf("-machine accel=%s -m 150M"
   " -name source,debug-threads=on"
   " -serial file:%s/src_serial"
-  " -drive file=%s,format=raw %s %s",
+  " -drive file=%s,format=raw %s",
   accel, tmpfs, bootpath,
-  extra_opts ? extra_opts : "", opts_src);
+  extra_opts ? extra_opts : "");
 cmd_dst = g_strdup_printf("-machine accel=%s -m 150M"
   " -name target,debug-threads=on"
   " -serial file:%s/dest_serial"
   " -drive file=%s,format=raw"
-  " -incoming %s %s %s",
+  " -incoming %s %s",
   accel, tmpfs, bootpath, uri,
-  extra_opts ? extra_opts : "", opts_dst);
+  extra_opts ? extra_opts : "");
 start_address = X86_TEST_MEM_START;
 end_address = X86_TEST_MEM_END;
 } else if (g_str_equal(arch, "s390x")) {
@@ -601,15 +602,15 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 extra_opts = use_shmem ? get_shmem_opts("128M", shmem_path) : NULL;
 cmd_src = g_strdup_printf("-machine accel=%s -m 128M"
   " -name source,debug-threads=on"
-  " -serial file:%s/src_serial -bios %s %s %s",
+  " -serial file:%s/src_serial -bios %s %s",
   accel, tmpfs, bootpath,
-  extra_opts ? extra_opts : "", opts_src);
+  extra_opts ? extra_opts : "");
 cmd_dst = g_strdup_printf("-machine accel=%s -m 128M"
   " -name target,debug-threads=on"
   " -serial file:%s/dest_serial -bios %s"
-  " -incoming %s %s %s",
+  " -incoming %s %s",
   accel, tmpfs, bootpath, uri,
-  extra_opts ? extra_opts : "", opts_dst);
+  extra_opts ? extra_opts : "");
 start_address = S390_TEST_MEM_START;
 end_address = S390_TEST_MEM_END;
 } else if (strcmp(arch, "ppc64") == 0) {
@@ -620,15 +621,14 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
   " -prom-env 'use-nvramrc?=true' -prom-env "
   "'nvramrc=hex .\" _\" begin %x %x "
   "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
-  "until' %s %s",  accel, tmpfs, end_address,
-  start_address, extra_opts ? extra_opts : "",
-  opts_src);
+  "until' %s",  accel, tmpfs, end_address,
+  start_address, extra_opts ? extra_opts : "");
 cmd_dst = g_strdup_printf("-machine accel=%s,vsmt=8 -m 256M"
   " -name target,debug-threads=on"
   " -serial file:%s/dest_serial"
-  " -incoming %s %s %s",
+  " -incoming %s %s",
   accel, tmpfs, uri,
-  extra_opts ? extra_opts : "", opts_dst);
+  extra_opts ? extra_opts : "");
 
 start_address = PPC_TEST_MEM_START;
 end_address = PPC_TEST_MEM_END;
@@ -638,16 +638,16 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 cmd_src = g_strdup_printf("-machine virt,accel=%s,gic-version=max "
   "-name vmsource,debug-threads=on -cpu max "

[PATCH v2 03/10] migration-test: Move -machine to common commandline

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 51 +-
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 0c01ed3543..5a63158872 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -563,7 +563,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 char *extra_opts = NULL;
 char *shmem_path = NULL;
 const char *arch = qtest_get_arch();
-const char *accel = "kvm:tcg";
+const char *machine_type;
+const char *machine_args;
 
 opts_src = opts_src ? opts_src : "";
 opts_dst = opts_dst ? opts_dst : "";
@@ -582,72 +583,78 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 /* the assembled x86 boot sector should be exactly one sector large */
 assert(sizeof(x86_bootsect) == 512);
 init_bootfile(bootpath, x86_bootsect, sizeof(x86_bootsect));
+machine_type = "";
+machine_args = "";
 extra_opts = use_shmem ? get_shmem_opts("150M", shmem_path) : NULL;
-cmd_src = g_strdup_printf("-machine accel=%s -m 150M"
+cmd_src = g_strdup_printf("-m 150M"
   " -name source,debug-threads=on"
   " -serial file:%s/src_serial"
   " -drive file=%s,format=raw %s",
-  accel, tmpfs, bootpath,
+  tmpfs, bootpath,
   extra_opts ? extra_opts : "");
-cmd_dst = g_strdup_printf("-machine accel=%s -m 150M"
+cmd_dst = g_strdup_printf("-m 150M"
   " -name target,debug-threads=on"
   " -serial file:%s/dest_serial"
   " -drive file=%s,format=raw"
   " -incoming %s %s",
-  accel, tmpfs, bootpath, uri,
+  tmpfs, bootpath, uri,
   extra_opts ? extra_opts : "");
 start_address = X86_TEST_MEM_START;
 end_address = X86_TEST_MEM_END;
 } else if (g_str_equal(arch, "s390x")) {
 init_bootfile(bootpath, s390x_elf, sizeof(s390x_elf));
+machine_type = "";
+machine_args = "";
 extra_opts = use_shmem ? get_shmem_opts("128M", shmem_path) : NULL;
-cmd_src = g_strdup_printf("-machine accel=%s -m 128M"
+cmd_src = g_strdup_printf("-m 128M"
   " -name source,debug-threads=on"
   " -serial file:%s/src_serial -bios %s %s",
-  accel, tmpfs, bootpath,
+  tmpfs, bootpath,
   extra_opts ? extra_opts : "");
-cmd_dst = g_strdup_printf("-machine accel=%s -m 128M"
+cmd_dst = g_strdup_printf("-m 128M"
   " -name target,debug-threads=on"
   " -serial file:%s/dest_serial -bios %s"
   " -incoming %s %s",
-  accel, tmpfs, bootpath, uri,
+  tmpfs, bootpath, uri,
   extra_opts ? extra_opts : "");
 start_address = S390_TEST_MEM_START;
 end_address = S390_TEST_MEM_END;
 } else if (strcmp(arch, "ppc64") == 0) {
+machine_type = "";
+machine_args = ",vsmt=8";
 extra_opts = use_shmem ? get_shmem_opts("256M", shmem_path) : NULL;
-cmd_src = g_strdup_printf("-machine accel=%s,vsmt=8 -m 256M 
-nodefaults"
+cmd_src = g_strdup_printf("-m 256M -nodefaults"
   " -name source,debug-threads=on"
   " -serial file:%s/src_serial"
   " -prom-env 'use-nvramrc?=true' -prom-env "
   "'nvramrc=hex .\" _\" begin %x %x "
   "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
-  "until' %s",  accel, tmpfs, end_address,
+  "until' %s", tmpfs, end_address,
   start_address, extra_opts ? extra_opts : "");
-cmd_dst = g_strdup_printf("-machine accel=%s,vsmt=8 -m 256M"
+cmd_dst = g_strdup_printf("-m 256M"
   " -name target,debug-threads=on"
   " -serial file:%s/dest_serial"
   " -incoming %s %s",
-  accel, tmpfs, uri,
+  tmpfs, uri,
   extra_opts ? extra_opts : "");
 
 start_address = PPC_TEST_MEM_START;
 end_address = PPC_TEST_MEM_END;
 } else if (strcmp(arch, "aarch64") == 0) {

[PATCH v2 02/10] migration-test: Move hide_stderr to common commandline

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index fbddcf2317..0c01ed3543 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -558,6 +558,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 {
 gchar *cmd_src, *cmd_dst;
 gchar *cmd_source, *cmd_target;
+const gchar *ignore_stderr;
 char *bootpath = NULL;
 char *extra_opts = NULL;
 char *shmem_path = NULL;
@@ -661,24 +662,19 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 g_free(extra_opts);
 
 if (hide_stderr) {
-gchar *tmp;
-tmp = g_strdup_printf("%s 2>/dev/null", cmd_src);
-g_free(cmd_src);
-cmd_src = tmp;
-
-tmp = g_strdup_printf("%s 2>/dev/null", cmd_dst);
-g_free(cmd_dst);
-cmd_dst = tmp;
+ignore_stderr = "2>/dev/null";
+} else {
+ignore_stderr = "";
 }
 
-cmd_source = g_strdup_printf("%s %s",
- cmd_src, opts_src);
+cmd_source = g_strdup_printf("%s %s %s",
+ cmd_src, opts_src, ignore_stderr);
 g_free(cmd_src);
 *from = qtest_init(cmd_source);
 g_free(cmd_source);
 
-cmd_target = g_strdup_printf("%s %s",
- cmd_dst, opts_dst);
+cmd_target = g_strdup_printf("%s %s %s",
+ cmd_dst, opts_dst, ignore_stderr);
 g_free(cmd_dst);
 *to = qtest_init(cmd_target);
 g_free(cmd_target);
-- 
2.23.0

[PATCH v2 05/10] migration-test: Move shmem handling to common commandline

2019-12-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 76 +++---
 1 file changed, 34 insertions(+), 42 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 9d40f2d30c..e17d432043 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -380,13 +380,6 @@ static void cleanup(const char *filename)
 g_free(path);
 }
 
-static char *get_shmem_opts(const char *mem_size, const char *shmem_path)
-{
-return g_strdup_printf("-object memory-backend-file,id=mem0,size=%s"
-   ",mem-path=%s,share=on -numa node,memdev=mem0",
-   mem_size, shmem_path);
-}
-
 static char *SocketAddress_to_str(SocketAddress *addr)
 {
 switch (addr->type) {
@@ -560,8 +553,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 gchar *cmd_source, *cmd_target;
 const gchar *ignore_stderr;
 char *bootpath = NULL;
-char *extra_opts = NULL;
-char *shmem_path = NULL;
+char *shmem_opts;
+char *shmem_path;
 const char *arch = qtest_get_arch();
 const char *machine_type;
 const char *machine_args;
@@ -575,7 +568,6 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 g_test_skip("/dev/shm is not supported");
 return -1;
 }
-shmem_path = g_strdup_printf("/dev/shm/qemu-%d", getpid());
 }
 
 got_stop = false;
@@ -587,18 +579,15 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_type = "";
 machine_args = "";
 memory_size = "150M";
-extra_opts = use_shmem ? get_shmem_opts(memory_size, shmem_path) : 
NULL;
 cmd_src = g_strdup_printf(" -name source,debug-threads=on"
   " -serial file:%s/src_serial"
-  " -drive file=%s,format=raw %s",
-  tmpfs, bootpath,
-  extra_opts ? extra_opts : "");
+  " -drive file=%s,format=raw",
+  tmpfs, bootpath);
 cmd_dst = g_strdup_printf(" -name target,debug-threads=on"
   " -serial file:%s/dest_serial"
   " -drive file=%s,format=raw"
-  " -incoming %s %s",
-  tmpfs, bootpath, uri,
-  extra_opts ? extra_opts : "");
+  " -incoming %s",
+  tmpfs, bootpath, uri);
 start_address = X86_TEST_MEM_START;
 end_address = X86_TEST_MEM_END;
 } else if (g_str_equal(arch, "s390x")) {
@@ -606,36 +595,31 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 machine_type = "";
 machine_args = "";
 memory_size = "128M";
-extra_opts = use_shmem ? get_shmem_opts(memory_size, shmem_path) : 
NULL;
 cmd_src = g_strdup_printf(" -name source,debug-threads=on"
-  " -serial file:%s/src_serial -bios %s %s",
-  tmpfs, bootpath,
-  extra_opts ? extra_opts : "");
+  " -serial file:%s/src_serial -bios %s",
+  tmpfs, bootpath);
 cmd_dst = g_strdup_printf(" -name target,debug-threads=on"
   " -serial file:%s/dest_serial -bios %s"
-  " -incoming %s %s",
-  tmpfs, bootpath, uri,
-  extra_opts ? extra_opts : "");
+  " -incoming %s",
+  tmpfs, bootpath, uri);
 start_address = S390_TEST_MEM_START;
 end_address = S390_TEST_MEM_END;
 } else if (strcmp(arch, "ppc64") == 0) {
 machine_type = "";
 machine_args = ",vsmt=8";
 memory_size = "256M";
-extra_opts = use_shmem ? get_shmem_opts(memory_size, shmem_path) : 
NULL;
 cmd_src = g_strdup_printf("-nodefaults"
   " -name source,debug-threads=on"
   " -serial file:%s/src_serial"
   " -prom-env 'use-nvramrc?=true' -prom-env "
   "'nvramrc=hex .\" _\" begin %x %x "
   "do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
-  "until' %s", tmpfs, end_address,
-  start_address, extra_opts ? extra_opts : "");
+  "until'", tmpfs, end_address,
+  start_address);
 cmd_dst = g_strdup_printf(" -name target,debug-threads=on"
   " -serial file:%s/dest_serial"
-  "

[PATCH v2 00/10] Migration Arguments cleanup

2019-12-17 Thread Juan Quintela

[v2]
- fix use-after-free (thanks peter)

[v1]
This series simplify test_migrate_start() in two ways:
- simplify the command line creation, so everything that is common between
  architectures don't have to be repeated (DRY).
  Note that this bit remove lines of code.
- test_migrate_start() has two bools and two strings as arguments, it is very
  difficult to remmeber which is which and meaning.  And it is even worse to
  add new parameters.  Just pass them through one struct.

Please, review.

Juan Quintela (10):
  migration-test: Create cmd_soure and cmd_target
  migration-test: Move hide_stderr to common commandline
  migration-test: Move -machine to common commandline
  migration-test: Move memory size to common commandline
  migration-test: Move shmem handling to common commandline
  migration-test: Move -name handling to common commandline
  migration-test: Move -serial handling to common commandline
  migration-test: Move -incomming handling to common commandline
  migration-test: Rename cmd_src/dst to arch_source/arch_target
  migration-test: Use a struct for test_migrate_start parameters

 tests/migration-test.c | 269 +++--
 1 file changed, 149 insertions(+), 120 deletions(-)

-- 
2.23.0

[PATCH] target/ppc: Remove unused PPC_INPUT_INT defines

2019-12-17 Thread Fabiano Rosas

They were added in "16415335be Use correct input constant" with a
single use in kvm_arch_pre_run but that function's implementation was
removed by "1e8f51e856 ppc: remove idle_timer logic".

Signed-off-by: Fabiano Rosas 
---
 target/ppc/kvm.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 7406d18945..b19555e97e 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -1325,12 +1325,6 @@ int kvmppc_set_interrupt(PowerPCCPU *cpu, int irq, int 
level)
 return 0;
 }
 
-#if defined(TARGET_PPC64)
-#define PPC_INPUT_INT PPC970_INPUT_INT
-#else
-#define PPC_INPUT_INT PPC6xx_INPUT_INT
-#endif
-
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
 return;
-- 
2.23.0

Re: [PATCH RESEND v2] util/cutils: Expand do_strtosz parsing precision to 64 bits

2019-12-17 Thread Tao Xu


On 12/17/2019 7:44 PM, Christophe de Dinechin wrote:




On 9 Dec 2019, at 09:30, Tao Xu  wrote:

Parse input string both as a double and as a uint64_t, then use the
method which consumes more characters. Update the related test cases.

Signed-off-by: Tao Xu 
---

Changes in v2:
- Resend to use double small than DBL_MIN
- Add more test case for double overflow and underflow.
- Set mul as int64_t (Markus)
- Restore endptr (Markus)
---
tests/test-cutils.c| 37 +++
tests/test-keyval.c| 47 +
tests/test-qemu-opts.c | 39 +---
util/cutils.c  | 67 +++---
4 files changed, 75 insertions(+), 115 deletions(-)


[...]

+/*
+ * Parse @nptr both as a double and as a uint64_t, then use the method
+ * which consumes more characters.
+ */


Why do ever need to parse as double if you have uint64?



Because we want to keep do_strtosz Compatible with double input (such as 
1.5k).

+retd = qemu_strtod_finite(nptr, , );
+retu = qemu_strtou64(nptr, , 0, );
+use_strtod = strlen(suffixd) < strlen(suffixu);


You could simply compare suffixd and suffixu:

use_strtod = suffixd > suffixu;



Thank you for your suggestion.

+
+if (use_strtod) {
+endptr = suffixd;
+retval = retd;
+} else {
+endptr = suffixu;
+retval = retu;
+}

-retval = qemu_strtod_finite(nptr, , );
 if (retval) {
 goto out;
 }
-fraction = modf(val, );
-if (fraction != 0) {
-mul_required = 1;
+if (use_strtod) {
+fraction = modf(vald, );
+if (fraction != 0) {
+mul_required = 1;
+}
 }
 c = *endptr;
 mul = suffix_mul(c, unit);
@@ -238,17 +258,30 @@ static int do_strtosz(const char *nptr, const char **end,
 retval = -EINVAL;
 goto out;
 }
-/*
- * Values near UINT64_MAX overflow to 2**64 when converting to double
- * precision.  Compare against the maximum representable double precision
- * value below 2**64, computed as "the next value after 2**64 (0x1p64) in
- * the direction of 0".
- */
-if ((val * mul > nextafter(0x1p64, 0)) || val < 0) {
-retval = -ERANGE;
-goto out;
+
+if (use_strtod) {
+/*
+ * Values near UINT64_MAX overflow to 2**64 when converting to double
+ * precision. Compare against the maximum representable double 
precision
+ * value below 2**64, computed as "the next value after 2**64 (0x1p64)
+ * in the direction of 0".
+ */
+if ((vald * mul > nextafter(0x1p64, 0)) || vald < 0) {
+retval = -ERANGE;
+goto out;
+}
+*result = vald * mul;
+} else {
+/* Reject negative input and overflow output */
+while (qemu_isspace(*nptr)) {
+nptr++;
+}
+if (*nptr == '-' || UINT64_MAX / mul < valu) {
+retval = -ERANGE;
+goto out;
+}
+*result = valu * mul;
 }
-*result = val * mul;
 retval = 0;

out:
--
2.20.1

Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits

2019-12-17 Thread Tao Xu


On 12/17/2019 6:25 PM, Markus Armbruster wrote:

Tao Xu  writes:


On 12/5/19 11:29 PM, Markus Armbruster wrote:

Tao Xu  writes:


Parse input string both as a double and as a uint64_t, then use the
method which consumes more characters. Update the related test cases.

Signed-off-by: Tao Xu 
---

[...]

diff --git a/util/cutils.c b/util/cutils.c
index 77acadc70a..b08058c57c 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end,
 const char default_suffix, int64_t unit,
 uint64_t *result)
   {
-int retval;
-const char *endptr;
+int retval, retd, retu;
+const char *suffix, *suffixd, *suffixu;
   unsigned char c;
   int mul_required = 0;
-double val, mul, integral, fraction;
+bool use_strtod;
+uint64_t valu;
+double vald, mul, integral, fraction;


Note for later: @mul is double.


+
+retd = qemu_strtod_finite(nptr, , );
+retu = qemu_strtou64(nptr, , 0, );


Note for later: passing 0 to base accepts octal and hexadecimal
integers.


+use_strtod = strlen(suffixd) < strlen(suffixu);
+
+/*
+ * Parse @nptr both as a double and as a uint64_t, then use the method
+ * which consumes more characters.
+ */


The comment is in a funny place.  I'd put it right before the
qemu_strtod_finite() line.


+if (use_strtod) {
+suffix = suffixd;
+retval = retd;
+} else {
+suffix = suffixu;
+retval = retu;
+}
   -retval = qemu_strtod_finite(nptr, , );
   if (retval) {
   goto out;
   }


This is even more subtle than it looks.

A close reading of the function contracts leads to three cases for each
conversion:

* parse error (including infinity and NaN)

@retu / @retd is -EINVAL
@valu / @vald is uninitialized
@suffixu / @suffixd is @nptr

* range error

@retu / @retd is -ERANGE
@valu / @vald is our best approximation of the conversion result
@suffixu / @suffixd points to the first character not consumed by the
conversion.

Sub-cases:

- uint64_t overflow

  We know the conversion result exceeds UINT64_MAX.

- double overflow

  we know the conversion result's magnitude exceeds the largest
  representable finite double DBL_MAX.

- double underflow

  we know the conversion result is close to zero (closer than DBL_MIN,
  the smallest normalized positive double).

* success

@retu / @retd is 0
@valu / @vald is the conversion result
@suffixu / @suffixd points to the first character not consumed by the
conversion.

This leads to a matrix (parse error, uint64_t overflow, success) x
(parse error, double overflow, double underflow, success).  We need to
check the code does what we want for each element of this matrix, and
document any behavior that's not perfectly obvious.

(success, success): we pick uint64_t if qemu_strtou64() consumed more
characters than qemu_strtod_finite(), else double.  "More" is important
here; when they consume the same characters, we *need* to use the
uint64_t result.  Example: for "18446744073709551615", we need to use
uint64_t 18446744073709551615, not double 18446744073709551616.0.  But
for "18446744073709551616.", we need to use the double.  Good.


Also fun: for "0123", we use uint64_t 83, not double 123.0.  But for
"0123.", we use 123.0, not 83.

Do we really want to accept octal and hexadecimal integers?



Thank you for reminding me. Octal and hexadecimal may bring more 
confusion. I will use qemu_strtou64(nptr, , 10, ) and add 
test for input like "0123".



(success, parse error) and (parse error, success): we pick the one that
succeeds, because success consumes characters, and failure to parse does
not.  Good.

(parse error, parse error): neither consumes characters, so we pick
uint64_t.  Good.

(parse error, double overflow), (parse error, double underflow) and
(uint64_t overflow, parse error): we pick the range error, because it
consumes characters.  Good.

These are the simple combinations.  The remainder are hairier: (success,
double overflow), (success, double underflow), (uint64_t overflow,
success).  I lack the time to analyze them today.  Must be done before
we take this patch.  Any takers?


(success, double overflow), (success, double underflow), pick double
overflow error, return -ERANGE. Because it consumes
characters. Example: for "1.79769e+309", qemu_strtou64 consumes "1",
and prases as uint64_t; but qemu_strtod_finite return -ERANGE and
consumes all characters. It is OK.


The only way to have double overflow when uint64_t succeeds is an
exponent.  Double consumes the characters making up the exponent,
uint64_t does not.  We use double.

The only way to have double underflow is with an exponent or a decimal
point.  Double consumes their characters, uint64_t does not.  We use
double.

Okay.


(uint64_t overflow, success), consume the same characters, use the

[PATCH] docker: gtester is no longer used

2019-12-17 Thread Paolo Bonzini

We are using tap-driver.pl, do not require anymore gtester to be installed
to run the testsuite in docker-based tests.

Signed-off-by: Paolo Bonzini 
---
 tests/docker/common.rc | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/tests/docker/common.rc b/tests/docker/common.rc
index 512202b..02cd67a 100755
--- a/tests/docker/common.rc
+++ b/tests/docker/common.rc
@@ -53,12 +53,7 @@ check_qemu()
 INVOCATION="$@"
 fi
 
-if command -v gtester > /dev/null 2>&1 && \
-   gtester --version > /dev/null 2>&1; then
-make $MAKEFLAGS $INVOCATION
-else
-echo "No working gtester, skipping make $INVOCATION"
-fi
+make $MAKEFLAGS $INVOCATION
 }
 
 test_fail()
-- 
1.8.3.1

Re: [PATCH v10 Kernel 4/5] vfio iommu: Implementation of ioctl to for dirty pages tracking.

2019-12-17 Thread Yan Zhao

On Tue, Dec 17, 2019 at 07:47:05PM +0800, Kirti Wankhede wrote:
> 
> 
> On 12/17/2019 3:21 PM, Yan Zhao wrote:
> > On Tue, Dec 17, 2019 at 05:24:14PM +0800, Kirti Wankhede wrote:
> >>
> >>
> >> On 12/17/2019 10:45 AM, Yan Zhao wrote:
> >>> On Tue, Dec 17, 2019 at 04:21:39AM +0800, Kirti Wankhede wrote:
>  VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
>  - Start unpinned pages dirty pages tracking while migration is active and
>  device is running, i.e. during pre-copy phase.
>  - Stop unpinned pages dirty pages tracking. This is required to stop
>  unpinned dirty pages tracking if migration failed or cancelled during
>  pre-copy phase. Unpinned pages tracking is clear.
>  - Get dirty pages bitmap. Stop unpinned dirty pages tracking and clear
>  unpinned pages information on bitmap read. This ioctl returns bitmap 
>  of
>  dirty pages, its user space application responsibility to copy 
>  content
>  of dirty pages from source to destination during migration.
> 
>  Signed-off-by: Kirti Wankhede 
>  Reviewed-by: Neo Jia 
>  ---
> drivers/vfio/vfio_iommu_type1.c | 210 
>  ++--
> 1 file changed, 203 insertions(+), 7 deletions(-)
> 
>  diff --git a/drivers/vfio/vfio_iommu_type1.c 
>  b/drivers/vfio/vfio_iommu_type1.c
>  index 3f6b04f2334f..264449654d3f 100644
>  --- a/drivers/vfio/vfio_iommu_type1.c
>  +++ b/drivers/vfio/vfio_iommu_type1.c
>  @@ -70,6 +70,7 @@ struct vfio_iommu {
>   unsigned intdma_avail;
>   boolv2;
>   boolnesting;
>  +booldirty_page_tracking;
> };
> 
> struct vfio_domain {
>  @@ -112,6 +113,7 @@ struct vfio_pfn {
>   dma_addr_t  iova;   /* Device address */
>   unsigned long   pfn;/* Host pfn */
>   atomic_tref_count;
>  +boolunpinned;
> };
> 
> struct vfio_regions {
>  @@ -244,6 +246,32 @@ static void vfio_remove_from_pfn_list(struct 
>  vfio_dma *dma,
>   kfree(vpfn);
> }
> 
>  +static void vfio_remove_unpinned_from_pfn_list(struct vfio_dma *dma, 
>  bool warn)
>  +{
>  +struct rb_node *n = rb_first(>pfn_list);
>  +
>  +for (; n; n = rb_next(n)) {
>  +struct vfio_pfn *vpfn = rb_entry(n, struct vfio_pfn, 
>  node);
>  +
>  +if (warn)
>  +WARN_ON_ONCE(vpfn->unpinned);
>  +
>  +if (vpfn->unpinned)
>  +vfio_remove_from_pfn_list(dma, vpfn);
>  +}
>  +}
>  +
>  +static void vfio_remove_unpinned_from_dma_list(struct vfio_iommu *iommu)
>  +{
>  +struct rb_node *n = rb_first(>dma_list);
>  +
>  +for (; n; n = rb_next(n)) {
>  +struct vfio_dma *dma = rb_entry(n, struct vfio_dma, 
>  node);
>  +
>  +vfio_remove_unpinned_from_pfn_list(dma, false);
>  +}
>  +}
>  +
> static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct vfio_dma *dma,
>  unsigned long iova)
> {
>  @@ -254,13 +282,17 @@ static struct vfio_pfn 
>  *vfio_iova_get_vfio_pfn(struct vfio_dma *dma,
>   return vpfn;
> }
> 
>  -static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn 
>  *vpfn)
>  +static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn 
>  *vpfn,
>  +  bool dirty_tracking)
> {
>   int ret = 0;
> 
>   if (atomic_dec_and_test(>ref_count)) {
>   ret = put_pfn(vpfn->pfn, dma->prot);
> >>> if physical page here is put, it may cause problem when pin this iova
> >>> next time:
> >>> vfio_iommu_type1_pin_pages {
> >>>   ...
> >>>   vpfn = vfio_iova_get_vfio_pfn(dma, iova);
> >>>   if (vpfn) {
> >>>   phys_pfn[i] = vpfn->pfn;
> >>>   continue;
> >>>   }
> >>>   ...
> >>> }
> >>>
> >>
> >> Good point. Fixing it as:
> >>
> >>   vpfn = vfio_iova_get_vfio_pfn(dma, iova);
> >>   if (vpfn) {
> >> -   phys_pfn[i] = vpfn->pfn;
> >> -   continue;
> >> +   if (vpfn->unpinned)
> >> +   vfio_remove_from_pfn_list(dma, vpfn);
> > what about updating vpfn instead?
> > 
> 
> vfio_pin_page_external() takes care of verification checks and mem lock 
> accounting. I prefer to free existing and add new node with existing 
> functions.
> 
> >> +   else

Re: [PATCH] target/arm: fix IL bit for data abort exceptions

2019-12-17 Thread Richard Henderson

On 12/17/19 11:02 AM, Jeff Kubascik wrote:
> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> index 5feb312941..e63f8bda29 100644
> --- a/target/arm/tlb_helper.c
> +++ b/target/arm/tlb_helper.c
> @@ -44,7 +44,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t 
> template_syn,
>  syn = syn_data_abort_with_iss(same_el,
>0, 0, 0, 0, 0,
>ea, 0, s1ptw, is_write, fsc,
> -  false);
> +  true);
>  /* Merge the runtime syndrome with the template syndrome.  */
>  syn |= template_syn;

This doesn't look correct.  Surely the IL bit should come from template_syn?

> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index d4bebbe629..a3c618fdd9 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -14045,6 +14045,7 @@ static void disas_a64_insn(CPUARMState *env, 
> DisasContext *s)
>  s->pc_curr = s->base.pc_next;
>  insn = arm_ldl_code(env, s->base.pc_next, s->sctlr_b);
>  s->insn = insn;
> +s->is_16bit = false;
>  s->base.pc_next += 4;

Should not be necessary, as the field is not read along any a64 path.  (Also,
while it's not yet in master, there's a patch on list that zero initializes the
entire structure.)

> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 2b6c1f91bf..300480f1b7 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -8555,7 +8555,7 @@ static ISSInfo make_issinfo(DisasContext *s, int rd, 
> bool p, bool w)
>  
>  /* ISS not valid if writeback */
>  if (p && !w) {
> -ret = rd;
> +ret = rd | (s->is_16bit ? ISSIs16Bit : 0);
>  } else {
>  ret = ISSInvalid;
>  }
> @@ -11057,6 +11057,7 @@ static void arm_tr_translate_insn(DisasContextBase 
> *dcbase, CPUState *cpu)
>  dc->pc_curr = dc->base.pc_next;
>  insn = arm_ldl_code(env, dc->base.pc_next, dc->sctlr_b);
>  dc->insn = insn;
> +dc->is_16bit = false;
>  dc->base.pc_next += 4;
>  disas_arm_insn(dc, insn);
>  
> @@ -11126,6 +11127,7 @@ static void thumb_tr_translate_insn(DisasContextBase 
> *dcbase, CPUState *cpu)
>  dc->pc_curr = dc->base.pc_next;
>  insn = arm_lduw_code(env, dc->base.pc_next, dc->sctlr_b);
>  is_16bit = thumb_insn_is_16bit(dc, dc->base.pc_next, insn);
> +dc->is_16bit = is_16bit;
>  dc->base.pc_next += 2;
>  if (!is_16bit) {
>  uint32_t insn2 = arm_lduw_code(env, dc->base.pc_next, dc->sctlr_b);
> diff --git a/target/arm/translate.h b/target/arm/translate.h
> index b837b7fcbf..c16f434477 100644
> --- a/target/arm/translate.h
> +++ b/target/arm/translate.h
> @@ -14,6 +14,8 @@ typedef struct DisasContext {
>  target_ulong pc_curr;
>  target_ulong page_start;
>  uint32_t insn;
> +/* 16-bit instruction flag */
> +bool is_16bit;
>  /* Nonzero if this instruction has been conditionally skipped.  */
>  int condjmp;
>  /* The label that will be jumped to when the instruction is skipped.  */

The rest of this looks both correct and necessary.


r~

[PATCH 4/9] tests/virtio-9p: added READDIR test

2019-12-17 Thread Christian Schoenebeck

This first READDIR test simply checks the amount of directory
entries returned, according to the created amount on 9p synth
driver side.

Signed-off-by: Christian Schoenebeck 
---
 tests/virtio-9p-test.c | 125 +
 1 file changed, 125 insertions(+)

diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
index 880b4ff567..ab5926527a 100644
--- a/tests/virtio-9p-test.c
+++ b/tests/virtio-9p-test.c
@@ -68,6 +68,11 @@ static void v9fs_memread(P9Req *req, void *addr, size_t len)
 req->r_off += len;
 }
 
+static void v9fs_uint8_read(P9Req *req, uint8_t *val)
+{
+v9fs_memread(req, val, 1);
+}
+
 static void v9fs_uint16_write(P9Req *req, uint16_t val)
 {
 uint16_t le_val = cpu_to_le16(val);
@@ -101,6 +106,12 @@ static void v9fs_uint32_read(P9Req *req, uint32_t *val)
 le32_to_cpus(val);
 }
 
+static void v9fs_uint64_read(P9Req *req, uint64_t *val)
+{
+v9fs_memread(req, val, 8);
+le64_to_cpus(val);
+}
+
 /* len[2] string[len] */
 static uint16_t v9fs_string_size(const char *string)
 {
@@ -191,6 +202,7 @@ static const char *rmessage_name(uint8_t id)
 id == P9_RLOPEN ? "RLOPEN" :
 id == P9_RWRITE ? "RWRITE" :
 id == P9_RFLUSH ? "RFLUSH" :
+id == P9_RREADDIR ? "READDIR" :
 "";
 }
 
@@ -348,6 +360,77 @@ static void v9fs_rwalk(P9Req *req, uint16_t *nwqid, 
v9fs_qid **wqid)
 v9fs_req_free(req);
 }
 
+/* size[4] Treaddir tag[2] fid[4] offset[8] count[4] */
+static P9Req *v9fs_treaddir(QVirtio9P *v9p, uint32_t fid, uint64_t offset,
+uint32_t count, uint16_t tag)
+{
+P9Req *req;
+
+req = v9fs_req_init(v9p, 4 + 8 + 4, P9_TREADDIR, tag);
+v9fs_uint32_write(req, fid);
+v9fs_uint64_write(req, offset);
+v9fs_uint32_write(req, count);
+v9fs_req_send(req);
+return req;
+}
+
+struct v9fs_dirent {
+v9fs_qid qid;
+uint64_t offset;
+uint8_t type;
+char* name;
+struct v9fs_dirent* next;
+};
+
+/* size[4] Rreaddir tag[2] count[4] data[count] */
+static void v9fs_rreaddir(P9Req *req, uint32_t *count, uint32_t *nentries,
+  struct v9fs_dirent **entries)
+{
+uint32_t sz;
+struct v9fs_dirent *e = NULL;
+uint16_t slen;
+uint32_t n = 0;
+
+v9fs_req_recv(req, P9_RREADDIR);
+v9fs_uint32_read(req, );
+
+if (count)
+*count = sz;
+
+for (int32_t togo = (int32_t)sz;
+ togo >= 13 + 8 + 1 + 2;
+ togo -= 13 + 8 + 1 + 2 + slen, ++n)
+{
+if (!e) {
+e = g_malloc(sizeof(struct v9fs_dirent));
+if (entries)
+*entries = e;
+} else {
+e = e->next = g_malloc(sizeof(struct v9fs_dirent));
+}
+e->next = NULL;
+/* qid[13] offset[8] type[1] name[s] */
+v9fs_memread(req, >qid, 13);
+v9fs_uint64_read(req, >offset);
+v9fs_uint8_read(req, >type);
+v9fs_string_read(req, , >name);
+}
+
+if (nentries)
+*nentries = n;
+}
+
+static void v9fs_free_dirents(struct v9fs_dirent *e)
+{
+struct v9fs_dirent *next = NULL;
+
+for (; e; e = next) {
+next = e->next;
+g_free(e->name);
+g_free(e);
+}
+}
+
 /* size[4] Tlopen tag[2] fid[4] flags[4] */
 static P9Req *v9fs_tlopen(QVirtio9P *v9p, uint32_t fid, uint32_t flags,
   uint16_t tag)
@@ -480,6 +563,47 @@ static void fs_walk(void *obj, void *data, QGuestAllocator 
*t_alloc)
 g_free(wqid);
 }
 
+static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+QVirtio9P *v9p = obj;
+alloc = t_alloc;
+char *const wnames[] = { g_strdup(QTEST_V9FS_SYNTH_READDIR_DIR) };
+uint16_t nqid;
+v9fs_qid qid;
+uint32_t count, nentries;
+struct v9fs_dirent *entries = NULL;
+P9Req *req;
+
+fs_attach(v9p, NULL, t_alloc);
+req = v9fs_twalk(v9p, 0, 1, 1, wnames, 0);
+v9fs_req_wait_for_reply(req, NULL);
+v9fs_rwalk(req, , NULL);
+g_assert_cmpint(nqid, ==, 1);
+
+req = v9fs_tlopen(v9p, 1, O_DIRECTORY, 0);
+v9fs_req_wait_for_reply(req, NULL);
+v9fs_rlopen(req, , NULL);
+
+req = v9fs_treaddir(
+v9p, 1/*fid*/, 0/*offset*/, P9_MAX_SIZE - P9_IOHDRSZ/*count*/,
+0/*tag*/
+);
+v9fs_req_wait_for_reply(req, NULL);
+v9fs_rreaddir(req, , , );
+
+/*
+ * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve all
+ * dir entries with only one readdir request.
+ */
+g_assert_cmpint(
+nentries, ==,
+QTEST_V9FS_SYNTH_READDIR_NFILES + 2 /* "." and ".." */
+);
+
+v9fs_free_dirents(entries);
+g_free(wnames[0]);
+}
+
 static void fs_walk_no_slash(void *obj, void *data, QGuestAllocator *t_alloc)
 {
 QVirtio9P *v9p = obj;
@@ -658,6 +782,7 @@ static void register_virtio_9p_test(void)
  NULL);
 qos_add_test("fs/flush/ignored", "virtio-9p", fs_flush_ignored,
  NULL);
+qos_add_test("fs/readdir/basic",

[PATCH 0/9] 9pfs: readdir optimization

2019-12-17 Thread Christian Schoenebeck

As previously mentioned, I was investigating performance issues with 9pfs.
Raw file read/write of 9pfs is actually quite good, provided that client
picked a reasonable high msize (maximum message size). I would recommend
to log a warning on 9p server side if a client attached with a small msize
that would cause performance issues for that reason.

However there other aspects where 9pfs currently performs suboptimally,
especially readdir handling of 9pfs is extremely slow, a simple readdir
request of a guest typically blocks for several hundred milliseconds or
even several seconds, no matter how powerful the underlying hardware is.
The reason for this performance issue: latency.
Currently 9pfs is heavily dispatching a T_readdir request numerous times
between main I/O thread and a background I/O thread back and forth; in fact
it is actually hopping between threads even multiple times for every single
directory entry during T_readdir request handling which leads in total to
huge latencies for a single T_readdir request.

This patch series aims to address this severe performance issue of 9pfs
T_readdir request handling. The actual performance fix is patch 8. I also
provided a convenient benchmark for comparing the performance improvements
by using the 9pfs "synth" driver (see patch 6 for instructions how to run
the benchmark), so no guest OS installation is required to peform this
benchmark A/B comparison. With patch 8 I achieved a performance improvement
of factor 40 on my test machine.

** NOTE: ** These patches are not heavily tested yet, nor thouroughly
reviewed for potential security issues yet. I decided to post them already
though, because I won't have the time in the next few weeks for polishing
them. The benchmark results should demonstrate though that it is worth the
hassle. So any testing/reviews/fixes appreciated!

Christian Schoenebeck (9):
  tests/virtio-9p: v9fs_string_read() didn't terminate string
  9pfs: validate count sent by client with T_readdir
  hw/9pfs/9p-synth: added directory for readdir test
  tests/virtio-9p: added READDIR test
  tests/virtio-9p: check file names of READDIR response
  9pfs: READDIR benchmark
  hw/9pfs/9p-synth: avoid n-square issue in synth_readdir()
  9pfs: T_readdir latency optimization
  hw/9pfs/9p.c: benchmark time on T_readdir request

 hw/9pfs/9p-synth.c |  46 ++-
 hw/9pfs/9p-synth.h |   5 ++
 hw/9pfs/9p.c   | 150 ++---
 hw/9pfs/9p.h   |  23 ++
 hw/9pfs/codir.c| 183 ++---
 hw/9pfs/coth.h |   3 +
 tests/virtio-9p-test.c | 182 +++-
 7 files changed, 509 insertions(+), 83 deletions(-)

-- 
2.20.1

[PATCH 6/9] 9pfs: READDIR benchmark

2019-12-17 Thread Christian Schoenebeck

This patch is not intended to be merged. It just provides a
temporary benchmark foundation for coneniently A/B comparison
of the subsequent 9p READDIR optimization patches:

* hw/9pfs/9p-synth: increase amount of simulated files for
  READDIR test to 2000 files.

* tests/virtio-9p: measure wall time that elapsed between
  sending T_readdir request and arrival of R_readdir response
  and print out that measured duration, as well as amount of
  directory entries received, and the amount of bytes of the
  response message.

* tests/virtio-9p: increased msize to 256kiB to allow
  retrieving all 2000 files (simulated by synth backend) with
  only one READDIR request.

Running this benchmark is fairly quick & simple and does not
require any guest OS installation or other prerequisites:

cd build
make && make tests/qos-test
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
tests/qos-test -p $(tests/qos-test -l | grep readdir/basic)

Since this benchmark uses the 9p synth backend, the host machine's
IO hardware (SSDs/HDDs) is not relevant for the benchmark result,
because the synth backend's readdir implementation returns
immediately (without any blocking IO that would happen with a
real-life backend) and just returns already prepared, simulated
directory entries directly from RAM. So this benchmark focuses on
the efficiency of the 9p controller code (or top half) for READDIR
request handling.

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p-synth.h |  2 +-
 tests/virtio-9p-test.c | 33 -
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h
index 036d7e4a5b..7d6cedcdac 100644
--- a/hw/9pfs/9p-synth.h
+++ b/hw/9pfs/9p-synth.h
@@ -58,7 +58,7 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int mode,
 /* for READDIR test */
 #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir"
 #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d"
-#define QTEST_V9FS_SYNTH_READDIR_NFILES 100
+#define QTEST_V9FS_SYNTH_READDIR_NFILES 2000
 
 /* Any write to the "FLUSH" file is handled one byte at a time by the
  * backend. If the byte is zero, the backend returns success (ie, 1),
diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
index dafea1ae61..9a8b2046ae 100644
--- a/tests/virtio-9p-test.c
+++ b/tests/virtio-9p-test.c
@@ -15,6 +15,17 @@
 #include "libqos/virtio-9p.h"
 #include "libqos/qgraph.h"
 
+/*
+ * to benchmark the real time (not CPU time) that elapsed between start of
+ * a request and arrival of its response
+ */
+static double wall_time(void) {
+struct timeval t;
+struct timezone tz;
+gettimeofday(, );
+return t.tv_sec + t.tv_usec * 0.01;
+}
+
 #define QVIRTIO_9P_TIMEOUT_US (10 * 1000 * 1000)
 static QGuestAllocator *alloc;
 
@@ -36,7 +47,7 @@ static void pci_config(void *obj, void *data, QGuestAllocator 
*t_alloc)
 g_free(tag);
 }
 
-#define P9_MAX_SIZE 4096 /* Max size of a T-message or R-message */
+#define P9_MAX_SIZE (256*1024) /* Max size of a T-message or R-message */
 
 typedef struct {
 QTestState *qts;
@@ -593,12 +604,32 @@ static void fs_readdir(void *obj, void *data, 
QGuestAllocator *t_alloc)
 v9fs_req_wait_for_reply(req, NULL);
 v9fs_rlopen(req, , NULL);
 
+const double start = wall_time();
+
 req = v9fs_treaddir(
 v9p, 1/*fid*/, 0/*offset*/, P9_MAX_SIZE - P9_IOHDRSZ/*count*/,
 0/*tag*/
 );
+const double treaddir = wall_time();
 v9fs_req_wait_for_reply(req, NULL);
+const double waitforreply = wall_time();
 v9fs_rreaddir(req, , , );
+const double end = wall_time();
+
+printf("\nTime client spent on sending T_readdir: %fs\n\n", treaddir - 
start);
+printf("Time client spent for waiting for reply from server: %fs [MOST 
IMPORTANT]\n", waitforreply - start);
+printf("(This is the most important value, because it reflects the time\n"
+   "the 9p server required to process and return the result of the\n"
+   "T_readdir request.)\n\n");
+
+printf("Total client time: %fs\n", end - start);
+printf("(NOTE: this time is not relevant; this huge time comes from\n"
+   "inefficient qtest_memread() calls. So you can discard this\n"
+   "value as a problem of this test client implementation while\n"
+   "processing the received server T_readdir reply.)\n\n");
+
+printf("Details of response message data: R_readddir nentries=%d 
rbytes=%d\n",
+   nentries, count);
 
 /*
  * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve all
-- 
2.20.1

[PATCH 3/9] hw/9pfs/9p-synth: added directory for readdir test

2019-12-17 Thread Christian Schoenebeck

This will provide the following virtual directories by the 9p
synth backend:

  - /ReadDirDir/ReadDirFile99
  - /ReadDirDir/ReadDirFile98
  ...
  - /ReadDirDir/ReadDirFile1
  - /ReadDirDir/ReadDirFile0

These virtual directories will be used by the upcoming
9pfs READDIR tests.

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p-synth.c | 19 +++
 hw/9pfs/9p-synth.h |  5 +
 2 files changed, 24 insertions(+)

diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c
index 54239c9bbf..7eb210ffa8 100644
--- a/hw/9pfs/9p-synth.c
+++ b/hw/9pfs/9p-synth.c
@@ -578,6 +578,25 @@ static int synth_init(FsContext *ctx, Error **errp)
NULL, v9fs_synth_qtest_flush_write,
ctx);
 assert(!ret);
+
+/* Directory for READDIR test */
+{
+V9fsSynthNode *dir = NULL;
+ret = qemu_v9fs_synth_mkdir(
+NULL, 0700, QTEST_V9FS_SYNTH_READDIR_DIR, 
+);
+assert(!ret);
+for (i = 0; i < QTEST_V9FS_SYNTH_READDIR_NFILES; ++i) {
+char *name = g_strdup_printf(
+QTEST_V9FS_SYNTH_READDIR_FILE, i
+);
+ret = qemu_v9fs_synth_add_file(
+dir, 0, name, NULL, NULL, ctx
+);
+assert(!ret);
+g_free(name);
+}
+}
 }
 
 return 0;
diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h
index af7a993a1e..036d7e4a5b 100644
--- a/hw/9pfs/9p-synth.h
+++ b/hw/9pfs/9p-synth.h
@@ -55,6 +55,11 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int mode,
 #define QTEST_V9FS_SYNTH_LOPEN_FILE "LOPEN"
 #define QTEST_V9FS_SYNTH_WRITE_FILE "WRITE"
 
+/* for READDIR test */
+#define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir"
+#define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d"
+#define QTEST_V9FS_SYNTH_READDIR_NFILES 100
+
 /* Any write to the "FLUSH" file is handled one byte at a time by the
  * backend. If the byte is zero, the backend returns success (ie, 1),
  * otherwise it forces the server to try again forever. Thus allowing
-- 
2.20.1

[PATCH 7/9] hw/9pfs/9p-synth: avoid n-square issue in synth_readdir()

2019-12-17 Thread Christian Schoenebeck

This patch is just a temporary benchmark hack, not intended
to be merged!

synth driver's readdir() implementation has a severe n-square
performance problem. This patch is a quick and dirty hack to
prevent that performance problem from tainting the readdir()
benchmark results.

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p-synth.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c
index 7eb210ffa8..1743f5409f 100644
--- a/hw/9pfs/9p-synth.c
+++ b/hw/9pfs/9p-synth.c
@@ -225,7 +225,7 @@ static void synth_direntry(V9fsSynthNode *node,
 }
 
 static struct dirent *synth_get_dentry(V9fsSynthNode *dir,
-struct dirent *entry, off_t off)
+struct dirent *entry, off_t off, 
V9fsSynthNode **hack)
 {
 int i = 0;
 V9fsSynthNode *node;
@@ -243,16 +243,37 @@ static struct dirent *synth_get_dentry(V9fsSynthNode *dir,
 /* end of directory */
 return NULL;
 }
+*hack = node;
 synth_direntry(node, entry, off);
 return entry;
 }
 
 static struct dirent *synth_readdir(FsContext *ctx, V9fsFidOpenState *fs)
 {
-struct dirent *entry;
+struct dirent *entry = NULL;
 V9fsSynthOpenState *synth_open = fs->private;
 V9fsSynthNode *node = synth_open->node;
-entry = synth_get_dentry(node, _open->dent, synth_open->offset);
+
+/*
+ * HACK: This is just intended for benchmark, to avoid severe n-square
+ * performance problem of synth driver's readdir implementation here which
+ * would otherwise unncessarily taint the benchmark results. By simply
+ * caching (globally) the previous node (of the previous synth_readdir()
+ * call) we can simply proceed to next node in chained list efficiently.
+ *
+ * not a good idea for any production code ;-)
+ */
+static struct V9fsSynthNode *cachedNode = NULL;
+
+if (!cachedNode) {
+entry = synth_get_dentry(node, _open->dent, synth_open->offset, 
);
+} else {
+cachedNode = cachedNode->sibling.le_next;
+if (cachedNode) {
+entry = _open->dent;
+synth_direntry(cachedNode, entry, synth_open->offset+1);
+}
+}
 if (entry) {
 synth_open->offset++;
 }
-- 
2.20.1

[PATCH 5/9] tests/virtio-9p: check file names of READDIR response

2019-12-17 Thread Christian Schoenebeck

Additionally to the already existing check for expected amount
of directory entries returned by R_readdir response, also check
whether all entries have the expected file names, ignoring
their precise order in result list though.

Signed-off-by: Christian Schoenebeck 
---
 tests/virtio-9p-test.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
index ab5926527a..dafea1ae61 100644
--- a/tests/virtio-9p-test.c
+++ b/tests/virtio-9p-test.c
@@ -563,6 +563,15 @@ static void fs_walk(void *obj, void *data, QGuestAllocator 
*t_alloc)
 g_free(wqid);
 }
 
+static bool fs_dirents_contain_name(struct v9fs_dirent *e, const char* name) {
+for (; e; e = e->next) {
+if (!strcmp(e->name, name)) {
+return true;
+}
+}
+return false;
+}
+
 static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc)
 {
 QVirtio9P *v9p = obj;
@@ -600,6 +609,18 @@ static void fs_readdir(void *obj, void *data, 
QGuestAllocator *t_alloc)
 QTEST_V9FS_SYNTH_READDIR_NFILES + 2 /* "." and ".." */
 );
 
+/*
+ * Check all file names exist in returned entries, ignore their order
+ * though.
+ */
+g_assert_cmpint(fs_dirents_contain_name(entries, "."), ==, true);
+g_assert_cmpint(fs_dirents_contain_name(entries, ".."), ==, true);
+for (int i = 0; i < QTEST_V9FS_SYNTH_READDIR_NFILES; ++i) {
+char *name = g_strdup_printf(QTEST_V9FS_SYNTH_READDIR_FILE, i);
+g_assert_cmpint(fs_dirents_contain_name(entries, name), ==, true);
+g_free(name);
+}
+
 v9fs_free_dirents(entries);
 g_free(wnames[0]);
 }
-- 
2.20.1

[PATCH 9/9] hw/9pfs/9p.c: benchmark time on T_readdir request

2019-12-17 Thread Christian Schoenebeck

This patch is not intended to be merged, it measures
and prints the time the 9p server spends on handling
a T_readdir request. It prints the total time it spent
on handling the request, and also the time it spent
on IO (driver) only.

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index b37f979150..68ce104d7e 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -2301,6 +2301,14 @@ static void v9fs_free_dirents(struct V9fsDirEnt *e)
 }
 }
 
+static double wall_time(void) {
+struct timeval t;
+struct timezone tz;
+gettimeofday(, );
+return t.tv_sec + t.tv_usec * 0.01;
+}
+
+
 static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp,
 int32_t max_count)
 {
@@ -2320,6 +2328,8 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, 
V9fsFidState *fidp,
  */
 const bool dostat = pdu->s->ctx.export_flags & V9FS_REMAP_INODES;
 
+const double start = wall_time();
+
 /*
  * Fetch all required directory entries altogether on a background IO
  * thread from fs driver. We don't want to do that for each entry
@@ -2334,6 +2344,9 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, 
V9fsFidState *fidp,
 }
 count = 0;
 
+const double end = wall_time();
+printf("\n\nTime 9p server spent on synth_readdir() I/O only (synth 
driver): %fs\n", end - start);
+
 for (struct V9fsDirEnt* e = entries; e; e = e->next) {
 dent = e->dent;
 
@@ -2406,6 +2419,8 @@ static void coroutine_fn v9fs_readdir(void *opaque)
 V9fsPDU *pdu = opaque;
 V9fsState *s = pdu->s;
 
+const double start = wall_time();
+
 retval = pdu_unmarshal(pdu, offset, "dqd", ,
_offset, _count);
 if (retval < 0) {
@@ -2449,6 +2464,9 @@ out:
 put_fid(pdu, fidp);
 out_nofid:
 pdu_complete(pdu, retval);
+
+const double end = wall_time();
+printf("Time 9p server spent on entire T_readdir request: %fs 
[IMPORTANT]\n", end - start);
 }
 
 static int v9fs_xattr_write(V9fsState *s, V9fsPDU *pdu, V9fsFidState *fidp,
-- 
2.20.1

[PATCH 8/9] 9pfs: T_readdir latency optimization

2019-12-17 Thread Christian Schoenebeck

Make top half really top half and bottom half really bottom half:

Each T_readdir request is hopping between threads (main IO thread
and background IO driver threads) several times for every individual
directory entry, which sums up to huge latencies for just a single
T_readdir request.

Instead of doing that, collect now all required directory entries
(including all potentially required stat buffers for each entry) in
one rush on a background IO thread from fs driver, then assemble the
entire resulting network message for the readdir request on main IO
thread. The driver is still aborting the directory entry retrieval
loop (on the background IO thread) as soon as it would exceed the
client's maximum requested response size. So we should not have any
performance penalty by doing this.

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c| 126 +++--
 hw/9pfs/9p.h|  23 ++
 hw/9pfs/codir.c | 183 +---
 hw/9pfs/coth.h  |   3 +
 4 files changed, 256 insertions(+), 79 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 30e33b6573..b37f979150 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -971,30 +971,6 @@ static int coroutine_fn fid_to_qid(V9fsPDU *pdu, 
V9fsFidState *fidp,
 return 0;
 }
 
-static int coroutine_fn dirent_to_qid(V9fsPDU *pdu, V9fsFidState *fidp,
-  struct dirent *dent, V9fsQID *qidp)
-{
-struct stat stbuf;
-V9fsPath path;
-int err;
-
-v9fs_path_init();
-
-err = v9fs_co_name_to_path(pdu, >path, dent->d_name, );
-if (err < 0) {
-goto out;
-}
-err = v9fs_co_lstat(pdu, , );
-if (err < 0) {
-goto out;
-}
-err = stat_to_qid(pdu, , qidp);
-
-out:
-v9fs_path_free();
-return err;
-}
-
 V9fsPDU *pdu_alloc(V9fsState *s)
 {
 V9fsPDU *pdu = NULL;
@@ -2302,7 +2278,7 @@ out_nofid:
 pdu_complete(pdu, err);
 }
 
-static size_t v9fs_readdir_data_size(V9fsString *name)
+size_t v9fs_readdir_response_size(V9fsString *name)
 {
 /*
  * Size of each dirent on the wire: size of qid (13) + size of offset (8)
@@ -2311,6 +2287,20 @@ static size_t v9fs_readdir_data_size(V9fsString *name)
 return 24 + v9fs_string_size(name);
 }
 
+static void v9fs_free_dirents(struct V9fsDirEnt *e)
+{
+struct V9fsDirEnt *next = NULL;
+
+for (; e; e = next) {
+next = e->next;
+if (e->dent)
+g_free(e->dent);
+if (e->st)
+g_free(e->st);
+g_free(e);
+}
+}
+
 static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, V9fsFidState *fidp,
 int32_t max_count)
 {
@@ -2319,54 +2309,53 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, 
V9fsFidState *fidp,
 V9fsString name;
 int len, err = 0;
 int32_t count = 0;
-off_t saved_dir_pos;
 struct dirent *dent;
+struct stat *st;
+struct V9fsDirEnt *entries = NULL;
 
-/* save the directory position */
-saved_dir_pos = v9fs_co_telldir(pdu, fidp);
-if (saved_dir_pos < 0) {
-return saved_dir_pos;
-}
-
-while (1) {
-v9fs_readdir_lock(>fs.dir);
+/*
+ * inode remapping requires the device id, which in turn might be
+ * different for different directory entries, so if inode remapping is
+ * enabled we have to make a full stat for each directory entry
+ */
+const bool dostat = pdu->s->ctx.export_flags & V9FS_REMAP_INODES;
 
-err = v9fs_co_readdir(pdu, fidp, );
-if (err || !dent) {
-break;
-}
-v9fs_string_init();
-v9fs_string_sprintf(, "%s", dent->d_name);
-if ((count + v9fs_readdir_data_size()) > max_count) {
-v9fs_readdir_unlock(>fs.dir);
+/*
+ * Fetch all required directory entries altogether on a background IO
+ * thread from fs driver. We don't want to do that for each entry
+ * individually, because hopping between threads (this main IO thread
+ * and background IO driver thread) would sum up to huge latencies.
+ */
+count = v9fs_co_readdir_lowlat(pdu, fidp, , max_count, dostat);
+if (count < 0) {
+err = count;
+count = 0;
+goto out;
+}
+count = 0;
 
-/* Ran out of buffer. Set dir back to old position and return */
-v9fs_co_seekdir(pdu, fidp, saved_dir_pos);
-v9fs_string_free();
-return count;
-}
+for (struct V9fsDirEnt* e = entries; e; e = e->next) {
+dent = e->dent;
 
 if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) {
-/*
- * dirent_to_qid() implies expensive stat call for each entry,
- * we must do that here though since inode remapping requires
- * the device id, which in turn might be different for
- * different entries; we cannot make any assumption to avoid
- * that here.
- */
-err =

[PATCH 2/9] 9pfs: validate count sent by client with T_readdir

2019-12-17 Thread Christian Schoenebeck

A good 9p client sends T_readdir with "count" parameter that's
sufficiently smaller than client's initially negotiated msize
(maximum message size). We perform a check for that though to
avoid the server to be interrupted with a "Failed to encode
VirtFS reply type 41" error message by bad clients.

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 520177f40c..30e33b6573 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -2414,6 +2414,7 @@ static void coroutine_fn v9fs_readdir(void *opaque)
 int32_t count;
 uint32_t max_count;
 V9fsPDU *pdu = opaque;
+V9fsState *s = pdu->s;
 
 retval = pdu_unmarshal(pdu, offset, "dqd", ,
_offset, _count);
@@ -2422,6 +2423,13 @@ static void coroutine_fn v9fs_readdir(void *opaque)
 }
 trace_v9fs_readdir(pdu->tag, pdu->id, fid, initial_offset, max_count);
 
+if (max_count > s->msize - P9_IOHDRSZ) {
+max_count = s->msize - P9_IOHDRSZ;
+warn_report_once(
+"9p: bad client: T_readdir with count > msize - P9_IOHDRSZ"
+);
+}
+
 fidp = get_fid(pdu, fid);
 if (fidp == NULL) {
 retval = -EINVAL;
-- 
2.20.1

[PATCH 1/9] tests/virtio-9p: v9fs_string_read() didn't terminate string

2019-12-17 Thread Christian Schoenebeck

Signed-off-by: Christian Schoenebeck 
---
 tests/virtio-9p-test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/virtio-9p-test.c b/tests/virtio-9p-test.c
index e7b58e3a0c..880b4ff567 100644
--- a/tests/virtio-9p-test.c
+++ b/tests/virtio-9p-test.c
@@ -130,8 +130,9 @@ static void v9fs_string_read(P9Req *req, uint16_t *len, 
char **string)
 *len = local_len;
 }
 if (string) {
-*string = g_malloc(local_len);
+*string = g_malloc(local_len+1);
 v9fs_memread(req, *string, local_len);
+(*string)[local_len] = 0;
 } else {
 v9fs_memskip(req, local_len);
 }
-- 
2.20.1

Re: [PATCH v3 4/4] qom/object: Use common get/set uint helpers

2019-12-17 Thread Alexey Kardashevskiy




On 18/12/2019 09:19, Felipe Franciosi wrote:
> Hi Alexey,
> 
> I don't know how, but somehow I didn't receive your reply:
> https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg02127.html
> 
> (I was about to follow up, then I decided to look at the archives to
> make sure your response didn't get lost in my client somehow...)
> 
> Still not sure of what happened, lol, let's move on. :)

dunno, I spent time configuring "spf" on ozlabs.ru and gmail to make my
mails go through all possible checks :)


> 
> I'm top-posting as I couldn't pull your response in for a proper reply.
> 
> You said:
>> The franciozzy/autosetters branch with this on top -
>> https://github.com/aik/qemu/commit/94c33bb7debf
>> - works fine. Thanks,
> 
> Your patch basically reverts a part of my commit and then makes the
> change Marc-Andre recommended (by dropping the (void *) cast).
> 
> Is it ok for me to just drop that part of my patch and send the v4?

Yes. Thanks,


> You can follow-up on the cast change afterwards.
>
> Thanks,
> F.
> 
> 
>> On Dec 10, 2019, at 1:04 PM, Felipe Franciosi  wrote:
>>
>> Hi
>>
>>> On Dec 2, 2019, at 6:31 AM, Alexey Kardashevskiy  wrote:
>>>
>>>
>>>
>>> On 30/11/2019 04:46, Felipe Franciosi wrote:
 Several objects implemented their own uint property getters and setters,
 despite them being straightforward (without any checks/validations on
 the values themselves) and identical across objects. This makes use of
 an enhanced API for object_property_add_uintXX_ptr() which offers
 default setters.

 Some of these setters used to update the value even if the type visit
 failed (eg. because the value being set overflowed over the given type).
 The new setter introduces a check for these errors, not updating the
 value if an error occurred. The error is propagated.

 Signed-off-by: Felipe Franciosi 
 ---
 hw/acpi/ich9.c   |  95 --
 hw/isa/lpc_ich9.c|  12 +
 hw/misc/edu.c|  13 ++
 hw/pci-host/q35.c|  14 ++
 hw/ppc/spapr.c   |  18 ++--
 hw/vfio/pci-quirks.c |  20 +++-
 memory.c |  15 +-
 target/arm/cpu.c |  22 ++---
 target/i386/sev.c| 106 ---
 9 files changed, 40 insertions(+), 275 deletions(-)

 diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
 index 742fb78226..d9305be891 100644
 --- a/hw/acpi/ich9.c
 +++ b/hw/acpi/ich9.c
 @@ -357,81 +357,6 @@ static void ich9_pm_set_cpu_hotplug_legacy(Object 
 *obj, bool value,
s->pm.cpu_hotplug_legacy = value;
 }

 -static void ich9_pm_get_disable_s3(Object *obj, Visitor *v, const char 
 *name,
 -   void *opaque, Error **errp)
 -{
 -ICH9LPCPMRegs *pm = opaque;
 -uint8_t value = pm->disable_s3;
 -
 -visit_type_uint8(v, name, , errp);
 -}
 -
 -static void ich9_pm_set_disable_s3(Object *obj, Visitor *v, const char 
 *name,
 -   void *opaque, Error **errp)
 -{
 -ICH9LPCPMRegs *pm = opaque;
 -Error *local_err = NULL;
 -uint8_t value;
 -
 -visit_type_uint8(v, name, , _err);
 -if (local_err) {
 -goto out;
 -}
 -pm->disable_s3 = value;
 -out:
 -error_propagate(errp, local_err);
 -}
 -
 -static void ich9_pm_get_disable_s4(Object *obj, Visitor *v, const char 
 *name,
 -   void *opaque, Error **errp)
 -{
 -ICH9LPCPMRegs *pm = opaque;
 -uint8_t value = pm->disable_s4;
 -
 -visit_type_uint8(v, name, , errp);
 -}
 -
 -static void ich9_pm_set_disable_s4(Object *obj, Visitor *v, const char 
 *name,
 -   void *opaque, Error **errp)
 -{
 -ICH9LPCPMRegs *pm = opaque;
 -Error *local_err = NULL;
 -uint8_t value;
 -
 -visit_type_uint8(v, name, , _err);
 -if (local_err) {
 -goto out;
 -}
 -pm->disable_s4 = value;
 -out:
 -error_propagate(errp, local_err);
 -}
 -
 -static void ich9_pm_get_s4_val(Object *obj, Visitor *v, const char *name,
 -   void *opaque, Error **errp)
 -{
 -ICH9LPCPMRegs *pm = opaque;
 -uint8_t value = pm->s4_val;
 -
 -visit_type_uint8(v, name, , errp);
 -}
 -
 -static void ich9_pm_set_s4_val(Object *obj, Visitor *v, const char *name,
 -   void *opaque, Error **errp)
 -{
 -ICH9LPCPMRegs *pm = opaque;
 -Error *local_err = NULL;
 -uint8_t value;
 -
 -visit_type_uint8(v, name, , _err);
 -if (local_err) {
 -goto out;
 -}
 -pm->s4_val = value;
 -out:
 -

Re: [PATCH v11 Kernel 6/6] vfio: Selective dirty page tracking if IOMMU backed device pins pages

2019-12-17 Thread Alex Williamson

On Tue, 17 Dec 2019 22:40:51 +0530
Kirti Wankhede  wrote:

> Track dirty pages reporting capability for each vfio_device by setting the
> capability flag on calling vfio_pin_pages() for that device.
> 
> In vfio_iommu_type1 module, while creating dirty pages bitmap, check if
> IOMMU backed device is present in the container. If IOMMU backed device is
> present in container then check dirty pages reporting capability for each
> vfio device in the container. If all vfio devices are capable of reporing
> dirty pages tracking by pinning pages through external API, then report
> create bitmap of pinned pages only. If IOMMU backed device is present in
> the container and any one device is not able to report dirty pages, then
> marked all pages as dirty.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio.c | 33 +++
>  drivers/vfio/vfio_iommu_type1.c | 44 
> +++--
>  include/linux/vfio.h|  3 ++-
>  3 files changed, 77 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index c8482624ca34..9d2fbe09768a 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -96,6 +96,8 @@ struct vfio_device {
>   struct vfio_group   *group;
>   struct list_headgroup_next;
>   void*device_data;
> + /* dirty pages reporting capable */
> + booldirty_pages_cap;
>  };
>  
>  #ifdef CONFIG_VFIO_NOIOMMU
> @@ -1866,6 +1868,29 @@ int vfio_set_irqs_validate_and_prepare(struct 
> vfio_irq_set *hdr, int num_irqs,
>  }
>  EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
>  
> +int vfio_device_is_dirty_reporting_capable(struct device *dev, bool *cap)
> +{
> + struct vfio_device *device;
> + struct vfio_group *group;
> +
> + if (!dev || !cap)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (!group)
> + return -ENODEV;
> +
> + device = vfio_group_get_device(group, dev);
> + if (!device)
> + return -ENODEV;
> +
> + *cap = device->dirty_pages_cap;
> + vfio_device_put(device);
> + vfio_group_put(group);
> + return 0;
> +}
> +EXPORT_SYMBOL(vfio_device_is_dirty_reporting_capable);

I'd suggest this just return true/false and any error condition simply
be part of the false case.

> +
>  /*
>   * Pin a set of guest PFNs and return their associated host PFNs for local
>   * domain only.
> @@ -1907,6 +1932,14 @@ int vfio_pin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage,
>   else
>   ret = -ENOTTY;
>  
> + if (ret > 0) {
> + struct vfio_device *device = vfio_group_get_device(group, dev);
> +
> + if (device) {
> + device->dirty_pages_cap = true;
> + vfio_device_put(device);
> + }
> + }

I think we'd want to trivially rework vfio_pin_pages() to use
vfio_device_get_from_dev() instead of vfio_group_get_from_dev(), then
we have access to the group via device->group.  Then vfio_device_put()
would be common in the return path.

>   vfio_group_try_dissolve_container(group);
>  
>  err_pin_pages:
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 68d8ed3b2665..ef56f31f4e73 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -891,6 +891,39 @@ static unsigned long vfio_pgsize_bitmap(struct 
> vfio_iommu *iommu)
>   return bitmap;
>  }
>  
> +static int vfio_is_dirty_pages_reporting_capable(struct device *dev, void 
> *data)
> +{
> + bool new;
> + int ret;
> +
> + ret = vfio_device_is_dirty_reporting_capable(dev, );
> + if (ret)
> + return ret;
> +
> + *(bool *)data = *(bool *)data && new;
> +
> + return 0;
> +}
> +
> +static bool vfio_dirty_pages_reporting_capable(struct vfio_iommu *iommu)
> +{
> + struct vfio_domain *d;
> + struct vfio_group *g;
> + bool capable = true;
> + int ret;
> +
> + list_for_each_entry(d, >domain_list, next) {
> + list_for_each_entry(g, >group_list, next) {
> + ret = iommu_group_for_each_dev(g->iommu_group, ,
> + vfio_is_dirty_pages_reporting_capable);
> + if (ret)
> + return false;

This will fail when there are devices within the IOMMU group that are
not represented as vfio_devices.  My original suggestion was:

On Thu, 14 Nov 2019 14:06:25 -0700
Alex Williamson  wrote:
> I think it does so by pinning pages.  Is it acceptable that if the
> vendor driver pins any pages, then from that point forward we consider
> the IOMMU group dirty page scope to be limited to pinned pages?  There
> are complications around non-singleton IOMMU groups, but I think we're
> already leaning towards that being a

Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update

2019-12-17 Thread Paolo Bonzini

On 17/12/19 23:57, Felipe Franciosi wrote:
> Doing it in userspace was the flow we proposed back in last year's KVM
> Forum (Edinburgh), but it got turned down.

I think the time since then has shown that essentially the cat is out of
the bag.  I didn't really like the idea of devices outside QEMU---and I
still don't---but if something like "VFIO over AF_UNIX" turns out to be
the cleanest way to implement multi-process QEMU device models, I am not
going to pull an RMS and block that from happening.  Assuming I could
even do so!

Paolo

Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update

2019-12-17 Thread Felipe Franciosi



> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi  wrote:
> 
> On Mon, Dec 16, 2019 at 07:57:32PM +, Felipe Franciosi wrote:
>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva  
>>> wrote:
>>> On Fri, Dec 13, 2019 at 10:41:16AM +, Stefan Hajnoczi wrote:
 Is there a work-in-progress muser patch series you can post to start the
 discussion early?  That way we can avoid reviewers like myself asking
 you to make changes after you have invested a lot of time.
 
>>> 
>>> Absolutely, that is our plan. At the moment we do not have the patches
>>> ready for the review. We have setup internally a milestone and will be
>>> sending that early version as a tarball after we have it completed.
>>> Would be also a meeting something that could help us to stay on the same
>>> page?
>> 
>> Please loop us in if you so set up a meeting.
> 
> There is a bi-weekly KVM Community Call that we can use for phone
> discussions:
> 
>  
> https://calendar.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
> 
> Or we can schedule a one-off call at any time :).

Sounds good either way, whenever it's needed.

> 
> Questions I've seen when discussing muser with people have been:
> 
> 1. Can unprivileged containers create muser devices?  If not, this is a
>   blocker for use cases that want to avoid root privileges entirely.

Yes you can. Muser device creation follows the same process as general
mdev device creation (ie. you write to a sysfs path). That creates an
entry in /dev/vfio and the control plane can further drop privileges
there (set selinux contexts, )

> 
> 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
>   security reasons)?  A similar library could be implemented in
>   userspace along the lines of the vhost-user protocol.  Although VMMs
>   would then need to use a new libmuser-client library instead of
>   reusing their VFIO code to access the device.

Doing it in userspace was the flow we proposed back in last year's KVM
Forum (Edinburgh), but it got turned down. That's why we procured the
kernel approach, which turned out to have some advantages:
- No changes needed to Qemu
- No Qemu needed at all for userspace drivers
- Device emulation process restart is trivial
  (it therefore makes device code upgrades much easier)

Having said that, nothing stops us from enhancing libmuser to talk
directly to Qemu (for the Qemu case). I envision at least two ways of
doing that:
- Hooking up libmuser with Qemu directly (eg. over a unix socket)
- Hooking Qemu with CUSE and implementing the muser.ko interface

For the latter, libmuser would talk to a character device just like it
talks to the vfio character device. We "just" need to implement that
backend in Qemu. :)

> 
> 3. Should this feature be Linux-only?  vhost-user can be implemented on
>   non-Linux OSes...

The userspace approach discussed above certainly can be more portable.
Currently, muser depends on MDEV+VFIO and that's where the restriction
comes from.

F.

> 
> Stefan

Re: [PATCH v11 Kernel 4/6] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap

2019-12-17 Thread Alex Williamson

On Tue, 17 Dec 2019 22:40:49 +0530
Kirti Wankhede  wrote:

> Pages, pinned by external interface for requested IO virtual address
> range,  might get unpinned  and unmapped while migration is active and
> device is still running, that is, in pre-copy phase while guest driver
> still could access those pages. Host device can write to these pages while
> those were mapped. Such pages should be marked dirty so that after
> migration guest driver should still be able to complete the operation.
> 
> To get bitmap during unmap, user should set flag
> VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and
> zeroed by user space application. Bitmap size and page size should be set
> by user application.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 63 
> -
>  include/uapi/linux/vfio.h   | 12 
>  2 files changed, 68 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 215aecb25453..101c2b1e72b4 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -974,7 +974,8 @@ static long verify_bitmap_size(unsigned long npages, 
> unsigned long bitmap_size)
>  }
>  
>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> -  struct vfio_iommu_type1_dma_unmap *unmap)
> +  struct vfio_iommu_type1_dma_unmap *unmap,
> +  unsigned long *bitmap)
>  {
>   uint64_t mask;
>   struct vfio_dma *dma, *dma_last = NULL;
> @@ -1049,6 +1050,15 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   if (dma->task->mm != current->mm)
>   break;
>  
> + if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> + (dma_last != dma))
> + vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
> +  unmap->bitmap_pgsize, unmap->iova,
> +  bitmap);
> + else
> + vfio_remove_unpinned_from_pfn_list(dma, true);
> +
> +
>   if (!RB_EMPTY_ROOT(>pfn_list)) {
>   struct vfio_iommu_type1_dma_unmap nb_unmap;
>  
> @@ -1074,6 +1084,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   _unmap);
>   goto again;
>   }
> +
>   unmapped += dma->size;
>   vfio_remove_dma(iommu, dma);
>   }
> @@ -2404,22 +2415,60 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>   } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>   struct vfio_iommu_type1_dma_unmap unmap;
> - long ret;
> + unsigned long *bitmap = NULL;
> + long ret, bsize;
>  
>   minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
>  
> - if (copy_from_user(, (void __user *)arg, minsz))
> + if (copy_from_user(, (void __user *)arg, sizeof(unmap)))

If we only require minsz, how are we going to copy sizeof(unmap)?  This
breaks existing userspace.  You'll need to copy the remainder of the
user data after validating that they've provided it.

>   return -EFAULT;
>  
> - if (unmap.argsz < minsz || unmap.flags)
> + if (unmap.argsz < minsz ||
> + unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
>   return -EINVAL;
>  
> - ret = vfio_dma_do_unmap(iommu, );
> + if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
> + unsigned long pgshift = __ffs(unmap.bitmap_pgsize);
> + uint64_t iommu_pgmask =
> +  ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> +
> + if (((unmap.bitmap_pgsize - 1) & iommu_pgmask) !=
> +  (unmap.bitmap_pgsize - 1))
> + return -EINVAL;
> +
> + bsize = verify_bitmap_size(unmap.size >> pgshift,
> +unmap.bitmap_size);
> + if (bsize < 0)
> + return bsize;
> +
> + bitmap = kmalloc(bsize, GFP_KERNEL);

Same allocation that we cannot do.  Thanks,

Alex

> + if (!bitmap)
> + return -ENOMEM;
> +
> + if (copy_from_user(bitmap, (void __user *)unmap.bitmap,
> +bsize)) {
> + ret = -EFAULT;
> + goto unmap_exit;
> + }
> + }
> +
> + ret = vfio_dma_do_unmap(iommu, , bitmap);
>   if (ret)
> - return ret;
> + goto unmap_exit;
>  
>

Re: [PATCH v3 0/4] Expose GT CNTFRQ as a CPU property to support AST2600

2019-12-17 Thread Andrew Jeffery




On Wed, 18 Dec 2019, at 01:55, Peter Maydell wrote:
> On Fri, 13 Dec 2019 at 05:48, Andrew Jeffery  wrote:
> >
> > Hello,
> >
> > This is a v3 of the belated follow-up from a few of my earlier attempts to 
> > fix
> > up the ARM generic timer for correct behaviour on the ASPEED AST2600 SoC. 
> > The
> > AST2600 clocks the generic timer at the rate of HPLL, which is configured to
> > 1125MHz.  This is significantly quicker than the currently hard-coded 
> > generic
> > timer rate of 62.5MHz and so we see "sticky" behaviour in the guest.
> >
> > v2 can be found here:
> >
> > https://patchwork.ozlabs.org/cover/1203474/
> >
> > Changes since v2:
> >
> > * Address some minor review comments from Philippe and add tags
> >
> > Changes since v1:
> >
> > * Fix a user mode build failure from partial renaming of 
> > gt_cntfrq_period_ns()
> > * Add tags from Cedric and Richard
> >
> > Please review.
> >
> > Andrew
> >
> > Andrew Jeffery (4):
> >   target/arm: Remove redundant scaling of nexttick
> >   target/arm: Abstract the generic timer frequency
> >   target/arm: Prepare generic timer for per-platform CNTFRQ
> >   ast2600: Configure CNTFRQ at 1125MHz
> 
> 
> 
> 
> Applied to target-arm.next, thanks.

Thanks for your feedback throughout.

Andrew

Re: [PATCH v38 11/22] target/avr: Add instruction disassembly function

2019-12-17 Thread Aleksandar Markovic

On Tuesday, December 17, 2019, Michael Rolnik  wrote:

> Aleksandar.
>
> 1. inst.decode file
> 2. avr features are not accessible from avr_print_insn as it does not
> receive a pointer to CPU context. So, there is not way to inform the user
> that some instructions are not supported unless I define several
> different avr_print_insn functions.
>
>
OK, this is not a crucial feature. If I were you, I would leave it for
future, as one of "nice to have" things. It is possible to implement it, of
course, with some additions to the decoder, but don't spend your energy on
that now, that is my advice.

But patch 1 restructuring is a must. You have to form several logical units
out of it.

inst.decode is written to be convenient to the author (you), but it should
be convenient to the reader, please rearrange items to be as in the ISA
document (even though we both know it is not convenient to you).

The review takes forever, but you are up to one of the most serious tasks
in QEMU, so it is expected, no reason to worry.

Best regards,

Aleksandar



> Regards,
> Michael Rolnik
>
>
>
> On Thu, Dec 12, 2019 at 11:12 AM Aleksandar Markovic <
> aleksandar.m.m...@gmail.com> wrote:
>
>> On Tue, Dec 10, 2019 at 8:18 AM Michael Rolnik  wrote:
>> >
>> > You are right. See at the bottom of the file. There is a comment about
>> it
>> >
>>
>> Sorry, what file?
>>
>> I also see that you disassemble instructions regardless of what AVR
>> CPU the current executable is built for, don't you? OK, not a very big
>> deal, but can be confusing for end user if disassembly text of an
>> instruction that is not supported by a particular CPU is displayed as
>> if it is supported.
>>
>> > Sent from my cell phone, please ignore typos
>> >
>> > On Tue, Dec 10, 2019, 6:21 AM Aleksandar Markovic <
>> aleksandar.m.m...@gmail.com> wrote:
>> >>
>> >>
>> >>
>> >> On Monday, December 9, 2019, Michael Rolnik  wrote:
>> >>>
>> >>> Hi Aleksandar.
>> >>>
>> >>> 1. all instructions are 16 bit long except CALL & JMP they are 32 bit
>> long
>> >>
>> >>
>> >> Accordingto the doc, LDS and STS also have 32-bit coding.
>> >>
>> >>
>> >>>
>> >>> 2. next_word_used is set to true by next_word when called by
>> append_16 when CALL & JMP are parsed
>> >>>
>> >>> Regards,
>> >>> Michael Rolnik
>> >>>
>> >>> On Mon, Dec 9, 2019 at 8:10 PM Aleksandar Markovic <
>> aleksandar.m.m...@gmail.com> wrote:
>> 
>> 
>> 
>>  On Sunday, December 8, 2019, Michael Rolnik 
>> wrote:
>> >
>> > Provide function disassembles executed instruction when `-d in_asm`
>> is
>> > provided
>> >
>> > Example:
>> > `./avr-softmmu/qemu-system-avr -bios 
>> > free-rtos/Demo/AVR_ATMega2560_GCC/demo.elf
>> -d in_asm` will produce something like the following
>> >
>> > ```
>> > ...
>> > IN:
>> > 0x014a:  CALL  0x3808
>> >
>> > IN: main
>> > 0x3808:  CALL  0x4b4
>> >
>> > IN: vParTestInitialise
>> > 0x04b4:  LDI   r24, 255
>> > 0x04b6:  STS   r24, 0
>> > 0x04b8:  MULS  r16, r20
>> > 0x04ba:  OUT   $1, r24
>> > 0x04bc:  LDS   r24, 0
>> > 0x04be:  MULS  r16, r20
>> > 0x04c0:  OUT   $2, r24
>> > 0x04c2:  RET
>> > ...
>> > ```
>> >
>> > Signed-off-by: Michael Rolnik 
>> > Suggested-by: Richard Henderson 
>> > Suggested-by: Philippe Mathieu-Daudé 
>> > Suggested-by: Aleksandar Markovic 
>> > Reviewed-by: Philippe Mathieu-Daudé 
>> > Tested-by: Philippe Mathieu-Daudé 
>> > ---
>> >  target/avr/cpu.h   |   1 +
>> >  target/avr/cpu.c   |   2 +-
>> >  target/avr/disas.c | 226 ++
>> +++
>> >  target/avr/translate.c |  11 ++
>> >  4 files changed, 239 insertions(+), 1 deletion(-)
>> >  create mode 100644 target/avr/disas.c
>> >
>> > diff --git a/target/avr/cpu.h b/target/avr/cpu.h
>> > index c217eefeb4..a8a3e7ade6 100644
>> > --- a/target/avr/cpu.h
>> > +++ b/target/avr/cpu.h
>> > @@ -178,6 +178,7 @@ bool avr_cpu_exec_interrupt(CPUState *cpu, int
>> int_req);
>> >  hwaddr avr_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
>> >  int avr_cpu_gdb_read_register(CPUState *cpu, uint8_t *buf, int
>> reg);
>> >  int avr_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int
>> reg);
>> > +int avr_print_insn(bfd_vma addr, disassemble_info *info);
>> >
>> >  static inline int avr_feature(CPUAVRState *env, int feature)
>> >  {
>> > diff --git a/target/avr/cpu.c b/target/avr/cpu.c
>> > index c5cafcae3c..be4b921e4d 100644
>> > --- a/target/avr/cpu.c
>> > +++ b/target/avr/cpu.c
>> > @@ -83,7 +83,7 @@ static void avr_cpu_reset(CPUState *cs)
>> >  static void avr_cpu_disas_set_info(CPUState *cpu,
>> disassemble_info *info)
>> >  {
>> >  info->mach = bfd_arch_avr;
>> > -

[PATCH] target/arm: fix IL bit for data abort exceptions

2019-12-17 Thread Jeff Kubascik

The Instruction Length bit of the Exception Syndrome Register was fixed to 1
for data aborts. This bit is used by the Xen hypervisor to determine how to
increment the program counter after a mmio handler is successful and returns
control back to the guest virtual machine. With this value fixed to 1, the
hypervisor would always increment the program counter by 0x4. This is a
problem when the guest virtual machine is using Thumb instructions, as the
instruction that caused the exception may be 16 bits.

This adds a is_16bit flag to the disassembler context to keep track of the
current instruction length. For load/store instructions, the instruction
length bit is stored with the instruction syndrome data, to be later used if
the data abort occurs.

Signed-off-by: Jeff Kubascik 
---
Hello,

I am using the ARMv8 version of QEMU to run the Xen hypervisor with a guest
virtual machine compiled for AArch32/Thumb code. I have noticed that when
the guest VM tries to write to an emulated PL011 register, the mmio handler
always increments the program counter by 4, even if the store instruction
that caused the exception was a 16-bit Thumb instruction.

I have traced this back to the IL bit in the ESR_EL2 register. Xen uses the
IL bit to determine how to increment the program counter. However, QEMU does
not correctly emulate this bit, always setting it to 1 (32-bit instruction).

The above patch works for my setup. However, I am not very familiar with the
QEMU code base, so it may not be the best way to do it, or even be correct.
Any feedback would be greatly appreciated.

Sincerely,
Jeff Kubascik
---
 target/arm/tlb_helper.c| 2 +-
 target/arm/translate-a64.c | 1 +
 target/arm/translate.c | 4 +++-
 target/arm/translate.h | 2 ++
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
index 5feb312941..e63f8bda29 100644
--- a/target/arm/tlb_helper.c
+++ b/target/arm/tlb_helper.c
@@ -44,7 +44,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t 
template_syn,
 syn = syn_data_abort_with_iss(same_el,
   0, 0, 0, 0, 0,
   ea, 0, s1ptw, is_write, fsc,
-  false);
+  true);
 /* Merge the runtime syndrome with the template syndrome.  */
 syn |= template_syn;
 }
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index d4bebbe629..a3c618fdd9 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14045,6 +14045,7 @@ static void disas_a64_insn(CPUARMState *env, 
DisasContext *s)
 s->pc_curr = s->base.pc_next;
 insn = arm_ldl_code(env, s->base.pc_next, s->sctlr_b);
 s->insn = insn;
+s->is_16bit = false;
 s->base.pc_next += 4;
 
 s->fp_access_checked = false;
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 2b6c1f91bf..300480f1b7 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8555,7 +8555,7 @@ static ISSInfo make_issinfo(DisasContext *s, int rd, bool 
p, bool w)
 
 /* ISS not valid if writeback */
 if (p && !w) {
-ret = rd;
+ret = rd | (s->is_16bit ? ISSIs16Bit : 0);
 } else {
 ret = ISSInvalid;
 }
@@ -11057,6 +11057,7 @@ static void arm_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 dc->pc_curr = dc->base.pc_next;
 insn = arm_ldl_code(env, dc->base.pc_next, dc->sctlr_b);
 dc->insn = insn;
+dc->is_16bit = false;
 dc->base.pc_next += 4;
 disas_arm_insn(dc, insn);
 
@@ -11126,6 +11127,7 @@ static void thumb_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 dc->pc_curr = dc->base.pc_next;
 insn = arm_lduw_code(env, dc->base.pc_next, dc->sctlr_b);
 is_16bit = thumb_insn_is_16bit(dc, dc->base.pc_next, insn);
+dc->is_16bit = is_16bit;
 dc->base.pc_next += 2;
 if (!is_16bit) {
 uint32_t insn2 = arm_lduw_code(env, dc->base.pc_next, dc->sctlr_b);
diff --git a/target/arm/translate.h b/target/arm/translate.h
index b837b7fcbf..c16f434477 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -14,6 +14,8 @@ typedef struct DisasContext {
 target_ulong pc_curr;
 target_ulong page_start;
 uint32_t insn;
+/* 16-bit instruction flag */
+bool is_16bit;
 /* Nonzero if this instruction has been conditionally skipped.  */
 int condjmp;
 /* The label that will be jumped to when the instruction is skipped.  */
-- 
2.17.1

Re: [PATCH v3 4/4] qom/object: Use common get/set uint helpers

2019-12-17 Thread Felipe Franciosi

Hi Alexey,

I don't know how, but somehow I didn't receive your reply:
https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg02127.html

(I was about to follow up, then I decided to look at the archives to
make sure your response didn't get lost in my client somehow...)

Still not sure of what happened, lol, let's move on. :)

I'm top-posting as I couldn't pull your response in for a proper reply.

You said:
> The franciozzy/autosetters branch with this on top -
> https://github.com/aik/qemu/commit/94c33bb7debf
> - works fine. Thanks,

Your patch basically reverts a part of my commit and then makes the
change Marc-Andre recommended (by dropping the (void *) cast).

Is it ok for me to just drop that part of my patch and send the v4?

You can follow-up on the cast change afterwards.

Thanks,
F.


> On Dec 10, 2019, at 1:04 PM, Felipe Franciosi  wrote:
> 
> Hi
> 
>> On Dec 2, 2019, at 6:31 AM, Alexey Kardashevskiy  wrote:
>> 
>> 
>> 
>> On 30/11/2019 04:46, Felipe Franciosi wrote:
>>> Several objects implemented their own uint property getters and setters,
>>> despite them being straightforward (without any checks/validations on
>>> the values themselves) and identical across objects. This makes use of
>>> an enhanced API for object_property_add_uintXX_ptr() which offers
>>> default setters.
>>> 
>>> Some of these setters used to update the value even if the type visit
>>> failed (eg. because the value being set overflowed over the given type).
>>> The new setter introduces a check for these errors, not updating the
>>> value if an error occurred. The error is propagated.
>>> 
>>> Signed-off-by: Felipe Franciosi 
>>> ---
>>> hw/acpi/ich9.c   |  95 --
>>> hw/isa/lpc_ich9.c|  12 +
>>> hw/misc/edu.c|  13 ++
>>> hw/pci-host/q35.c|  14 ++
>>> hw/ppc/spapr.c   |  18 ++--
>>> hw/vfio/pci-quirks.c |  20 +++-
>>> memory.c |  15 +-
>>> target/arm/cpu.c |  22 ++---
>>> target/i386/sev.c| 106 ---
>>> 9 files changed, 40 insertions(+), 275 deletions(-)
>>> 
>>> diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
>>> index 742fb78226..d9305be891 100644
>>> --- a/hw/acpi/ich9.c
>>> +++ b/hw/acpi/ich9.c
>>> @@ -357,81 +357,6 @@ static void ich9_pm_set_cpu_hotplug_legacy(Object 
>>> *obj, bool value,
>>>s->pm.cpu_hotplug_legacy = value;
>>> }
>>> 
>>> -static void ich9_pm_get_disable_s3(Object *obj, Visitor *v, const char 
>>> *name,
>>> -   void *opaque, Error **errp)
>>> -{
>>> -ICH9LPCPMRegs *pm = opaque;
>>> -uint8_t value = pm->disable_s3;
>>> -
>>> -visit_type_uint8(v, name, , errp);
>>> -}
>>> -
>>> -static void ich9_pm_set_disable_s3(Object *obj, Visitor *v, const char 
>>> *name,
>>> -   void *opaque, Error **errp)
>>> -{
>>> -ICH9LPCPMRegs *pm = opaque;
>>> -Error *local_err = NULL;
>>> -uint8_t value;
>>> -
>>> -visit_type_uint8(v, name, , _err);
>>> -if (local_err) {
>>> -goto out;
>>> -}
>>> -pm->disable_s3 = value;
>>> -out:
>>> -error_propagate(errp, local_err);
>>> -}
>>> -
>>> -static void ich9_pm_get_disable_s4(Object *obj, Visitor *v, const char 
>>> *name,
>>> -   void *opaque, Error **errp)
>>> -{
>>> -ICH9LPCPMRegs *pm = opaque;
>>> -uint8_t value = pm->disable_s4;
>>> -
>>> -visit_type_uint8(v, name, , errp);
>>> -}
>>> -
>>> -static void ich9_pm_set_disable_s4(Object *obj, Visitor *v, const char 
>>> *name,
>>> -   void *opaque, Error **errp)
>>> -{
>>> -ICH9LPCPMRegs *pm = opaque;
>>> -Error *local_err = NULL;
>>> -uint8_t value;
>>> -
>>> -visit_type_uint8(v, name, , _err);
>>> -if (local_err) {
>>> -goto out;
>>> -}
>>> -pm->disable_s4 = value;
>>> -out:
>>> -error_propagate(errp, local_err);
>>> -}
>>> -
>>> -static void ich9_pm_get_s4_val(Object *obj, Visitor *v, const char *name,
>>> -   void *opaque, Error **errp)
>>> -{
>>> -ICH9LPCPMRegs *pm = opaque;
>>> -uint8_t value = pm->s4_val;
>>> -
>>> -visit_type_uint8(v, name, , errp);
>>> -}
>>> -
>>> -static void ich9_pm_set_s4_val(Object *obj, Visitor *v, const char *name,
>>> -   void *opaque, Error **errp)
>>> -{
>>> -ICH9LPCPMRegs *pm = opaque;
>>> -Error *local_err = NULL;
>>> -uint8_t value;
>>> -
>>> -visit_type_uint8(v, name, , _err);
>>> -if (local_err) {
>>> -goto out;
>>> -}
>>> -pm->s4_val = value;
>>> -out:
>>> -error_propagate(errp, local_err);
>>> -}
>>> -
>>> static bool ich9_pm_get_enable_tco(Object *obj, Error **errp)
>>> {
>>>ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
>>> @@ -468,18 +393,14 @@ void ich9_pm_add_properties(Object *obj, 
>>> ICH9LPCPMRegs *pm, Error **errp)
>>> ich9_pm_get_cpu_hotplug_legacy,
>>>

Re: [PATCH v11 Kernel 3/6] vfio iommu: Implementation of ioctl to for dirty pages tracking.

2019-12-17 Thread Alex Williamson

On Tue, 17 Dec 2019 22:40:48 +0530
Kirti Wankhede  wrote:

> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> - Start unpinned pages dirty pages tracking while migration is active and
>   device is running, i.e. during pre-copy phase.
> - Stop unpinned pages dirty pages tracking. This is required to stop
>   unpinned dirty pages tracking if migration failed or cancelled during
>   pre-copy phase. Unpinned pages tracking is clear.
> - Get dirty pages bitmap. Stop unpinned dirty pages tracking and clear
>   unpinned pages information on bitmap read. This ioctl returns bitmap of
>   dirty pages, its user space application responsibility to copy content
>   of dirty pages from source to destination during migration.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 218 
> ++--
>  1 file changed, 209 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2ada8e6cdb88..215aecb25453 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -70,6 +70,7 @@ struct vfio_iommu {
>   unsigned intdma_avail;
>   boolv2;
>   boolnesting;
> + booldirty_page_tracking;
>  };
>  
>  struct vfio_domain {
> @@ -112,6 +113,7 @@ struct vfio_pfn {
>   dma_addr_t  iova;   /* Device address */
>   unsigned long   pfn;/* Host pfn */
>   atomic_tref_count;
> + boolunpinned;

Doesn't this duplicate ref_count == 0?

>  };
>  
>  struct vfio_regions {
> @@ -244,6 +246,32 @@ static void vfio_remove_from_pfn_list(struct vfio_dma 
> *dma,
>   kfree(vpfn);
>  }
>  
> +static void vfio_remove_unpinned_from_pfn_list(struct vfio_dma *dma, bool 
> warn)
> +{
> + struct rb_node *n = rb_first(>pfn_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_pfn *vpfn = rb_entry(n, struct vfio_pfn, node);
> +
> + if (warn)
> + WARN_ON_ONCE(vpfn->unpinned);

This option isn't used within this patch, perhaps better to add with
its use case, but it seems this presents both a denial of service via
kernel tainting and an undocumented feature/bug.  As I interpret its
use within the next patch, this generates a warning if the user
unmapped the IOVA with dirty pages present, without using the dirty
bitmap extension of the unmap call.  Our job is not to babysit the
user, if they don't care to look at the dirty bitmap, that's their
prerogative.  Drop this warning and the function arg.

> +
> + if (vpfn->unpinned)

if (!atomic_read(>ref_count))

> + vfio_remove_from_pfn_list(dma, vpfn);
> + }
> +}
> +
> +static void vfio_remove_unpinned_from_dma_list(struct vfio_iommu *iommu)
> +{
> + struct rb_node *n = rb_first(>dma_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> +
> + vfio_remove_unpinned_from_pfn_list(dma, false);
> + }
> +}
> +
>  static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct vfio_dma *dma,
>  unsigned long iova)
>  {
> @@ -254,13 +282,17 @@ static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct 
> vfio_dma *dma,
>   return vpfn;
>  }
>  
> -static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn 
> *vpfn)
> +static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn 
> *vpfn,
> +   bool dirty_tracking)
>  {
>   int ret = 0;
>  
>   if (atomic_dec_and_test(>ref_count)) {
>   ret = put_pfn(vpfn->pfn, dma->prot);
> - vfio_remove_from_pfn_list(dma, vpfn);
> + if (dirty_tracking)
> + vpfn->unpinned = true;
> + else
> + vfio_remove_from_pfn_list(dma, vpfn);

This can also simply use ref_count.  BTW, checking the locking, I think
->ref_count is only manipulated under iommu->lock, therefore the atomic
ops are probably overkill.

>   }
>   return ret;
>  }
> @@ -504,7 +536,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, 
> unsigned long vaddr,
>  }
>  
>  static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova,
> - bool do_accounting)
> + bool do_accounting, bool dirty_tracking)
>  {
>   int unlocked;
>   struct vfio_pfn *vpfn = vfio_find_vpfn(dma, iova);
> @@ -512,7 +544,10 @@ static int vfio_unpin_page_external(struct vfio_dma 
> *dma, dma_addr_t iova,
>   if (!vpfn)
>   return 0;
>  
> - unlocked = vfio_iova_put_vfio_pfn(dma, vpfn);
> + if (vpfn->unpinned)
> + return 0;

Combine with above, if (!vpfn || !vpfn->ref_count)

> +
> + unlocked =

Re: [PATCH 0/6] Fix more GCC9 -O3 warnings

2019-12-17 Thread Chubb, Peter (Data61, Kensington NSW)

> "Philippe" == Philippe Mathieu-Daudé  writes:

Philippe> Fix some trivial warnings when building with -O3.

For compatibility with lint and other older checkers, it'd be good to keep
this as /* FALLTHROUGH */ (which gcc should accept according to its
manual).

Fixing the comments' placement is a different matter, and should be
done.  Seems to me that until gcc started warning for this, noone had
actually run a checker, and the comments were just for human info.

Peter C
-- 
Dr Peter Chubb Tel: +61 2 9490 5852  http://ts.data61.csiro.au/
Trustworthy Systems Group   Data61 (formerly NICTA)

Re: [PATCH v38 11/22] target/avr: Add instruction disassembly function

2019-12-17 Thread Michael Rolnik

Aleksandar.

1. inst.decode file
2. avr features are not accessible from avr_print_insn as it does not
receive a pointer to CPU context. So, there is not way to inform the user
that some instructions are not supported unless I define several
different avr_print_insn functions.

Regards,
Michael Rolnik



On Thu, Dec 12, 2019 at 11:12 AM Aleksandar Markovic <
aleksandar.m.m...@gmail.com> wrote:

> On Tue, Dec 10, 2019 at 8:18 AM Michael Rolnik  wrote:
> >
> > You are right. See at the bottom of the file. There is a comment about it
> >
>
> Sorry, what file?
>
> I also see that you disassemble instructions regardless of what AVR
> CPU the current executable is built for, don't you? OK, not a very big
> deal, but can be confusing for end user if disassembly text of an
> instruction that is not supported by a particular CPU is displayed as
> if it is supported.
>
> > Sent from my cell phone, please ignore typos
> >
> > On Tue, Dec 10, 2019, 6:21 AM Aleksandar Markovic <
> aleksandar.m.m...@gmail.com> wrote:
> >>
> >>
> >>
> >> On Monday, December 9, 2019, Michael Rolnik  wrote:
> >>>
> >>> Hi Aleksandar.
> >>>
> >>> 1. all instructions are 16 bit long except CALL & JMP they are 32 bit
> long
> >>
> >>
> >> Accordingto the doc, LDS and STS also have 32-bit coding.
> >>
> >>
> >>>
> >>> 2. next_word_used is set to true by next_word when called by append_16
> when CALL & JMP are parsed
> >>>
> >>> Regards,
> >>> Michael Rolnik
> >>>
> >>> On Mon, Dec 9, 2019 at 8:10 PM Aleksandar Markovic <
> aleksandar.m.m...@gmail.com> wrote:
> 
> 
> 
>  On Sunday, December 8, 2019, Michael Rolnik 
> wrote:
> >
> > Provide function disassembles executed instruction when `-d in_asm`
> is
> > provided
> >
> > Example:
> > `./avr-softmmu/qemu-system-avr -bios
> free-rtos/Demo/AVR_ATMega2560_GCC/demo.elf -d in_asm` will produce
> something like the following
> >
> > ```
> > ...
> > IN:
> > 0x014a:  CALL  0x3808
> >
> > IN: main
> > 0x3808:  CALL  0x4b4
> >
> > IN: vParTestInitialise
> > 0x04b4:  LDI   r24, 255
> > 0x04b6:  STS   r24, 0
> > 0x04b8:  MULS  r16, r20
> > 0x04ba:  OUT   $1, r24
> > 0x04bc:  LDS   r24, 0
> > 0x04be:  MULS  r16, r20
> > 0x04c0:  OUT   $2, r24
> > 0x04c2:  RET
> > ...
> > ```
> >
> > Signed-off-by: Michael Rolnik 
> > Suggested-by: Richard Henderson 
> > Suggested-by: Philippe Mathieu-Daudé 
> > Suggested-by: Aleksandar Markovic 
> > Reviewed-by: Philippe Mathieu-Daudé 
> > Tested-by: Philippe Mathieu-Daudé 
> > ---
> >  target/avr/cpu.h   |   1 +
> >  target/avr/cpu.c   |   2 +-
> >  target/avr/disas.c | 226
> +
> >  target/avr/translate.c |  11 ++
> >  4 files changed, 239 insertions(+), 1 deletion(-)
> >  create mode 100644 target/avr/disas.c
> >
> > diff --git a/target/avr/cpu.h b/target/avr/cpu.h
> > index c217eefeb4..a8a3e7ade6 100644
> > --- a/target/avr/cpu.h
> > +++ b/target/avr/cpu.h
> > @@ -178,6 +178,7 @@ bool avr_cpu_exec_interrupt(CPUState *cpu, int
> int_req);
> >  hwaddr avr_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
> >  int avr_cpu_gdb_read_register(CPUState *cpu, uint8_t *buf, int reg);
> >  int avr_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int
> reg);
> > +int avr_print_insn(bfd_vma addr, disassemble_info *info);
> >
> >  static inline int avr_feature(CPUAVRState *env, int feature)
> >  {
> > diff --git a/target/avr/cpu.c b/target/avr/cpu.c
> > index c5cafcae3c..be4b921e4d 100644
> > --- a/target/avr/cpu.c
> > +++ b/target/avr/cpu.c
> > @@ -83,7 +83,7 @@ static void avr_cpu_reset(CPUState *cs)
> >  static void avr_cpu_disas_set_info(CPUState *cpu, disassemble_info
> *info)
> >  {
> >  info->mach = bfd_arch_avr;
> > -info->print_insn = NULL;
> > +info->print_insn = avr_print_insn;
> >  }
> >
> >  static void avr_cpu_realizefn(DeviceState *dev, Error **errp)
> > diff --git a/target/avr/disas.c b/target/avr/disas.c
> > new file mode 100644
> > index 00..22863d2eb1
> > --- /dev/null
> > +++ b/target/avr/disas.c
> > @@ -0,0 +1,226 @@
> > +/*
> > + * AVR disassembler
> > + *
> > + * Copyright (c) 2019 Richard Henderson 
> > + * Copyright (c) 2019 Michael Rolnik 
> > + *
> > + * This program is free software: you can redistribute it and/or
> modify
> > + * it under the terms of the GNU General Public License as
> published by
> > + * the Free Software Foundation, either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
>

Re: [PATCH v5 1/5] tpm_spapr: Support TPM for ppc64 using CRQ based interface

2019-12-17 Thread Stefan Berger


On 12/16/19 7:29 PM, David Gibson wrote:

On Fri, Dec 13, 2019 at 08:03:36AM -0500, Stefan Berger wrote:

On 12/13/19 12:34 AM, David Gibson wrote:

The existing one looks like this:

typedef struct SpaprVioCrq {
     uint64_t qladdr;
     uint32_t qsize;
     uint32_t qnext;
     int(*SendFunc)(struct SpaprVioDevice *vdev, uint8_t *crq);
} SpaprVioCrq;

I don't seem to find the fields there that we need for vTPM support.

Yeah, I can see the difference in the structures.  What I'm after is
what is the difference in purpose which means they have different
content.

Having read through the whole series now, I *think* the answer is that
the tpm specific structure is one entry in the request queue for the
vtpm, whereas the VioCrq structure is a handle on an entire queue.

I think the tpm one needs a rename to reflect that a) it's vtpm
specific and b) it's not actually a queue, just part of it.



v6 has it as TpmCrq. It's local to the file, so from that perspective 
it's specific to (v)TPM.




This is a 1:1 copy from the existing TIS driver.

Hm, right.  Probably not a bad idea to move that out as a helper
function then.



In V7 then.



+static void tpm_spapr_update_deviceclass(SpaprVioDevice *dev)
+{
+SPAPRvTPMState *s = VIO_SPAPR_VTPM(dev);
+SpaprVioDeviceClass *k = VIO_SPAPR_DEVICE_GET_CLASS(dev);
+
+switch (s->be_tpm_version) {
+case TPM_VERSION_UNSPEC:
+assert(false);
+break;
+case TPM_VERSION_1_2:
+k->dt_name = "vtpm";
+k->dt_type = "IBM,vtpm";
+k->dt_compatible = "IBM,vtpm";
+break;
+case TPM_VERSION_2_0:
+k->dt_name = "vtpm";
+k->dt_type = "IBM,vtpm";
+k->dt_compatible = "IBM,vtpm20";
+break;

Erk.  Updating DeviceClass structures on the fly is hideously ugly.
We might need to take a different approach for this.

Make a suggestion... Obviously, we can hard-initialize dt_name and dt_type
but dt_compatible can only be set after we have determined the version of
TPM.

As you say name and type can just be put into the class statically.



I did this in v6.



Since you need to change compatible based on an internal variable,
we'd need to replace the static dt_compatible in the class with a
callback.



Why can we not initialize it once we know the version of TPM? From the 
perspective of SLOF at least this seems to be building the device tree 
fine since it sees the proper value...



   Stefan

[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2019-12-17 Thread dann frazier

I tested the patch in Comment #34, and it was able to pass 500
iterations.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Confirmed
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Bionic:
  Confirmed
Status in qemu source package in Disco:
  Confirmed
Status in qemu source package in Eoan:
  In Progress
Status in qemu source package in Focal:
  Confirmed

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  [ Original Description ]

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=,
  __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
  #2  qemu_poll_ns (fds=, nfds=,
  timeout=timeout@entry=-1) at util/qemu-timer.c:322
  #3  0xbbefbf80 in os_host_main_loop_wait (timeout=-1)
  at util/main-loop.c:233
  #4  main_loop_wait (nonblocking=) at util/main-loop.c:497
  #5  0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at 
qemu-img.c:1980
  #6  img_convert (argc=, argv=) at 
qemu-img.c:2456
  #7  0xbbe2333c in main (argc=7, argv=) at 
qemu-img.c:4975

  Reproduced w/ latest QEMU git (@ 53744e0a182)

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions

Re: [PATCH] memory: Do not allow subregion out of the parent region range

2019-12-17 Thread Philippe Mathieu-Daudé


On 12/17/19 7:52 PM, Alex Williamson wrote:

On Tue, 17 Dec 2019 19:31:41 +0100
Paolo Bonzini  wrote:


On 17/12/19 19:17, Peter Maydell wrote:

On Tue, 17 Dec 2019 at 16:57, Richard Henderson
 wrote:


On 12/17/19 1:58 AM, Christophe de Dinechin wrote:


  

On 17 Dec 2019, at 11:51, Paolo Bonzini  wrote:
Yes, the idea is that you could have for one version of the device

   parent 0x000-0x7ff
 stuff 0x000-0x3ff
 morestuff 0x400-0x7ff

and for another

   parent 0x000-0x3ff
 stuff 0x000-0x3ff
 morestuff 0x400-0x7ff

where parent is the BAR, and you can share the code to generate the tree
underneath parent.


I can see why you would have code reuse reasons to do that,
but frankly it looks buggy and confusing. In the rare cases
where this is indented, maybe add a flag making it explicit?


The guest OS is programming the BAR, producing a configuration that, while it
doesn't make sense, is also legal per PCI.  QEMU cannot abort for this
configuration.


Does guest programming of the PCI BAR size actually change the size
of the 'parent' region, or does it just result in the creation
of an appropriately sized alias into 'parent' ?


Resizable BARs are not handled by the PCI host bridge but rather from
the device itself, so the device is free to handle them either way.


More specifically, it's the responsibility of drivers within the guest
to resize the parent bridge aperture to make the extent of the BAR
accessible.  This does seem like an interesting way to implement a
resizable BAR in QEMU though.  Thanks,


This is something I'm thinking about since some time, as I observed this 
behavior in 3 different MIPS boards with different northbridge chipset 
(Malta with the GT64120, Fuloong2E with the Bonito).
The firmware sets one layout, Linux (or other) reinit & reorder all the 
memory layout. I guess Mark hit the same issue with his sparc64 based 
boards.

Re: [PATCH] memory: Do not allow subregion out of the parent region range

2019-12-17 Thread Alex Williamson

On Tue, 17 Dec 2019 19:31:41 +0100
Paolo Bonzini  wrote:

> On 17/12/19 19:17, Peter Maydell wrote:
> > On Tue, 17 Dec 2019 at 16:57, Richard Henderson
> >  wrote:  
> >>
> >> On 12/17/19 1:58 AM, Christophe de Dinechin wrote:  
> >>>
> >>>  
>  On 17 Dec 2019, at 11:51, Paolo Bonzini  wrote:
>  Yes, the idea is that you could have for one version of the device
> 
>    parent 0x000-0x7ff
>  stuff 0x000-0x3ff
>  morestuff 0x400-0x7ff
> 
>  and for another
> 
>    parent 0x000-0x3ff
>  stuff 0x000-0x3ff
>  morestuff 0x400-0x7ff
> 
>  where parent is the BAR, and you can share the code to generate the tree
>  underneath parent.  
> >>>
> >>> I can see why you would have code reuse reasons to do that,
> >>> but frankly it looks buggy and confusing. In the rare cases
> >>> where this is indented, maybe add a flag making it explicit?  
> >>
> >> The guest OS is programming the BAR, producing a configuration that, while 
> >> it
> >> doesn't make sense, is also legal per PCI.  QEMU cannot abort for this
> >> configuration.  
> > 
> > Does guest programming of the PCI BAR size actually change the size
> > of the 'parent' region, or does it just result in the creation
> > of an appropriately sized alias into 'parent' ?  
> 
> Resizable BARs are not handled by the PCI host bridge but rather from
> the device itself, so the device is free to handle them either way.

More specifically, it's the responsibility of drivers within the guest
to resize the parent bridge aperture to make the extent of the BAR
accessible.  This does seem like an interesting way to implement a
resizable BAR in QEMU though.  Thanks,

Alex

Re: QEMU for Qualcomm Hexagon - KVM Forum talk and code available

2019-12-17 Thread Philippe Mathieu-Daudé

On Tue, Dec 17, 2019 at 7:41 PM Philippe Mathieu-Daudé
 wrote:
>
> On 12/17/19 7:21 PM, Peter Maydell wrote:
> > On Tue, 17 Dec 2019 at 18:16, Taylor Simpson  wrote:
> >> Question 2:
> >> What is the best source of guidance on breaking down support for a new 
> >> target into a patch series?
> >
> > Look at how previous ports did it.
>
> Recent ports were system (softmmu), this is a linux-user port. The last
> architecture merged is RISCV, they did that with commit, so I'm not sure
> this is our best example on breaking down:
>
> $ git show --stat ea10325917c8
> commit ea10325917c8a8f92611025c85950c00f826cb73
> Author: Michael Clark 
> Date:   Sat Mar 3 01:31:10 2018 +1300
>
>  RISC-V Disassembler
>
>  The RISC-V disassembler has no dependencies outside of the 'disas'
>  directory so it can be applied independently. The majority of the
>  disassembler is machine-generated from instruction set metadata:
>
>  - https://github.com/michaeljclark/riscv-meta
>
>  Expected checkpatch errors for consistency and brevity reasons:
>
>  ERROR: line over 90 characters
>  ERROR: trailing statements should be on next line
>  ERROR: space prohibited between function name and open parenthesis '('
>
>  Reviewed-by: Richard Henderson 
>  Signed-off-by: Michael Clark 
>
>   include/disas/bfd.h |2 +
>   disas.c |2 +
>   disas/riscv.c   | 3048
> 
>   disas/Makefile.objs |1 +
>   4 files changed, 3053 insertions(+)
>
> $ git show --stat 55c2a12cbcd3d
> commit 55c2a12cbcd3d417de39ee82dfe1d26b22a07116
> Author: Michael Clark 
> Date:   Sat Mar 3 01:31:11 2018 +1300
>
>  RISC-V TCG Code Generation
>
>  TCG code generation for the RV32IMAFDC and RV64IMAFDC. The QEMU
>  RISC-V code generator has complete coverage for the Base ISA v2.2,
>  Privileged ISA v1.9.1 and Privileged ISA v1.10:
>
>  - RISC-V Instruction Set Manual Volume I: User-Level ISA Version 2.2
>  - RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.9.1
>  - RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.10
>
>  Reviewed-by: Richard Henderson 
>  Signed-off-by: Bastian Koppelmann 
>  Signed-off-by: Sagar Karandikar 
>  Signed-off-by: Michael Clark 
>
>   target/riscv/instmap.h   |  364 ++
>   target/riscv/translate.c | 1978
> +++
>   2 files changed, 2342 insertions(+)
>
> $ git show --stat 47ae93cdfed
> commit 47ae93cdfedc683c56e19113d516d7ce4971c8e6
> Author: Michael Clark 
> Date:   Sat Mar 3 01:31:11 2018 +1300
>
>  RISC-V Linux User Emulation
>
>  Implementation of linux user emulation for RISC-V.
>
>  Reviewed-by: Richard Henderson 
>  Signed-off-by: Sagar Karandikar 
>  Signed-off-by: Michael Clark 
>
>   linux-user/riscv/syscall_nr.h | 287
> +++
>   linux-user/riscv/target_cpu.h |  18 +
>   linux-user/riscv/target_elf.h |  14 ++
>   linux-user/riscv/target_signal.h  |  23 
>   linux-user/riscv/target_structs.h |  46 
>   linux-user/riscv/target_syscall.h |  56
> ++
>   linux-user/riscv/termbits.h   | 222
> +++
>   linux-user/syscall_defs.h |  13 +
>   target/riscv/cpu_user.h   |  13 +
>   linux-user/elfload.c  |  22 +++
>   linux-user/main.c |  99
> +++
>   linux-user/signal.c   | 203
> +-
>   linux-user/syscall.c  |   2 ++
>   13 files changed, 1012 insertions(+), 6 deletions(-)

I sent too quick. You can read a summary of the different review
comments before the final merge in tag 'riscv-qemu-upstream-v8.2'.

Re: [EXTERNAL]Re: [PATCH-for-4.2] hw/mips: Deprecate the r4k machine

2019-12-17 Thread Aleksandar Markovic




From: Thomas Huth 
Sent: Tuesday, December 17, 2019 7:10 PM
To: Philippe Mathieu-Daudé; qemu-devel@nongnu.org
Cc: libvir-l...@redhat.com; Hervé Poussineau; Aleksandar Markovic; Aleksandar 
Rikalo; Aurelien Jarno
Subject: [EXTERNAL]Re: [PATCH-for-4.2] hw/mips: Deprecate the r4k machine

 Hi,

On 25/11/2019 11.41, Philippe Mathieu-Daudé wrote:
> > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> > index 4b4b7425ac..05265b43c8 100644
> > --- a/qemu-deprecated.texi
> > +++ b/qemu-deprecated.texi
> > @@ -266,6 +266,11 @@ The 'scsi-disk' device is deprecated. Users should use 
> > 'scsi-hd' or
> >
> >  @section System emulator machines
> >
> > +@subsection mips r4k platform (since 4.2)
> 
> Since the patch has now been merged after the release of 4.2, the mips
> 4k platform will be deprecated in 5.0 instead. Could you send a patch to
> fix it up?

OK, I'll send a patch that'll certainly be applied to the next MIPS queue.

Thanks for spotting this, Thomas.

Aleksandar

>  Thanks,
>   Thomas

Re: [PATCH v10 Kernel 1/5] vfio: KABI for migration interface for device state

2019-12-17 Thread Alex Williamson

On Tue, 17 Dec 2019 11:58:44 +0530
Kirti Wankhede  wrote:

> On 12/17/2019 4:14 AM, Alex Williamson wrote:
> > On Tue, 17 Dec 2019 01:51:36 +0530
> > Kirti Wankhede  wrote:
> >   
> >> - Defined MIGRATION region type and sub-type.
> >>
> >> - Defined vfio_device_migration_info structure which will be placed at 0th
> >>offset of migration region to get/set VFIO device related information.
> >>Defined members of structure and usage on read/write access.
> >>
> >> - Defined device states and added state transition details in the comment.
> >>
> >> - Added sequence to be followed while saving and resuming VFIO device state
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>   include/uapi/linux/vfio.h | 180 
> >> ++
> >>   1 file changed, 180 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >> index 9e843a147ead..a0817ba267c1 100644
> >> --- a/include/uapi/linux/vfio.h
> >> +++ b/include/uapi/linux/vfio.h
> >> @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
> >>   #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0x)
> >>   #define VFIO_REGION_TYPE_GFX(1)
> >>   #define VFIO_REGION_TYPE_CCW (2)
> >> +#define VFIO_REGION_TYPE_MIGRATION  (3)
> >>   
> >>   /* sub-types for VFIO_REGION_TYPE_PCI_* */
> >>   
> >> @@ -379,6 +380,185 @@ struct vfio_region_gfx_edid {
> >>   /* sub-types for VFIO_REGION_TYPE_CCW */
> >>   #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD(1)
> >>   
> >> +/* sub-types for VFIO_REGION_TYPE_MIGRATION */
> >> +#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
> >> +
> >> +/*
> >> + * Structure vfio_device_migration_info is placed at 0th offset of
> >> + * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related 
> >> migration
> >> + * information. Field accesses from this structure are only supported at 
> >> their
> >> + * native width and alignment, otherwise the result is undefined and 
> >> vendor
> >> + * drivers should return an error.
> >> + *
> >> + * device_state: (read/write)
> >> + *  To indicate vendor driver the state VFIO device should be 
> >> transitioned
> >> + *  to. If device state transition fails, write on this field return 
> >> error.
> >> + *  It consists of 3 bits:
> >> + *  - If bit 0 set, indicates _RUNNING state. When its clear, that 
> >> indicates  
> > 
> > s/its/it's/
> >   
> >> + *_STOP state. When device is changed to _STOP, driver should stop
> >> + *device before write() returns.
> >> + *  - If bit 1 set, indicates _SAVING state. When set, that indicates 
> >> driver
> >> + *should start gathering device state information which will be 
> >> provided
> >> + *to VFIO user space application to save device's state.
> >> + *  - If bit 2 set, indicates _RESUMING state. When set, that 
> >> indicates
> >> + *prepare to resume device, data provided through migration region
> >> + *should be used to resume device.
> >> + *  Bits 3 - 31 are reserved for future use. User should perform
> >> + *  read-modify-write operation on this field.
> >> + *
> >> + *  +--- _RESUMING
> >> + *  |+-- _SAVING
> >> + *  ||+- _RUNNING
> >> + *  |||
> >> + *  000b => Device Stopped, not saving or resuming
> >> + *  001b => Device running state, default state
> >> + *  010b => Stop Device & save device state, stop-and-copy state
> >> + *  011b => Device running and save device state, pre-copy state
> >> + *  100b => Device stopped and device state is resuming
> >> + *  101b => Invalid state  
> > 
> > Eventually this would be intended for post-copy, if supported by the
> > device, right?
> >   
> 
> No, as per Yan mentioned in earlier version, _RESUMING + _RUNNING can't 
> be used for post-copy. New flag will be required for post-copy.
> 
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg658768.html
> 
> >> + *  110b => Invalid state
> >> + *  111b => Invalid state
> >> + *
> >> + * State transitions:
> >> + *
> >> + *  _RESUMING  _RUNNINGPre-copyStop-and-copy   _STOP
> >> + *(100b) (001b) (011b)(010b)   (000b)
> >> + * 0. Running or Default state
> >> + * |
> >> + *
> >> + * 1. Normal Shutdown  
> > 
> > Optional, userspace is under no obligation.
> >   
> >> + * |->|
> >> + *
> >> + * 2. Save state or Suspend
> >> + * |->|-->|
> >> + *
> >> + * 3. Save state during live migration
> >> + * |--->|>|-->|
> >> + *
> >> + * 4. Resuming
> >> + *  |<-|
> >> + *
> >> + * 5. Resumed
> >> + *  |->|
> >> + *
> >> + * 0. Default state of VFIO device is _RUNNNG when VFIO application 
> >> starts.
>

1 2 3 >

1 - 100 of 258 matches

Mail list logo