date:20190711

[Qemu-devel] [PATCH for-4.1] Makefile: Fix "make install" when "make all" needs work

2019-07-11 Thread Markus Armbruster

Until recently, target install used to recurse into target directories
in its recipe: it ran make install in a for-loop.  Since target
install depends on target all, this trivially ensured we run the
sub-make install only after completing target all.

Commit 1338a4b "Makefile: Reuse all's recursion machinery for clean
and install" moved the target recursion to dependencies.  That's good
(the commit message explains why), but I forgot to add dependencies to
ensure make runs the sub-make install only after completing target
all.  Do that now.

Fixes: 1338a4b72659ce08eacb9de0205fe16202a22d9c
Reported-by: Mark Cave-Ayland 
Reported-by: Guenter Roeck 
Tested-by: Guenter Roeck 
Signed-off-by: Markus Armbruster 
---
 Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile b/Makefile
index 1fcbaed62c..09b77e8a7b 100644
--- a/Makefile
+++ b/Makefile
@@ -522,6 +522,7 @@ $(ROM_DIRS_RULES):
 recurse-all: $(addsuffix /all, $(TARGET_DIRS) $(ROM_DIRS))
 recurse-clean: $(addsuffix /clean, $(TARGET_DIRS) $(ROM_DIRS))
 recurse-install: $(addsuffix /install, $(TARGET_DIRS))
+$(addsuffix /install, $(TARGET_DIRS)): all
 
 $(BUILD_DIR)/version.o: $(SRC_PATH)/version.rc config-host.h
$(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ 
$<,"RC","version.o")
-- 
2.21.0

Re: [Qemu-devel] [PATCH v11 for-4.0 00/11] qemu_thread_create: propagate the error to callers to handle

2019-07-11 Thread Markus Armbruster

Did this get stuck?

Fei Li  writes:

> Hi,
>
> This idea comes from BiteSizedTasks, and this patch series implement
> the error checking of qemu_thread_create: make qemu_thread_create
> return a flag to indicate if it succeeded rather than failing with
> an error; make all callers check it.
>
> The first patch modifies the qemu_thread_create() by passing
> _abort and returing a value to indicate if it succeeds. The next
> 10 patches will improve on _abort for callers who could handle
> more properly.
>
> Please help to review, thanks a lot! 
[...]

Re: [Qemu-devel] [RISU PATCH v3 01/18] risugen_common: add helper functions insnv, randint

2019-07-11 Thread Richard Henderson

On 7/12/19 12:32 AM, Jan Bobek wrote:
> insnv allows emitting variable-length instructions in little-endian or
> big-endian byte order; it subsumes functionality of former insn16()
> and insn32() functions.
> 
> randint can reliably generate signed or unsigned integers of arbitrary
> width.
> 
> Signed-off-by: Jan Bobek 
> ---
>  risugen_common.pm | 55 +--
>  1 file changed, 48 insertions(+), 7 deletions(-)
> 
> diff --git a/risugen_common.pm b/risugen_common.pm
> index 71ee996..d63250a 100644
> --- a/risugen_common.pm
> +++ b/risugen_common.pm
> @@ -23,8 +23,9 @@ BEGIN {
>  require Exporter;
>  
>  our @ISA = qw(Exporter);
> -our @EXPORT = qw(open_bin close_bin set_endian insn32 insn16 $bytecount
> -   progress_start progress_update progress_end
> +our @EXPORT = qw(open_bin close_bin set_endian insn32 insn16
> +   $bytecount insnv randint progress_start
> +   progress_update progress_end
> eval_with_fields is_pow_of_2 sextract ctz
> dump_insn_details);
>  }
> @@ -37,7 +38,7 @@ my $bigendian = 0;
>  # (default is little endian, 0).
>  sub set_endian
>  {
> -$bigendian = @_;
> +($bigendian) = @_;
>  }
>  
>  sub open_bin
> @@ -52,18 +53,58 @@ sub close_bin
>  close(BIN) or die "can't close output file: $!";
>  }
>  
> +sub insnv(%)
> +{
> +my (%args) = @_;
> +
> +# Default to big-endian order, so that the instruction bytes are
> +# emitted in the same order as they are written in the
> +# configuration file.
> +$args{bigendian} = 1 unless defined $args{bigendian};
> +
> +for (my $bitcur = 0; $bitcur < $args{width}; $bitcur += 8) {
> +my $value = $args{value} >> ($args{bigendian}
> + ? $args{width} - $bitcur - 8
> + : $bitcur);
> +
> +print BIN pack("C", $value & 0xff);
> +$bytecount += 1;
> +}

Looks like bytecount is no longer used?

Otherwise,
Reviewed-by: Richard Henderson 


r~

[Qemu-devel] [PATCH v26 5/7] target/avr: Add limited support for USART and 16 bit timer peripherals

2019-07-11 Thread Michael Rolnik

From: Sarah Harris 

These were designed to facilitate testing but should provide enough function to 
be useful in other contexts.
Only a subset of the functions of each peripheral is implemented, mainly due to 
the lack of a standard way to handle electrical connections (like GPIO pins).

Signed-off-by: Michael Rolnik 
---
 hw/char/Kconfig|   3 +
 hw/char/Makefile.objs  |   1 +
 hw/char/avr_usart.c| 322 ++
 hw/misc/Kconfig|   3 +
 hw/misc/Makefile.objs  |   2 +
 hw/misc/avr_mask.c | 110 ++
 hw/timer/Kconfig   |   3 +
 hw/timer/Makefile.objs |   1 +
 hw/timer/avr_timer16.c | 603 +
 include/hw/char/avr_usart.h|  97 ++
 include/hw/misc/avr_mask.h |  47 +++
 include/hw/timer/avr_timer16.h |  97 ++
 12 files changed, 1289 insertions(+)
 create mode 100644 hw/char/avr_usart.c
 create mode 100644 hw/misc/avr_mask.c
 create mode 100644 hw/timer/avr_timer16.c
 create mode 100644 include/hw/char/avr_usart.h
 create mode 100644 include/hw/misc/avr_mask.h
 create mode 100644 include/hw/timer/avr_timer16.h

diff --git a/hw/char/Kconfig b/hw/char/Kconfig
index 40e7a8b8bb..331b20983f 100644
--- a/hw/char/Kconfig
+++ b/hw/char/Kconfig
@@ -46,3 +46,6 @@ config SCLPCONSOLE
 
 config TERMINAL3270
 bool
+
+config AVR_USART
+bool
diff --git a/hw/char/Makefile.objs b/hw/char/Makefile.objs
index 02d8a66925..09ed50f1d0 100644
--- a/hw/char/Makefile.objs
+++ b/hw/char/Makefile.objs
@@ -21,6 +21,7 @@ obj-$(CONFIG_PSERIES) += spapr_vty.o
 obj-$(CONFIG_DIGIC) += digic-uart.o
 obj-$(CONFIG_STM32F2XX_USART) += stm32f2xx_usart.o
 obj-$(CONFIG_RASPI) += bcm2835_aux.o
+obj-$(CONFIG_AVR_USART) += avr_usart.o
 
 common-obj-$(CONFIG_CMSDK_APB_UART) += cmsdk-apb-uart.o
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_ser.o
diff --git a/hw/char/avr_usart.c b/hw/char/avr_usart.c
new file mode 100644
index 00..21f533f6e6
--- /dev/null
+++ b/hw/char/avr_usart.c
@@ -0,0 +1,322 @@
+/*
+ * AVR USART
+ *
+ * Copyright (c) 2018 University of Kent
+ * Author: Sarah Harris
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/char/avr_usart.h"
+#include "qemu/log.h"
+
+static int avr_usart_can_receive(void *opaque)
+{
+AVRUsartState *usart = opaque;
+
+if (usart->data_valid || !(usart->csrb & USART_CSRB_RXEN)) {
+return 0;
+}
+return 1;
+}
+
+static void avr_usart_receive(void *opaque, const uint8_t *buffer, int size)
+{
+AVRUsartState *usart = opaque;
+assert(size == 1);
+assert(!usart->data_valid);
+usart->data = buffer[0];
+usart->data_valid = true;
+usart->csra |= USART_CSRA_RXC;
+if (usart->csrb & USART_CSRB_RXCIE) {
+qemu_set_irq(usart->rxc_irq, 1);
+}
+}
+
+static void update_char_mask(AVRUsartState *usart)
+{
+uint8_t mode = ((usart->csrc & USART_CSRC_CSZ0) ? 1 : 0) |
+((usart->csrc & USART_CSRC_CSZ1) ? 2 : 0) |
+((usart->csrb & USART_CSRB_CSZ2) ? 4 : 0);
+switch (mode) {
+case 0:
+usart->char_mask = 0b1;
+break;
+case 1:
+usart->char_mask = 0b11;
+break;
+case 2:
+usart->char_mask = 0b111;
+break;
+case 3:
+usart->char_mask = 0b;
+break;
+case 4:
+/* Fallthrough. */
+case 5:
+/* Fallthrough. */
+case 6:
+qemu_log_mask(
+LOG_GUEST_ERROR,
+"%s: Reserved character size 0x%x\n",
+__func__,
+mode);
+break;
+case 7:
+qemu_log_mask(
+LOG_GUEST_ERROR,
+"%s: Nine bit character size not supported (forcing eight)\n",
+__func__);
+usart->char_mask = 0b;
+break;
+default:
+assert(0);
+}
+}
+
+static void avr_usart_reset(DeviceState *dev)
+{
+

[Qemu-devel] [PATCH v26 7/7] target/avr: Register AVR support with the rest of QEMU, the build system, and the MAINTAINERS file

2019-07-11 Thread Michael Rolnik

Signed-off-by: Michael Rolnik 
---
 MAINTAINERS |  6 ++
 arch_init.c |  2 ++
 configure   |  7 +++
 default-configs/avr-softmmu.mak |  5 +
 include/disas/dis-asm.h |  6 ++
 include/sysemu/arch_init.h  |  1 +
 qapi/common.json|  3 ++-
 target/avr/Makefile.objs| 33 +
 tests/machine-none-test.c   |  1 +
 9 files changed, 63 insertions(+), 1 deletion(-)
 create mode 100644 default-configs/avr-softmmu.mak
 create mode 100644 target/avr/Makefile.objs

diff --git a/MAINTAINERS b/MAINTAINERS
index cc9636b43a..934ad5739b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -163,6 +163,12 @@ S: Maintained
 F: hw/arm/smmu*
 F: include/hw/arm/smmu*
 
+AVR TCG CPUs
+M: Michael Rolnik 
+S: Maintained
+F: target/avr/
+F: hw/avr/
+
 CRIS TCG CPUs
 M: Edgar E. Iglesias 
 S: Maintained
diff --git a/arch_init.c b/arch_init.c
index 74b0708634..413ad7acfd 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -85,6 +85,8 @@ int graphic_depth = 32;
 #define QEMU_ARCH QEMU_ARCH_UNICORE32
 #elif defined(TARGET_XTENSA)
 #define QEMU_ARCH QEMU_ARCH_XTENSA
+#elif defined(TARGET_AVR)
+#define QEMU_ARCH QEMU_ARCH_AVR
 #endif
 
 const uint32_t arch_type = QEMU_ARCH;
diff --git a/configure b/configure
index 4983c8b533..ab8ebba100 100755
--- a/configure
+++ b/configure
@@ -7503,6 +7503,10 @@ case "$target_name" in
 target_compiler=$cross_cc_aarch64
 eval "target_compiler_cflags=\$cross_cc_cflags_${target_name}"
   ;;
+  avr)
+   gdb_xml_files="avr-cpu.xml"
+target_compiler=$cross_cc_avr
+  ;;
   cris)
 target_compiler=$cross_cc_cris
   ;;
@@ -7780,6 +7784,9 @@ for i in $ARCH $TARGET_BASE_ARCH ; do
   disas_config "ARM_A64"
 fi
   ;;
+  avr)
+disas_config "AVR"
+  ;;
   cris)
 disas_config "CRIS"
   ;;
diff --git a/default-configs/avr-softmmu.mak b/default-configs/avr-softmmu.mak
new file mode 100644
index 00..d1e1c28118
--- /dev/null
+++ b/default-configs/avr-softmmu.mak
@@ -0,0 +1,5 @@
+# Default configuration for avr-softmmu
+
+# Boards:
+#
+CONFIG_AVR_SAMPLE=y
diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
index e9c7dd8eb4..8bedce17ac 100644
--- a/include/disas/dis-asm.h
+++ b/include/disas/dis-asm.h
@@ -211,6 +211,12 @@ enum bfd_architecture
 #define bfd_mach_m32r  0  /* backwards compatibility */
   bfd_arch_mn10200,/* Matsushita MN10200 */
   bfd_arch_mn10300,/* Matsushita MN10300 */
+  bfd_arch_avr,   /* Atmel AVR microcontrollers.  */
+#define bfd_mach_avr1  1
+#define bfd_mach_avr2  2
+#define bfd_mach_avr3  3
+#define bfd_mach_avr4  4
+#define bfd_mach_avr5  5
   bfd_arch_cris,   /* Axis CRIS */
 #define bfd_mach_cris_v0_v10   255
 #define bfd_mach_cris_v32  32
diff --git a/include/sysemu/arch_init.h b/include/sysemu/arch_init.h
index 10cbafe970..aff57bfe61 100644
--- a/include/sysemu/arch_init.h
+++ b/include/sysemu/arch_init.h
@@ -25,6 +25,7 @@ enum {
 QEMU_ARCH_NIOS2 = (1 << 17),
 QEMU_ARCH_HPPA = (1 << 18),
 QEMU_ARCH_RISCV = (1 << 19),
+QEMU_ARCH_AVR = (1 << 20),
 };
 
 extern const uint32_t arch_type;
diff --git a/qapi/common.json b/qapi/common.json
index 99d313ef3b..6866c3e81d 100644
--- a/qapi/common.json
+++ b/qapi/common.json
@@ -183,11 +183,12 @@
 #is true even for "qemu-system-x86_64".
 #
 # ppcemb: dropped in 3.1
+# avr: added in 4.1
 #
 # Since: 3.0
 ##
 { 'enum' : 'SysEmuTarget',
-  'data' : [ 'aarch64', 'alpha', 'arm', 'cris', 'hppa', 'i386', 'lm32',
+  'data' : [ 'aarch64', 'alpha', 'arm', 'avr', 'cris', 'hppa', 'i386', 'lm32',
  'm68k', 'microblaze', 'microblazeel', 'mips', 'mips64',
  'mips64el', 'mipsel', 'moxie', 'nios2', 'or1k', 'ppc',
  'ppc64', 'riscv32', 'riscv64', 's390x', 'sh4',
diff --git a/target/avr/Makefile.objs b/target/avr/Makefile.objs
new file mode 100644
index 00..2976affd95
--- /dev/null
+++ b/target/avr/Makefile.objs
@@ -0,0 +1,33 @@
+#
+#  QEMU AVR CPU
+#
+#  Copyright (c) 2019 Michael Rolnik
+#
+#  This library is free software; you can redistribute it and/or
+#  modify it under the terms of the GNU Lesser General Public
+#  License as published by the Free Software Foundation; either
+#  version 2.1 of the License, or (at your option) any later version.
+#
+#  This library is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  Lesser General Public License for more details.
+#
+#  You should have received a copy of the GNU Lesser General Public
+#  License along with this library; if not, see
+#  
+#
+
+DECODETREE = $(SRC_PATH)/scripts/decodetree.py
+decode-y = $(SRC_PATH)/target/avr/insn.decode
+
+target/avr/decode_insn.inc.c: $(decode-y) $(DECODETREE)
+   $(call quiet-command, \
+

[Qemu-devel] [PATCH v26 4/7] target/avr: Add instruction translation

2019-07-11 Thread Michael Rolnik

This includes:
- TCG translations for each instruction

Signed-off-by: Michael Rolnik 
---
 target/avr/translate.c | 2888 
 1 file changed, 2888 insertions(+)
 create mode 100644 target/avr/translate.c

diff --git a/target/avr/translate.c b/target/avr/translate.c
new file mode 100644
index 00..42cb4a690c
--- /dev/null
+++ b/target/avr/translate.c
@@ -0,0 +1,2888 @@
+/*
+ * QEMU AVR CPU
+ *
+ * Copyright (c) 2019 Michael Rolnik
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/qemu-print.h"
+#include "tcg/tcg.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "disas/disas.h"
+#include "tcg-op.h"
+#include "exec/cpu_ldst.h"
+#include "exec/helper-proto.h"
+#include "exec/helper-gen.h"
+#include "exec/log.h"
+#include "exec/gdbstub.h"
+#include "exec/translator.h"
+
+/*
+ *  Define if you want a BREAK instruction translated to a breakpoint
+ *  Active debugging connection is assumed
+ *  This is for
+ *  https://github.com/seharris/qemu-avr-tests/tree/master/instruction-tests
+ *  tests
+ */
+#undef BREAKPOINT_ON_BREAK
+
+static TCGv cpu_pc;
+
+static TCGv cpu_Cf;
+static TCGv cpu_Zf;
+static TCGv cpu_Nf;
+static TCGv cpu_Vf;
+static TCGv cpu_Sf;
+static TCGv cpu_Hf;
+static TCGv cpu_Tf;
+static TCGv cpu_If;
+
+static TCGv cpu_rampD;
+static TCGv cpu_rampX;
+static TCGv cpu_rampY;
+static TCGv cpu_rampZ;
+
+static TCGv cpu_r[NO_CPU_REGISTERS];
+static TCGv cpu_eind;
+static TCGv cpu_sp;
+
+static TCGv cpu_skip;
+
+static const char reg_names[NO_CPU_REGISTERS][8] = {
+"r0",  "r1",  "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
+"r8",  "r9",  "r10", "r11", "r12", "r13", "r14", "r15",
+"r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",
+"r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31",
+};
+#define REG(x) (cpu_r[x])
+
+enum {
+DISAS_EXIT   = DISAS_TARGET_0,  /* We want return to the cpu main loop.  */
+DISAS_LOOKUP = DISAS_TARGET_1,  /* We have a variable condition exit.  */
+DISAS_CHAIN  = DISAS_TARGET_2,  /* We have a single condition exit.  */
+};
+
+typedef struct DisasContext DisasContext;
+
+/* This is the state at translation time. */
+struct DisasContext {
+TranslationBlock *tb;
+
+CPUAVRState *env;
+CPUState *cs;
+
+target_long npc;
+uint32_t opcode;
+
+/* Routine used to access memory */
+int memidx;
+int bstate;
+int singlestep;
+
+TCGv skip_var0;
+TCGv skip_var1;
+TCGCond skip_cond;
+bool free_skip_var0;
+};
+
+static int to_A(DisasContext *ctx, int indx) { return 16 + (indx % 16); }
+static int to_B(DisasContext *ctx, int indx) { return 16 + (indx % 8); }
+static int to_C(DisasContext *ctx, int indx) { return 24 + (indx % 4) * 2; }
+static int to_D(DisasContext *ctx, int indx) { return (indx % 16) * 2; }
+
+static uint16_t next_word(DisasContext *ctx)
+{
+return cpu_lduw_code(ctx->env, ctx->npc++ * 2);
+}
+
+static int append_16(DisasContext *ctx, int x)
+{
+return x << 16 | next_word(ctx);
+}
+
+static bool decode_insn(DisasContext *ctx, uint16_t insn);
+#include "decode_insn.inc.c"
+
+static bool avr_have_feature(DisasContext *ctx, int feature)
+{
+if (!avr_feature(ctx->env, feature)) {
+gen_helper_unsupported(cpu_env);
+ctx->bstate = DISAS_NORETURN;
+return false;
+}
+return true;
+}
+
+static void gen_goto_tb(DisasContext *ctx, int n, target_ulong dest)
+{
+TranslationBlock *tb = ctx->tb;
+
+if (ctx->singlestep == 0) {
+tcg_gen_goto_tb(n);
+tcg_gen_movi_i32(cpu_pc, dest);
+tcg_gen_exit_tb(tb, n);
+} else {
+tcg_gen_movi_i32(cpu_pc, dest);
+gen_helper_debug(cpu_env);
+tcg_gen_exit_tb(NULL, 0);
+}
+ctx->bstate = DISAS_NORETURN;
+}
+
+#include "exec/gen-icount.h"
+
+static void gen_add_CHf(TCGv R, TCGv Rd, TCGv Rr)
+{
+TCGv t1 = tcg_temp_new_i32();
+TCGv t2 = tcg_temp_new_i32();
+TCGv t3 = tcg_temp_new_i32();
+
+tcg_gen_and_tl(t1, Rd, Rr); /* t1 = Rd & Rr */
+tcg_gen_andc_tl(t2, Rd, R); /* t2 = Rd & ~R */
+tcg_gen_andc_tl(t3, Rr, R); /* t3 = Rr & ~R */
+tcg_gen_or_tl(t1, t1, t2); /* t1 = t1 | t2 | t3 */
+tcg_gen_or_tl(t1, t1, t3);
+
+tcg_gen_shri_tl(cpu_Cf, t1, 7); /* Cf = t1(7) */
+tcg_gen_shri_tl(cpu_Hf, t1, 3); /* Hf =

[Qemu-devel] [PATCH v26 6/7] target/avr: Add example board configuration

2019-07-11 Thread Michael Rolnik

From: Sarah Harris 

A simple board setup that configures an AVR CPU to run a given firmware image.
This is all that's useful to implement without peripheral emulation as AVR CPUs 
include a lot of on-board peripherals.

Signed-off-by: Michael Rolnik 
---
 hw/Kconfig   |   1 +
 hw/avr/Kconfig   |   5 +
 hw/avr/Makefile.objs |   1 +
 hw/avr/sample.c  | 237 +++
 4 files changed, 244 insertions(+)
 create mode 100644 hw/avr/Kconfig
 create mode 100644 hw/avr/Makefile.objs
 create mode 100644 hw/avr/sample.c

diff --git a/hw/Kconfig b/hw/Kconfig
index 195f541e50..1f25636855 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -42,6 +42,7 @@ source watchdog/Kconfig
 # arch Kconfig
 source arm/Kconfig
 source alpha/Kconfig
+source avr/Kconfig
 source cris/Kconfig
 source hppa/Kconfig
 source i386/Kconfig
diff --git a/hw/avr/Kconfig b/hw/avr/Kconfig
new file mode 100644
index 00..dd02a4c37a
--- /dev/null
+++ b/hw/avr/Kconfig
@@ -0,0 +1,5 @@
+config AVR_SAMPLE
+bool
+select AVR_TIMER16
+select AVR_USART
+select AVR_MASK
diff --git a/hw/avr/Makefile.objs b/hw/avr/Makefile.objs
new file mode 100644
index 00..626b7064b3
--- /dev/null
+++ b/hw/avr/Makefile.objs
@@ -0,0 +1 @@
+obj-y += sample.o
diff --git a/hw/avr/sample.c b/hw/avr/sample.c
new file mode 100644
index 00..563edbd417
--- /dev/null
+++ b/hw/avr/sample.c
@@ -0,0 +1,237 @@
+/*
+ * QEMU AVR CPU
+ *
+ * Copyright (c) 2019 Michael Rolnik
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * 
+ */
+
+/*
+ *  NOTE:
+ *  This is not a real AVR board, this is an example!
+ *  The CPU is an approximation of an ATmega2560, but is missing various
+ *  built-in peripherals.
+ *
+ *  This example board loads provided binary file into flash memory and
+ *  executes it from 0x address in the code memory space.
+ *
+ *  Currently used for AVR CPU validation
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "hw/hw.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/qtest.h"
+#include "ui/console.h"
+#include "hw/boards.h"
+#include "hw/loader.h"
+#include "qemu/error-report.h"
+#include "exec/address-spaces.h"
+#include "include/hw/sysbus.h"
+#include "include/hw/char/avr_usart.h"
+#include "include/hw/timer/avr_timer16.h"
+#include "include/hw/misc/avr_mask.h"
+#include "elf.h"
+
+#define SIZE_FLASH 0x0004
+#define SIZE_SRAM 0x2200
+/*
+ * Size of additional "external" memory, as if the AVR were configured to use
+ * an external RAM chip.
+ * Note that the configuration registers that normally enable this feature are
+ * unimplemented.
+ */
+#define SIZE_EXMEM 0x
+
+/* Offsets of periphals in emulated memory space (i.e. not host addresses)  */
+#define PRR0_BASE 0x64
+#define PRR1_BASE 0x65
+#define USART_BASE 0xc0
+#define TIMER1_BASE 0x80
+#define TIMER1_IMSK_BASE 0x6f
+#define TIMER1_IFR_BASE 0x36
+
+/* Interrupt numbers used by peripherals */
+#define USART_RXC_IRQ 24
+#define USART_DRE_IRQ 25
+#define USART_TXC_IRQ 26
+
+#define TIMER1_CAPT_IRQ 15
+#define TIMER1_COMPA_IRQ 16
+#define TIMER1_COMPB_IRQ 17
+#define TIMER1_COMPC_IRQ 18
+#define TIMER1_OVF_IRQ 19
+
+/*  Power reduction */
+#define PRR1_BIT_PRTIM5 0x05/*  Timer/Counter5  */
+#define PRR1_BIT_PRTIM4 0x04/*  Timer/Counter4  */
+#define PRR1_BIT_PRTIM3 0x03/*  Timer/Counter3  */
+#define PRR1_BIT_PRUSART3   0x02/*  USART3  */
+#define PRR1_BIT_PRUSART2   0x01/*  USART2  */
+#define PRR1_BIT_PRUSART1   0x00/*  USART1  */
+
+#define PRR0_BIT_PRTWI  0x06/*  TWI */
+#define PRR0_BIT_PRTIM2 0x05/*  Timer/Counter2  */
+#define PRR0_BIT_PRTIM0 0x04/*  Timer/Counter0  */
+#define PRR0_BIT_PRTIM1 0x03/*  Timer/Counter1  */
+#define PRR0_BIT_PRSPI  0x02/*  Serial Peripheral Interface */
+#define PRR0_BIT_PRUSART0   0x01/*  USART0  */
+#define PRR0_BIT_PRADC  0x00/*  ADC */
+
+typedef struct {
+MachineClass parent;
+} SampleMachineClass;
+
+typedef struct {
+MachineState parent;
+MemoryRegion *ram;
+MemoryRegion *flash;
+AVRUsartState *usart0;
+AVRTimer16State *timer1;
+AVRMaskState *prr[2];
+} SampleMachineState;
+
+#define TYPE_SAMPLE_MACHINE MACHINE_TYPE_NAME("sample")
+

[Qemu-devel] [PATCH v26 1/7] target/avr: Add outward facing interfaces and core CPU logic

2019-07-11 Thread Michael Rolnik

From: Sarah Harris 

This includes:
- CPU data structures
- object model classes and functions
- migration functions
- GDB hooks

Signed-off-by: Michael Rolnik 
---
 gdb-xml/avr-cpu.xml|  49 
 target/avr/cpu-param.h |  37 +++
 target/avr/cpu.c   | 579 +
 target/avr/cpu.h   | 280 
 target/avr/gdbstub.c   |  85 ++
 target/avr/machine.c   | 123 +
 6 files changed, 1153 insertions(+)
 create mode 100644 gdb-xml/avr-cpu.xml
 create mode 100644 target/avr/cpu-param.h
 create mode 100644 target/avr/cpu.c
 create mode 100644 target/avr/cpu.h
 create mode 100644 target/avr/gdbstub.c
 create mode 100644 target/avr/machine.c

diff --git a/gdb-xml/avr-cpu.xml b/gdb-xml/avr-cpu.xml
new file mode 100644
index 00..c4747f5b40
--- /dev/null
+++ b/gdb-xml/avr-cpu.xml
@@ -0,0 +1,49 @@
+
+
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/target/avr/cpu-param.h b/target/avr/cpu-param.h
new file mode 100644
index 00..ccd1ea3429
--- /dev/null
+++ b/target/avr/cpu-param.h
@@ -0,0 +1,37 @@
+/*
+ * QEMU AVR CPU
+ *
+ * Copyright (c) 2019 Michael Rolnik
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * 
+ */
+
+#ifndef AVR_CPU_PARAM_H
+#define AVR_CPU_PARAM_H 1
+
+#define TARGET_LONG_BITS 32
+/*
+ * TARGET_PAGE_BITS cannot be more than 8 bits because
+ * 1.  all IO registers occupy [0x .. 0x00ff] address range, and they
+ * should be implemented as a device and not memory
+ * 2.  SRAM starts at the address 0x0100
+ */
+#define TARGET_PAGE_BITS 8
+#define TARGET_PHYS_ADDR_SPACE_BITS 24
+#define TARGET_VIRT_ADDR_SPACE_BITS 24
+#define NB_MMU_MODES 2
+
+
+#endif
diff --git a/target/avr/cpu.c b/target/avr/cpu.c
new file mode 100644
index 00..c474526925
--- /dev/null
+++ b/target/avr/cpu.c
@@ -0,0 +1,579 @@
+/*
+ * QEMU AVR CPU
+ *
+ * Copyright (c) 2019 Michael Rolnik
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/qemu-print.h"
+#include "qemu/log.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "migration/vmstate.h"
+
+static void avr_cpu_set_pc(CPUState *cs, vaddr value)
+{
+AVRCPU *cpu = AVR_CPU(cs);
+
+cpu->env.pc_w = value / 2; /* internally PC points to words */
+}
+
+static bool avr_cpu_has_work(CPUState *cs)
+{
+AVRCPU *cpu = AVR_CPU(cs);
+CPUAVRState *env = >env;
+
+return (cs->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_RESET))
+&& cpu_interrupts_enabled(env);
+}
+
+static void avr_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+{
+AVRCPU *cpu = AVR_CPU(cs);
+CPUAVRState *env = >env;
+
+env->pc_w = tb->pc / 2; /* internally PC points to words */
+}
+
+static void avr_cpu_reset(CPUState *cs)
+{
+AVRCPU *cpu = AVR_CPU(cs);
+AVRCPUClass *mcc = AVR_CPU_GET_CLASS(cpu);
+CPUAVRState *env = >env;
+
+mcc->parent_reset(cs);
+
+env->pc_w = 0;
+env->sregI = 1;
+env->sregC = 0;
+env->sregZ = 0;
+env->sregN = 0;
+env->sregV = 0;
+env->sregS = 0;
+env->sregH = 0;
+env->sregT = 0;
+
+env->rampD = 0;
+env->rampX = 0;
+env->rampY = 0;
+env->rampZ = 0;
+env->eind = 0;
+env->sp = 0;
+
+env->skip = 0;
+
+memset(env->r, 0, sizeof(env->r));
+
+tlb_flush(cs);
+}
+
+static void avr_cpu_disas_set_info(CPUState *cpu, disassemble_info *info)
+{
+info->mach = bfd_arch_avr;
+info->print_insn = NULL;
+}
+
+static void avr_cpu_realizefn(DeviceState *dev, Error **errp)
+{
+

[Qemu-devel] [PATCH v26 3/7] target/avr: Add instruction decoding

2019-07-11 Thread Michael Rolnik

This includes:
- encoding of all 16 bit instructions
- encoding of all 32 bit instructions

Signed-off-by: Michael Rolnik 
---
 target/avr/insn.decode | 175 +
 1 file changed, 175 insertions(+)
 create mode 100644 target/avr/insn.decode

diff --git a/target/avr/insn.decode b/target/avr/insn.decode
new file mode 100644
index 00..6b387762c6
--- /dev/null
+++ b/target/avr/insn.decode
@@ -0,0 +1,175 @@
+#
+#   A = [16 .. 31]
+#   B = [16 .. 23]
+#   C = [24, 26, 28, 30]
+#   D = [0, 2, 4, 6, 8, .. 30]
+
+%rd 4:5
+%rr 9:1 0:4
+
+_rr  rd rr
+_imm rd imm
+
+@op_rd_rr    .. . . _rr  rd=%rd rr=%rr
+ADD  11 . . @op_rd_rr
+ADC 0001 11 . . @op_rd_rr
+AND 0010 00 . . @op_rd_rr
+CP  0001 01 . . @op_rd_rr
+CPC  01 . . @op_rd_rr
+CPSE0001 00 . . @op_rd_rr
+EOR 0010 01 . . @op_rd_rr
+MOV 0010 11 . . @op_rd_rr
+MUL 1001 11 . . @op_rd_rr
+OR  0010 10 . . @op_rd_rr
+SBC  10 . . @op_rd_rr
+SUB 0001 10 . . @op_rd_rr
+
+
+%rd_c   4:2 !function=to_C
+%imm6   6:2 0:4
+
+@op_rd_imm6   .. .. _imm rd=%rd_c imm=%imm6
+ADIW1001 0110 .. .. @op_rd_imm6
+SBIW1001 0111 .. .. @op_rd_imm6
+
+
+%rd_a   4:4 !function=to_A
+%rr_a   0:4 !function=to_A
+%rd_d   4:4 !function=to_D
+%rr_d   0:4 !function=to_D
+%imm8   8:4 0:4
+
+@op_rd_imm8     _imm rd=%rd_a imm=%imm8
+ANDI0111    @op_rd_imm8
+CPI 0011    @op_rd_imm8
+LDI 1110    @op_rd_imm8
+ORI 0110    @op_rd_imm8
+SBCI0100    @op_rd_imm8
+SUBI0101    @op_rd_imm8
+
+
+@op_rd   ... rd:5 
+ASR 1001 010 . 0101 @op_rd
+COM 1001 010 .  @op_rd
+DEC 1001 010 . 1010 @op_rd
+ELPM2   1001 000 . 0110 @op_rd
+ELPMX   1001 000 . 0111 @op_rd
+INC 1001 010 . 0011 @op_rd
+LDX11001 000 . 1100 @op_rd
+LDX21001 000 . 1101 @op_rd
+LDX31001 000 . 1110 @op_rd
+LDY21001 000 . 1001 @op_rd
+LDY31001 000 . 1010 @op_rd
+LDZ21001 000 . 0001 @op_rd
+LDZ31001 000 . 0010 @op_rd
+LPM21001 000 . 0100 @op_rd
+LPMX1001 000 . 0101 @op_rd
+LSR 1001 010 . 0110 @op_rd
+NEG 1001 010 . 0001 @op_rd
+POP 1001 000 .  @op_rd
+PUSH1001 001 .  @op_rd
+ROR 1001 010 . 0111 @op_rd
+STY21001 001 . 1001 @op_rd
+STY31001 001 . 1010 @op_rd
+STZ21001 001 . 0001 @op_rd
+STZ31001 001 . 0010 @op_rd
+SWAP1001 010 . 0010 @op_rd
+
+
+@op_bit   . bit:3 
+BCLR1001 0100 1 ... 1000@op_bit
+BSET1001 0100 0 ... 1000@op_bit
+
+
+@op_rd_bit   ... rd:5 . bit:3
+BLD  100 . 0 ...@op_rd_bit
+BST  101 . 0 ...@op_rd_bit
+
+
+@op_bit_imm  .. imm:s7 bit:3
+BRBC 01 ... ... @op_bit_imm
+BRBS 00 ... ... @op_bit_imm
+
+
+BREAK   1001 0101 1001 1000
+EICALL  1001 0101 0001 1001
+EIJMP   1001 0100 0001 1001
+ELPM1   1001 0101 1101 1000
+ICALL   1001 0101  1001
+IJMP1001 0100  1001
+LPM11001 0101 1100 1000
+NOP    
+RET 1001 0101  1000
+RETI1001 0101 0001 1000
+SLEEP   1001 0101 1000 1000
+SPM 1001 0101 1110 1000
+SPMX1001 0101  1000
+WDR 1001 0101 1010 1000
+
+
+@op_reg_bit   reg:5 bit:3
+CBI 1001 1000 . ... @op_reg_bit
+SBI 1001 1010 . ... @op_reg_bit
+SBIC1001 1001 . ... @op_reg_bit
+SBIS1001 1011 . ... @op_reg_bit
+
+
+DES 1001 0100 imm:4 1011
+
+
+%rd_b   4:3

[Qemu-devel] [PATCH v26 2/7] target/avr: Add instruction helpers

2019-07-11 Thread Michael Rolnik

From: Sarah Harris 

Stubs for unimplemented instructions and helpers for instructions that need to 
interact with QEMU.
SPM and WDR are unimplemented because they require emulation of complex 
peripherals.
The implementation of SLEEP is very limited due to the lack of peripherals to 
generate wake interrupts.
Memory access instructions are implemented here because some address ranges 
actually refer to CPU registers.

Signed-off-by: Michael Rolnik 
---
 target/avr/helper.c | 354 
 target/avr/helper.h |  29 
 2 files changed, 383 insertions(+)
 create mode 100644 target/avr/helper.c
 create mode 100644 target/avr/helper.h

diff --git a/target/avr/helper.c b/target/avr/helper.c
new file mode 100644
index 00..f0f0d4f15a
--- /dev/null
+++ b/target/avr/helper.c
@@ -0,0 +1,354 @@
+/*
+ * QEMU AVR CPU
+ *
+ * Copyright (c) 2019 Michael Rolnik
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * 
+ */
+
+#include "qemu/osdep.h"
+
+#include "cpu.h"
+#include "hw/irq.h"
+#include "hw/sysbus.h"
+#include "sysemu/sysemu.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+#include "exec/helper-proto.h"
+#include "exec/ioport.h"
+#include "qemu/host-utils.h"
+#include "qemu/error-report.h"
+
+bool avr_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
+{
+bool ret = false;
+CPUClass *cc = CPU_GET_CLASS(cs);
+AVRCPU *cpu = AVR_CPU(cs);
+CPUAVRState *env = >env;
+
+if (interrupt_request & CPU_INTERRUPT_RESET) {
+if (cpu_interrupts_enabled(env)) {
+cs->exception_index = EXCP_RESET;
+cc->do_interrupt(cs);
+
+cs->interrupt_request &= ~CPU_INTERRUPT_RESET;
+
+ret = true;
+}
+}
+if (interrupt_request & CPU_INTERRUPT_HARD) {
+if (cpu_interrupts_enabled(env) && env->intsrc != 0) {
+int index = ctz32(env->intsrc);
+cs->exception_index = EXCP_INT(index);
+cc->do_interrupt(cs);
+
+env->intsrc &= env->intsrc - 1; /* clear the interrupt */
+cs->interrupt_request &= ~CPU_INTERRUPT_HARD;
+
+ret = true;
+}
+}
+return ret;
+}
+
+void avr_cpu_do_interrupt(CPUState *cs)
+{
+AVRCPU *cpu = AVR_CPU(cs);
+CPUAVRState *env = >env;
+
+uint32_t ret = env->pc_w;
+int vector = 0;
+int size = avr_feature(env, AVR_FEATURE_JMP_CALL) ? 2 : 1;
+int base = 0;
+
+if (cs->exception_index == EXCP_RESET) {
+vector = 0;
+} else if (env->intsrc != 0) {
+vector = ctz32(env->intsrc) + 1;
+}
+
+if (avr_feature(env, AVR_FEATURE_3_BYTE_PC)) {
+cpu_stb_data(env, env->sp--, (ret & 0xff));
+cpu_stb_data(env, env->sp--, (ret & 0x00ff00) >> 8);
+cpu_stb_data(env, env->sp--, (ret & 0xff) >> 16);
+} else if (avr_feature(env, AVR_FEATURE_2_BYTE_PC)) {
+cpu_stb_data(env, env->sp--, (ret & 0xff));
+cpu_stb_data(env, env->sp--, (ret & 0x00ff00) >> 8);
+} else {
+cpu_stb_data(env, env->sp--, (ret & 0xff));
+}
+
+env->pc_w = base + vector * size;
+env->sregI = 0; /* clear Global Interrupt Flag */
+
+cs->exception_index = -1;
+}
+
+int avr_cpu_memory_rw_debug(CPUState *cs, vaddr addr, uint8_t *buf,
+int len, bool is_write)
+{
+return cpu_memory_rw_debug(cs, addr, buf, len, is_write);
+}
+
+hwaddr avr_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
+{
+return addr; /* I assume 1:1 address correspondance */
+}
+
+int avr_cpu_handle_mmu_fault(
+CPUState *cs, vaddr address, int size, int rw, int mmu_idx)
+{
+/* currently it's assumed that this will never happen */
+cs->exception_index = EXCP_DEBUG;
+cpu_dump_state(cs, stderr, 0);
+return 1;
+}
+
+bool avr_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
+MMUAccessType access_type, int mmu_idx,
+bool probe, uintptr_t retaddr)
+{
+int prot = 0;
+MemTxAttrs attrs = {};
+uint32_t paddr;
+
+address &= TARGET_PAGE_MASK;
+
+if (mmu_idx == MMU_CODE_IDX) {
+/* access to code in flash */
+paddr = OFFSET_CODE + address;
+prot = PAGE_READ | PAGE_EXEC;
+if (paddr + TARGET_PAGE_SIZE > OFFSET_DATA) {
+error_report("execution left flash memory");

[Qemu-devel] [PATCH v26 0/7] QEMU AVR 8 bit cores

2019-07-11 Thread Michael Rolnik

This series of patches adds 8bit AVR cores to QEMU.
All instruction, except BREAK/DES/SPM/SPMX, are implemented. Not fully tested 
yet.
However I was able to execute simple code with functions. e.g fibonacci 
calculation.
This series of patches include a non real, sample board.
No fuses support yet. PC is set to 0 at reset.

the patches include the following
1. just a basic 8bit AVR CPU, without instruction decoding or translation
2. CPU features which allow define the following 8bit AVR cores
 avr1
 avr2 avr25
 avr3 avr31 avr35
 avr4
 avr5 avr51
 avr6
 xmega2 xmega4 xmega5 xmega6 xmega7
3. a definition of sample machine with SRAM, FLASH and CPU which allows to 
execute simple code
4. encoding for all AVR instructions
5. interrupt handling
6. helpers for IN, OUT, SLEEP, WBR & unsupported instructions
7. a decoder which given an opcode decides what istruction it is
8. translation of AVR instruction into TCG
9. all features together

changes since v3
1. rampD/X/Y/Z registers are encoded as 0x00ff (instead of 0x00ff) for 
faster address manipulaton
2. ffs changed to ctz32
3. duplicate code removed at avr_cpu_do_interrupt
4. using andc instead of not + and
5. fixing V flag calculation in varios instructions
6. freeing local variables in PUSH
7. tcg_const_local_i32 -> tcg_const_i32
8. using sextract32 instead of my implementation
9. fixing BLD instruction
10.xor(r) instead of 0xff - r at COM
11.fixing MULS/MULSU not to modify inputs' content
12.using SUB for NEG
13.fixing tcg_gen_qemu_ld/st call in XCH

changes since v4
1. target is now defined as big endian in order to optimize push_ret/pop_ret
2. all style warnings are fixed
3. adding cpu_set/get_sreg functions
4. simplifying gen_goto_tb as there is no real paging
5. env->pc -> env->pc_w
6. making flag dump more compact
7. more spacing
8. renaming CODE/DATA_INDEX -> MMU_CODE/DATA_IDX
9. removing avr_set_feature
10. SPL/SPH set bug fix
11. switching stb_phys to cpu_stb_data
12. cleaning up avr_decode
13. saving sreg, rampD/X/Y/Z, eind in HW format (savevm)
14. saving CPU features (savevm)

changes since v5
1. BLD bug fix
2. decoder generator is added

chages since v6
1. using cpu_get_sreg/cpu_set_sreg in 
avr_cpu_gdb_read_register/avr_cpu_gdb_write_register
2. configure the target as little endian because otherwise GDB does not work
3. fixing and testing gen_push_ret/gen_pop_ret

changes since v7
1. folding back v6
2. logging at helper_outb and helper_inb are done for non supported yet 
registers only
3. MAINTAINERS updated

changes since v8
1. removing hw/avr from hw/Makefile.obj as it should not be built for all
2. making linux compilable
3. testing on
a. Mac, Apple LLVM version 7.0.0
b. Ubuntu 12.04, gcc 4.9.2
c. Fedora 23, gcc 5.3.1
4. folding back some patches
5. translation bug fixes for ORI, CPI, XOR instructions
6. propper handling of cpu register writes though memory

changes since v9
1. removing forward declarations of static functions
2. disabling debug prints
3. switching to case range instead of if else if ...
4. LD/ST IN/OUT accessing CPU maintainder registers are not routed to any device
5. commenst about sample board and sample IO device added
6. sample board description is more descriptive now
7. memory_region_allocate_system_memory is used to create RAM
8. now there are helper_fullrd & helper_fullwr when LD/ST try to access 
registers

changes since v10
1. movig back fullwr & fullrd into the commit where outb and inb were introduced
2. changing tlb_fill function signature
3. adding empty line between functions
4. adding newline on the last line of the file
5. using tb->flags to generae full access ST/LD instructions
6. fixing SBRC bug
7. folding back 10th commit
8. whenever a new file is introduced it's added to Makefile.objs

changes since v11
1. updating to v2.7.0-rc
2. removing assignment to env->fullacc from gen_intermediate_code

changes since v12
1. fixing spacing
2. fixing get/put_segment functions
3. removing target-avr/machine.h file
4. VMSTATE_SINGLE_TEST -> VMSTATE_SINGLE
5. comment spelling
6. removing hw/avr/sample_io.c
7. char const* -> const char*
8. proper ram allocation
9. fixing breakpoint functionality.
10.env1 -> env
11.fixing avr_cpu_gdb_write_register & avr_cpu_gdb_read_register functions
12.any cpu is removed
12.feature bits are not saved into vm state

changes since v13
1. rebasing to v2.7.0-rc1

changes since v14
1. I made self review with git gui tool. (I did not know such a thing exists)
2. removing all double/tripple spaces
3. removing comment reference to SampleIO
4. folding back some changes, so there is not deleted lines in my code
5. moving avr configuration, within configure file, before chris

changes since v15
1. removing IO registers cache from CPU
2. implementing CBI/SBI as read(helper_inb), modify, write(helper_outb)
3. implementing CBIC/SBIC as read(helper_inb), check, branch
4. adding missing tcg_temp_free_i32 for tcg_const_i32

changes since v16
1.

[Qemu-devel] [PATCH] migration: check length directly to make sure the range is aligned

2019-07-11 Thread Wei Yang

Since the start addr is already checked, to make sure the range is
aligned, checking the length is enough.

Signed-off-by: Wei Yang 
---
 exec.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index 50ea9c5aaa..8fa980baae 100644
--- a/exec.c
+++ b/exec.c
@@ -4067,10 +4067,9 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t 
start, size_t length)
 
 if ((start + length) <= rb->used_length) {
 bool need_madvise, need_fallocate;
-uint8_t *host_endaddr = host_startaddr + length;
-if ((uintptr_t)host_endaddr & (rb->page_size - 1)) {
-error_report("ram_block_discard_range: Unaligned end address: %p",
- host_endaddr);
+if (length & (rb->page_size - 1)) {
+error_report("ram_block_discard_range: Unaligned length: %lx",
+ length);
 goto err;
 }
 
-- 
2.17.1

[Qemu-devel] [PATCH 2/2] spapr: initial implementation for H_TPM_COMM hcall

2019-07-11 Thread Michael Roth

This implements the H_TPM_COMM hypercall, which is used by an
Ultravisor to pass TPM commands directly to the host's TPM device, or
a TPM Resource Manager associated with the device.

This also introduces a new pseries machine option which is used to
configure what TPM device to pass commands to, for example:

  -machine pseries,...,tpm-device-file=/dev/tmprm0

By default, no tpm-device-file is defined and hcalls will return
H_RESOURCE.

The full specification for this hypercall can be found in
docs/specs/ppc-spapr-uv-hcalls.txt

Signed-off-by: Michael Roth tpm_device_file) {
+spapr_hcall_tpm_reset();
+}
+
 spapr_clear_pending_events(spapr);
 
 /*
@@ -3340,6 +3344,21 @@ static void spapr_set_host_serial(Object *obj, const 
char *value, Error **errp)
 spapr->host_serial = g_strdup(value);
 }
 
+static char *spapr_get_tpm_device_file(Object *obj, Error **errp)
+{
+SpaprMachineState *spapr = SPAPR_MACHINE(obj);
+
+return g_strdup(spapr->tpm_device_file);
+}
+
+static void spapr_set_tpm_device_file(Object *obj, const char *value, Error 
**errp)
+{
+SpaprMachineState *spapr = SPAPR_MACHINE(obj);
+
+g_free(spapr->tpm_device_file);
+spapr->tpm_device_file = g_strdup(value);
+}
+
 static void spapr_instance_init(Object *obj)
 {
 SpaprMachineState *spapr = SPAPR_MACHINE(obj);
@@ -3396,6 +3415,14 @@ static void spapr_instance_init(Object *obj)
 _abort);
 object_property_set_description(obj, "host-serial",
 "Host serial number to advertise in guest device tree", _abort);
+object_property_add_str(obj, "tpm-device-file",
+spapr_get_tpm_device_file,
+spapr_set_tpm_device_file, _abort);
+object_property_set_description(obj, "tpm-device-file",
+ "Specifies the path to the TPM character device file to use"
+ " for TPM communication via hypercalls (usually a TPM"
+ " resource manager)",
+ _abort);
 }
 
 static void spapr_machine_finalizefn(Object *obj)
diff --git a/hw/ppc/spapr_hcall_tpm.c b/hw/ppc/spapr_hcall_tpm.c
new file mode 100644
index 00..75e2b6d594
--- /dev/null
+++ b/hw/ppc/spapr_hcall_tpm.c
@@ -0,0 +1,135 @@
+/*
+ * SPAPR TPM Hypercall
+ *
+ * Copyright IBM Corp. 2019
+ *
+ * Authors:
+ *  Michael Roth  
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "cpu.h"
+#include "hw/ppc/spapr.h"
+#include "trace.h"
+
+#define TPM_SPAPR_BUFSIZE 4096
+
+enum {
+TPM_COMM_OP_EXECUTE = 1,
+TPM_COMM_OP_CLOSE_SESSION = 2,
+};
+
+static int tpm_devfd = -1;
+
+static ssize_t tpm_execute(SpaprMachineState *spapr, target_ulong *args)
+{
+uint64_t data_in = ppc64_phys_to_real(args[1]);
+target_ulong data_in_size = args[2];
+uint64_t data_out = ppc64_phys_to_real(args[3]);
+target_ulong data_out_size = args[4];
+uint8_t buf_in[TPM_SPAPR_BUFSIZE];
+uint8_t buf_out[TPM_SPAPR_BUFSIZE];
+ssize_t ret;
+
+trace_spapr_tpm_execute(data_in, data_in_size, data_out, data_out_size);
+
+if (data_in_size > TPM_SPAPR_BUFSIZE) {
+error_report("invalid TPM input buffer size: " TARGET_FMT_lu "\n",
+ data_in_size);
+return H_P3;
+}
+
+if (data_out_size < TPM_SPAPR_BUFSIZE) {
+error_report("invalid TPM output buffer size: " TARGET_FMT_lu "\n",
+ data_out_size);
+return H_P5;
+}
+
+if (tpm_devfd == -1) {
+tpm_devfd = open(spapr->tpm_device_file, O_RDWR);
+if (tpm_devfd == -1) {
+error_report("failed to open TPM device %s: %d",
+ spapr->tpm_device_file, errno);
+return H_RESOURCE;
+}
+}
+
+cpu_physical_memory_read(data_in, buf_in, data_in_size);
+
+do {
+ret = write(tpm_devfd, buf_in, data_in_size);
+if (ret > 0) {
+data_in_size -= ret;
+}
+} while ((ret >= 0 && data_in_size > 0) || (ret == -1 && errno == EINTR));
+
+if (ret == -1) {
+error_report("failed to write to TPM device %s: %d",
+ spapr->tpm_device_file, errno);
+return H_RESOURCE;
+}
+
+do {
+ret = read(tpm_devfd, buf_out, data_out_size);
+} while (ret == 0 || (ret == -1 && errno == EINTR));
+
+if (ret == -1) {
+error_report("failed to read from TPM device %s: %d",
+ spapr->tpm_device_file, errno);
+return H_RESOURCE;
+}
+
+cpu_physical_memory_write(data_out, buf_out, ret);
+args[0] = ret;
+
+return H_SUCCESS;
+}
+
+static target_ulong h_tpm_comm(PowerPCCPU *cpu,
+   SpaprMachineState *spapr,
+   target_ulong opcode,
+   target_ulong

Re: [Qemu-devel] [PATCH v7 10/13] vfio: Add load state functions to SaveVMHandlers

2019-07-11 Thread Yan Zhao

On Tue, Jul 09, 2019 at 05:49:17PM +0800, Kirti Wankhede wrote:
> Flow during _RESUMING device state:
> - If Vendor driver defines mappable region, mmap migration region.
> - Load config state.
> - For data packet, till VFIO_MIG_FLAG_END_OF_STATE is not reached
> - read data_size from packet, read buffer of data_size
> - read data_offset from where QEMU should write data.
> if region is mmaped, write data of data_size to mmaped region.
> - write data_size.
> In case of mmapped region, write to data_size indicates kernel
> driver that data is written in staging buffer.
> - if region is trapped, pwrite() data of data_size from data_offset.
> - Repeat above until VFIO_MIG_FLAG_END_OF_STATE.
> - Unmap migration region.
> 
> For user, data is opaque. User should write data in the same order as
> received.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/migration.c  | 162 
> +++
>  hw/vfio/trace-events |   3 +
>  2 files changed, 165 insertions(+)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 4e9b4cce230b..5fb4c5329ede 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -249,6 +249,26 @@ static int vfio_save_device_config_state(QEMUFile *f, 
> void *opaque)
>  return qemu_file_get_error(f);
>  }
>  
> +static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +uint64_t data;
> +
> +if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
> +vbasedev->ops->vfio_load_config(vbasedev, f);
> +}
> +
> +data = qemu_get_be64(f);
> +if (data != VFIO_MIG_FLAG_END_OF_STATE) {
> +error_report("%s: Failed loading device config space, "
> + "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
> +return -EINVAL;
> +}
> +
> +trace_vfio_load_device_config_state(vbasedev->name);
> +return qemu_file_get_error(f);
> +}
> +
>  /* -- */
>  
>  static int vfio_save_setup(QEMUFile *f, void *opaque)
> @@ -421,12 +441,154 @@ static int vfio_save_complete_precopy(QEMUFile *f, 
> void *opaque)
>  return ret;
>  }
>  
> +static int vfio_load_setup(QEMUFile *f, void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +int ret = 0;
> +
> +if (migration->region.buffer.mmaps) {
> +ret = vfio_region_mmap(>region.buffer);
> +if (ret) {
> +error_report("%s: Failed to mmap VFIO migration region %d: %s",
> + vbasedev->name, migration->region.index,
> + strerror(-ret));
> +return ret;
> +}
> +}
> +
> +ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING);
> +if (ret) {
> +error_report("%s: Failed to set state RESUMING", vbasedev->name);
> +}
> +return ret;
> +}
> +
> +static int vfio_load_cleanup(void *opaque)
> +{
> +vfio_save_cleanup(opaque);
> +return 0;
> +}
> +
> +static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +int ret = 0;
> +uint64_t data, data_size;
> +
I think checking of version_id is still needed.

Thanks
Yan

> +data = qemu_get_be64(f);
> +while (data != VFIO_MIG_FLAG_END_OF_STATE) {
> +
> +trace_vfio_load_state(vbasedev->name, data);
> +
> +switch (data) {
> +case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
> +{
> +ret = vfio_load_device_config_state(f, opaque);
> +if (ret) {
> +return ret;
> +}
> +break;
> +}
> +case VFIO_MIG_FLAG_DEV_SETUP_STATE:
> +{
> +data = qemu_get_be64(f);
> +if (data == VFIO_MIG_FLAG_END_OF_STATE) {
> +return ret;
> +} else {
> +error_report("%s: SETUP STATE: EOS not found 0x%"PRIx64,
> + vbasedev->name, data);
> +return -EINVAL;
> +}
> +break;
> +}
> +case VFIO_MIG_FLAG_DEV_DATA_STATE:
> +{
> +VFIORegion *region = >region.buffer;
> +void *buf = NULL;
> +bool buffer_mmaped = false;
> +uint64_t data_offset = 0;
> +
> +data_size = qemu_get_be64(f);
> +if (data_size == 0) {
> +break;
> +}
> +
> +ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
> +region->fd_offset +
> +offsetof(struct vfio_device_migration_info,
> +data_offset));
> +if (ret != sizeof(data_offset)) {
> +error_report("%s:Failed to get migration buffer data offset 
> %d",

Re: [Qemu-devel] [PATCH v7 09/13] vfio: Add save state functions to SaveVMHandlers

2019-07-11 Thread Yan Zhao

On Tue, Jul 09, 2019 at 05:49:16PM +0800, Kirti Wankhede wrote:
> Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy
> functions. These functions handles pre-copy and stop-and-copy phase.
> 
> In _SAVING|_RUNNING device state or pre-copy phase:
> - read pending_bytes
> - read data_offset - indicates kernel driver to write data to staging
>   buffer which is mmapped.
> - read data_size - amount of data in bytes written by vendor driver in 
> migration
>   region.
> - if data section is trapped, pread() from data_offset of data_size.
> - if data section is mmaped, read mmaped buffer of data_size.
> - Write data packet to file stream as below:
> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
> VFIO_MIG_FLAG_END_OF_STATE }
> 
> In _SAVING device state or stop-and-copy phase
> a. read config space of device and save to migration file stream. This
>doesn't need to be from vendor driver. Any other special config state
>from driver can be saved as data in following iteration.
> b. read pending_bytes
> c. read data_offset - indicates kernel driver to write data to staging
>buffer which is mmapped.
> d. read data_size - amount of data in bytes written by vendor driver in
>migration region.
> e. if data section is trapped, pread() from data_offset of data_size.
> f. if data section is mmaped, read mmaped buffer of data_size.
> g. Write data packet as below:
>{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
> h. iterate through steps b to g while (pending_bytes > 0)
> i. Write {VFIO_MIG_FLAG_END_OF_STATE}
> 
> When data region is mapped, its user's responsibility to read data from
> data_offset of data_size before moving to next steps.
> 
> .save_live_iterate runs outside the iothread lock in the migration case, which
> could race with asynchronous call to get dirty page list causing data 
> corruption
> in mapped migration region. Mutex added here to serial migration buffer read
> operation.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/migration.c  | 246 
> +++
>  hw/vfio/trace-events |   6 ++
>  2 files changed, 252 insertions(+)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 0597a45fda2d..4e9b4cce230b 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -117,6 +117,138 @@ static int vfio_migration_set_state(VFIODevice 
> *vbasedev, uint32_t state)
>  return 0;
>  }
>  
> +static void *find_data_region(VFIORegion *region,
> +  uint64_t data_offset,
> +  uint64_t data_size)
> +{
> +void *ptr = NULL;
> +int i;
> +
> +for (i = 0; i < region->nr_mmaps; i++) {
> +if ((data_offset >= region->mmaps[i].offset) &&
> +(data_offset < region->mmaps[i].offset + region->mmaps[i].size) 
> &&
> +(data_size <= region->mmaps[i].size)) {
> +ptr = region->mmaps[i].mmap + (data_offset -
> +   region->mmaps[i].offset);
> +break;
> +}
> +}
> +return ptr;
> +}
> +
> +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
> +{
> +VFIOMigration *migration = vbasedev->migration;
> +VFIORegion *region = >region.buffer;
> +uint64_t data_offset = 0, data_size = 0;
> +int ret;
> +
> +ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
> +region->fd_offset + offsetof(struct 
> vfio_device_migration_info,
> + data_offset));
> +if (ret != sizeof(data_offset)) {
> +error_report("%s: Failed to get migration buffer data offset %d",
> + vbasedev->name, ret);
> +return -EINVAL;
> +}
> +
> +ret = pread(vbasedev->fd, _size, sizeof(data_size),
> +region->fd_offset + offsetof(struct 
> vfio_device_migration_info,
> + data_size));
> +if (ret != sizeof(data_size)) {
> +error_report("%s: Failed to get migration buffer data size %d",
> + vbasedev->name, ret);
> +return -EINVAL;
> +}
> +
> +if (data_size > 0) {
> +void *buf = NULL;
> +bool buffer_mmaped;
> +
> +if (region->mmaps) {
> +buf = find_data_region(region, data_offset, data_size);
> +}
> +
> +buffer_mmaped = (buf != NULL) ? true : false;
> +
> +if (!buffer_mmaped) {
> +buf = g_try_malloc0(data_size);
> +if (!buf) {
> +error_report("%s: Error allocating buffer ", __func__);
> +return -ENOMEM;
> +}
> +
> +ret = pread(vbasedev->fd, buf, data_size,
> +region->fd_offset + data_offset);
> +if (ret != data_size) {
> +error_report("%s: Failed to get migration data %d",
> + vbasedev->name, ret);
> +

Re: [Qemu-devel] [PATCH] migration/postcopy: fix document of postcopy_send_discard_bm_ram()

2019-07-11 Thread Wei Yang

On Thu, Jul 11, 2019 at 10:34:27AM +0100, Dr. David Alan Gilbert wrote:
>* Wei Yang (richardw.y...@linux.intel.com) wrote:
>> Commit 6b6712efccd3 ('ram: Split dirty bitmap by RAMBlock') changes the
>> parameter of postcopy_send_discard_bm_ram(), while left the document
>> part untouched.
>> 
>> This patch correct the document and fix one typo by hand.
>> 
>> Signed-off-by: Wei Yang 
>> ---
>>  migration/ram.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 246efe6939..410e0f89fe 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2763,8 +2763,7 @@ void 
>> ram_postcopy_migrated_memory_release(MigrationState *ms)
>>   *
>>   * @ms: current migration state
>>   * @pds: state for postcopy
>> - * @start: RAMBlock starting page
>> - * @length: RAMBlock size
>> + * @block: RAMBlock to discard
>>   */
>>  static int postcopy_send_discard_bm_ram(MigrationState *ms,
>>  PostcopyDiscardState *pds,
>> @@ -2961,7 +2960,7 @@ static void 
>> postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
>>  }
>>  
>>  /**
>> - * postcopy_chuck_hostpages: discrad any partially sent host page
>> + * postcopy_chuck_hostpages: discard any partially sent host page
>
>While we're here we should probably fix the name of the function as
>well!   s/chuck/chunk/
>

Ah, didn't notice this :)

Do you like me to send v2 to fix this?

>Dave
>
>>   *
>>   * Utility for the outgoing postcopy code.
>>   *
>> -- 
>> 2.19.1
>> 
>--
>Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

-- 
Wei Yang
Help you, Help me

[Qemu-devel] [RFC PATCH 0/2] spapr: Implement H_TPM_COMM for accessing host TPM device

2019-07-11 Thread Michael Roth

These patches are also available at:

  https://github.com/mdroth/qemu/commits/spapr-tpm-hcall-v0

This patchset implements the H_TPM_COMM hypercall, which provides a way
for an Ultravisor to pass raw TPM commands on to a host's TPM device,
either directly or through a TPM Resource Manager (needed to support
multiple guests).

Secure Guests running on an Ultravisor have a symmetric key that is
encrypted using a public key that is bound to a trusted host's TPM
hardware. This hypercall provides a means to decrypt the symmetric
key on behalf of a Secure Guest using the host's TPM hardware.

More details are provided in the spec summary introduced in patch 1.

 docs/specs/ppc-spapr-uv-hcalls.txt |  74 
++
 hw/ppc/Makefile.objs   |   1 +
 hw/ppc/spapr.c |  27 +++
 hw/ppc/spapr_hcall_tpm.c   | 135 
+++
 hw/ppc/trace-events|   4 
 include/hw/ppc/spapr.h |   7 ++-
 6 files changed, 247 insertions(+), 1 deletion(-)

[Qemu-devel] [PATCH 1/2] docs/specs: initial spec summary for Ultravisor-related hcalls

2019-07-11 Thread Michael Roth

For now this only covers hcalls relating to TPM communication since
it's the only one particularly important from a QEMU perspective atm,
but others can be added here where it makes sense.

The full specification for all hcalls/ucalls will eventually be made
available in the public/OpenPower version of the PAPR specification.

Signed-off-by: Michael Roth 
---
 docs/specs/ppc-spapr-uv-hcalls.txt | 74 ++
 1 file changed, 74 insertions(+)
 create mode 100644 docs/specs/ppc-spapr-uv-hcalls.txt

diff --git a/docs/specs/ppc-spapr-uv-hcalls.txt 
b/docs/specs/ppc-spapr-uv-hcalls.txt
new file mode 100644
index 00..0278f89190
--- /dev/null
+++ b/docs/specs/ppc-spapr-uv-hcalls.txt
@@ -0,0 +1,74 @@
+On PPC64 systems supporting Protected Execution Facility (PEF), system
+memory can be placed in a secured region where only an "ultravisor"
+running in firmware can provide to access it. pseries guests on such
+systems can communicate with the ultravisor (via ultracalls) to switch to a
+secure VM mode (SVM) where the guest's memory is relocated to this secured
+region, making its memory inaccessible to normal processes/guests running on
+the host.
+
+The various ultracalls/hypercalls relating to SVM mode are currently
+only documented internally, but are planned for direct inclusion into the
+public OpenPOWER version of the PAPR specification (LoPAPR/LoPAR). An internal
+ACR has been filed to reserve a hypercall number range specific to this
+use-case to avoid any future conflicts with the internally-maintained PAPR
+specification. This document summarizes some of these details as they relate
+to QEMU.
+
+== hypercalls needed by the ultravisor ==
+
+Switching to SVM mode involves a number of hcalls issued by the ultravisor
+to the hypervisor to orchestrate the movement of guest memory to secure
+memory and various other aspects SVM mode. The below documents the hcalls
+relevant to QEMU.
+
+- H_TPM_COMM (0xef10)
+
+  For TPM_COMM_OP_EXECUTE operation:
+Send a request to a TPM and receive a response, opening a new TPM session
+if one has not already been opened.
+
+  For TPM_COMM_OP_CLOSE_SESSION operation:
+Close the existing TPM session, if any.
+
+  Arguments:
+
+r3 : H_TPM_COMM (0xef10)
+r4 : TPM operation, one of:
+ TPM_COMM_OP_EXECUTE (0x1)
+ TPM_COMM_OP_CLOSE_SESSION (0x2)
+r5 : in_buffer, guest physical address of buffer containing the request
+ - Caller may use the same address for both request and response
+r6 : in_size, size of the in buffer, must
+ - Must be less than or equal to 4KB
+r7 : out_buffer, guest physical address of buffer to store the response
+ - Caller may use the same address for both request and response
+r8 : out_size, size of the out buffer
+ - Must be at least 4KB, as this is the maximum request/response size
+   supported by most TPM implementations, including the TPM Resource
+   Manager in the linux kernel.
+
+  Return values:
+
+r3 : H_Successrequest processed successfully
+ H_PARAMETER  invalid TPM operation
+ H_P2 in_buffer is invalid
+ H_P3 in_size is invalid
+ H_P4 out_buffer is invalid
+ H_P5 out_size is invalid
+ H_RESOURCE   TPM is unavailable
+r4 : For TPM_COMM_OP_EXECUTE, the size of the response will be stored here
+ upon success.
+
+  Use-case/notes:
+
+SVM filesystems are encrypted using a symmetric key. This key is then
+wrapped/encrypted using the public key of a trusted system which has the
+private key stored in the system's TPM. An Ultravisor will use this
+hcall to unwrap/unseal the symmetric key using the system's TPM device
+or a TPM Resource Manager associated with the device.
+
+The Ultravisor sets up a separate session key with the TPM in advance
+during host system boot. All sensitive in and out values will be
+encrypted using the session key. Though the hypervisor will see the 'in'
+and 'out' buffers in raw form, any sensitive contents will generally be
+encrypted using this session key.
-- 
2.17.1

Re: [Qemu-devel] [PATCH 08/10] ppc/xive: Extend XiveTCTX with an router object pointer

2019-07-11 Thread David Gibson

On Wed, Jul 03, 2019 at 07:54:57AM +0200, Cédric Le Goater wrote:
> On 03/07/2019 04:07, David Gibson wrote:
> > On Sun, Jun 30, 2019 at 10:45:59PM +0200, Cédric Le Goater wrote:
> >> This is to perform lookups in the NVT table when a vCPU is dispatched
> >> and possibly resend interrupts.
> > 
> > I'm slightly confused by this one.  Aren't there multiple router
> > objects, each of which can deliver to any thread?  In which case what
> > router object is associated with a specific TCTX?
> 
> when a vCPU is dispatched on a HW thread, the hypervisor does a store 
> on the CAM line to store the VP id. At that time, it checks the IPB in 
> the associated NVT structure and notifies the thread if an interrupt is 
> pending. 
> 
> We need to do a NVT lookup, just like the presenter in HW, hence the 
> router pointer. You should look at the following patch which clarifies 
> the resend sequence.

Hm, ok.

> >> Future XIVE chip will use a different class for the model of the
> >> interrupt controller. So use an 'Object *' instead of a 'XiveRouter *'.
> > 
> > This seems odd to me, shouldn't it be an interface pointer or
> > something in that case?
> 
> I have duplicated most of the XIVE models for P10 because the internal 
> structures have changed. I managed to keep the XiveSource and XiveTCTX 
> but we now have a Xive10Router, this is the reason why.

Right, but XiveRouter and Xive10Router must have something in common
if they can both be used here.  Usually that's expressed as a shared
QOM interface - in which case you can use a pointer to the interface,
rathe than using Object * which kind of implies *anything* can go
here.

> 
> If I was to duplicate XiveTCTX also, I will switch it back to a XiveRouter 
> pointer in the P9 version. 
> 
> C.
> 
> 
> >> Signed-off-by: Cédric Le Goater 
> > 
> >> ---
> >>  include/hw/ppc/xive.h |  4 +++-
> >>  hw/intc/xive.c| 11 ++-
> >>  hw/ppc/pnv.c  |  2 +-
> >>  hw/ppc/spapr_irq.c|  2 +-
> >>  4 files changed, 15 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> >> index d922524982d3..b764e1e4e6d4 100644
> >> --- a/include/hw/ppc/xive.h
> >> +++ b/include/hw/ppc/xive.h
> >> @@ -321,6 +321,8 @@ typedef struct XiveTCTX {
> >>  qemu_irqos_output;
> >>  
> >>  uint8_t regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
> >> +
> >> +Object  *xrtr;
> >>  } XiveTCTX;
> >>  
> >>  /*
> >> @@ -416,7 +418,7 @@ void xive_tctx_tm_write(XiveTCTX *tctx, hwaddr offset, 
> >> uint64_t value,
> >>  uint64_t xive_tctx_tm_read(XiveTCTX *tctx, hwaddr offset, unsigned size);
> >>  
> >>  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
> >> -Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp);
> >> +Object *xive_tctx_create(Object *cpu, Object *xrtr, Error **errp);
> >>  
> >>  static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t 
> >> nvt_idx)
> >>  {
> >> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> >> index f7ba1c3b622f..56700681884f 100644
> >> --- a/hw/intc/xive.c
> >> +++ b/hw/intc/xive.c
> >> @@ -573,6 +573,14 @@ static void xive_tctx_realize(DeviceState *dev, Error 
> >> **errp)
> >>  Object *obj;
> >>  Error *local_err = NULL;
> >>  
> >> +obj = object_property_get_link(OBJECT(dev), "xrtr", _err);
> >> +if (!obj) {
> >> +error_propagate(errp, local_err);
> >> +error_prepend(errp, "required link 'xrtr' not found: ");
> >> +return;
> >> +}
> >> +tctx->xrtr = obj;
> >> +
> >>  obj = object_property_get_link(OBJECT(dev), "cpu", _err);
> >>  if (!obj) {
> >>  error_propagate(errp, local_err);
> >> @@ -657,7 +665,7 @@ static const TypeInfo xive_tctx_info = {
> >>  .class_init= xive_tctx_class_init,
> >>  };
> >>  
> >> -Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp)
> >> +Object *xive_tctx_create(Object *cpu, Object *xrtr, Error **errp)
> >>  {
> >>  Error *local_err = NULL;
> >>  Object *obj;
> >> @@ -666,6 +674,7 @@ Object *xive_tctx_create(Object *cpu, XiveRouter 
> >> *xrtr, Error **errp)
> >>  object_property_add_child(cpu, TYPE_XIVE_TCTX, obj, _abort);
> >>  object_unref(obj);
> >>  object_property_add_const_link(obj, "cpu", cpu, _abort);
> >> +object_property_add_const_link(obj, "xrtr", xrtr, _abort);
> >>  object_property_set_bool(obj, true, "realized", _err);
> >>  if (local_err) {
> >>  goto error;
> >> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> >> index b87e01e5b925..11916dc273c2 100644
> >> --- a/hw/ppc/pnv.c
> >> +++ b/hw/ppc/pnv.c
> >> @@ -765,7 +765,7 @@ static void pnv_chip_power9_intc_create(PnvChip *chip, 
> >> PowerPCCPU *cpu,
> >>   * controller object is initialized afterwards. Hopefully, it's
> >>   * only used at runtime.
> >>   */
> >> -obj = xive_tctx_create(OBJECT(cpu), XIVE_ROUTER(>xive), 
> >> _err);
> >> +obj = xive_tctx_create(OBJECT(cpu), OBJECT(>xive), _err);
> >>

Re: [Qemu-devel] spapr_pci: Advertise BAR reallocation capability

2019-07-11 Thread David Gibson

On Thu, Jun 13, 2019 at 11:37:45AM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 12/06/2019 16:11, David Gibson wrote:
> > On Thu, Jun 06, 2019 at 02:09:19PM +1000, Alexey Kardashevskiy wrote:
> >> The pseries guests do not normally allocate PCI resouces and rely on
> >> the system firmware doing so. Furthermore at least at some point in
> >> the past the pseries guests won't even be allowed to change BARs, probably
> >> it is still the case for phyp. So since the initial commit we have [1]
> >> which prevents resource reallocation.
> >>
> >> This is not a problem until we want specific BAR alignments, for example,
> >> PAGE_SIZE==64k to make sure we can still map MMIO BARs directly. For
> >> the boot time devices we handle this in SLOF [2] but since QEMU's RTAS
> >> does not allocate BARs, the guest does this instead and does not align
> >> BARs even if Linux is given pci=resource_alignment=16@pci:0:0 as
> >> PCI_PROBE_ONLY makes Linux ignore alignment requests.
> >>
> >> ARM folks added a dial to control PCI_PROBE_ONLY via the device tree [3].
> >> This makes use of the dial to advertise to the guest that we can handle
> >> BAR reassignments.
> >>
> >> We do not remove the flag from [1] as pseries guests are still supported
> >> under phyp so having that removed may cause problems.
> >>
> >> [1] 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/platforms/pseries/setup.c?h=v5.1#n773
> >> [2] 
> >> https://git.qemu.org/?p=SLOF.git;a=blob;f=board-qemu/slof/pci-phb.fs;h=06729bcf77a0d4e900c527adcd9befe2a269f65d;hb=HEAD#l338
> >> [3] 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f81c11af
> >> Signed-off-by: Alexey Kardashevskiy 
> > 
> > Changing a guest visible property, that could have a big effect on how
> > the guest behaves, without a machine version change seems... unwise.
> 
> 
> As a general rule - sure, not good. In this particular case QEMU has
> always been able to cope with BAR reallocations.

That's not really the point.  What I'm worried about is some old
kernel version running on a guest in the wild having a bug here and
the supposedly compatible qemu change breaking it.

> What could probably
> make sense is having it as a machine option (pci-probe-only=off by
> default) in case if we find some old kernel which cannot handle
> "linux,pci-probe-only" but I seriously doubt we'll find such a broken
> kernel - I do remove the probe-only switch from guest kernels on a
> regular basis last 7 or so years when debugging.

Yeah, doing it when debugging isn't really the same as exercising it
in production environments.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH for 4.1?] includes: remove stale [smp|max]_cpus externs

2019-07-11 Thread Like Xu


On 2019/7/11 21:05, Alex Bennée wrote:

Commit a5e0b3311 removed these in favour of querying machine
properties. Remove the extern declarations as well.

Signed-off-by: Alex Bennée 
Cc: Like Xu 


Reviewed-by: Like Xu 


---
  include/sysemu/sysemu.h | 2 --
  1 file changed, 2 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 984c439ac9..e70edf7c1c 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -103,8 +103,6 @@ extern const char *keyboard_layout;
  extern int win2k_install_hack;
  extern int alt_grab;
  extern int ctrl_grab;
-extern int smp_cpus;
-extern unsigned int max_cpus;
  extern int cursor_hide;
  extern int graphic_rotate;
  extern int no_quit;

Re: [Qemu-devel] [PATCH v7 11/13] vfio: Add function to get dirty page list

2019-07-11 Thread Yan Zhao

On Tue, Jul 09, 2019 at 05:49:18PM +0800, Kirti Wankhede wrote:
> Dirty page tracking (.log_sync) is part of RAM copying state, where
> vendor driver provides the bitmap of pages which are dirtied by vendor
> driver through migration region and as part of RAM copy, those pages
> gets copied to file stream.
> 
> To get dirty page bitmap:
> - write start address, page_size and pfn count.
> - read count of pfns copied.
> - Vendor driver should return 0 if driver doesn't have any page to
>   report dirty in given range.
> - Vendor driver should return -1 to mark all pages dirty for given range.
> - read data_offset, where vendor driver has written bitmap.
> - read bitmap from the region or mmaped part of the region.
> - Iterate above steps till page bitmap for all requested pfns are copied.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/migration.c   | 123 
> ++
>  hw/vfio/trace-events  |   1 +
>  include/hw/vfio/vfio-common.h |   2 +
>  3 files changed, 126 insertions(+)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 5fb4c5329ede..ca1a8c0f5f1f 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -269,6 +269,129 @@ static int vfio_load_device_config_state(QEMUFile *f, 
> void *opaque)
>  return qemu_file_get_error(f);
>  }
>  
> +void vfio_get_dirty_page_list(VFIODevice *vbasedev,
> +  uint64_t start_pfn,
> +  uint64_t pfn_count,
> +  uint64_t page_size)
> +{
> +VFIOMigration *migration = vbasedev->migration;
> +VFIORegion *region = >region.buffer;
> +uint64_t count = 0;
> +int64_t copied_pfns = 0;
> +int64_t total_pfns = pfn_count;
> +int ret;
> +
> +qemu_mutex_lock(>lock);
> +
> +while (total_pfns > 0) {
> +uint64_t bitmap_size, data_offset = 0;
> +uint64_t start = start_pfn + count;
> +void *buf = NULL;
> +bool buffer_mmaped = false;
> +
> +ret = pwrite(vbasedev->fd, , sizeof(start),
> + region->fd_offset + offsetof(struct 
> vfio_device_migration_info,
> +  start_pfn));
> +if (ret < 0) {
> +error_report("%s: Failed to set dirty pages start address %d %s",
> + vbasedev->name, ret, strerror(errno));
> +goto dpl_unlock;
> +}
> +
> +ret = pwrite(vbasedev->fd, _size, sizeof(page_size),
> + region->fd_offset + offsetof(struct 
> vfio_device_migration_info,
> +  page_size));
> +if (ret < 0) {
> +error_report("%s: Failed to set dirty page size %d %s",
> + vbasedev->name, ret, strerror(errno));
> +goto dpl_unlock;
> +}
> +
> +ret = pwrite(vbasedev->fd, _pfns, sizeof(total_pfns),
> + region->fd_offset + offsetof(struct 
> vfio_device_migration_info,
> +  total_pfns));
> +if (ret < 0) {
> +error_report("%s: Failed to set dirty page total pfns %d %s",
> + vbasedev->name, ret, strerror(errno));
> +goto dpl_unlock;
> +}
> +
> +/* Read copied dirty pfns */
> +ret = pread(vbasedev->fd, _pfns, sizeof(copied_pfns),
> +region->fd_offset + offsetof(struct 
> vfio_device_migration_info,
> + copied_pfns));
> +if (ret < 0) {
> +error_report("%s: Failed to get dirty pages bitmap count %d %s",
> + vbasedev->name, ret, strerror(errno));
> +goto dpl_unlock;
> +}
> +
> +if (copied_pfns == VFIO_DEVICE_DIRTY_PFNS_NONE) {
> +/*
> + * copied_pfns could be 0 if driver doesn't have any page to
> + * report dirty in given range
> + */
> +break;
> +} else if (copied_pfns == VFIO_DEVICE_DIRTY_PFNS_ALL) {
> +/* Mark all pages dirty for this range */
> +cpu_physical_memory_set_dirty_range(start_pfn * page_size,
> +pfn_count * page_size,
> +DIRTY_MEMORY_MIGRATION);
seesm pfn_count here is not right
> +break;
> +}
> +
> +bitmap_size = (BITS_TO_LONGS(copied_pfns) + 1) * sizeof(unsigned 
> long);
> +
> +ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
> +region->fd_offset + offsetof(struct 
> vfio_device_migration_info,
> + data_offset));
> +if (ret != sizeof(data_offset)) {
> +error_report("%s: Failed to get migration buffer data offset %d",
> + vbasedev->name, ret);
> +goto dpl_unlock;
> +

Re: [Qemu-devel] [PATCH v7 00/13] Add migration support for VFIO device

2019-07-11 Thread Yan Zhao

On Fri, Jul 12, 2019 at 03:08:31AM +0800, Kirti Wankhede wrote:
> 
> 
> On 7/11/2019 9:53 PM, Dr. David Alan Gilbert wrote:
> > * Yan Zhao (yan.y.z...@intel.com) wrote:
> >> On Thu, Jul 11, 2019 at 06:50:12PM +0800, Dr. David Alan Gilbert wrote:
> >>> * Yan Zhao (yan.y.z...@intel.com) wrote:
>  Hi Kirti,
>  There are still unaddressed comments to your patches v4.
>  Would you mind addressing them?
> 
>  1. should we register two migration interfaces simultaneously
>  (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04750.html)
> >>>
> >>> Please don't do this.
> >>> As far as I'm aware we currently only have one device that does that
> >>> (vmxnet3) and a patch has just been posted that fixes/removes that.
> >>>
> >>> Dave
> >>>
> >> hi Dave,
> >> Thanks for notifying this. but if we want to support postcopy in future,
> >> after device stops, what interface could we use to transfer data of
> >> device state only?
> >> for postcopy, when source device stops, we need to transfer only
> >> necessary device state to target vm before target vm starts, and we
> >> don't want to transfer device memory as we'll do that after target vm
> >> resuming.
> > 
> > Hmm ok, lets see; that's got to happen in the call to:
> > qemu_savevm_state_complete_precopy(fb, false, false);
> > that's made from postcopy_start.
> >  (the false's are iterable_only and inactivate_disks)
> > 
> > and at that time I believe the state is POSTCOPY_ACTIVE, so in_postcopy
> > is true.
> > 
> > If you're doing postcopy, then you'll probably define a has_postcopy()
> > function, so qemu_savevm_state_complete_precopy will skip the
> > save_live_complete_precopy call from it's loop for at least two of the
> > reasons in it's big if.
> > 
> > So you're right; you need the VMSD for this to happen in the second
> > loop in qemu_savevm_state_complete_precopy.  Hmm.
> > 
> > Now, what worries me, and I don't know the answer, is how the section
> > header for the vmstate and the section header for an iteration look
> > on the stream; how are they different?
> > 
> 
> I don't have way to test postcopy migration - is one of the major reason
> I had not included postcopy support in this patchset and clearly called
> out in cover letter.
> This patchset is thoroughly tested for precopy migration.
> If anyone have hardware that supports fault, then I would prefer to add
> postcopy support as incremental change later which can be tested before
> submitting.
> 
> Just a suggestion, instead of using VMSD, is it possible to have some
> additional check to call save_live_complete_precopy from
> qemu_savevm_state_complete_precopy?
> 
> 
> >>
>  2. in each save iteration, how much data is to be saved
>  (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04683.html)
> 
> > how big is the data_size ?
> > if this size is too big, it may take too much time and block others.
> 
> I do had mentioned this in the comment about the structure in vfio.h
> header. data_size will be provided by vendor driver and obviously will
> not be greater that migration region size. Vendor driver should be
> responsible to keep its solution optimized.
>
if the data_size is no big than migration region size, and each
iteration only saves data of data_size, i'm afraid it will cause
prolonged down time. after all, the migration region size cannot be very
big.
Also, if vendor driver determines how much data to save in each
iteration alone, and no checks in qemu, it may cause other devices'
migration time be squeezed.

> 
>  3. do we need extra interface to get data for device state only
>  (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04812.html)
> 
> I don't think so. Opaque Device data from vendor driver can include
> device state and device memory. Vendor driver who is managing his device
> can decide how to place data over the stream.
>
I know current design is opaque device data. then to support postcopy,
we may have to add extra device state like in-postcopy. but postcopy is
more like a qemu state and is not a device state.
to address it elegantly, may we add an extra interface besides
vfio_save_buffer() to get data for device state only?

>  4. definition of dirty page copied_pfn
>  (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg05592.html)
> 
> 
> This was inline to discussion going with Alex. I addressed the concern
> there. Please check current patchset, which addresses the concerns raised.
>
ok. I saw you also updated the flow in the part. please check my comment
in that patch for detail. but as a suggestion, I think processed_pfns is
a better name compared to copied_pfns :)

>  Also, I'm glad to see that you updated code by following my comments 
>  below,
>  but please don't forget to reply my comments next time:)
> 
> I tried to reply top of threads and addressed common concerns raised in
> that. Sorry If I missed any, I'll make sure to point you to my replies
> going

Re: [Qemu-devel] [RFC v4 00/29] vSMMUv3/pSMMUv3 2 stage VFIO integration

2019-07-11 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20190711172845.31035-1-eric.au...@redhat.com/



Hi,

This series failed build test on s390x host. Please find the details below.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e

echo
echo "=== ENV ==="
env

echo
echo "=== PACKAGES ==="
rpm -qa

echo
echo "=== UNAME ==="
uname -a

CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

  CC  aarch64-softmmu/hw/arm/msf2-soc.o
  CC  i386-softmmu/hw/virtio/virtio-pmem.o
  CC  aarch64-softmmu/hw/arm/musca.o
/var/tmp/patchew-tester-tmp-r6tvf9a3/src/hw/virtio/virtio-pmem.c:21:10: fatal 
error: standard-headers/linux/virtio_pmem.h: No such file or directory
   21 | #include "standard-headers/linux/virtio_pmem.h"
  |  ^~
compilation terminated.


The full log is available at
http://patchew.org/logs/20190711172845.31035-1-eric.au...@redhat.com/testing.s390x/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [PATCH v7 00/13] Add migration support for VFIO device

2019-07-11 Thread Yan Zhao

On Fri, Jul 12, 2019 at 12:23:15AM +0800, Dr. David Alan Gilbert wrote:
> * Yan Zhao (yan.y.z...@intel.com) wrote:
> > On Thu, Jul 11, 2019 at 06:50:12PM +0800, Dr. David Alan Gilbert wrote:
> > > * Yan Zhao (yan.y.z...@intel.com) wrote:
> > > > Hi Kirti,
> > > > There are still unaddressed comments to your patches v4.
> > > > Would you mind addressing them?
> > > > 
> > > > 1. should we register two migration interfaces simultaneously
> > > > (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04750.html)
> > > 
> > > Please don't do this.
> > > As far as I'm aware we currently only have one device that does that
> > > (vmxnet3) and a patch has just been posted that fixes/removes that.
> > > 
> > > Dave
> > >
> > hi Dave,
> > Thanks for notifying this. but if we want to support postcopy in future,
> > after device stops, what interface could we use to transfer data of
> > device state only?
> > for postcopy, when source device stops, we need to transfer only
> > necessary device state to target vm before target vm starts, and we
> > don't want to transfer device memory as we'll do that after target vm
> > resuming.
> 
> Hmm ok, lets see; that's got to happen in the call to:
> qemu_savevm_state_complete_precopy(fb, false, false);
> that's made from postcopy_start.
>  (the false's are iterable_only and inactivate_disks)
> 
> and at that time I believe the state is POSTCOPY_ACTIVE, so in_postcopy
> is true.
> 
> If you're doing postcopy, then you'll probably define a has_postcopy()
> function, so qemu_savevm_state_complete_precopy will skip the
> save_live_complete_precopy call from it's loop for at least two of the
> reasons in it's big if.
> 
> So you're right; you need the VMSD for this to happen in the second
> loop in qemu_savevm_state_complete_precopy.  Hmm.
> 
> Now, what worries me, and I don't know the answer, is how the section
> header for the vmstate and the section header for an iteration look
> on the stream; how are they different?
>
may we name one "vfio" and the other "vfio-vmsd", and let iteration
interface for device memory data and vmstate interface for device state
data?

Thanks
Yan
> Dave
> 
> > Thanks
> > Yan
> > 
> > > > 2. in each save iteration, how much data is to be saved
> > > > (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04683.html)
> > > > 3. do we need extra interface to get data for device state only
> > > > (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04812.html)
> > > > 4. definition of dirty page copied_pfn
> > > > (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg05592.html)
> > > > 
> > > > Also, I'm glad to see that you updated code by following my comments 
> > > > below,
> > > > but please don't forget to reply my comments next time:)
> > > > https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg05357.html
> > > > https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg06454.html
> > > > 
> > > > Thanks
> > > > Yan
> > > > 
> > > > On Tue, Jul 09, 2019 at 05:49:07PM +0800, Kirti Wankhede wrote:
> > > > > Add migration support for VFIO device
> > > > > 
> > > > > This Patch set include patches as below:
> > > > > - Define KABI for VFIO device for migration support.
> > > > > - Added save and restore functions for PCI configuration space
> > > > > - Generic migration functionality for VFIO device.
> > > > >   * This patch set adds functionality only for PCI devices, but can be
> > > > > extended to other VFIO devices.
> > > > >   * Added all the basic functions required for pre-copy, 
> > > > > stop-and-copy and
> > > > > resume phases of migration.
> > > > >   * Added state change notifier and from that notifier function, VFIO
> > > > > device's state changed is conveyed to VFIO device driver.
> > > > >   * During save setup phase and resume/load setup phase, migration 
> > > > > region
> > > > > is queried and is used to read/write VFIO device data.
> > > > >   * .save_live_pending and .save_live_iterate are implemented to use 
> > > > > QEMU's
> > > > > functionality of iteration during pre-copy phase.
> > > > >   * In .save_live_complete_precopy, that is in stop-and-copy phase,
> > > > > iteration to read data from VFIO device driver is implemented 
> > > > > till pending
> > > > > bytes returned by driver are not zero.
> > > > >   * Added function to get dirty pages bitmap for the pages which are 
> > > > > used by
> > > > > driver.
> > > > > - Add vfio_listerner_log_sync to mark dirty pages.
> > > > > - Make VFIO PCI device migration capable. If migration region is not 
> > > > > provided by
> > > > >   driver, migration is blocked.
> > > > > 
> > > > > Below is the flow of state change for live migration where states in 
> > > > > brackets
> > > > > represent VM state, migration state and VFIO device state as:
> > > > > (VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE)
> > > > > 
> > > > > Live migration save path:
> > > > > QEMU normal running state
> > > > >

[Qemu-devel] [RISU PATCH v3 17/18] x86.risu: add AVX instructions

2019-07-11 Thread Jan Bobek

Add AVX instructions to the x86 configuration file.

Signed-off-by: Jan Bobek 
---
 x86.risu | 1362 ++
 1 file changed, 1362 insertions(+)

diff --git a/x86.risu b/x86.risu
index 177979a..03ffc89 100644
--- a/x86.risu
+++ b/x86.risu
@@ -29,6 +29,12 @@ MOVD SSE2  011 d 1110 \
   !constraints { data16($_); modrm($_); !(defined $_->{modrm}{reg2} && 
$_->{modrm}{reg2} == REG_RSP) } \
   !memory { $d ? store(size => 4) : load(size => 4); }
 
+# VEX.128.66.0F.W0 6E /r: VMOVD xmm1,r32/m32
+# VEX.128.66.0F.W0 7E /r: VMOVD r32/m32,xmm1
+VMOVD AVX 011 d 1110 \
+  !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0x66, w => 0); 
modrm($_); !(defined $_->{modrm}{reg2} && $_->{modrm}{reg2} == REG_RSP) } \
+  !memory { $d ? store(size => 4) : load(size => 4); }
+
 # NP REX.W + 0F 6E /r: MOVQ mm,r/m64
 # NP REX.W + 0F 7E /r: MOVQ r/m64,mm
 MOVQ MMX  011 d 1110 \
@@ -41,6 +47,12 @@ MOVQ SSE2  011 d 1110 \
   !constraints { data16($_); rex($_, w => 1); modrm($_); !(defined 
$_->{modrm}{reg2} && $_->{modrm}{reg2} == REG_RSP) } \
   !memory { $d ? store(size => 8) : load(size => 8); }
 
+# VEX.128.66.0F.W1 6E /r: VMOVQ xmm1,r64/m64
+# VEX.128.66.0F.W1 7E /r: VMOVQ r64/m64,xmm1
+VMOVQ AVX 011 d 1110 \
+  !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0x66, w => 1); 
modrm($_); !(defined $_->{modrm}{reg2} && $_->{modrm}{reg2} == REG_RSP) } \
+  !memory { $d ? store(size => 8) : load(size => 8); }
+
 # NP 0F 6F /r: MOVQ mm, mm/m64
 # NP 0F 7F /r: MOVQ mm/m64, mm
 MOVQ_mm MMX  011 d  \
@@ -52,59 +64,121 @@ MOVQ_xmm1 SSE2  0110 \
   !constraints { rep($_); modrm($_); 1 } \
   !memory { load(size => 8); }
 
+# VEX.128.F3.0F.WIG 7E /r: VMOVQ xmm1, xmm2/m64
+VMOVQ_xmm1 AVX 0110 \
+  !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0xF3); modrm($_); 1 
} \
+  !memory { load(size => 8); }
+
 # 66 0F D6 /r: MOVQ xmm2/m64, xmm1
 MOVQ_xmm2 SSE2  11010110 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { store(size => 8); }
 
+# VEX.128.66.0F.WIG D6 /r: VMOVQ xmm1/m64, xmm2
+VMOVQ_xmm2 AVX 11010110 \
+  !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0x66); modrm($_); 1 
} \
+  !memory { store(size => 8); }
+
 # NP 0F 28 /r: MOVAPS xmm1, xmm2/m128
 # NP 0F 29 /r: MOVAPS xmm2/m128, xmm1
 MOVAPS SSE  0010100 d \
   !constraints { modrm($_); 1 } \
   !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
 
+# VEX.128.0F.WIG 28 /r: VMOVAPS xmm1, xmm2/m128
+# VEX.128.0F.WIG 29 /r: VMOVAPS xmm2/m128, xmm1
+VMOVAPS AVX 0010100 d \
+  !constraints { vex($_, m => 0x0F, l => 128, v => 0); modrm($_); 1 } \
+  !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
+
 # 66 0F 28 /r: MOVAPD xmm1, xmm2/m128
 # 66 0F 29 /r: MOVAPD xmm2/m128, xmm1
 MOVAPD SSE2  0010100 d \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
 
+# VEX.128.66.0F.WIG 28 /r: VMOVAPD xmm1, xmm2/m128
+# VEX.128.66.0F.WIG 29 /r: VMOVAPD xmm2/m128, xmm1
+VMOVAPD AVX 0010100 d \
+  !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0x66); modrm($_); 1 
} \
+  !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
+
 # 66 0F 6F /r: MOVDQA xmm1, xmm2/m128
 # 66 0F 7F /r: MOVDQA xmm2/m128, xmm1
 MOVDQA SSE2  011 d  \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
 
+# VEX.128.66.0F.WIG 6F /r: VMOVDQA xmm1, xmm2/m128
+# VEX.128.66.0F.WIG 7F /r: VMOVDQA xmm2/m128, xmm1
+VMOVDQA AVX 011 d  \
+  !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0x66); modrm($_); 1 
} \
+  !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
+
 # NP 0F 10 /r: MOVUPS xmm1, xmm2/m128
 # NP 0F 11 /r: MOVUPS xmm2/m128, xmm1
 MOVUPS SSE  0001000 d \
   !constraints { modrm($_); 1 } \
   !memory { $d ? store(size => 16) : load(size => 16); }
 
+# VEX.128.0F.WIG 10 /r: VMOVUPS xmm1, xmm2/m128
+# VEX.128.0F.WIG 11 /r: VMOVUPS xmm2/m128, xmm1
+VMOVUPS AVX 0001000 d \
+  !constraints { vex($_, m => 0x0F, l => 128, v => 0); modrm($_); 1 } \
+  !memory { $d ? store(size => 16) : load(size => 16); }
+
 # 66 0F 10 /r: MOVUPD xmm1, xmm2/m128
 # 66 0F 11 /r: MOVUPD xmm2/m128, xmm1
 MOVUPD SSE2  0001000 d \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { $d ? store(size => 16) : load(size => 16); }
 
+# VEX.128.66.0F.WIG 10 /r: VMOVUPD xmm1, xmm2/m128
+# VEX.128.66.0F.WIG 11 /r: VMOVUPD xmm2/m128, xmm1
+VMOVUPD AVX 0001000 d \
+  !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0x66); modrm($_); 1 
} \
+  !memory { $d ? store(size => 16) : load(size => 16); }
+
 # F3 0F 6F /r: MOVDQU xmm1,xmm2/m128
 # F3 0F 7F /r: MOVDQU xmm2/m128,xmm1
 MOVDQU SSE2  011 d  \
   !constraints { rep($_); modrm($_); 1 } \

[Qemu-devel] [RISU PATCH v3 18/18] x86.risu: add AVX2 instructions

2019-07-11 Thread Jan Bobek

Add AVX2 instructions to the configuration file.

Signed-off-by: Jan Bobek 
---
 x86.risu | 1239 ++
 1 file changed, 1239 insertions(+)

diff --git a/x86.risu b/x86.risu
index 03ffc89..1705a8e 100644
--- a/x86.risu
+++ b/x86.risu
@@ -91,6 +91,12 @@ VMOVAPS AVX 0010100 d \
   !constraints { vex($_, m => 0x0F, l => 128, v => 0); modrm($_); 1 } \
   !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
 
+# VEX.256.0F.WIG 28 /r: VMOVAPS ymm1, ymm2/m256
+# VEX.256.0F.WIG 29 /r: VMOVAPS ymm2/m256, ymm1
+VMOVAPS AVX2 0010100 d \
+  !constraints { vex($_, m => 0x0F, l => 256, v => 0); modrm($_); 1 } \
+  !memory { $d ? store(size => 32, align => 32) : load(size => 32, align => 
32); }
+
 # 66 0F 28 /r: MOVAPD xmm1, xmm2/m128
 # 66 0F 29 /r: MOVAPD xmm2/m128, xmm1
 MOVAPD SSE2  0010100 d \
@@ -103,6 +109,12 @@ VMOVAPD AVX 0010100 d \
   !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0x66); modrm($_); 1 
} \
   !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
 
+# VEX.256.66.0F.WIG 28 /r: VMOVAPD ymm1, ymm2/m256
+# VEX.256.66.0F.WIG 29 /r: VMOVAPD ymm2/m256, ymm1
+VMOVAPD AVX2 0010100 d \
+  !constraints { vex($_, m => 0x0F, l => 256, v => 0, p => 0x66); modrm($_); 1 
} \
+  !memory { $d ? store(size => 32, align => 32) : load(size => 32, align => 
32); }
+
 # 66 0F 6F /r: MOVDQA xmm1, xmm2/m128
 # 66 0F 7F /r: MOVDQA xmm2/m128, xmm1
 MOVDQA SSE2  011 d  \
@@ -115,6 +127,12 @@ VMOVDQA AVX 011 d  \
   !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0x66); modrm($_); 1 
} \
   !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
 
+# VEX.256.66.0F.WIG 6F /r: VMOVDQA ymm1, ymm2/m256
+# VEX.256.66.0F.WIG 7F /r: VMOVDQA ymm2/m256, ymm1
+VMOVDQA AVX2 011 d  \
+  !constraints { vex($_, m => 0x0F, l => 256, v => 0, p => 0x66); modrm($_); 1 
} \
+  !memory { $d ? store(size => 32, align => 32) : load(size => 32, align => 
32); }
+
 # NP 0F 10 /r: MOVUPS xmm1, xmm2/m128
 # NP 0F 11 /r: MOVUPS xmm2/m128, xmm1
 MOVUPS SSE  0001000 d \
@@ -127,6 +145,12 @@ VMOVUPS AVX 0001000 d \
   !constraints { vex($_, m => 0x0F, l => 128, v => 0); modrm($_); 1 } \
   !memory { $d ? store(size => 16) : load(size => 16); }
 
+# VEX.256.0F.WIG 10 /r: VMOVUPS ymm1, ymm2/m256
+# VEX.256.0F.WIG 11 /r: VMOVUPS ymm2/m256, ymm1
+VMOVUPS AVX2 0001000 d \
+  !constraints { vex($_, m => 0x0F, l => 256, v => 0); modrm($_); 1 } \
+  !memory { $d ? store(size => 32) : load(size => 32); }
+
 # 66 0F 10 /r: MOVUPD xmm1, xmm2/m128
 # 66 0F 11 /r: MOVUPD xmm2/m128, xmm1
 MOVUPD SSE2  0001000 d \
@@ -139,6 +163,12 @@ VMOVUPD AVX 0001000 d \
   !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0x66); modrm($_); 1 
} \
   !memory { $d ? store(size => 16) : load(size => 16); }
 
+# VEX.256.66.0F.WIG 10 /r: VMOVUPD ymm1, ymm2/m256
+# VEX.256.66.0F.WIG 11 /r: VMOVUPD ymm2/m256, ymm1
+VMOVUPD AVX2 0001000 d \
+  !constraints { vex($_, m => 0x0F, l => 256, v => 0, p => 0x66); modrm($_); 1 
} \
+  !memory { $d ? store(size => 32) : load(size => 32); }
+
 # F3 0F 6F /r: MOVDQU xmm1,xmm2/m128
 # F3 0F 7F /r: MOVDQU xmm2/m128,xmm1
 MOVDQU SSE2  011 d  \
@@ -151,6 +181,12 @@ VMOVDQU AVX 011 d  \
   !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0xF3); modrm($_); 1 
} \
   !memory { $d ? store(size => 16) : load(size => 16); }
 
+# VEX.256.F3.0F.WIG 6F /r: VMOVDQU ymm1,ymm2/m256
+# VEX.256.F3.0F.WIG 7F /r: VMOVDQU ymm2/m256,ymm1
+VMOVDQU AVX2 011 d  \
+  !constraints { vex($_, m => 0x0F, l => 256, v => 0, p => 0xF3); modrm($_); 1 
} \
+  !memory { $d ? store(size => 32) : load(size => 32); }
+
 # F3 0F 10 /r: MOVSS xmm1, xmm2/m32
 # F3 0F 11 /r: MOVSS xmm2/m32, xmm1
 MOVSS SSE  0001000 d \
@@ -263,6 +299,10 @@ PMOVMSKB SSE2  11010111 \
 VPMOVMSKB AVX 11010111 \
   !constraints { vex($_, m => 0x0F, l => 128, v => 0, p => 0x66); modrm($_); 
$_->{modrm}{reg} != REG_RSP && defined $_->{modrm}{reg2} }
 
+# VEX.256.66.0F.WIG D7 /r: VPMOVMSKB reg, ymm1
+VPMOVMSKB AVX2 11010111 \
+  !constraints { vex($_, m => 0x0F, l => 256, v => 0, p => 0x66); modrm($_); 
$_->{modrm}{reg} != REG_RSP && defined $_->{modrm}{reg2} }
+
 # NP 0F 50 /r: MOVMSKPS reg, xmm
 MOVMSKPS SSE  0101 \
   !constraints { modrm($_); $_->{modrm}{reg} != REG_RSP && defined 
$_->{modrm}{reg2} }
@@ -271,6 +311,10 @@ MOVMSKPS SSE  0101 \
 VMOVMSKPS AVX 0101 \
   !constraints { vex($_, m => 0x0F, l => 128, v => 0); modrm($_); 
$_->{modrm}{reg} != REG_RSP && defined $_->{modrm}{reg2} }
 
+# VEX.256.0F.WIG 50 /r: VMOVMSKPS reg, ymm2
+VMOVMSKPS AVX2 0101 \
+  !constraints { vex($_, m => 0x0F, l => 256, v => 0); modrm($_); 
$_->{modrm}{reg} != REG_RSP && defined $_->{modrm}{reg2} }
+
 # 66 0F 50 /r: MOVMSKPD reg, xmm
 MOVMSKPD SSE2  0101 \
   !constraints { data16($_); modrm($_); $_->{modrm}{reg} != REG_RSP && defined

[Qemu-devel] [RISU PATCH v3 12/18] x86.risu: add SSE2 instructions

2019-07-11 Thread Jan Bobek

Add SSE2 instructions to the x86 configuration file.

Signed-off-by: Jan Bobek 
---
 x86.risu | 734 +++
 1 file changed, 734 insertions(+)

diff --git a/x86.risu b/x86.risu
index 2d963fc..b9d424e 100644
--- a/x86.risu
+++ b/x86.risu
@@ -23,48 +23,120 @@ MOVD MMX  011 d 1110 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; !(defined 
$_->{modrm}{reg2} && $_->{modrm}{reg2} == REG_RSP) } \
   !memory { $d ? store(size => 4) : load(size => 4); }
 
+# 66 0F 6E /r: MOVD xmm,r/m32
+# 66 0F 7E /r: MOVD r/m32,xmm
+MOVD SSE2  011 d 1110 \
+  !constraints { data16($_); modrm($_); !(defined $_->{modrm}{reg2} && 
$_->{modrm}{reg2} == REG_RSP) } \
+  !memory { $d ? store(size => 4) : load(size => 4); }
+
 # NP REX.W + 0F 6E /r: MOVQ mm,r/m64
 # NP REX.W + 0F 7E /r: MOVQ r/m64,mm
 MOVQ MMX  011 d 1110 \
   !constraints { rex($_, w => 1); modrm($_); $_->{modrm}{reg} &= 0b111; 
!(defined $_->{modrm}{reg2} && $_->{modrm}{reg2} == REG_RSP) } \
   !memory { $d ? store(size => 8) : load(size => 8); }
 
+# 66 REX.W 0F 6E /r: MOVQ xmm,r/m64
+# 66 REX.W 0F 7E /r: MOVQ r/m64,xmm
+MOVQ SSE2  011 d 1110 \
+  !constraints { data16($_); rex($_, w => 1); modrm($_); !(defined 
$_->{modrm}{reg2} && $_->{modrm}{reg2} == REG_RSP) } \
+  !memory { $d ? store(size => 8) : load(size => 8); }
+
 # NP 0F 6F /r: MOVQ mm, mm/m64
 # NP 0F 7F /r: MOVQ mm/m64, mm
 MOVQ_mm MMX  011 d  \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
   !memory { $d ? store(size => 8) : load(size => 8); }
 
+# F3 0F 7E /r: MOVQ xmm1, xmm2/m64
+MOVQ_xmm1 SSE2  0110 \
+  !constraints { rep($_); modrm($_); 1 } \
+  !memory { load(size => 8); }
+
+# 66 0F D6 /r: MOVQ xmm2/m64, xmm1
+MOVQ_xmm2 SSE2  11010110 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { store(size => 8); }
+
 # NP 0F 28 /r: MOVAPS xmm1, xmm2/m128
 # NP 0F 29 /r: MOVAPS xmm2/m128, xmm1
 MOVAPS SSE  0010100 d \
   !constraints { modrm($_); 1 } \
   !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
 
+# 66 0F 28 /r: MOVAPD xmm1, xmm2/m128
+# 66 0F 29 /r: MOVAPD xmm2/m128, xmm1
+MOVAPD SSE2  0010100 d \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
+
+# 66 0F 6F /r: MOVDQA xmm1, xmm2/m128
+# 66 0F 7F /r: MOVDQA xmm2/m128, xmm1
+MOVDQA SSE2  011 d  \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
+
 # NP 0F 10 /r: MOVUPS xmm1, xmm2/m128
 # NP 0F 11 /r: MOVUPS xmm2/m128, xmm1
 MOVUPS SSE  0001000 d \
   !constraints { modrm($_); 1 } \
   !memory { $d ? store(size => 16) : load(size => 16); }
 
+# 66 0F 10 /r: MOVUPD xmm1, xmm2/m128
+# 66 0F 11 /r: MOVUPD xmm2/m128, xmm1
+MOVUPD SSE2  0001000 d \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { $d ? store(size => 16) : load(size => 16); }
+
+# F3 0F 6F /r: MOVDQU xmm1,xmm2/m128
+# F3 0F 7F /r: MOVDQU xmm2/m128,xmm1
+MOVDQU SSE2  011 d  \
+  !constraints { rep($_); modrm($_); 1 } \
+  !memory { $d ? store(size => 16) : load(size => 16); }
+
 # F3 0F 10 /r: MOVSS xmm1, xmm2/m32
 # F3 0F 11 /r: MOVSS xmm2/m32, xmm1
 MOVSS SSE  0001000 d \
   !constraints { rep($_); modrm($_); 1 } \
   !memory { $d ? store(size => 4) : load(size => 4); }
 
+# F2 0F 10 /r: MOVSD xmm1, xmm2/m64
+# F2 0F 11 /r: MOVSD xmm1/m64, xmm2
+MOVSD SSE2  0001000 d \
+  !constraints { repne($_); modrm($_); 1 } \
+  !memory { $d ? store(size => 8): load(size => 8); }
+
+# F3 0F D6 /r: MOVQ2DQ xmm, mm
+MOVQ2DQ SSE2  11010110 \
+  !constraints { rep($_); modrm($_); $_->{modrm}{reg2} &= 0b111 if defined 
$_->{modrm}{reg2}; defined $_->{modrm}{reg2} }
+
+# F2 0F D6 /r: MOVDQ2Q mm, xmm
+MOVDQ2Q SSE2  11010110 \
+  !constraints { repne($_); modrm($_); $_->{modrm}{reg} &= 0b111; defined 
$_->{modrm}{reg2} }
+
 # NP 0F 12 /r: MOVLPS xmm1, m64
 # 0F 13 /r: MOVLPS m64, xmm1
 MOVLPS SSE  0001001 d \
   !constraints { modrm($_); !defined $_->{modrm}{reg2} } \
   !memory { $d ? store(size => 8) : load(size => 8); }
 
+# 66 0F 12 /r: MOVLPD xmm1,m64
+# 66 0F 13 /r: MOVLPD m64,xmm1
+MOVLPD SSE2  0001001 d \
+  !constraints { data16($_); modrm($_); !defined $_->{modrm}{reg2} } \
+  !memory { $d ? store(size => 8) : load(size => 8); }
+
 # NP 0F 16 /r: MOVHPS xmm1, m64
 # NP 0F 17 /r: MOVHPS m64, xmm1
 MOVHPS SSE  0001011 d \
   !constraints { modrm($_); !defined $_->{modrm}{reg2} } \
   !memory { $d ? store(size => 8) : load(size => 8); }
 
+# 66 0F 16 /r: MOVHPD xmm1, m64
+# 66 0F 17 /r: MOVHPD m64, xmm1
+MOVHPD SSE2  0001011 d \
+  !constraints { data16($_); modrm($_); !defined $_->{modrm}{reg2} } \
+  !memory { $d ? store(size => 8) : load(size => 8); }
+
 # NP 0F 16 /r:

[Qemu-devel] [RISU PATCH v3 15/18] x86.risu: add SSE4.1 and SSE4.2 instructions

2019-07-11 Thread Jan Bobek

Add SSE4.1 and SSE4.2 instructions to the x86 configuration file.

Signed-off-by: Jan Bobek 
---
 x86.risu | 270 +++
 1 file changed, 270 insertions(+)

diff --git a/x86.risu b/x86.risu
index 6f89a80..bc6636e 100644
--- a/x86.risu
+++ b/x86.risu
@@ -486,6 +486,11 @@ PMULLW SSE2  11010101 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { load(size => 16, align => 16); }
 
+# 66 0F 38 40 /r: PMULLD xmm1, xmm2/m128
+PMULLD SSE4_1  00111000 0100 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F E5 /r: PMULHW mm, mm/m64
 PMULHW MMX  11100101 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
@@ -506,6 +511,11 @@ PMULHUW SSE2  11100100 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { load(size => 16, align => 16); }
 
+# 66 0F 38 28 /r: PMULDQ xmm1, xmm2/m128
+PMULDQ SSE4_1  00111000 00101000 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F F4 /r: PMULUDQ mm1, mm2/m64
 PMULUDQ_mm SSE2  0100 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
@@ -636,6 +646,21 @@ PMINUB SSE2  11011010 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { load(size => 16, align => 16); }
 
+# 66 0F 38 3A /r: PMINUW xmm1, xmm2/m128
+PMINUW SSE4_1  00111000 00111010 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 38 3B /r: PMINUD xmm1, xmm2/m128
+PMINUD SSE4_1  00111000 00111011 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 38 38 /r: PMINSB xmm1, xmm2/m128
+PMINSB SSE4_1  00111000 00111000 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F EA /r: PMINSW mm1, mm2/m64
 PMINSW SSE  11101010 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
@@ -646,6 +671,11 @@ PMINSW SSE2  11101010 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { load(size => 16, align => 16); }
 
+# 66 0F 38 39 /r: PMINSD xmm1, xmm2/m128
+PMINSD SSE4_1  00111000 00111001 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F 5D /r: MINPS xmm1, xmm2/m128
 MINPS SSE  01011101 \
   !constraints { modrm($_); 1 } \
@@ -666,6 +696,11 @@ MINSD SSE2  01011101 \
   !constraints { repne($_); modrm($_); 1 } \
   !memory { load(size => 8); }
 
+# 66 0F 38 41 /r: PHMINPOSUW xmm1, xmm2/m128
+PHMINPOSUW SSE4_1  00111000 0101 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F DE /r: PMAXUB mm1, mm2/m64
 PMAXUB SSE  1100 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
@@ -676,6 +711,21 @@ PMAXUB SSE2  1100 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { load(size => 16, align => 16); }
 
+# 66 0F 38 3E /r: PMAXUW xmm1, xmm2/m128
+PMAXUW SSE4_1  00111000 0010 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 38 3F /r: PMAXUD xmm1, xmm2/m128
+PMAXUD SSE4_1  00111000 0011 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 38 3C /r: PMAXSB xmm1, xmm2/m128
+PMAXSB SSE4_1  00111000 0000 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F EE /r: PMAXSW mm1, mm2/m64
 PMAXSW SSE  11101110 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
@@ -686,6 +736,11 @@ PMAXSW SSE2  11101110 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { load(size => 16, align => 16); }
 
+# 66 0F 38 3D /r: PMAXSD xmm1, xmm2/m128
+PMAXSD SSE4_1  00111000 0001 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F 5F /r: MAXPS xmm1, xmm2/m128
 MAXPS SSE  0101 \
   !constraints { modrm($_); 1 } \
@@ -736,6 +791,11 @@ PSADBW SSE2  0110 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { load(size => 16, align => 16); }
 
+# 66 0F 3A 42 /r ib: MPSADBW xmm1, xmm2/m128, imm8
+MPSADBW SSE4_1  00111010 0110 \
+  !constraints { data16($_); modrm($_); imm($_, width => 8); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F 38 1C /r: PABSB mm1, mm2/m64
 PABSB_mm SSSE3  00111000 00011100 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111

[Qemu-devel] [RISU PATCH v3 04/18] risugen_x86_constraints: add module

2019-07-11 Thread Jan Bobek

The module risugen_x86_constraints.pm provides environment for
evaluating x86 "!constraints" blocks. This is facilitated by the
single exported function eval_constraints_block.

Signed-off-by: Jan Bobek 
---
 risugen_x86_constraints.pm | 154 +
 1 file changed, 154 insertions(+)
 create mode 100644 risugen_x86_constraints.pm

diff --git a/risugen_x86_constraints.pm b/risugen_x86_constraints.pm
new file mode 100644
index 000..a4ee687
--- /dev/null
+++ b/risugen_x86_constraints.pm
@@ -0,0 +1,154 @@
+#!/usr/bin/perl -w
+###
+# Copyright (c) 2019 Jan Bobek
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Jan Bobek - initial implementation
+###
+
+# risugen_x86_constraints -- risugen_x86's helper module for "!constraints" 
blocks
+package risugen_x86_constraints;
+
+use strict;
+use warnings;
+
+use risugen_common;
+use risugen_x86_asm;
+
+our @ISA= qw(Exporter);
+our @EXPORT = qw(eval_constraints_block);
+
+my $is_x86_64;
+
+sub data16($%)
+{
+my ($insn, %data16) = @_;
+$insn->{data16} = \%data16;
+}
+
+sub rep($%)
+{
+my ($insn, %rep) = @_;
+$insn->{rep} = \%rep;
+}
+
+sub repne($%)
+{
+my ($insn, %repne) = @_;
+$insn->{repne} = \%repne;
+}
+
+sub rex($%)
+{
+my ($insn, %rex) = @_;
+# It doesn't make sense to randomize any REX fields, since REX.W
+# is opcode-like and REX.R/.X/.B are encoded automatically by
+# risugen_x86_asm.
+$insn->{rex} = \%rex;
+}
+
+sub vex($%)
+{
+my ($insn, %vex) = @_;
+my $regidw = $is_x86_64 ? 4 : 3;
+
+# There is no point in randomizing other VEX fields, since
+# VEX.R/.X/.B are encoded automatically by risugen_x86_asm, and
+# VEX.M/.P are opcodes.
+$vex{l} = randint(width => 1) ? 256 : 128 unless defined $vex{l};
+$vex{v} = randint(width => $regidw)   unless defined $vex{v};
+$vex{w} = randint(width => 1) unless defined $vex{w};
+$insn->{vex} = \%vex;
+}
+
+sub modrm_($%)
+{
+my ($insn, %args) = @_;
+my $regidw = $is_x86_64 ? 4 : 3;
+
+my %modrm = ();
+if (defined $args{reg}) {
+# This makes the config file syntax a bit more accommodating
+# in cases where MODRM.REG is an opcode extension field.
+$modrm{reg} = $args{reg};
+} else {
+$modrm{reg} = randint(width => $regidw);
+}
+
+# There is also a displacement-only form, but we don't know
+# absolute address of the memblock, so we cannot test it.
+my $form = int(rand(4));
+if ($form == 0) {
+$modrm{reg2} = randint(width => $regidw);
+} else {
+$modrm{base} = randint(width => $regidw);
+
+if ($form == 2) {
+$modrm{base}= randint(width => $regidw);
+$modrm{disp}{value} = randint(width => 8, signed => 1);
+$modrm{disp}{width} = 8;
+} elsif ($form == 3) {
+$modrm{base}= randint(width => $regidw);
+$modrm{disp}{value} = randint(width => 32, signed => 1);
+$modrm{disp}{width} = 32;
+}
+
+my $have_index = int(rand(2));
+if ($have_index) {
+my $indexk  = $args{indexk};
+$modrm{ss}  = randint(width => 2);
+$modrm{$indexk} = randint(width => $regidw);
+}
+}
+
+$insn->{modrm} = \%modrm;
+}
+
+sub modrm($%)
+{
+my ($insn, %args) = @_;
+modrm_($insn, indexk => 'index', %args);
+}
+
+sub modrm_vsib($%)
+{
+my ($insn, %args) = @_;
+modrm_($insn, indexk => 'vindex', %args);
+}
+
+sub imm($%)
+{
+my ($insn, %args) = @_;
+$insn->{imm}{value} = randint(%args);
+$insn->{imm}{width} = $args{width};
+}
+
+sub eval_constraints_block(%)
+{
+my (%args) = @_;
+my $rec = $args{rec};
+my $insn = $args{insn};
+my $insnname = $rec->{name};
+my $opcode = $insn->{opcode}{value};
+
+$is_x86_64 = $args{is_x86_64};
+
+my $constraint = $rec->{blocks}{"constraints"};
+if (defined $constraint) {
+# user-specified constraint: evaluate in an environment
+# with variables set corresponding to the variable fields.
+my %env = extract_fields($opcode, $rec);
+# set the variable $_ to the instruction in question
+$env{_} = $insn;
+
+return eval_block($insnname, "constraints", $constraint, \%env);
+} else {
+return 1;
+}
+}
+
+1;
-- 
2.20.1

[Qemu-devel] [RISU PATCH v3 02/18] risugen_common: split eval_with_fields into extract_fields and eval_block

2019-07-11 Thread Jan Bobek

extract_fields can extract named variable fields from an opcode; it
returns a hash which can be then passed as environment parameter to
eval_block. More importantly, this allows the caller to augment the
block environment with more variables, if they wish to do so.

Signed-off-by: Jan Bobek 
---
 risugen_arm.pm|  6 +++--
 risugen_common.pm | 64 ---
 risugen_m68k.pm   |  3 ++-
 risugen_ppc64.pm  |  6 +++--
 4 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/risugen_arm.pm b/risugen_arm.pm
index 8d423b1..23a468c 100644
--- a/risugen_arm.pm
+++ b/risugen_arm.pm
@@ -992,7 +992,8 @@ sub gen_one_insn($$)
 if (defined $constraint) {
 # user-specified constraint: evaluate in an environment
 # with variables set corresponding to the variable fields.
-my $v = eval_with_fields($insnname, $insn, $rec, "constraints", 
$constraint);
+my %env = extract_fields($insn, $rec);
+my $v = eval_block($insnname, "constraints", $constraint, \%env);
 if (!$v) {
 $constraintfailures++;
 if ($constraintfailures > 1) {
@@ -1020,7 +1021,8 @@ sub gen_one_insn($$)
 } else {
 align(4);
 }
-$basereg = eval_with_fields($insnname, $insn, $rec, "memory", 
$memblock);
+my %env = extract_fields($insn, $rec);
+$basereg = eval_block($insnname, "memory", $memblock, \%env);
 
 if ($is_aarch64) {
 data_barrier();
diff --git a/risugen_common.pm b/risugen_common.pm
index d63250a..3f927ef 100644
--- a/risugen_common.pm
+++ b/risugen_common.pm
@@ -25,8 +25,8 @@ BEGIN {
 our @ISA = qw(Exporter);
 our @EXPORT = qw(open_bin close_bin set_endian insn32 insn16
$bytecount insnv randint progress_start
-   progress_update progress_end
-   eval_with_fields is_pow_of_2 sextract ctz
+   progress_update progress_end extract_fields
+   eval_block is_pow_of_2 sextract ctz
dump_insn_details);
 }
 
@@ -138,36 +138,48 @@ sub progress_end()
 $| = 0;
 }
 
-sub eval_with_fields($) {
-# Evaluate the given block in an environment with Perl variables
-# set corresponding to the variable fields for the insn.
-# Return the result of the eval; we die with a useful error
-# message in case of syntax error.
-#
-# At the moment we just evaluate the string in the environment
-# of the calling package.
-# What we *ought* to do here is to give the config snippets
-# their own package, and explicitly import into it only the
-# functions that we want to be accessible to the config.
-# That would provide better separation and an explicitly set up
-# environment that doesn't allow config file code to accidentally
-# change state it shouldn't have access to, and avoid the need to
-# use 'caller' to get the package name of our calling function.
-my ($insnname, $insn, $rec, $blockname, $block) = @_;
+sub extract_fields($$)
+{
+my ($insn, $rec) = @_;
+
+my %fields = ();
+for my $tuple (@{ $rec->{fields} }) {
+my ($var, $pos, $mask) = @$tuple;
+$fields{$var} = ($insn >> $pos) & $mask;
+}
+return %fields;
+}
+
+# Evaluate the given block in an environment with Perl variables set
+# corresponding to env. Return the result of the eval; we die with a
+# useful error message in case of syntax error.
+#
+# At the moment we just evaluate the string in the environment of the
+# calling package. What we *ought* to do here is to give the config
+# snippets their own package, and explicitly import into it only the
+# functions that we want to be accessible to the config.  That would
+# provide better separation and an explicitly set up environment that
+# doesn't allow config file code to accidentally change state it
+# shouldn't have access to, and avoid the need to use 'caller' to get
+# the package name of our calling function.
+sub eval_block()
+{
+my ($insnname, $blockname, $block, $env) = @_;
+
 my $calling_package = caller;
 my $evalstr = "{ package $calling_package; ";
-for my $tuple (@{ $rec->{fields} }) {
-my ($var, $pos, $mask) = @$tuple;
-my $val = ($insn >> $pos) & $mask;
-$evalstr .= "my (\$$var) = $val; ";
+for (keys %{$env}) {
+$evalstr .= "my " unless $_ eq '_';
+$evalstr .= "(\$$_) = \$env->{$_}; ";
 }
 $evalstr .= $block;
 $evalstr .= "}";
+
 my $v = eval $evalstr;
-if ($@) {
-print "Syntax error detected evaluating $insnname $blockname 
string:\n$block\n$@";
-exit(1);
-}
+die "Syntax error detected evaluating $insnname $blockname string:\n"
+. "$block\n"
+. "$@"
+if ($@);
 return $v;
 }
 
diff --git a/risugen_m68k.pm b/risugen_m68k.pm
index 7d62b13..8c812b5

[Qemu-devel] [RISU PATCH v3 03/18] risugen_x86_asm: add module

2019-07-11 Thread Jan Bobek

The module risugen_x86_asm.pm exports named register constants and
asm_insn_* family of functions, which greatly simplify emission of x86
instructions.

Signed-off-by: Jan Bobek 
---
 risugen_x86_asm.pm | 918 +
 1 file changed, 918 insertions(+)
 create mode 100644 risugen_x86_asm.pm

diff --git a/risugen_x86_asm.pm b/risugen_x86_asm.pm
new file mode 100644
index 000..642f18b
--- /dev/null
+++ b/risugen_x86_asm.pm
@@ -0,0 +1,918 @@
+#!/usr/bin/perl -w
+###
+# Copyright (c) 2019 Jan Bobek
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Jan Bobek - initial implementation
+###
+
+# risugen_x86_asm -- risugen_x86's helper module for x86 assembly
+package risugen_x86_asm;
+
+use strict;
+use warnings;
+
+use risugen_common;
+
+our @ISA= qw(Exporter);
+our @EXPORT = qw(
+asm_insn asm_insn_ud1 asm_insn_xor asm_insn_sahf asm_insn_lea64
+asm_insn_call asm_insn_jmp asm_insn_pop asm_insn_mov
+asm_insn_mov64 asm_insn_movq asm_insn_add asm_insn_add64
+asm_insn_and asm_insn_and64 asm_insn_neg asm_insn_neg64
+asm_insn_xchg asm_insn_xchg64 asm_insn_movaps asm_insn_vmovaps
+asm_insn_movdqu asm_insn_vmovdqu
+REG_RAX REG_RCX REG_RDX REG_RBX REG_RSP REG_RBP REG_RSI REG_RDI
+REG_R8 REG_R9 REG_R10 REG_R11 REG_R12 REG_R13 REG_R14 REG_R15
+);
+
+use constant {
+VEX_L_128 => 0,
+VEX_L_256 => 1,
+
+VEX_P_NONE   => 0b00,
+VEX_P_DATA16 => 0b01,
+VEX_P_REP=> 0b10,
+VEX_P_REPNE  => 0b11,
+
+VEX_M_0F   => 0b1,
+VEX_M_0F38 => 0b00010,
+VEX_M_0F3A => 0b00011,
+
+VEX_V_UNUSED => 0b,
+
+REG_RAX => 0,
+REG_RCX => 1,
+REG_RDX => 2,
+REG_RBX => 3,
+REG_RSP => 4,
+REG_RBP => 5,
+REG_RSI => 6,
+REG_RDI => 7,
+REG_R8  => 8,
+REG_R9  => 9,
+REG_R10 => 10,
+REG_R11 => 11,
+REG_R12 => 12,
+REG_R13 => 13,
+REG_R14 => 14,
+REG_R15 => 15,
+
+MOD_INDIRECT=> 0b00,
+MOD_INDIRECT_DISP8  => 0b01,
+MOD_INDIRECT_DISP32 => 0b10,
+MOD_DIRECT  => 0b11,
+};
+
+sub write_insn_repne(%)
+{
+insnv(value => 0xF2, width => 8);
+}
+
+sub write_insn_rep(%)
+{
+insnv(value => 0xF3, width => 8);
+}
+
+sub write_insn_data16(%)
+{
+insnv(value => 0x66, width => 8);
+}
+
+sub write_insn_rex(%)
+{
+my (%args) = @_;
+
+my $rex = 0x40;
+$rex |= (defined $args{w} && $args{w}) << 3;
+$rex |= (defined $args{r} && $args{r}) << 2;
+$rex |= (defined $args{x} && $args{x}) << 1;
+$rex |= (defined $args{b} && $args{b}) << 0;
+insnv(value => $rex, width => 8);
+}
+
+sub write_insn_vex(%)
+{
+my (%args) = @_;
+
+$args{r} = 1unless defined $args{r};
+$args{x} = 1unless defined $args{x};
+$args{b} = 1unless defined $args{b};
+$args{w} = 0unless defined $args{w};
+$args{m} = VEX_M_0F unless defined $args{m};
+$args{v} = VEX_V_UNUSED unless defined $args{v};
+$args{p} = VEX_P_NONE   unless defined $args{p};
+
+# The Intel manual implies that 2-byte VEX prefix is equivalent to
+# VEX.X = 1, VEX.B = 1, VEX.W = 0 and VEX.M = VEX_M_0F.
+if ($args{x} && $args{b} && !$args{w} && $args{m} == VEX_M_0F) {
+# We can use the 2-byte VEX prefix
+my $vex = 0xC5 << 8;
+$vex |= ($args{r} & 0b1)<< 7;
+$vex |= ($args{v} & 0b) << 3;
+$vex |= ($args{l} & 0b1)<< 2;
+$vex |= ($args{p} & 0b11)   << 0;
+insnv(value => $vex, width => 16);
+} else {
+# We have to use the 3-byte VEX prefix
+my $vex = 0xC4 << 16;
+$vex |= ($args{r} & 0b1) << 15;
+$vex |= ($args{x} & 0b1) << 14;
+$vex |= ($args{b} & 0b1) << 13;
+$vex |= ($args{m} & 0b1) << 8;
+$vex |= ($args{w} & 0b1) << 7;
+$vex |= ($args{v} & 0b)  << 3;
+$vex |= ($args{l} & 0b1) << 2;
+$vex |= ($args{p} & 0b11)<< 0;
+insnv(value => $vex, width => 24);
+}
+}
+
+sub write_insn_modrm(%)
+{
+my (%args) = @_;
+
+my $modrm = 0;
+$modrm |= ($args{mod} & 0b11)  << 6;
+$modrm |= ($args{reg} & 0b111) << 3;
+$modrm |= ($args{rm}  & 0b111) << 0;
+insnv(value => $modrm, width => 8);
+}
+
+sub write_insn_sib(%)
+{
+my (%args) = @_;
+
+my $sib = 0;
+$sib |= ($args{ss}& 0b11)  << 6;
+$sib |= ($args{index} & 0b111) << 3;
+$sib |= ($args{base}  & 0b111) << 0;
+insnv(value => $sib, width => 8);
+}
+
+sub write_insn(%)
+{
+my (%insn) = @_;
+
+my @tokens;
+push @tokens, "EVEX"   if defined

[Qemu-devel] [RISU PATCH v3 07/18] risugen: allow all byte-aligned instructions

2019-07-11 Thread Jan Bobek

Accept all instructions whose bit length is divisible by 8. Note that
the maximum instruction length (as specified in the config file) is 32
bits, hence this change permits instructions which are 8 bits or 24
bits long (16-bit instructions have already been considered valid).

Note that while valid x86 instructions may be up to 15 bytes long, the
length constraint described above only applies to the main opcode
field, which is usually only 1 or 2 bytes long. Therefore, the primary
purpose of this change is to allow 1-byte x86 opcodes.

Reviewed-by: Richard Henderson 
Signed-off-by: Jan Bobek 
---
 risugen | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/risugen b/risugen
index e690b18..0c859aa 100755
--- a/risugen
+++ b/risugen
@@ -229,12 +229,11 @@ sub parse_config_file($)
 push @fields, [ $var, $bitpos, $bitmask ];
 }
 }
-if ($bitpos == 16) {
-# assume this is a half-width thumb instruction
+if ($bitpos % 8 == 0) {
 # Note that we don't fiddle with the bitmasks or positions,
 # which means the generated insn will be in the high halfword!
-$insnwidth = 16;
-} elsif ($bitpos != 0) {
+$insnwidth -= $bitpos;
+} else {
 print STDERR "$file:$.: ($insn $enc) not enough bits specified\n";
 exit(1);
 }
-- 
2.20.1

[Qemu-devel] [RISU PATCH v3 01/18] risugen_common: add helper functions insnv, randint

2019-07-11 Thread Jan Bobek

insnv allows emitting variable-length instructions in little-endian or
big-endian byte order; it subsumes functionality of former insn16()
and insn32() functions.

randint can reliably generate signed or unsigned integers of arbitrary
width.

Signed-off-by: Jan Bobek 
---
 risugen_common.pm | 55 +--
 1 file changed, 48 insertions(+), 7 deletions(-)

diff --git a/risugen_common.pm b/risugen_common.pm
index 71ee996..d63250a 100644
--- a/risugen_common.pm
+++ b/risugen_common.pm
@@ -23,8 +23,9 @@ BEGIN {
 require Exporter;
 
 our @ISA = qw(Exporter);
-our @EXPORT = qw(open_bin close_bin set_endian insn32 insn16 $bytecount
-   progress_start progress_update progress_end
+our @EXPORT = qw(open_bin close_bin set_endian insn32 insn16
+   $bytecount insnv randint progress_start
+   progress_update progress_end
eval_with_fields is_pow_of_2 sextract ctz
dump_insn_details);
 }
@@ -37,7 +38,7 @@ my $bigendian = 0;
 # (default is little endian, 0).
 sub set_endian
 {
-$bigendian = @_;
+($bigendian) = @_;
 }
 
 sub open_bin
@@ -52,18 +53,58 @@ sub close_bin
 close(BIN) or die "can't close output file: $!";
 }
 
+sub insnv(%)
+{
+my (%args) = @_;
+
+# Default to big-endian order, so that the instruction bytes are
+# emitted in the same order as they are written in the
+# configuration file.
+$args{bigendian} = 1 unless defined $args{bigendian};
+
+for (my $bitcur = 0; $bitcur < $args{width}; $bitcur += 8) {
+my $value = $args{value} >> ($args{bigendian}
+ ? $args{width} - $bitcur - 8
+ : $bitcur);
+
+print BIN pack("C", $value & 0xff);
+$bytecount += 1;
+}
+}
+
 sub insn32($)
 {
 my ($insn) = @_;
-print BIN pack($bigendian ? "N" : "V", $insn);
-$bytecount += 4;
+insnv(value => $insn, width => 32, bigendian => $bigendian);
 }
 
 sub insn16($)
 {
 my ($insn) = @_;
-print BIN pack($bigendian ? "n" : "v", $insn);
-$bytecount += 2;
+insnv(value => $insn, width => 16, bigendian => $bigendian);
+}
+
+sub randint
+{
+my (%args) = @_;
+my $width = $args{width};
+
+if ($width > 32) {
+# Generate at most 32 bits at once; Perl's rand() does not
+# behave well with ranges that are too large.
+my $lower = randint(%args, width => 32);
+my $upper = randint(%args, width => $args{width} - 32);
+# Use arithmetic rather than bitwise operators, since bitwise
+# ops turn signed integers into unsigned.
+return $upper * (1 << 32) + $lower;
+} elsif ($width > 0) {
+my $halfrange = 1 << ($width - 1);
+my $value = int(rand(2 * $halfrange));
+$value -= $halfrange if defined $args{signed} && $args{signed};
+return $value;
+} else {
+return 0;
+}
 }
 
 # Progress bar implementation
-- 
2.20.1

[Qemu-devel] [RISU PATCH v3 14/18] x86.risu: add SSSE3 instructions

2019-07-11 Thread Jan Bobek

Add SSSE3 instructions to the x86 configuration file.

Signed-off-by: Jan Bobek 
---
 x86.risu | 160 +++
 1 file changed, 160 insertions(+)

diff --git a/x86.risu b/x86.risu
index d40b9df..6f89a80 100644
--- a/x86.risu
+++ b/x86.risu
@@ -286,6 +286,36 @@ ADDSD SSE2  01011000 \
   !constraints { repne($_); modrm($_); 1 } \
   !memory { load(size => 8); }
 
+# NP 0F 38 01 /r: PHADDW mm1, mm2/m64
+PHADDW_mm SSSE3  00111000 0001 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# 66 0F 38 01 /r: PHADDW xmm1, xmm2/m128
+PHADDW SSSE3  00111000 0001 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# NP 0F 38 02 /r: PHADDD mm1, mm2/m64
+PHADDD_mm SSSE3  00111000 0010 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# 66 0F 38 02 /r: PHADDD xmm1, xmm2/m128
+PHADDD SSSE3  00111000 0010 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# NP 0F 38 03 /r: PHADDSW mm1, mm2/m64
+PHADDSW_mm SSSE3  00111000 0011 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# 66 0F 38 03 /r: PHADDSW xmm1, xmm2/m128
+PHADDSW SSSE3  00111000 0011 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # F2 0F 7C /r: HADDPS xmm1, xmm2/m128
 HADDPS SSE3  0100 \
   !constraints { repne($_); modrm($_); 1 } \
@@ -396,6 +426,36 @@ SUBSD SSE2  01011100 \
   !constraints { repne($_); modrm($_); 1 } \
   !memory { load(size => 8); }
 
+# NP 0F 38 05 /r: PHSUBW mm1, mm2/m64
+PHSUBW_mm SSSE3  00111000 0101 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# 66 0F 38 05 /r: PHSUBW xmm1, xmm2/m128
+PHSUBW SSSE3  00111000 0101 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# NP 0F 38 06 /r: PHSUBD mm1, mm2/m64
+PHSUBD_mm SSSE3  00111000 0110 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# 66 0F 38 06 /r: PHSUBD xmm1, xmm2/m128
+PHSUBD SSSE3  00111000 0110 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# NP 0F 38 07 /r: PHSUBSW mm1, mm2/m64
+PHSUBSW_mm SSSE3  00111000 0111 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# 66 0F 38 07 /r: PHSUBSW xmm1, xmm2/m128
+PHSUBSW SSSE3  00111000 0111 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # F2 0F 7D /r: HSUBPS xmm1, xmm2/m128
 HSUBPS SSE3  0101 \
   !constraints { repne($_); modrm($_); 1 } \
@@ -456,6 +516,16 @@ PMULUDQ SSE2  0100 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { load(size => 16, align => 16); }
 
+# NP 0F 38 0B /r: PMULHRSW mm1, mm2/m64
+PMULHRSW_mm SSSE3  00111000 1011 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# 66 0F 38 0B /r: PMULHRSW xmm1, xmm2/m128
+PMULHRSW SSSE3  00111000 1011 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F 59 /r: MULPS xmm1, xmm2/m128
 MULPS SSE  01011001 \
   !constraints { modrm($_); 1 } \
@@ -486,6 +556,16 @@ PMADDWD SSE2  0101 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { load(size => 16, align => 16); }
 
+# NP 0F 38 04 /r: PMADDUBSW mm1, mm2/m64
+PMADDUBSW_mm SSSE3  00111000 0100 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# 66 0F 38 04 /r: PMADDUBSW xmm1, xmm2/m128
+PMADDUBSW SSSE3  00111000 0100 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F 5E /r: DIVPS xmm1, xmm2/m128
 DIVPS SSE  0100 \
   !constraints { modrm($_); 1 } \
@@ -656,6 +736,66 @@ PSADBW SSE2  0110 \
   !constraints { data16($_); modrm($_); 1 } \
   !memory { load(size => 16, align => 16); }
 
+# NP 0F 38 1C /r: PABSB mm1, mm2/m64
+PABSB_mm SSSE3  00111000 00011100 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined

[Qemu-devel] [RISU PATCH v3 00/18] Support for generating x86 SIMD test images

2019-07-11 Thread Jan Bobek

This is v3 of the patch series posted in [1] and [2]. Note that this
is the first fully-featured patch series implementing all desired
functionality, including (V)LDMXCSR and VSIB-based instructions like
VGATHER*.

While implementing the last bits required in order to support VGATHERx
instructions, I ran into problems which required a larger redesign;
namely, there are no more !emit blocks as their functionality is now
implemented in regular !constraints blocks. Also, memory constraints
are specified in !memory blocks, similarly to other architectures.

I tested these changes on my machine; both master and slave modes work
in both 32-bit and 64-bit modes.

Cheers,
 -Jan

Changes since v2:
  Too many to be listed individually; this patch series might be
  better reviewed on its own.

References:
  1. https://lists.nongnu.org/archive/html/qemu-devel/2019-06/msg04123.html
  2. https://lists.nongnu.org/archive/html/qemu-devel/2019-07/msg1.html

Jan Bobek (18):
  risugen_common: add helper functions insnv, randint
  risugen_common: split eval_with_fields into extract_fields and
eval_block
  risugen_x86_asm: add module
  risugen_x86_constraints: add module
  risugen_x86_memory: add module
  risugen_x86: add module
  risugen: allow all byte-aligned instructions
  risugen: add command-line flag --x86_64
  risugen: add --xfeatures option for x86
  x86.risu: add MMX instructions
  x86.risu: add SSE instructions
  x86.risu: add SSE2 instructions
  x86.risu: add SSE3 instructions
  x86.risu: add SSSE3 instructions
  x86.risu: add SSE4.1 and SSE4.2 instructions
  x86.risu: add AES and PCLMULQDQ instructions
  x86.risu: add AVX instructions
  x86.risu: add AVX2 instructions

 risugen|   27 +-
 risugen_arm.pm |6 +-
 risugen_common.pm  |  117 +-
 risugen_m68k.pm|3 +-
 risugen_ppc64.pm   |6 +-
 risugen_x86.pm |  518 +
 risugen_x86_asm.pm |  918 
 risugen_x86_constraints.pm |  154 ++
 risugen_x86_memory.pm  |   87 +
 x86.risu   | 4499 
 10 files changed, 6293 insertions(+), 42 deletions(-)
 create mode 100644 risugen_x86.pm
 create mode 100644 risugen_x86_asm.pm
 create mode 100644 risugen_x86_constraints.pm
 create mode 100644 risugen_x86_memory.pm
 create mode 100644 x86.risu

-- 
2.20.1

[Qemu-devel] [RISU PATCH v3 05/18] risugen_x86_memory: add module

2019-07-11 Thread Jan Bobek

The module risugen_x86_memory.pm provides environment for evaluating
x86 "!memory" blocks. This is facilitated by the single exported
function eval_memory_block.

Signed-off-by: Jan Bobek 
---
 risugen_x86_memory.pm | 87 +++
 1 file changed, 87 insertions(+)
 create mode 100644 risugen_x86_memory.pm

diff --git a/risugen_x86_memory.pm b/risugen_x86_memory.pm
new file mode 100644
index 000..6aa6877
--- /dev/null
+++ b/risugen_x86_memory.pm
@@ -0,0 +1,87 @@
+#!/usr/bin/perl -w
+###
+# Copyright (c) 2019 Jan Bobek
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Jan Bobek - initial implementation
+###
+
+# risugen_x86_memory -- risugen_x86's helper module for "!memory" blocks
+package risugen_x86_memory;
+
+use strict;
+use warnings;
+
+use risugen_common;
+use risugen_x86_asm;
+
+our @ISA= qw(Exporter);
+our @EXPORT = qw(eval_memory_block);
+
+my %memory_opts;
+
+sub load(%)
+{
+my (%args) = @_;
+
+@memory_opts{keys %args} = values %args;
+$memory_opts{is_write}   = 0;
+}
+
+sub store(%)
+{
+my (%args) = @_;
+
+@memory_opts{keys %args} = values %args;
+$memory_opts{is_write}   = 1;
+}
+
+sub eval_memory_block(%)
+{
+my (%args) = @_;
+my $rec = $args{rec};
+my $insn = $args{insn};
+my $insnname = $rec->{name};
+my $opcode = $insn->{opcode}{value};
+
+# Setup reasonable defaults
+%memory_opts   = ();
+$memory_opts{size} = 0;
+$memory_opts{align}= 1;
+$memory_opts{disp} = 0;
+$memory_opts{ss}   = 0;
+$memory_opts{value}= 0;
+$memory_opts{mask} = 0;
+$memory_opts{rollback} = 0;
+$memory_opts{is_write} = 0;
+
+if (defined $insn->{modrm}) {
+my $modrm = $insn->{modrm};
+
+$memory_opts{ss} = $modrm->{ss}  if defined $modrm->{ss};
+$memory_opts{index}  = $modrm->{index}   if defined 
$modrm->{index};
+$memory_opts{vindex} = $modrm->{vindex}  if defined 
$modrm->{vindex};
+$memory_opts{base}   = $modrm->{base}if defined $modrm->{base};
+$memory_opts{disp}   = $modrm->{disp}{value} if defined $modrm->{disp};
+
+$memory_opts{rollback} = defined $modrm->{base};
+}
+
+my $memory = $rec->{blocks}{"memory"};
+if (defined $memory) {
+# Evaluate in an environment with variables set corresponding
+# to the variable fields.
+my %env = extract_fields($opcode, $rec);
+# set the variable $_ to the instruction in question
+$env{_} = $insn;
+
+eval_block($insnname, "memory", $memory, \%env);
+}
+return %memory_opts;
+}
+
+1;
-- 
2.20.1

[Qemu-devel] [RISU PATCH v3 06/18] risugen_x86: add module

2019-07-11 Thread Jan Bobek

risugen_x86.pm is the main backend module for Intel i386 and x86_64
architectures; it orchestrates generation of the test code with
support from the rest of risugen_x86_* modules.

Signed-off-by: Jan Bobek 
---
 risugen_x86.pm | 518 +
 1 file changed, 518 insertions(+)
 create mode 100644 risugen_x86.pm

diff --git a/risugen_x86.pm b/risugen_x86.pm
new file mode 100644
index 000..ae11843
--- /dev/null
+++ b/risugen_x86.pm
@@ -0,0 +1,518 @@
+#!/usr/bin/perl -w
+###
+# Copyright (c) 2019 Jan Bobek
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Jan Bobek - initial implementation
+###
+
+# risugen_x86 -- risugen module for Intel i386/x86_64 architectures
+package risugen_x86;
+
+use strict;
+use warnings;
+
+use risugen_common;
+use risugen_x86_asm;
+use risugen_x86_constraints;
+use risugen_x86_memory;
+
+require Exporter;
+
+our @ISA= qw(Exporter);
+our @EXPORT = qw(write_test_code);
+
+use constant {
+RISUOP_COMPARE => 0,# compare registers
+RISUOP_TESTEND => 1,# end of test, stop
+RISUOP_SETMEMBLOCK => 2,# eax is address of memory block (8192 
bytes)
+RISUOP_GETMEMBLOCK => 3,# add the address of memory block to eax
+RISUOP_COMPAREMEM  => 4,# compare memory block
+
+# Maximum alignment restriction permitted for a memory op.
+MAXALIGN => 64,
+MEMBLOCK_LEN => 8192,
+};
+
+my $periodic_reg_random = 1;
+my $is_x86_64 = 0;
+
+sub wrap_int32($)
+{
+my ($x) = @_;
+my $r = 1 << 31;
+return ($x + $r) % (2 * $r) - $r;
+}
+
+sub asm_insn_risuop($)
+{
+my ($op) = @_;
+asm_insn_ud1(reg => REG_RAX, reg2 => $op);
+}
+
+sub asm_insn_movT(%)
+{
+my (%args) = @_;
+
+if ($is_x86_64) {
+asm_insn_mov64(%args);
+} else {
+asm_insn_mov(%args);
+}
+}
+
+sub asm_insn_movT_imm(%)
+{
+my (%args) = @_;
+my $imm = $args{imm}; delete $args{imm};
+
+my $is_sint32 = (-0x8000 <= $imm && $imm <= 0x7fff);
+my $is_uint32 = (0 <= $imm && $imm <= 0x);
+
+$args{$is_sint32 || $is_uint32 ? 'imm32' : 'imm64'} = $imm;
+asm_insn_movT(%args);
+}
+
+sub asm_insn_addT(%)
+{
+my (%args) = @_;
+
+if ($is_x86_64) {
+asm_insn_add64(%args);
+} else {
+asm_insn_add(%args);
+}
+}
+
+sub asm_insn_negT(%)
+{
+my (%args) = @_;
+
+if ($is_x86_64) {
+asm_insn_neg64(%args);
+} else {
+asm_insn_neg(%args);
+}
+}
+
+sub asm_insn_xchgT(%)
+{
+my (%args) = @_;
+
+if ($is_x86_64) {
+asm_insn_xchg64(%args);
+} else {
+asm_insn_xchg(%args);
+}
+}
+
+sub write_random_regdata()
+{
+my $reg_cnt = $is_x86_64 ? 16 : 8;
+my $reg_width = $is_x86_64 ? 64 : 32;
+
+# initialize flags register
+asm_insn_xor(reg => REG_RAX, reg2 => REG_RAX);
+asm_insn_sahf();
+
+# general purpose registers
+for (my $reg = 0; $reg < $reg_cnt; $reg++) {
+if ($reg != REG_RSP) {
+my $imm = randint(width => $reg_width, signed => 1);
+asm_insn_movT_imm(reg => $reg, imm => $imm);
+}
+}
+}
+
+# At the end of this function, we can emit $datalen data-bytes which
+# will be skipped over at runtime, but whose address will be present
+# in EAX and optionally aligned.
+sub prepare_datablock(%)
+{
+my (%args) = @_;
+$args{align} = 0 unless defined $args{align} && $args{align} > 1;
+
+# First, load current EIP/RIP into EAX/RAX. Easy to do on x86_64
+# thanks to RIP-relative addressing, but on i386 we need to play
+# some well-known tricks with the CALL instruction. Then, AND the
+# EAX/RAX register with correct mask to obtain the aligned
+# address.
+my $reg = REG_RAX;
+
+if ($is_x86_64) {
+my $disp32 = 5; # 5-byte JMP
+$disp32 += 4 + ($args{align} - 1) if $args{align}; # 4-byte AND
+
+asm_insn_lea64(reg => $reg, disp32 => $disp32);
+asm_insn_and64(reg2 => $reg, imm8 => ~($args{align} - 1))
+if $args{align};
+} else {
+my $imm8 = 1 + 3 + 5;   # 1-byte POP + 3-byte ADD + 5-byte JMP
+$imm8 += 3 + ($args{align} - 1) if $args{align}; # 3-byte AND
+
+# displacement = next instruction
+asm_insn_call(imm32 => 0x);
+asm_insn_pop(reg => $reg);
+asm_insn_add(reg2 => $reg, imm8 => $imm8);
+asm_insn_and(reg2 => $reg, imm8 => ~($args{align} - 1))
+if $args{align};
+}
+
+# JMP over the data blob.
+asm_insn_jmp(imm32 => $args{datalen});
+}
+
+# Write a block of random data, $datalen bytes long,

[Qemu-devel] [RISU PATCH v3 11/18] x86.risu: add SSE instructions

2019-07-11 Thread Jan Bobek

Add SSE instructions to the x86 configuration file.

Signed-off-by: Jan Bobek 
---
 x86.risu | 318 +++
 1 file changed, 318 insertions(+)

diff --git a/x86.risu b/x86.risu
index 208ac16..2d963fc 100644
--- a/x86.risu
+++ b/x86.risu
@@ -35,6 +35,52 @@ MOVQ_mm MMX  011 d  \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
   !memory { $d ? store(size => 8) : load(size => 8); }
 
+# NP 0F 28 /r: MOVAPS xmm1, xmm2/m128
+# NP 0F 29 /r: MOVAPS xmm2/m128, xmm1
+MOVAPS SSE  0010100 d \
+  !constraints { modrm($_); 1 } \
+  !memory { $d ? store(size => 16, align => 16) : load(size => 16, align => 
16); }
+
+# NP 0F 10 /r: MOVUPS xmm1, xmm2/m128
+# NP 0F 11 /r: MOVUPS xmm2/m128, xmm1
+MOVUPS SSE  0001000 d \
+  !constraints { modrm($_); 1 } \
+  !memory { $d ? store(size => 16) : load(size => 16); }
+
+# F3 0F 10 /r: MOVSS xmm1, xmm2/m32
+# F3 0F 11 /r: MOVSS xmm2/m32, xmm1
+MOVSS SSE  0001000 d \
+  !constraints { rep($_); modrm($_); 1 } \
+  !memory { $d ? store(size => 4) : load(size => 4); }
+
+# NP 0F 12 /r: MOVLPS xmm1, m64
+# 0F 13 /r: MOVLPS m64, xmm1
+MOVLPS SSE  0001001 d \
+  !constraints { modrm($_); !defined $_->{modrm}{reg2} } \
+  !memory { $d ? store(size => 8) : load(size => 8); }
+
+# NP 0F 16 /r: MOVHPS xmm1, m64
+# NP 0F 17 /r: MOVHPS m64, xmm1
+MOVHPS SSE  0001011 d \
+  !constraints { modrm($_); !defined $_->{modrm}{reg2} } \
+  !memory { $d ? store(size => 8) : load(size => 8); }
+
+# NP 0F 16 /r: MOVLHPS xmm1, xmm2
+MOVLHPS SSE  00010110 \
+  !constraints { modrm($_); defined $_->{modrm}{reg2} }
+
+# NP 0F 12 /r: MOVHLPS xmm1, xmm2
+MOVHLPS SSE  00010010 \
+  !constraints { modrm($_); defined $_->{modrm}{reg2} }
+
+# NP 0F D7 /r: PMOVMSKB reg, mm
+PMOVMSKB SSE  11010111 \
+  !constraints { modrm($_); $_->{modrm}{reg2} &= 0b111 if defined 
$_->{modrm}{reg2}; $_->{modrm}{reg} != REG_RSP && defined $_->{modrm}{reg2} }
+
+# NP 0F 50 /r: MOVMSKPS reg, xmm
+MOVMSKPS SSE  0101 \
+  !constraints { modrm($_); $_->{modrm}{reg} != REG_RSP && defined 
$_->{modrm}{reg2} }
+
 #
 # Arithmetic Instructions
 # ---
@@ -75,6 +121,16 @@ PADDUSW MMX  11011101 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
   !memory { load(size => 8); }
 
+# NP 0F 58 /r: ADDPS xmm1, xmm2/m128
+ADDPS SSE  01011000 \
+  !constraints { modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# F3 0F 58 /r: ADDSS xmm1, xmm2/m32
+ADDSS SSE  01011000 \
+  !constraints { rep($_); modrm($_); 1 } \
+  !memory { load(size => 4); }
+
 # NP 0F F8 /r: PSUBB mm, mm/m64
 PSUBB MMX  1000 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
@@ -110,6 +166,16 @@ PSUBUSW MMX  11011001 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
   !memory { load(size => 8); }
 
+# NP 0F 5C /r: SUBPS xmm1, xmm2/m128
+SUBPS SSE  01011100 \
+  !constraints { modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# F3 0F 5C /r: SUBSS xmm1, xmm2/m32
+SUBSS SSE  01011100 \
+  !constraints { rep($_); modrm($_); 1 } \
+  !memory { load(size => 4); }
+
 # NP 0F D5 /r: PMULLW mm, mm/m64
 PMULLW MMX  11010101 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
@@ -120,11 +186,121 @@ PMULHW MMX  11100101 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
   !memory { load(size => 8); }
 
+# NP 0F E4 /r: PMULHUW mm1, mm2/m64
+PMULHUW SSE  11100100 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F 59 /r: MULPS xmm1, xmm2/m128
+MULPS SSE  01011001 \
+  !constraints { modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# F3 0F 59 /r: MULSS xmm1,xmm2/m32
+MULSS SSE  01011001 \
+  !constraints { rep($_); modrm($_); 1 } \
+  !memory { load(size => 4); }
+
 # NP 0F F5 /r: PMADDWD mm, mm/m64
 PMADDWD MMX  0101 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
   !memory { load(size => 8); }
 
+# NP 0F 5E /r: DIVPS xmm1, xmm2/m128
+DIVPS SSE  0100 \
+  !constraints { modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# F3 0F 5E /r: DIVSS xmm1, xmm2/m32
+DIVSS SSE  0100 \
+  !constraints { rep($_); modrm($_); 1 } \
+  !memory { load(size => 4); }
+
+# NP 0F 53 /r: RCPPS xmm1, xmm2/m128
+RCPPS SSE  01010011 \
+  !constraints {

[Qemu-devel] [RISU PATCH v3 16/18] x86.risu: add AES and PCLMULQDQ instructions

2019-07-11 Thread Jan Bobek

Add AES-NI and PCLMULQDQ instructions to the x86 configuration file.

Signed-off-by: Jan Bobek 
---
 x86.risu | 45 +
 1 file changed, 45 insertions(+)

diff --git a/x86.risu b/x86.risu
index bc6636e..177979a 100644
--- a/x86.risu
+++ b/x86.risu
@@ -886,6 +886,51 @@ ROUNDSD SSE4_1  00111010 1011 \
   !constraints { data16($_); modrm($_); imm($_, width => 8); 1 } \
   !memory { load(size => 8); }
 
+#
+# AES Instructions
+# 
+#
+
+# 66 0F 38 DE /r: AESDEC xmm1, xmm2/m128
+AESDEC AES  00111000 1100 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 38 DF /r: AESDECLAST xmm1, xmm2/m128
+AESDECLAST AES  00111000 1101 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 38 DC /r: AESENC xmm1, xmm2/m128
+AESENC AES  00111000 11011100 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 38 DD /r: AESENCLAST xmm1, xmm2/m128
+AESENCLAST AES  00111000 11011101 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 38 DB /r: AESIMC xmm1, xmm2/m128
+AESIMC AES  00111000 11011011 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 3A DF /r ib: AESKEYGENASSIST xmm1, xmm2/m128, imm8
+AESKEYGENASSIST AES  00111010 1101 \
+  !constraints { data16($_); modrm($_); imm($_, width => 8); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+#
+# PCLMULQDQ Instructions
+# --
+#
+
+# 66 0F 3A 44 /r ib: PCLMULQDQ xmm1, xmm2/m128, imm8
+PCLMULQDQ PCLMULQDQ  00111010 01000100 \
+  !constraints { data16($_); modrm($_); imm($_, width => 8); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 #
 # Comparison Instructions
 # ---
-- 
2.20.1

[Qemu-devel] [RISU PATCH v3 13/18] x86.risu: add SSE3 instructions

2019-07-11 Thread Jan Bobek

Add SSE3 instructions to the x86 configuration file.

Signed-off-by: Jan Bobek 
---
 x86.risu | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/x86.risu b/x86.risu
index b9d424e..d40b9df 100644
--- a/x86.risu
+++ b/x86.risu
@@ -161,6 +161,26 @@ MOVMSKPS SSE  0101 \
 MOVMSKPD SSE2  0101 \
   !constraints { data16($_); modrm($_); $_->{modrm}{reg} != REG_RSP && defined 
$_->{modrm}{reg2} }
 
+# F2 0F F0 /r: LDDQU xmm1, m128
+LDDQU SSE3   \
+  !constraints { repne($_); modrm($_); !defined $_->{modrm}{reg2} } \
+  !memory { load(size => 16); }
+
+# F3 0F 16 /r: MOVSHDUP xmm1, xmm2/m128
+MOVSHDUP SSE3  00010110 \
+  !constraints { rep($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# F3 0F 12 /r: MOVSLDUP xmm1, xmm2/m128
+MOVSLDUP SSE3  00010010 \
+  !constraints { rep($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# F2 0F 12 /r: MOVDDUP xmm1, xmm2/m64
+MOVDDUP SSE3  00010010 \
+  !constraints { repne($_); modrm($_); 1 } \
+  !memory { load(size => 8); }
+
 #
 # Arithmetic Instructions
 # ---
@@ -266,6 +286,16 @@ ADDSD SSE2  01011000 \
   !constraints { repne($_); modrm($_); 1 } \
   !memory { load(size => 8); }
 
+# F2 0F 7C /r: HADDPS xmm1, xmm2/m128
+HADDPS SSE3  0100 \
+  !constraints { repne($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 7C /r: HADDPD xmm1, xmm2/m128
+HADDPD SSE3  0100 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F F8 /r: PSUBB mm, mm/m64
 PSUBB MMX  1000 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
@@ -366,6 +396,26 @@ SUBSD SSE2  01011100 \
   !constraints { repne($_); modrm($_); 1 } \
   !memory { load(size => 8); }
 
+# F2 0F 7D /r: HSUBPS xmm1, xmm2/m128
+HSUBPS SSE3  0101 \
+  !constraints { repne($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F 7D /r: HSUBPD xmm1, xmm2/m128
+HSUBPD SSE3  0101 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# F2 0F D0 /r: ADDSUBPS xmm1, xmm2/m128
+ADDSUBPS SSE3  1101 \
+  !constraints { repne($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
+# 66 0F D0 /r: ADDSUBPD xmm1, xmm2/m128
+ADDSUBPD SSE3  1101 \
+  !constraints { data16($_); modrm($_); 1 } \
+  !memory { load(size => 16, align => 16); }
+
 # NP 0F D5 /r: PMULLW mm, mm/m64
 PMULLW MMX  11010101 \
   !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
-- 
2.20.1

[Qemu-devel] [RISU PATCH v3 08/18] risugen: add command-line flag --x86_64

2019-07-11 Thread Jan Bobek

This flag instructs the x86 backend to emit 64-bit (rather than
32-bit) code.

Signed-off-by: Jan Bobek 
---
 risugen | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/risugen b/risugen
index 0c859aa..50920eb 100755
--- a/risugen
+++ b/risugen
@@ -10,6 +10,7 @@
 # Peter Maydell (Linaro) - initial implementation
 # Claudio Fontana (Linaro) - initial aarch64 support
 # Jose Ricardo Ziviani (IBM) - initial ppc64 support and arch isolation
+# Jan Bobek - initial x86 support
 ###
 
 # risugen -- generate a test binary file for use with risu
@@ -309,6 +310,7 @@ Valid options:
Useful to test before support for FP is available.
 --sve: enable sve floating point
 --be : generate instructions in Big-Endian byte order (ppc64 only).
+--x86_64 : generate 64-bit (rather than 32-bit) code. (x86 only)
 --help   : print this message
 EOT
 }
@@ -321,6 +323,7 @@ sub main()
 my $fp_enabled = 1;
 my $sve_enabled = 0;
 my $big_endian = 0;
+my $is_x86_64 = 0;
 my ($infile, $outfile);
 
 GetOptions( "help" => sub { usage(); exit(0); },
@@ -338,6 +341,7 @@ sub main()
 "be" => sub { $big_endian = 1; },
 "no-fp" => sub { $fp_enabled = 0; },
 "sve" => sub { $sve_enabled = 1; },
+"x86_64" => sub { $is_x86_64 = 1; },
 ) or return 1;
 # allow "--pattern re,re" and "--pattern re --pattern re"
 @pattern_re = split(/,/,join(',',@pattern_re));
@@ -371,7 +375,8 @@ sub main()
 'keys' => \@insn_keys,
 'arch' => $full_arch[0],
 'subarch' => $full_arch[1] || '',
-'bigendian' => $big_endian
+'bigendian' => $big_endian,
+'x86_64' => $is_x86_64,
 );
 
 write_test_code(\%params);
-- 
2.20.1

[Qemu-devel] [RISU PATCH v3 10/18] x86.risu: add MMX instructions

2019-07-11 Thread Jan Bobek

Add an x86 configuration file with all MMX instructions.

Signed-off-by: Jan Bobek 
---
 x86.risu | 321 +++
 1 file changed, 321 insertions(+)
 create mode 100644 x86.risu

diff --git a/x86.risu b/x86.risu
new file mode 100644
index 000..208ac16
--- /dev/null
+++ b/x86.risu
@@ -0,0 +1,321 @@
+###
+# Copyright (c) 2019 Jan Bobek
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Jan Bobek - initial implementation
+###
+
+# Input file for risugen defining x86 instructions
+.mode x86
+
+#
+# Data Transfer Instructions
+# --
+#
+
+# NP 0F 6E /r: MOVD mm,r/m32
+# NP 0F 7E /r: MOVD r/m32,mm
+MOVD MMX  011 d 1110 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; !(defined 
$_->{modrm}{reg2} && $_->{modrm}{reg2} == REG_RSP) } \
+  !memory { $d ? store(size => 4) : load(size => 4); }
+
+# NP REX.W + 0F 6E /r: MOVQ mm,r/m64
+# NP REX.W + 0F 7E /r: MOVQ r/m64,mm
+MOVQ MMX  011 d 1110 \
+  !constraints { rex($_, w => 1); modrm($_); $_->{modrm}{reg} &= 0b111; 
!(defined $_->{modrm}{reg2} && $_->{modrm}{reg2} == REG_RSP) } \
+  !memory { $d ? store(size => 8) : load(size => 8); }
+
+# NP 0F 6F /r: MOVQ mm, mm/m64
+# NP 0F 7F /r: MOVQ mm/m64, mm
+MOVQ_mm MMX  011 d  \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { $d ? store(size => 8) : load(size => 8); }
+
+#
+# Arithmetic Instructions
+# ---
+#
+
+# NP 0F FC /r: PADDB mm, mm/m64
+PADDB MMX  1100 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F FD /r: PADDW mm, mm/m64
+PADDW MMX  1101 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F FE /r: PADDD mm, mm/m64
+PADDD MMX  1110 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F EC /r: PADDSB mm, mm/m64
+PADDSB MMX  11101100 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F ED /r: PADDSW mm, mm/m64
+PADDSW MMX  11101101 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F DC /r: PADDUSB mm,mm/m64
+PADDUSB MMX  11011100 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F DD /r: PADDUSW mm,mm/m64
+PADDUSW MMX  11011101 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F F8 /r: PSUBB mm, mm/m64
+PSUBB MMX  1000 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F F9 /r: PSUBW mm, mm/m64
+PSUBW MMX  1001 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F FA /r: PSUBD mm, mm/m64
+PSUBD MMX  1010 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F E8 /r: PSUBSB mm, mm/m64
+PSUBSB MMX  11101000 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F E9 /r: PSUBSW mm, mm/m64
+PSUBSW MMX  11101001 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F D8 /r: PSUBUSB mm, mm/m64
+PSUBUSB MMX  11011000 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F D9 /r: PSUBUSW mm, mm/m64
+PSUBUSW MMX  11011001 \
+  !constraints { modrm($_); $_->{modrm}{reg} &= 0b111; $_->{modrm}{reg2} &= 
0b111 if defined $_->{modrm}{reg2}; 1 } \
+  !memory { load(size => 8); }
+
+# NP 0F D5 /r: PMULLW

[Qemu-devel] [RISU PATCH v3 09/18] risugen: add --xfeatures option for x86

2019-07-11 Thread Jan Bobek

The --xfeatures option is modelled after identically-named option to
RISU itself; it allows the user to specify which vector registers
should be initialized, so that the test image doesn't try to access
registers which may not be present at runtime. Note that it is still
the user's responsibility to filter out the test instructions using
these registers.

Signed-off-by: Jan Bobek 
---
 risugen | 13 +
 1 file changed, 13 insertions(+)

diff --git a/risugen b/risugen
index 50920eb..76424e1 100755
--- a/risugen
+++ b/risugen
@@ -311,6 +311,9 @@ Valid options:
 --sve: enable sve floating point
 --be : generate instructions in Big-Endian byte order (ppc64 only).
 --x86_64 : generate 64-bit (rather than 32-bit) code. (x86 only)
+--xfeatures {none|mmx|sse|avx} : what SIMD registers should be
+   initialized. The initialization is cumulative,
+   i.e. AVX includes both MMX and SSE. (x86 only)
 --help   : print this message
 EOT
 }
@@ -324,6 +327,7 @@ sub main()
 my $sve_enabled = 0;
 my $big_endian = 0;
 my $is_x86_64 = 0;
+my $xfeatures = 'none';
 my ($infile, $outfile);
 
 GetOptions( "help" => sub { usage(); exit(0); },
@@ -342,6 +346,14 @@ sub main()
 "no-fp" => sub { $fp_enabled = 0; },
 "sve" => sub { $sve_enabled = 1; },
 "x86_64" => sub { $is_x86_64 = 1; },
+"xfeatures=s" => sub {
+$xfeatures = $_[1];
+die "value for xfeatures must be one of 'none', 'mmx', 
'sse', 'avx' (got '$xfeatures')\n"
+unless ($xfeatures eq 'none'
+|| $xfeatures eq 'mmx'
+|| $xfeatures eq 'sse'
+|| $xfeatures eq 'avx');
+},
 ) or return 1;
 # allow "--pattern re,re" and "--pattern re --pattern re"
 @pattern_re = split(/,/,join(',',@pattern_re));
@@ -377,6 +389,7 @@ sub main()
 'subarch' => $full_arch[1] || '',
 'bigendian' => $big_endian,
 'x86_64' => $is_x86_64,
+'xfeatures' => $xfeatures,
 );
 
 write_test_code(\%params);
-- 
2.20.1

Re: [Qemu-devel] [RFC v5 00/29] vSMMUv3/pSMMUv3 2 stage VFIO integration

2019-07-11 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20190711173933.31203-1-eric.au...@redhat.com/



Hi,

This series failed build test on s390x host. Please find the details below.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e

echo
echo "=== ENV ==="
env

echo
echo "=== PACKAGES ==="
rpm -qa

echo
echo "=== UNAME ==="
uname -a

CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

  CC  i386-softmmu/hw/virtio/virtio-crypto.o
  CC  i386-softmmu/hw/virtio/virtio-crypto-pci.o
  CC  i386-softmmu/hw/virtio/virtio-pmem.o
/var/tmp/patchew-tester-tmp-u52ljjk0/src/hw/virtio/virtio-pmem.c:21:10: fatal 
error: standard-headers/linux/virtio_pmem.h: No such file or directory
   21 | #include "standard-headers/linux/virtio_pmem.h"
  |  ^~
compilation terminated.


The full log is available at
http://patchew.org/logs/20190711173933.31203-1-eric.au...@redhat.com/testing.s390x/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions

2019-07-11 Thread Jan Bobek

On 7/11/19 9:57 AM, Richard Henderson wrote:
> On 7/11/19 3:29 PM, Jan Bobek wrote:
>> However, I downloaded a fresh copy of Intel SDM off the Intel website
>> this morning (just to make sure) and in Volume 2B, Section "4.3
>> Instructions (M-U)," page 4-208 titled "PADDB/PADDW/PADDD/PADDQ—Add
>> Packed Integers," there's the NP 0F D4 /r PADDQ mm, mm/m64 instruction
>> in the 4th row, and the CPUID column says MMX. On the other hand, I
>> can't find it in the Volume 1, Section 5.4 "MMX(tm) Instructions," or
>> in Vol. 1, Chapter 9 "Programming with Intel(R) MMX(tm) Technology,"
>> so it's a bit confusing.
>>
>> If you know for a fact that it didn't come until SSE2 and the manual
>> is wrong, I will change it.
> 
> Interesting.  I see what you see in
> 
>   253665-069US January 2019
> 
> but I first looked at
> 
>   325462-058US April 2016
> 
> which definitely has this marked as SSE2.
> 
> In the 2019 version, "5.6.3 SSE2 128-Bit SIMD Integer Instructions" is the
> first mention of PADDQ.  Whereas "5.4.3 MMX Packed Arithmetic Instructions"
> mentions PADD{B,W,D} but not Q.
> 
> I tend to think that this is a bug in the current manual.
> 
> Checking in binutils I see
> 
>> paddq, 2, 0x660fd4, None, 2, CpuSSE2, 
>> Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 
>> RegXMM|Unspecified|BaseIndex, RegXMM }
>> paddq, 2, 0xfd4, None, 2, CpuSSE2, 
>> Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoAVX, { 
>> Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> 
> and both contain CpuSSE2. If you like, I could run this by one of the Intel 
> GCC
> folk to be sure.

I think this is convincing enough for me; it was a good idea to check
binutils! I find it interesting that they'd get it wrong in a more
recent version of the manual, though.

-Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4] linux-user: fix to handle variably sized SIOCGSTAMP with new kernels

2019-07-11 Thread Arnd Bergmann

On Thu, Jul 11, 2019 at 7:32 PM Laurent Vivier  wrote:

>
> Notes:
> v4: [lv] timeval64 and timespec64 are { long long , long }

>
> +STRUCT(timeval64, TYPE_LONGLONG, TYPE_LONG)
> +
> +STRUCT(timespec64, TYPE_LONGLONG, TYPE_LONG)
> +

This still doesn't look right, see my earlier comment about padding
on big-endian architectures.

Note that the in-kernel 'timespec64' is different from the uapi
'__kernel_timespec' exported by the kernel. I also still think you may
need to convert between SIOCGSTAMP_NEW and SIOCGSTAMP_OLD,
e.g. when emulating a 32-bit riscv process (which only use
SIOCGSTAMP_NEW) on a kernel that only understands
SIOCGSTAMP_OLD.

 Arnd

[Qemu-devel] [PATCH v2] pcie: consistent names for function args

2019-07-11 Thread Michael S. Tsirkin

The function declarations for pci_cap_slot_get and
pci_cap_slot_write_config call the argument "slot_ctl", but the function
definitions and all the call sites drop the 'o' and call it "slt_ctl".
Let's be consistent.

Reported-by: Peter Maydell 
Signed-off-by: Michael S. Tsirkin 
---

Fix pcie_cap_slot_write_config too.

 include/hw/pci/pcie.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 34f277735c..8cf3361fc4 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -107,9 +107,9 @@ void pcie_cap_lnkctl_reset(PCIDevice *dev);
 
 void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot);
 void pcie_cap_slot_reset(PCIDevice *dev);
-void pcie_cap_slot_get(PCIDevice *dev, uint16_t *slot_ctl, uint16_t *slt_sta);
+void pcie_cap_slot_get(PCIDevice *dev, uint16_t *slt_ctl, uint16_t *slt_sta);
 void pcie_cap_slot_write_config(PCIDevice *dev,
-uint16_t old_slot_ctl, uint16_t old_slt_sta,
+uint16_t old_slt_ctl, uint16_t old_slt_sta,
 uint32_t addr, uint32_t val, int len);
 int pcie_cap_slot_post_load(void *opaque, int version_id);
 void pcie_cap_slot_push_attention_button(PCIDevice *dev);
-- 
MST

[Qemu-devel] [RFC 5/5] iotests: Add test for fallback truncate/create

2019-07-11 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/259 | 71 ++
 tests/qemu-iotests/259.out | 20 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 92 insertions(+)
 create mode 100755 tests/qemu-iotests/259
 create mode 100644 tests/qemu-iotests/259.out

diff --git a/tests/qemu-iotests/259 b/tests/qemu-iotests/259
new file mode 100755
index 00..6e3941378f
--- /dev/null
+++ b/tests/qemu-iotests/259
@@ -0,0 +1,71 @@
+#!/usr/bin/env bash
+#
+# Test generic image creation and truncation fallback (by using NBD)
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto nbd
+_supported_os Linux
+
+
+_make_test_img 64M
+
+echo
+echo '--- Testing no-op shrinking ---'
+
+$QEMU_IMG resize -f raw --shrink "$TEST_IMG" 32M
+
+echo
+echo '--- Testing non-working growing ---'
+
+$QEMU_IMG resize -f raw "$TEST_IMG" 128M
+
+echo
+echo '--- Testing creation ---'
+
+$QEMU_IMG create -f qcow2 "$TEST_IMG" 64M | _filter_img_create
+$QEMU_IMG info "$TEST_IMG" | _filter_img_info
+
+echo
+echo '--- Testing creation for which the node would need to grow ---'
+
+$QEMU_IMG create -f qcow2 -o preallocation=metadata "$TEST_IMG" 64M 2>&1 \
+| _filter_img_create
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/259.out b/tests/qemu-iotests/259.out
new file mode 100644
index 00..1e4b11055f
--- /dev/null
+++ b/tests/qemu-iotests/259.out
@@ -0,0 +1,20 @@
+QA output created by 259
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+
+--- Testing no-op shrinking ---
+qemu-img: Image was not resized; resizing may not be supported for this image
+
+--- Testing non-working growing ---
+qemu-img: Cannot grow this nbd node
+
+--- Testing creation ---
+Formatting 'TEST_DIR/t.IMGFMT', fmt=qcow2 size=67108864
+image: TEST_DIR/t.IMGFMT
+file format: qcow2
+virtual size: 64 MiB (67108864 bytes)
+disk size: unavailable
+
+--- Testing creation for which the node would need to grow ---
+qemu-img: TEST_DIR/t.IMGFMT: Could not resize image: Cannot grow this nbd node
+Formatting 'TEST_DIR/t.IMGFMT', fmt=qcow2 size=67108864 preallocation=metadata
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index b34c8e3c0c..80e7603174 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -269,3 +269,4 @@
 254 rw auto backing quick
 255 rw auto quick
 256 rw auto quick
+259 rw auto quick
-- 
2.21.0

[Qemu-devel] [RFC 4/5] block: Generic file creation fallback

2019-07-11 Thread Max Reitz

If a protocol driver does not support image creation, we can see whether
maybe the file exists already.  If so, just truncating it will be
sufficient.

Signed-off-by: Max Reitz 
---
 block.c | 77 -
 1 file changed, 65 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index c139540f2b..8fb8e4dfda 100644
--- a/block.c
+++ b/block.c
@@ -531,20 +531,57 @@ out:
 return ret;
 }
 
-int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
+static int bdrv_create_file_fallback(const char *filename, BlockDriver *drv,
+ QemuOpts *opts, Error **errp)
 {
-BlockDriver *drv;
+BlockBackend *blk;
+QDict *options = qdict_new();
+int64_t size = 0;
+char *buf = NULL;
+PreallocMode prealloc;
 Error *local_err = NULL;
 int ret;
 
+size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
+buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+prealloc = qapi_enum_parse(_lookup, buf,
+   PREALLOC_MODE_OFF, _err);
+g_free(buf);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
+qdict_put_str(options, "driver", drv->format_name);
+
+blk = blk_new_open(filename, NULL, options,
+   BDRV_O_RDWR | BDRV_O_RESIZE, errp);
+if (!blk) {
+error_prepend(errp, "Protocol driver '%s' does not support "
+  "image creation, and opening the image failed: ",
+  drv->format_name);
+return -EINVAL;
+}
+
+ret = blk_truncate(blk, size, prealloc, errp);
+blk_unref(blk);
+return ret;
+}
+
+int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
+{
+BlockDriver *drv;
+
 drv = bdrv_find_protocol(filename, true, errp);
 if (drv == NULL) {
 return -ENOENT;
 }
 
-ret = bdrv_create(drv, filename, opts, _err);
-error_propagate(errp, local_err);
-return ret;
+if (drv->bdrv_co_create_opts) {
+return bdrv_create(drv, filename, opts, errp);
+} else {
+return bdrv_create_file_fallback(filename, drv, opts, errp);
+}
 }
 
 /**
@@ -1420,6 +1457,24 @@ QemuOptsList bdrv_runtime_opts = {
 },
 };
 
+static QemuOptsList fallback_create_opts = {
+.name = "fallback-create-opts",
+.head = QTAILQ_HEAD_INITIALIZER(fallback_create_opts.head),
+.desc = {
+{
+.name = BLOCK_OPT_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "Virtual disk size"
+},
+{
+.name = BLOCK_OPT_PREALLOC,
+.type = QEMU_OPT_STRING,
+.help = "Preallocation mode (allowed values: off)"
+},
+{ /* end of list */ }
+}
+};
+
 /*
  * Common part for opening disk images and files
  *
@@ -5681,14 +5736,12 @@ void bdrv_img_create(const char *filename, const char 
*fmt,
 return;
 }
 
-if (!proto_drv->create_opts) {
-error_setg(errp, "Protocol driver '%s' does not support image 
creation",
-   proto_drv->format_name);
-return;
-}
-
 create_opts = qemu_opts_append(create_opts, drv->create_opts);
-create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
+if (proto_drv->create_opts) {
+create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
+} else {
+create_opts = qemu_opts_append(create_opts, _create_opts);
+}
 
 /* Create parameter list with default values */
 opts = qemu_opts_create(create_opts, NULL, 0, _abort);
-- 
2.21.0

[Qemu-devel] [RFC 3/5] block: Fall back to fallback truncate function

2019-07-11 Thread Max Reitz

file-posix does not need to basically duplicate our fallback truncate
implementation; and sheepdog can fall back to it for "shrinking" files.

Signed-off-by: Max Reitz 
---
 block/file-posix.c | 21 +
 block/sheepdog.c   |  2 +-
 2 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index ab05b51a66..bcddfc7fbe 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2031,23 +2031,7 @@ static int coroutine_fn raw_co_truncate(BlockDriverState 
*bs, int64_t offset,
 return raw_regular_truncate(bs, s->fd, offset, prealloc, errp);
 }
 
-if (prealloc != PREALLOC_MODE_OFF) {
-error_setg(errp, "Preallocation mode '%s' unsupported for this "
-   "non-regular file", PreallocMode_str(prealloc));
-return -ENOTSUP;
-}
-
-if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
-if (offset > raw_getlength(bs)) {
-error_setg(errp, "Cannot grow device files");
-return -EINVAL;
-}
-} else {
-error_setg(errp, "Resizing this file is not supported");
-return -ENOTSUP;
-}
-
-return 0;
+return -ENOTSUP;
 }
 
 #ifdef __OpenBSD__
@@ -3413,7 +3397,6 @@ static BlockDriver bdrv_host_device = {
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
-.bdrv_co_truncate   = raw_co_truncate,
 .bdrv_getlength= raw_getlength,
 .bdrv_get_info = raw_get_info,
 .bdrv_get_allocated_file_size
@@ -3537,7 +3520,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
-.bdrv_co_truncate= raw_co_truncate,
 .bdrv_getlength  = raw_getlength,
 .has_variable_length = true,
 .bdrv_get_allocated_file_size
@@ -3669,7 +3651,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
-.bdrv_co_truncate= raw_co_truncate,
 .bdrv_getlength  = raw_getlength,
 .has_variable_length = true,
 .bdrv_get_allocated_file_size
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6f402e5d4d..4af4961cb7 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2301,7 +2301,7 @@ static int coroutine_fn sd_co_truncate(BlockDriverState 
*bs, int64_t offset,
 max_vdi_size = (UINT64_C(1) << s->inode.block_size_shift) * MAX_DATA_OBJS;
 if (offset < old_size) {
 error_setg(errp, "shrinking is not supported");
-return -EINVAL;
+return -ENOTSUP;
 } else if (offset > max_vdi_size) {
 error_setg(errp, "too big image size");
 return -EINVAL;
-- 
2.21.0

[Qemu-devel] [RFC 1/5] block/nbd: Fix hang in .bdrv_close()

2019-07-11 Thread Max Reitz

When nbd_close() is called from a coroutine, the connection_co never
gets to run, and thus nbd_teardown_connection() hangs.

This is because aio_co_enter() only puts the connection_co into the main
coroutine's wake-up queue, so this main coroutine needs to yield and
reschedule itself to let the connection_co run.

Signed-off-by: Max Reitz 
---
 block/nbd.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 81edabbf35..b83b6cd43e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -135,7 +135,17 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 qio_channel_shutdown(s->ioc,
  QIO_CHANNEL_SHUTDOWN_BOTH,
  NULL);
-BDRV_POLL_WHILE(bs, s->connection_co);
+
+if (qemu_in_coroutine()) {
+/* Let our caller poll and just yield until connection_co is done */
+while (s->connection_co) {
+aio_co_schedule(qemu_get_current_aio_context(),
+qemu_coroutine_self());
+qemu_coroutine_yield();
+}
+} else {
+BDRV_POLL_WHILE(bs, s->connection_co);
+}
 
 nbd_client_detach_aio_context(bs);
 object_unref(OBJECT(s->sioc));
-- 
2.21.0

[Qemu-devel] [RFC 0/5] block: Generic file truncation/creation fallbacks

2019-07-11 Thread Max Reitz

Hi,

Some protocol drivers do not really support file truncation but still
implement .bdrv_co_truncate(): They just don’t do anything when asked to
shrink a file.  This is reflected by qemu-img, which warns if you resize
a file and it has the exact same length afterwards as it had before.

We can just do that generically.  There is no reason for some protocol
drivers to act ashamed and pretend nobody notices.  The only thing we
have to take care of is to zero everything in the first sector past the
desired EOF, so that format probing won’t go wrong.

(RFC: Is it really OK to just do this for all block drivers?)

Similarly, we can add a fallback file creation path: If a block driver
does not support creating a file but it already exists, we can just use
it.  All we have to do is truncate it to the desired size.


This is an RFC because it feels weird and I don’t want people to
associate me with weird stuff too closely.

Well, patch 1 isn’t really an RFC.  It’s just a fix.


I was inspired to this series by Maxim’s patch “block/nvme: add support
for image creation” (from his “Few fixes for userspace NVME driver”
series).


Max Reitz (5):
  block/nbd: Fix hang in .bdrv_close()
  block: Generic truncation fallback
  block: Fall back to fallback truncate function
  block: Generic file creation fallback
  iotests: Add test for fallback truncate/create

 block.c| 77 --
 block/file-posix.c | 21 +--
 block/io.c | 69 --
 block/nbd.c| 12 +-
 block/sheepdog.c   |  2 +-
 tests/qemu-iotests/259 | 71 +++
 tests/qemu-iotests/259.out | 20 ++
 tests/qemu-iotests/group   |  1 +
 8 files changed, 235 insertions(+), 38 deletions(-)
 create mode 100755 tests/qemu-iotests/259
 create mode 100644 tests/qemu-iotests/259.out

-- 
2.21.0

[Qemu-devel] [RFC 2/5] block: Generic truncation fallback

2019-07-11 Thread Max Reitz

If a protocol driver does not support truncation, we call fall back to
effectively not doing anything if the new size is less than the actual
file size.  This is what we have been doing for some host device drivers
already.

The only caveat is that we have to zero out everything in the first
sector that lies beyond the new "EOF" so we do not get any surprises
with format probing.

Signed-off-by: Max Reitz 
---
 block/io.c | 69 ++
 1 file changed, 65 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 24a18759fd..382728fa9a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3064,6 +3064,57 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs)
 }
 }
 
+static int coroutine_fn bdrv_co_truncate_fallback(BdrvChild *child,
+  int64_t offset,
+  PreallocMode prealloc,
+  Error **errp)
+{
+BlockDriverState *bs = child->bs;
+int64_t cur_size = bdrv_getlength(bs);
+
+if (cur_size < 0) {
+error_setg_errno(errp, -cur_size,
+ "Failed to inquire current file size");
+return cur_size;
+}
+
+if (prealloc != PREALLOC_MODE_OFF) {
+error_setg(errp, "Unsupported preallocation mode: %s",
+   PreallocMode_str(prealloc));
+return -ENOTSUP;
+}
+
+if (offset > cur_size) {
+error_setg(errp, "Cannot grow this %s node", bs->drv->format_name);
+return -ENOTSUP;
+}
+
+/*
+ * Overwrite first "post-EOF" parts of the first sector with
+ * zeroes so raw images will not be misprobed
+ */
+if (offset < BDRV_SECTOR_SIZE && offset < cur_size) {
+int64_t fill_len = MIN(BDRV_SECTOR_SIZE - offset, cur_size - offset);
+int ret;
+
+if (!(child->perm & BLK_PERM_WRITE)) {
+error_setg(errp, "Cannot write to this node to clear the file past 
"
+   "the truncated EOF");
+return -EPERM;
+}
+
+ret = bdrv_co_pwrite_zeroes(child, offset, fill_len, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "Failed to clear file past the truncated EOF");
+return ret;
+}
+}
+
+return 0;
+}
+
+
 /**
  * Truncate file to 'offset' bytes (needed only for file protocols)
  */
@@ -3074,6 +3125,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset,
 BlockDriver *drv = bs->drv;
 BdrvTrackedRequest req;
 int64_t old_size, new_bytes;
+Error *local_err = NULL;
 int ret;
 
 
@@ -3127,15 +3179,24 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset,
 ret = bdrv_co_truncate(bs->file, offset, prealloc, errp);
 goto out;
 }
-error_setg(errp, "Image format driver does not support resize");
+error_setg(_err, "Image format driver does not support resize");
 ret = -ENOTSUP;
-goto out;
+} else {
+ret = drv->bdrv_co_truncate(bs, offset, prealloc, _err);
 }
 
-ret = drv->bdrv_co_truncate(bs, offset, prealloc, errp);
-if (ret < 0) {
+if (ret == -ENOTSUP && drv->bdrv_file_open) {
+error_free(local_err);
+
+ret = bdrv_co_truncate_fallback(child, offset, prealloc, errp);
+if (ret < 0) {
+goto out;
+}
+} else if (ret < 0) {
+error_propagate(errp, local_err);
 goto out;
 }
+
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not refresh total sector count");
-- 
2.21.0

[Qemu-devel] [PATCH v2] migration: Do not re-read the clock on pre_save in case of paused guest

2019-07-11 Thread Maxiwell S. Garcia

Re-read the timebase before migrate was ported from x86 commit:
   6053a86fe7bd: kvmclock: reduce kvmclock difference on migration

The clock move makes the guest knows about the paused time between
the stop and migrate commands. This is an issue in an already-paused
VM because some side effects, like process stalls, could happen
after migration.

So, this patch checks the runstate of guest in the pre_save handler and
do not re-reads the timebase in case of paused state (cold migration).

Signed-off-by: Maxiwell S. Garcia 
---
 hw/ppc/ppc.c | 13 +
 target/ppc/cpu-qom.h |  1 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index a9e508c496..8572e45274 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1008,6 +1008,8 @@ static void timebase_save(PPCTimebase *tb)
  * there is no need to update it from KVM here
  */
 tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
+
+tb->runstate_paused = runstate_check(RUN_STATE_PAUSED);
 }
 
 static void timebase_load(PPCTimebase *tb)
@@ -1051,9 +1053,9 @@ void cpu_ppc_clock_vm_state_change(void *opaque, int 
running,
 }
 
 /*
- * When migrating, read the clock just before migration,
- * so that the guest clock counts during the events
- * between:
+ * When migrating a running guest, read the clock just
+ * before migration, so that the guest clock counts
+ * during the events between:
  *
  *  * vm_stop()
  *  *
@@ -1068,7 +1070,10 @@ static int timebase_pre_save(void *opaque)
 {
 PPCTimebase *tb = opaque;
 
-timebase_save(tb);
+/* guest_timebase won't be overridden in case of paused guest */
+if (!tb->runstate_paused) {
+timebase_save(tb);
+}
 
 return 0;
 }
diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
index be9b4c30c3..5fbcdee9c9 100644
--- a/target/ppc/cpu-qom.h
+++ b/target/ppc/cpu-qom.h
@@ -201,6 +201,7 @@ typedef struct PowerPCCPUClass {
 typedef struct PPCTimebase {
 uint64_t guest_timebase;
 int64_t time_of_the_day_ns;
+bool runstate_paused;
 } PPCTimebase;
 
 extern const struct VMStateDescription vmstate_ppc_timebase;
-- 
2.20.1

Re: [Qemu-devel] [PATCH v2 02/13] kvm: introduce high-level API to support encrypted page migration

2019-07-11 Thread Singh, Brijesh



On 7/11/19 12:47 PM, Dr. David Alan Gilbert wrote:
> * Singh, Brijesh (brijesh.si...@amd.com) wrote:
>> When memory encryption is enabled in VM, the guest pages will be
>> encrypted with the guest-specific key, to protect the confidentiality
>> of data in transit. To support the live migration we need to use
>> platform specific hooks to access the guest memory.
>>
>> The kvm_memcrypt_save_outgoing_page() can be used by the sender to write
>> the encrypted pages and metadata associated with it on the socket.
>>
>> The kvm_memcrypt_load_incoming_page() can be used by receiver to read the
>> incoming encrypted pages from the socket and load into the guest memory.
>>
>> Signed-off-by: Brijesh Singh <>
>> ---
>>   accel/kvm/kvm-all.c| 27 +++
>>   accel/kvm/sev-stub.c   | 11 +++
>>   accel/stubs/kvm-stub.c | 12 
>>   include/sysemu/kvm.h   | 12 
>>   include/sysemu/sev.h   |  3 +++
>>   5 files changed, 65 insertions(+)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 3d86ae5052..162a2d5085 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -110,6 +110,10 @@ struct KVMState
>>   /* memory encryption */
>>   void *memcrypt_handle;
>>   int (*memcrypt_encrypt_data)(void *handle, uint8_t *ptr, uint64_t len);
>> +int (*memcrypt_save_outgoing_page)(void *ehandle, QEMUFile *f,
>> +uint8_t *ptr, uint32_t sz, uint64_t *bytes_sent);
>> +int (*memcrypt_load_incoming_page)(void *ehandle, QEMUFile *f,
>> +uint8_t *ptr);
>>   };
>>   
>>   KVMState *kvm_state;
>> @@ -165,6 +169,29 @@ int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t 
>> len)
>>   return 1;
>>   }
>>   
>> +int kvm_memcrypt_save_outgoing_page(QEMUFile *f, uint8_t *ptr,
>> +uint32_t size, uint64_t *bytes_sent)
>> +{
>> +if (kvm_state->memcrypt_handle &&
>> +kvm_state->memcrypt_save_outgoing_page) {
>> +return 
>> kvm_state->memcrypt_save_outgoing_page(kvm_state->memcrypt_handle,
>> +f, ptr, size, bytes_sent);
>> +}
>> +
>> +return 1;
> 
> This needs to be commented saying what the return values mean.
> I'm not sure what '1' means for the case when this didn't have
> encryption support.
> 

Agreed, I will add comment in API header about this. The value of zero
means success and anything else is failure.


>> +}
>> +
>> +int kvm_memcrypt_load_incoming_page(QEMUFile *f, uint8_t *ptr)
>> +{
>> +if (kvm_state->memcrypt_handle &&
>> +kvm_state->memcrypt_load_incoming_page) {
>> +return 
>> kvm_state->memcrypt_load_incoming_page(kvm_state->memcrypt_handle,
>> +f, ptr);
>> +}
>> +
>> +return 1;
>> +}
>> +
>>   static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
>>   {
>>   KVMState *s = kvm_state;
>> diff --git a/accel/kvm/sev-stub.c b/accel/kvm/sev-stub.c
>> index 4f97452585..c12a8e005e 100644
>> --- a/accel/kvm/sev-stub.c
>> +++ b/accel/kvm/sev-stub.c
>> @@ -24,3 +24,14 @@ void *sev_guest_init(const char *id)
>>   {
>>   return NULL;
>>   }
>> +
>> +int sev_save_outgoing_page(void *handle, QEMUFile *f, uint8_t *ptr,
>> +   uint32_t size, uint64_t *bytes_sent)
>> +{
>> +return 1;
>> +}
>> +
>> +int sev_load_incoming_page(void *handle, QEMUFile *f, uint8_t *ptr)
>> +{
>> +return 1;
>> +}
>> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
>> index 6feb66ed80..e14b879531 100644
>> --- a/accel/stubs/kvm-stub.c
>> +++ b/accel/stubs/kvm-stub.c
>> @@ -114,6 +114,18 @@ int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t 
>> len)
>> return 1;
>>   }
>>   
>> +int kvm_memcrypt_save_outgoing_page(QEMUFile *f, uint8_t *ptr,
>> +uint32_t size, uint64_t *bytes_sent)
>> +{
>> +return 1;
>> +}
>> +
>> +int kvm_memcrypt_load_incoming_page(QEMUFile *f, uint8_t *ptr)
>> +{
>> +return 1;
>> +}
>> +
>> +
>>   #ifndef CONFIG_USER_ONLY
>>   int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
>>   {
>> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
>> index acd90aebb6..bb6bcc143c 100644
>> --- a/include/sysemu/kvm.h
>> +++ b/include/sysemu/kvm.h
>> @@ -247,6 +247,18 @@ bool kvm_memcrypt_enabled(void);
>>*/
>>   int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len);
>>   
>> +/**
>> + * kvm_memcrypt_save_outgoing_buffer - encrypt the outgoing buffer
>> + * and write to the wire.
>> + */
>> +int kvm_memcrypt_save_outgoing_page(QEMUFile *f, uint8_t *ptr, uint32_t 
>> size,
>> +uint64_t *bytes_sent);
>> +
>> +/**
>> + * kvm_memcrypt_load_incoming_buffer - read the encrypt incoming buffer and 
>> copy
>> + * the buffer into the guest memory space.
>> + */
>> +int kvm_memcrypt_load_incoming_page(QEMUFile *f, uint8_t *ptr);
>>   
>>   #ifdef NEED_CPU_H
>>   #include "cpu.h"
>> diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
>>

Re: [Qemu-devel] [PATCH v2 00/13] Add SEV guest live migration support

2019-07-11 Thread Singh, Brijesh



On 7/11/19 4:59 AM, Dr. David Alan Gilbert wrote:
> * Singh, Brijesh (brijesh.si...@amd.com) wrote:
>> AMD SEV encrypts the memory of VMs and because this encryption is done using
>> an address tweak, the hypervisor will not be able to simply copy ciphertext
>> between machines to migrate a VM. Instead the AMD SEV Key Management API
>> provides a set of functions which the hypervisor can use to package a
>> guest encrypted pages for migration, while maintaining the confidentiality
>> provided by AMD SEV.
>>
>> The patch series add the support required in Qemu to perform the SEV
>> guest live migration. Before initiating the live migration a user
>> should use newly added 'migrate-set-sev-info' command to pass the
>> target machines certificate chain. See the docs/amd-memory-encryption.txt
>> for further details.
> 
> Note the two patchew errors:
>a) Mostly formatting; 80 char lines, /* comments etc - you should
>   check your patches using scripts/checkpatch.pl  to get rid of that
>   lot.
> 
>b) There are some build errors on non-x86 softmmu builds.
> 

Dave, thanks for reviews. I will fix these in next version.

Re: [Qemu-devel] [PATCH v2 03/13] migration/ram: add support to send encrypted pages

2019-07-11 Thread Singh, Brijesh



On 7/11/19 12:34 PM, Dr. David Alan Gilbert wrote:
> * Singh, Brijesh (brijesh.si...@amd.com) wrote:
>> When memory encryption is enabled, the guest memory will be encrypted with
>> the guest specific key. The patch introduces RAM_SAVE_FLAG_ENCRYPTED_PAGE
>> flag to distinguish the encrypted data from plaintext. Encrypted pages
>> may need special handling. The kvm_memcrypt_save_outgoing_page() is used
>> by the sender to write the encrypted pages onto the socket, similarly the
>> kvm_memcrypt_load_incoming_page() is used by the target to read the
>> encrypted pages from the socket and load into the guest memory.
>>
>> Signed-off-by: Brijesh Singh 
>> ---
>>   migration/ram.c | 54 -
>>   1 file changed, 53 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 908517fc2b..3c8977d508 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -57,6 +57,7 @@
>>   #include "qemu/uuid.h"
>>   #include "savevm.h"
>>   #include "qemu/iov.h"
>> +#include "sysemu/kvm.h"
>>   
>>   /***/
>>   /* ram save/restore */
>> @@ -76,6 +77,7 @@
>>   #define RAM_SAVE_FLAG_XBZRLE   0x40
>>   /* 0x80 is reserved in migration.h start with 0x100 next */
>>   #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
>> +#define RAM_SAVE_FLAG_ENCRYPTED_PAGE   0x200
> 
> OK, that's our very last usable flag!  Use it wisely!
> 

Hmm, maybe then I missed something. I thought the flag is 64-bit and
we have more room. Did I miss something ?


>>   static inline bool is_zero_range(uint8_t *p, uint64_t size)
>>   {
>> @@ -460,6 +462,9 @@ static QemuCond decomp_done_cond;
>>   static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
>> *block,
>>ram_addr_t offset, uint8_t *source_buf);
>>   
>> +static int ram_save_encrypted_page(RAMState *rs, PageSearchStatus *pss,
>> +   bool last_stage);
>> +
>>   static void *do_data_compress(void *opaque)
>>   {
>>   CompressParam *param = opaque;
>> @@ -2006,6 +2011,36 @@ static int ram_save_multifd_page(RAMState *rs, 
>> RAMBlock *block,
>>   return 1;
>>   }
>>   
>> +/**
>> + * ram_save_encrypted_page - send the given encrypted page to the stream
>> + */
>> +static int ram_save_encrypted_page(RAMState *rs, PageSearchStatus *pss,
>> +   bool last_stage)
>> +{
>> +int ret;
>> +uint8_t *p;
>> +RAMBlock *block = pss->block;
>> +ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>> +uint64_t bytes_xmit;
>> +
>> +p = block->host + offset;
>> +
>> +ram_counters.transferred +=
>> +save_page_header(rs, rs->f, block,
>> +offset | RAM_SAVE_FLAG_ENCRYPTED_PAGE);
>> +
>> +ret = kvm_memcrypt_save_outgoing_page(rs->f, p,
> 
> I think you need to somehow abstract the kvm_memcrypt stuff; nothing
> else in migration actually knows it's dealing with kvm.  So there
> should be some indirection - probably through the cpu or the machine
> type or something.
> 

Currently, there are two interfaces by which we can know if we
are dealing with encrypted guest. kvm_memcrypt_enabled() or
MachineState->memory_encryption pointer. I did realized that
migration code have not dealt with kvm so far.

How about target/i386/sev.c exporting the migration functions and
based on state of MachineState->memory_encryption we call the
SEV migration routines for the encrypted pages?


> Also, this isn't bisectable - you can't make this call in this patch
> because you don't define/declare this function until a later patch.
> 
> 
>> +TARGET_PAGE_SIZE, _xmit);
>> +if (ret) {
>> +return -1;
>> +}
>> +
>> +ram_counters.transferred += bytes_xmit;
>> +ram_counters.normal++;
>> +
>> +return 1;
>> +}
>> +
>>   static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
>> *block,
>>ram_addr_t offset, uint8_t *source_buf)
>>   {
>> @@ -2450,6 +2485,16 @@ static int ram_save_target_page(RAMState *rs, 
>> PageSearchStatus *pss,
>>   return res;
>>   }
>>   
>> +/*
>> + * If memory encryption is enabled then use memory encryption APIs
>> + * to write the outgoing buffer to the wire. The encryption APIs
>> + * will take care of accessing the guest memory and re-encrypt it
>> + * for the transport purposes.
>> + */
>> + if (kvm_memcrypt_enabled()) {
>> +return ram_save_encrypted_page(rs, pss, last_stage);
>> + }
>> +
>>   if (save_compress_page(rs, block, offset)) {
>>   return 1;
>>   }
>> @@ -4271,7 +4316,8 @@ static int ram_load(QEMUFile *f, void *opaque, int 
>> version_id)
>>   }
>>   
>>   if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
>> - RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
>> +

[Qemu-devel] [PATCH] xio3130_downstream: typo fix

2019-07-11 Thread Michael S. Tsirkin

slt ctl/status are passed in incorrect order.
Fix this up.

Signed-off-by: Michael S. Tsirkin 
Reported-by: Peter Maydell 
---
 hw/pci-bridge/xio3130_downstream.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci-bridge/xio3130_downstream.c 
b/hw/pci-bridge/xio3130_downstream.c
index 899b0fd6c9..182e164f74 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -43,7 +43,7 @@ static void xio3130_downstream_write_config(PCIDevice *d, 
uint32_t address,
 {
 uint16_t slt_ctl, slt_sta;
 
-pcie_cap_slot_get(d, _sta, _ctl);
+pcie_cap_slot_get(d, _ctl, _sta);
 pci_bridge_write_config(d, address, val, len);
 pcie_cap_flr_write_config(d, address, val, len);
 pcie_cap_slot_write_config(d, slt_ctl, slt_sta, address, val, len);
-- 
MST

[Qemu-devel] [PATCH] pcie: consistent names for function args

2019-07-11 Thread Michael S. Tsirkin

The function declarations for pci_cap_slot_get and
pci_cap_slot_write_config call the argument "slot_ctl", but the function
definitions and all the call sites drop the 'o' and call it "slt_ctl".
Let's be consistent.

Reported-by: Peter Maydell 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/pci/pcie.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 34f277735c..c7f0388b26 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -107,7 +107,7 @@ void pcie_cap_lnkctl_reset(PCIDevice *dev);
 
 void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot);
 void pcie_cap_slot_reset(PCIDevice *dev);
-void pcie_cap_slot_get(PCIDevice *dev, uint16_t *slot_ctl, uint16_t *slt_sta);
+void pcie_cap_slot_get(PCIDevice *dev, uint16_t *slt_ctl, uint16_t *slt_sta);
 void pcie_cap_slot_write_config(PCIDevice *dev,
 uint16_t old_slot_ctl, uint16_t old_slt_sta,
 uint32_t addr, uint32_t val, int len);
-- 
MST

Re: [Qemu-devel] [PATCH 10/11] audio: remove read and write pcm_ops

2019-07-11 Thread Zoltán Kővágó

On 2019-07-10 21:57, Marc-André Lureau wrote:
> On Tue, Jul 9, 2019 at 10:57 PM Kővágó, Zoltán  wrote:
>>
>> They just called audio_pcm_sw_read/write anyway, so it makes no sense
>> to have them too.  (The noaudio's read is the only exception, but it
>> should work with the generic code too.)
> 
> It works with the generic code, but wouldn't it be more expensive?

It's a bit more expensive, but only if the guest has a sound card and
the guest is playing some sound, because otherwise the output is suspended.

> Perhaps there can be something in audio_pcm_sw_write() to skip the
> work if noaudio is the backend?

It feels like a hacky solution to the same problem what the function
pointer currently solves.  On the other hand, it's unlikely any other
backend would drop every sample without processing them.

>>
>> Signed-off-by: Kővágó, Zoltán 
>> ---
>>  audio/audio_int.h   |  5 -
>>  audio/alsaaudio.c   | 12 
>>  audio/audio.c   |  8 
>>  audio/coreaudio.c   |  6 --
>>  audio/dsoundaudio.c | 12 
>>  audio/noaudio.c | 19 ---
>>  audio/ossaudio.c| 12 
>>  audio/paaudio.c | 12 
>>  audio/sdlaudio.c|  6 --
>>  audio/spiceaudio.c  | 12 
>>  audio/wavaudio.c|  6 --
>>  11 files changed, 4 insertions(+), 106 deletions(-)
>>
>> diff --git a/audio/audio_int.h b/audio/audio_int.h
>> index 7e00c1332e..003b7ab8cc 100644
>> --- a/audio/audio_int.h
>> +++ b/audio/audio_int.h
>> @@ -150,13 +150,11 @@ struct audio_pcm_ops {
>>  int  (*init_out)(HWVoiceOut *hw, struct audsettings *as, void 
>> *drv_opaque);
>>  void (*fini_out)(HWVoiceOut *hw);
>>  int  (*run_out) (HWVoiceOut *hw, int live);
>> -int  (*write)   (SWVoiceOut *sw, void *buf, int size);
>>  int  (*ctl_out) (HWVoiceOut *hw, int cmd, ...);
>>
>>  int  (*init_in) (HWVoiceIn *hw, struct audsettings *as, void 
>> *drv_opaque);
>>  void (*fini_in) (HWVoiceIn *hw);
>>  int  (*run_in)  (HWVoiceIn *hw);
>> -int  (*read)(SWVoiceIn *sw, void *buf, int size);
>>  int  (*ctl_in)  (HWVoiceIn *hw, int cmd, ...);
>>  };
>>
>> @@ -210,11 +208,8 @@ audio_driver *audio_driver_lookup(const char *name);
>>  void audio_pcm_init_info (struct audio_pcm_info *info, struct audsettings 
>> *as);
>>  void audio_pcm_info_clear_buf (struct audio_pcm_info *info, void *buf, int 
>> len);
>>
>> -int  audio_pcm_sw_write (SWVoiceOut *sw, void *buf, int len);
>>  int  audio_pcm_hw_get_live_in (HWVoiceIn *hw);
>>
>> -int  audio_pcm_sw_read (SWVoiceIn *sw, void *buf, int len);
>> -
>>  int audio_pcm_hw_clip_out (HWVoiceOut *hw, void *pcm_buf,
>> int live, int pending);
>>
>> diff --git a/audio/alsaaudio.c b/audio/alsaaudio.c
>> index 3daa7c8f8f..e9e3a4819c 100644
>> --- a/audio/alsaaudio.c
>> +++ b/audio/alsaaudio.c
>> @@ -270,11 +270,6 @@ static int alsa_poll_in (HWVoiceIn *hw)
>>  return alsa_poll_helper (alsa->handle, >pollhlp, POLLIN);
>>  }
>>
>> -static int alsa_write (SWVoiceOut *sw, void *buf, int len)
>> -{
>> -return audio_pcm_sw_write (sw, buf, len);
>> -}
>> -
>>  static snd_pcm_format_t aud_to_alsafmt (AudioFormat fmt, int endianness)
>>  {
>>  switch (fmt) {
>> @@ -988,11 +983,6 @@ static int alsa_run_in (HWVoiceIn *hw)
>>  return read_samples;
>>  }
>>
>> -static int alsa_read (SWVoiceIn *sw, void *buf, int size)
>> -{
>> -return audio_pcm_sw_read (sw, buf, size);
>> -}
>> -
>>  static int alsa_ctl_in (HWVoiceIn *hw, int cmd, ...)
>>  {
>>  ALSAVoiceIn *alsa = (ALSAVoiceIn *) hw;
>> @@ -1076,13 +1066,11 @@ static struct audio_pcm_ops alsa_pcm_ops = {
>>  .init_out = alsa_init_out,
>>  .fini_out = alsa_fini_out,
>>  .run_out  = alsa_run_out,
>> -.write= alsa_write,
>>  .ctl_out  = alsa_ctl_out,
>>
>>  .init_in  = alsa_init_in,
>>  .fini_in  = alsa_fini_in,
>>  .run_in   = alsa_run_in,
>> -.read = alsa_read,
>>  .ctl_in   = alsa_ctl_in,
>>  };
>>
>> diff --git a/audio/audio.c b/audio/audio.c
>> index d73cc086b6..b79f56fe64 100644
>> --- a/audio/audio.c
>> +++ b/audio/audio.c
>> @@ -594,7 +594,7 @@ static int audio_pcm_sw_get_rpos_in (SWVoiceIn *sw)
>>  }
>>  }
>>
>> -int audio_pcm_sw_read (SWVoiceIn *sw, void *buf, int size)
>> +static int audio_pcm_sw_read(SWVoiceIn *sw, void *buf, int size)
>>  {
>>  HWVoiceIn *hw = sw->hw;
>>  int samples, live, ret = 0, swlim, isamp, osamp, rpos, total = 0;
>> @@ -696,7 +696,7 @@ static int audio_pcm_hw_get_live_out (HWVoiceOut *hw, 
>> int *nb_live)
>>  /*
>>   * Soft voice (playback)
>>   */
>> -int audio_pcm_sw_write (SWVoiceOut *sw, void *buf, int size)
>> +static int audio_pcm_sw_write(SWVoiceOut *sw, void *buf, int size)
>>  {
>>  int hwsamples, samples, isamp, osamp, wpos, live, dead, left, swlim, 
>> blck;
>>  int ret = 0, pos = 0, total = 0;
>> @@ -854,7 +854,7 @@ int AUD_write (SWVoiceOut *sw, void *buf, int size)
>>  return 0;
>>  }
>>
>> -

Re: [Qemu-devel] [PATCH v7 06/13] vfio: Add VM state change handler to know state of VM

2019-07-11 Thread Kirti Wankhede




On 7/11/2019 5:43 PM, Dr. David Alan Gilbert wrote:
> * Kirti Wankhede (kwankh...@nvidia.com) wrote:
>> VM state change handler gets called on change in VM's state. This is used to 
>> set
>> VFIO device state to _RUNNING.
>> VM state change handler, migration state change handler and log_sync listener
>> are called asynchronously, which sometimes lead to data corruption in 
>> migration
>> region. Initialised mutex that is used to serialize operations on migration 
>> data
>> region during saving state.
>>
>> Signed-off-by: Kirti Wankhede 
>> Reviewed-by: Neo Jia 
>> ---
>>  hw/vfio/migration.c   | 64 
>> +++
>>  hw/vfio/trace-events  |  2 ++
>>  include/hw/vfio/vfio-common.h |  4 +++
>>  3 files changed, 70 insertions(+)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index a2cfbd5af2e1..c01f08b659d0 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -78,6 +78,60 @@ err:
>>  return ret;
>>  }
>>  
>> +static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t state)
>> +{
>> +VFIOMigration *migration = vbasedev->migration;
>> +VFIORegion *region = >region.buffer;
>> +uint32_t device_state;
>> +int ret = 0;
>> +
>> +device_state = (state & VFIO_DEVICE_STATE_MASK) |
>> +   (vbasedev->device_state & ~VFIO_DEVICE_STATE_MASK);
>> +
>> +if ((device_state & VFIO_DEVICE_STATE_MASK) == 
>> VFIO_DEVICE_STATE_INVALID) {
>> +return -EINVAL;
>> +}
>> +
>> +ret = pwrite(vbasedev->fd, _state, sizeof(device_state),
>> + region->fd_offset + offsetof(struct 
>> vfio_device_migration_info,
>> +  device_state));
>> +if (ret < 0) {
>> +error_report("%s: Failed to set device state %d %s",
>> + vbasedev->name, ret, strerror(errno));
>> +return ret;
>> +}
>> +
>> +vbasedev->device_state = device_state;
>> +trace_vfio_migration_set_state(vbasedev->name, device_state);
>> +return 0;
>> +}
>> +
>> +static void vfio_vmstate_change(void *opaque, int running, RunState state)
>> +{
>> +VFIODevice *vbasedev = opaque;
>> +
>> +if ((vbasedev->vm_running != running)) {
>> +int ret;
>> +uint32_t dev_state;
>> +
>> +if (running) {
>> +dev_state = VFIO_DEVICE_STATE_RUNNING;
>> +} else {
>> +dev_state = (vbasedev->device_state & VFIO_DEVICE_STATE_MASK) &
>> + ~VFIO_DEVICE_STATE_RUNNING;
>> +}
>> +
>> +ret = vfio_migration_set_state(vbasedev, dev_state);
>> +if (ret) {
>> +error_report("%s: Failed to set device state 0x%x",
>> + vbasedev->name, dev_state);
>> +}
>> +vbasedev->vm_running = running;
>> +trace_vfio_vmstate_change(vbasedev->name, running, 
>> RunState_str(state),
>> +  dev_state);
>> +}
>> +}
>> +
>>  static int vfio_migration_init(VFIODevice *vbasedev,
>> struct vfio_region_info *info)
>>  {
>> @@ -93,6 +147,11 @@ static int vfio_migration_init(VFIODevice *vbasedev,
>>  return ret;
>>  }
>>  
>> +qemu_mutex_init(>migration->lock);
> 
> Does this and it's friend below belong in this patch?  As far as I can
> tell you init/deinit the lock here but don't use it which is strange.
> 

This lock is used in
0009-vfio-Add-save-state-functions-to-SaveVMHandlers.patch and
0011-vfio-Add-function-to-get-dirty-page-list.patch

Hm. I'll move this init/deinit to patch 0009 in next iteration.

Thanks,
Kirti


> Dave
> 
>> +vbasedev->vm_state = 
>> qemu_add_vm_change_state_handler(vfio_vmstate_change,
>> +  vbasedev);
>> +
>>  return 0;
>>  }
>>  
>> @@ -135,11 +194,16 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>>  return;
>>  }
>>  
>> +if (vbasedev->vm_state) {
>> +qemu_del_vm_change_state_handler(vbasedev->vm_state);
>> +}
>> +
>>  if (vbasedev->migration_blocker) {
>>  migrate_del_blocker(vbasedev->migration_blocker);
>>  error_free(vbasedev->migration_blocker);
>>  }
>>  
>> +qemu_mutex_destroy(>migration->lock);
>>  vfio_migration_region_exit(vbasedev);
>>  g_free(vbasedev->migration);
>>  }
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 191a726a1312..3d15bacd031a 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -146,3 +146,5 @@ vfio_display_edid_write_error(void) ""
>>  
>>  # migration.c
>>  vfio_migration_probe(char *name, uint32_t index) " (%s) Region %d"
>> +vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
>> +vfio_vmstate_change(char *name, int running, const char *reason, uint32_t 
>> dev_state) " (%s) running %d reason %s device state %d"
>> diff --git a/include/hw/vfio/vfio-common.h

Re: [Qemu-devel] [PATCH 04/11] audio: audiodev= parameters no longer optional when -audiodev present

2019-07-11 Thread Zoltán Kővágó

On 2019-07-10 21:30, Marc-André Lureau wrote:
> On Tue, Jul 9, 2019 at 11:00 PM Kővágó, Zoltán  wrote:
>>
>> This means you should probably stop using -soundhw (as it doesn't allow
>> you to specify any options) and add the device manually with -device.
>> The exception is pcspk, it's currently not possible to manually add it.
>> To use it with audiodev, use something like this:
>>
>> -audiodev id=foo,... -global isa-pcspk.audiodev=foo -soundhw pcspk
> 
> Can you seperate the paaudio changes to ease review?

I'll look into it, unfortunately the two changes are more intermixed
than I'd like them to be.

>>
>> Signed-off-by: Kővágó, Zoltán 
>> ---
>>  audio/audio.c   |  24 ++--
>>  audio/paaudio.c | 329 +++-
>>  2 files changed, 203 insertions(+), 150 deletions(-)
>>
>> diff --git a/audio/audio.c b/audio/audio.c
>> index e9dd7c8b32..82dd0e3e13 100644
>> --- a/audio/audio.c
>> +++ b/audio/audio.c
>> @@ -101,6 +101,8 @@ const struct mixeng_volume nominal_volume = {
>>  #endif
>>  };
>>
>> +static bool legacy_config = true;
>> +
>>  #ifdef AUDIO_IS_FLAWLESS_AND_NO_CHECKS_ARE_REQURIED
>>  #error No its not
>>  #else
>> @@ -1392,7 +1394,7 @@ static AudiodevListEntry *audiodev_find(
>>   * if dev == NULL => legacy implicit initialization, return the already 
>> created
>>   *   state or create a new one
>>   */
>> -static AudioState *audio_init(Audiodev *dev)
>> +static AudioState *audio_init(Audiodev *dev, const char *name)
>>  {
>>  static bool atexit_registered;
>>  size_t i;
>> @@ -1406,12 +1408,13 @@ static AudioState *audio_init(Audiodev *dev)
>>
>>  if (dev) {
>>  /* -audiodev option */
>> +legacy_config = false;
>>  drvname = AudiodevDriver_str(dev->driver);
>>  } else if (!QTAILQ_EMPTY(_states)) {
>> -/*
>> - * todo: check for -audiodev once we have normal audiodev selection
>> - * support
>> - */
>> +if (!legacy_config) {
>> +dolog("You must specify an audiodev= for the device %s\n", 
>> name);
>> +exit(1);
>> +}
>>  return QTAILQ_FIRST(_states);
>>  } else {
>>  /* legacy implicit initialization */
>> @@ -1518,7 +1521,7 @@ void audio_free_audiodev_list(AudiodevListHead *head)
>>  void AUD_register_card (const char *name, QEMUSoundCard *card)
>>  {
>>  if (!card->state) {
>> -card->state = audio_init(NULL);
>> +card->state = audio_init(NULL, name);
>>  }
>>
>>  card->name = g_strdup (name);
>> @@ -1544,8 +1547,11 @@ CaptureVoiceOut *AUD_add_capture(
>>  struct capture_callback *cb;
>>
>>  if (!s) {
>> -/* todo: remove when we have normal audiodev selection support */
>> -s = audio_init(NULL);
>> +if (!legacy_config) {
>> +dolog("You must specify audiodev when trying to capture\n");
>> +return NULL;
>> +}
>> +s = audio_init(NULL, NULL);
>>  }
>>
>>  if (audio_validate_settings (as)) {
>> @@ -1776,7 +1782,7 @@ void audio_init_audiodevs(void)
>>  AudiodevListEntry *e;
>>
>>  QSIMPLEQ_FOREACH(e, , next) {
>> -audio_init(e->dev);
>> +audio_init(e->dev, NULL);
>>  }
>>  }
>>
>> diff --git a/audio/paaudio.c b/audio/paaudio.c
>> index 5fc886bb33..cc3a34c2ea 100644
>> --- a/audio/paaudio.c
>> +++ b/audio/paaudio.c
>> @@ -11,10 +11,21 @@
>>  #include "audio_int.h"
>>  #include "audio_pt_int.h"
>>
>> -typedef struct {
>> -Audiodev *dev;
>> +typedef struct PAConnection {
>> +char *server;
>> +int refcount;
>> +QTAILQ_ENTRY(PAConnection) list;
>> +
>>  pa_threaded_mainloop *mainloop;
>>  pa_context *context;
>> +} PAConnection;
>> +
>> +static QTAILQ_HEAD(PAConnectionHead, PAConnection) pa_conns =
>> +QTAILQ_HEAD_INITIALIZER(pa_conns);
>> +
>> +typedef struct {
>> +Audiodev *dev;
>> +PAConnection *conn;
>>  } paaudio;
>>
>>  typedef struct {
>> @@ -45,7 +56,7 @@ typedef struct {
>>  int samples;
>>  } PAVoiceIn;
>>
>> -static void qpa_audio_fini(void *opaque);
>> +static void qpa_conn_fini(PAConnection *c);
>>
>>  static void GCC_FMT_ATTR (2, 3) qpa_logerr (int err, const char *fmt, ...)
>>  {
>> @@ -108,11 +119,11 @@ static inline int PA_STREAM_IS_GOOD(pa_stream_state_t 
>> x)
>>
>>  static int qpa_simple_read (PAVoiceIn *p, void *data, size_t length, int 
>> *rerror)
>>  {
>> -paaudio *g = p->g;
>> +PAConnection *c = p->g->conn;
>>
>> -pa_threaded_mainloop_lock (g->mainloop);
>> +pa_threaded_mainloop_lock(c->mainloop);
>>
>> -CHECK_DEAD_GOTO (g, p->stream, rerror, unlock_and_fail);
>> +CHECK_DEAD_GOTO(c, p->stream, rerror, unlock_and_fail);
>>
>>  while (length > 0) {
>>  size_t l;
>> @@ -121,11 +132,11 @@ static int qpa_simple_read (PAVoiceIn *p, void *data, 
>> size_t length, int *rerror
>>  int r;
>>
>>  r = pa_stream_peek (p->stream, >read_data, >read_length);
>> -CHECK_SUCCESS_GOTO (g,

Re: [Qemu-devel] [PATCH v7 00/13] Add migration support for VFIO device

2019-07-11 Thread Kirti Wankhede

On 7/11/2019 9:53 PM, Dr. David Alan Gilbert wrote:
> * Yan Zhao (yan.y.z...@intel.com) wrote:
>> On Thu, Jul 11, 2019 at 06:50:12PM +0800, Dr. David Alan Gilbert wrote:
>>> * Yan Zhao (yan.y.z...@intel.com) wrote:
 Hi Kirti,
 There are still unaddressed comments to your patches v4.
 Would you mind addressing them?

 1. should we register two migration interfaces simultaneously
 (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04750.html)
>>>
>>> Please don't do this.
>>> As far as I'm aware we currently only have one device that does that
>>> (vmxnet3) and a patch has just been posted that fixes/removes that.
>>>
>>> Dave
>>>
>> hi Dave,
>> Thanks for notifying this. but if we want to support postcopy in future,
>> after device stops, what interface could we use to transfer data of
>> device state only?
>> for postcopy, when source device stops, we need to transfer only
>> necessary device state to target vm before target vm starts, and we
>> don't want to transfer device memory as we'll do that after target vm
>> resuming.
> 
> Hmm ok, lets see; that's got to happen in the call to:
> qemu_savevm_state_complete_precopy(fb, false, false);
> that's made from postcopy_start.
>  (the false's are iterable_only and inactivate_disks)
> 
> and at that time I believe the state is POSTCOPY_ACTIVE, so in_postcopy
> is true.
> 
> If you're doing postcopy, then you'll probably define a has_postcopy()
> function, so qemu_savevm_state_complete_precopy will skip the
> save_live_complete_precopy call from it's loop for at least two of the
> reasons in it's big if.
> 
> So you're right; you need the VMSD for this to happen in the second
> loop in qemu_savevm_state_complete_precopy.  Hmm.
> 
> Now, what worries me, and I don't know the answer, is how the section
> header for the vmstate and the section header for an iteration look
> on the stream; how are they different?
> 

I don't have way to test postcopy migration - is one of the major reason
I had not included postcopy support in this patchset and clearly called
out in cover letter.
This patchset is thoroughly tested for precopy migration.
If anyone have hardware that supports fault, then I would prefer to add
postcopy support as incremental change later which can be tested before
submitting.

Just a suggestion, instead of using VMSD, is it possible to have some
additional check to call save_live_complete_precopy from
qemu_savevm_state_complete_precopy?

>>
 2. in each save iteration, how much data is to be saved
 (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04683.html)

> how big is the data_size ?
> if this size is too big, it may take too much time and block others.

I do had mentioned this in the comment about the structure in vfio.h
header. data_size will be provided by vendor driver and obviously will
not be greater that migration region size. Vendor driver should be
responsible to keep its solution optimized.

 3. do we need extra interface to get data for device state only
 (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04812.html)

I don't think so. Opaque Device data from vendor driver can include
device state and device memory. Vendor driver who is managing his device
can decide how to place data over the stream.

 4. definition of dirty page copied_pfn
 (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg05592.html)

This was inline to discussion going with Alex. I addressed the concern
there. Please check current patchset, which addresses the concerns raised.

 Also, I'm glad to see that you updated code by following my comments below,
 but please don't forget to reply my comments next time:)

I tried to reply top of threads and addressed common concerns raised in
that. Sorry If I missed any, I'll make sure to point you to my replies
going ahead.

Thanks,
Kirti

 https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg05357.html
 https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg06454.html

 Thanks
 Yan

 On Tue, Jul 09, 2019 at 05:49:07PM +0800, Kirti Wankhede wrote:
> Add migration support for VFIO device
>
> This Patch set include patches as below:
> - Define KABI for VFIO device for migration support.
> - Added save and restore functions for PCI configuration space
> - Generic migration functionality for VFIO device.
>   * This patch set adds functionality only for PCI devices, but can be
> extended to other VFIO devices.
>   * Added all the basic functions required for pre-copy, stop-and-copy and
> resume phases of migration.
>   * Added state change notifier and from that notifier function, VFIO
> device's state changed is conveyed to VFIO device driver.
>   * During save setup phase and resume/load setup phase, migration region
> is queried and is used to read/write VFIO device data.
>   * .save_live_pending and .save_live_iterate

Re: [Qemu-devel] [PATCH 02/11] audio: basic support for multi backend audio

2019-07-11 Thread Zoltán Kővágó

On 2019-07-11 16:37, Markus Armbruster wrote:
> "Zoltán Kővágó"  writes:
> 
>> On 2019-07-10 06:06, Markus Armbruster wrote:
>>> "Kővágó, Zoltán"  writes:
>>>
 Audio functions no longer access glob_audio_state, instead they get an
 AudioState as a parameter.  This is required in order to support
 multiple backends.

 glob_audio_state is also gone, and replaced with a tailq so we can store
 more than one states.

 Signed-off-by: Kővágó, Zoltán 
 ---
>>> [...]
 diff --git a/hmp-commands.hx b/hmp-commands.hx
 index bfa5681dd2..23196da3fe 100644
 --- a/hmp-commands.hx
 +++ b/hmp-commands.hx
 @@ -819,16 +819,17 @@ ETEXI
  
  {
  .name   = "wavcapture",
 -.args_type  = "path:F,freq:i?,bits:i?,nchannels:i?",
 -.params = "path [frequency [bits [channels]]]",
 +.args_type  = "path:F,freq:i?,bits:i?,nchannels:i?,audiodev:s?",
 +.params = "path [frequency [bits [channels [audiodev",
  .help   = "capture audio to a wave file (default 
 frequency=44100 bits=16 channels=2)",
  .cmd= hmp_wavcapture,
  },
  STEXI
 -@item wavcapture @var{filename} [@var{frequency} [@var{bits} 
 [@var{channels}]]]
 +@item wavcapture @var{filename} [@var{frequency} [@var{bits} 
 [@var{channels} [@var{audiodev}
  @findex wavcapture
 -Capture audio into @var{filename}. Using sample rate @var{frequency}
 -bits per sample @var{bits} and number of channels @var{channels}.
 +Capture audio into @var{filename} from @var{audiodev}. Using sample rate
 +@var{frequency} bits per sample @var{bits} and number of channels
 +@var{channels}.
  
  Defaults:
  @itemize @minus
>>>@item Sample rate = 44100 Hz - CD quality
>>>@item Bits = 16
>>>@item Number of channels = 2 - Stereo
>>>@end itemize
>>>ETEXI
>>>
>>> Defaults for the other optional arguments are listed here.  Why not for
>>> @audiodev?
>>
>> There's no default listed because there's no default when you use the
>> -audiodev options, since there's no good default.  When you don't use
>> -audiodev, it'll use the implicitly created audiodev which doesn't have
>> a name, so it can't be specified.
> 
> Double-checking to avoid misunderstandings: there is a default
> *behavior*, but no default *value*, i.e. there is no VALUE that makes
> audiodev=VALUE give you the same behavior as no audiodev.  Correct?

Yes.  If there is no audiodev=VALUE, and no -audiodev on the command
line, use the legacy config.  If there is audiodev=VALUE and -audiodev
id=VALUE, use that device.  Otherwise, it's an error.

> 
>>But I agree that this situation
>> should be documented somehow.
> 
> Yes, please.
> 
 diff --git a/qemu-options.hx b/qemu-options.hx
 index 9621e934c0..0111055aa4 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
 @@ -1978,6 +1978,11 @@ can help the device and guest to keep up and not 
 lose events in case
  events are arriving in bulk.  Possible causes for the latter are flaky
  network connections, or scripts for automated testing.
  
 +@item audiodev=@var{audiodev}
 +
 +Use the specified @var{audiodev} when the VNC client requests audio
 +transmission.
 +
>>>
>>> What's the default?
>>
>> It's the same story as wav_capture.
>>
>> Regards,
>> Zoltan

Re: [Qemu-devel] [PATCH v2 04/13] kvm: add support to sync the page encryption state bitmap

2019-07-11 Thread Dr. David Alan Gilbert

* Singh, Brijesh (brijesh.si...@amd.com) wrote:
> The SEV VMs have concept of private and shared memory. The private memory
> is encrypted with guest-specific key, while shared memory may be encrypted
> with hyperivosr key. The KVM_GET_PAGE_ENC_BITMAP can be used to get a
> bitmap indicating whether the guest page is private or shared. A private
> page must be transmitted using the SEV migration commands.
> 
> Add a cpu_physical_memory_sync_encrypted_bitmap() which can be used to sync
> the page encryption bitmap for a given memory region.
> 
> Signed-off-by: Brijesh Singh 
> ---
>  accel/kvm/kvm-all.c |  38 ++
>  include/exec/ram_addr.h | 161 ++--
>  include/exec/ramlist.h  |   3 +-
>  migration/ram.c |  28 ++-
>  4 files changed, 222 insertions(+), 8 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 162a2d5085..c935e9366c 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -504,6 +504,37 @@ static int 
> kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
>  
>  #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
>  
> +/* sync page_enc bitmap */
> +static int kvm_sync_page_enc_bitmap(KVMMemoryListener *kml,
> +MemoryRegionSection *section,
> +KVMSlot *mem)

How AMD/SEV specific is this? i.e. should this be in a target/ specific
place? 

> +{
> +unsigned long size;
> +KVMState *s = kvm_state;
> +struct kvm_page_enc_bitmap e = {};
> +ram_addr_t pages = int128_get64(section->size) / getpagesize();
> +ram_addr_t start = section->offset_within_region +
> +   memory_region_get_ram_addr(section->mr);
> +
> +size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
> + /*HOST_LONG_BITS*/ 64) / 8;
> +e.enc_bitmap = g_malloc0(size);
> +e.start_gfn = mem->start_addr >> TARGET_PAGE_BITS;
> +e.num_pages = pages;
> +if (kvm_vm_ioctl(s, KVM_GET_PAGE_ENC_BITMAP, ) == -1) {
> +DPRINTF("KVM_GET_PAGE_ENC_BITMAP ioctl failed %d\n", errno);
> +g_free(e.enc_bitmap);
> +return 1;
> +}
> +
> +cpu_physical_memory_set_encrypted_lebitmap(e.enc_bitmap,
> +   start, pages);
> +
> +g_free(e.enc_bitmap);
> +
> +return 0;
> +}
> +
>  /**
>   * kvm_physical_sync_dirty_bitmap - Grab dirty bitmap from kernel space
>   * This function updates qemu's dirty bitmap using
> @@ -553,6 +584,13 @@ static int 
> kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
>  }
>  
>  kvm_get_dirty_pages_log_range(section, d.dirty_bitmap);
> +
> +if (kvm_memcrypt_enabled() &&
> +kvm_sync_page_enc_bitmap(kml, section, mem)) {
> +g_free(d.dirty_bitmap);
> +return -1;
> +}
> +
>  g_free(d.dirty_bitmap);
>  }
>  
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index f96777bb99..6fc6864194 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -51,6 +51,8 @@ struct RAMBlock {
>  unsigned long *unsentmap;
>  /* bitmap of already received pages in postcopy */
>  unsigned long *receivedmap;
> +/* bitmap of page encryption state for an encrypted guest */
> +unsigned long *encbmap;
>  };
>  
>  static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
> @@ -314,9 +316,41 @@ static inline void 
> cpu_physical_memory_set_dirty_range(ram_addr_t start,
>  }
>  
>  #if !defined(_WIN32)
> -static inline void cpu_physical_memory_set_dirty_lebitmap(unsigned long 
> *bitmap,
> +
> +static inline void cpu_physical_memory_set_encrypted_range(ram_addr_t start,
> +   ram_addr_t length,
> +   unsigned long val)
> +{
> +unsigned long end, page;
> +unsigned long * const *src;
> +
> +if (length == 0) {
> +return;
> +}
> +
> +end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
> +page = start >> TARGET_PAGE_BITS;
> +
> +rcu_read_lock();
> +
> +src = 
> atomic_rcu_read(_list.dirty_memory[DIRTY_MEMORY_ENCRYPTED])->blocks;
> +
> +while (page < end) {
> +unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> +unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> +unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - 
> offset);
> +
> +atomic_xchg([idx][BIT_WORD(offset)], val);
> +page += num;
> +}
> +
> +rcu_read_unlock();
> +}
> +
> +static inline void cpu_physical_memory_set_dirty_enc_lebitmap(unsigned long 
> *bitmap,
>ram_addr_t start,
> -  ram_addr_t pages)
> +  ram_addr_t pages,
> +

Re: [Qemu-devel] [PATCH 2/2] create_config: remove $(CONFIG_SOFTMMU) hack

2019-07-11 Thread Montes, Julio

lgtm, thanks Paolo


Reviewed-by: Julio Montes 
Tested-by: Julio Montes 

On Thu, 2019-07-11 at 19:22 +0200, Paolo Bonzini wrote:
> CONFIG_TPM is defined to a rather weird $(CONFIG_SOFTMMU) so that it
> expands to the right thing in hw/Makefile.objs.  This however is not
> needed anymore and it has a corresponding hack in create_config
> to turn it into "#define CONFIG_TPM 1".  Clean up.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  configure | 2 +-
>  scripts/create_config | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/configure b/configure
> index 4983c8b..eb635c3 100755
> --- a/configure
> +++ b/configure
> @@ -7159,7 +7159,7 @@ if test "$live_block_migration" = "yes" ; then
>  fi
>  
>  if test "$tpm" = "yes"; then
> -  echo 'CONFIG_TPM=$(CONFIG_SOFTMMU)' >> $config_host_mak
> +  echo 'CONFIG_TPM=y' >> $config_host_mak
>  fi
>  
>  echo "TRACE_BACKENDS=$trace_backends" >> $config_host_mak
> diff --git a/scripts/create_config b/scripts/create_config
> index 00e86c8..6d8f08b 100755
> --- a/scripts/create_config
> +++ b/scripts/create_config
> @@ -54,7 +54,7 @@ case $line in
>  done
>  echo "NULL"
>  ;;
> - CONFIG_*='$(CONFIG_SOFTMMU)'|CONFIG_*=y) # configuration
> + CONFIG_*=y) # configuration
>  name=${line%=*}
>  echo "#define $name 1"
>  ;;

[Qemu-devel] [RFC v4 16/29] vfio: Introduce helpers to DMA map/unmap a RAM section

2019-07-11 Thread Eric Auger

Let's introduce two helpers that allow to DMA map/unmap a RAM
section. Those helpers will be called for nested stage setup in
another call site. Also the vfio_listener_region_add/del()
structure may be clearer.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 177 ++-
 hw/vfio/trace-events |   4 +-
 2 files changed, 108 insertions(+), 73 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 81d29ce908..ef8452a4bc 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -510,13 +510,115 @@ hostwin_from_range(VFIOContainer *container, hwaddr 
iova, hwaddr end)
 return NULL;
 }
 
+static int vfio_dma_map_ram_section(VFIOContainer *container,
+MemoryRegionSection *section)
+{
+VFIOHostDMAWindow *hostwin;
+Int128 llend, llsize;
+hwaddr iova, end;
+void *vaddr;
+int ret;
+
+assert(memory_region_is_ram(section->mr));
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+end = int128_get64(int128_sub(llend, int128_one()));
+
+vaddr = memory_region_get_ram_ptr(section->mr) +
+section->offset_within_region +
+(iova - section->offset_within_address_space);
+
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
+error_report("vfio: IOMMU container %p can't map guest IOVA region"
+ " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
+ container, iova, end);
+return -EFAULT;
+}
+
+trace_vfio_dma_map_ram(iova, end, vaddr);
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+
+if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
+trace_vfio_listener_region_add_no_dma_map(
+memory_region_name(section->mr),
+section->offset_within_address_space,
+int128_getlo(section->size),
+pgmask + 1);
+return 0;
+}
+}
+
+ret = vfio_dma_map(container, iova, int128_get64(llsize),
+   vaddr, section->readonly);
+if (ret) {
+error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
+ container, iova, int128_get64(llsize), vaddr, ret);
+if (memory_region_is_ram_device(section->mr)) {
+/* Allow unexpected mappings not to be fatal for RAM devices */
+return 0;
+}
+return ret;
+}
+return 0;
+}
+
+static void vfio_dma_unmap_ram_section(VFIOContainer *container,
+   MemoryRegionSection *section)
+{
+Int128 llend, llsize;
+hwaddr iova, end;
+bool try_unmap = true;
+int ret;
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+if (int128_ge(int128_make64(iova), llend)) {
+return;
+}
+end = int128_get64(int128_sub(llend, int128_one()));
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+trace_vfio_dma_unmap_ram(iova, end);
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask;
+VFIOHostDMAWindow *hostwin = hostwin_from_range(container, iova, end);
+
+assert(hostwin); /* or region_add() would have failed */
+
+pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
+}
+
+if (try_unmap) {
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
+if (ret) {
+error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, int128_get64(llsize), ret);
+}
+}
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
 VFIOContainer *container = container_of(listener, VFIOContainer, listener);
 hwaddr iova, end;
-Int128 llend, llsize;
-void *vaddr;
+Int128 llend;
 int ret;
 VFIOHostDMAWindow *hostwin;
 
@@ -650,41 +752,9 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 
 /* Here we assume that memory_region_is_ram(section->mr)==true */
-
-vaddr = memory_region_get_ram_ptr(section->mr) +
-section->offset_within_region +
-(iova - section->offset_within_address_space);
-
-trace_vfio_listener_region_add_ram(iova, end, vaddr);
-
-llsize =

Re: [Qemu-devel] [PATCH v2 05/13] doc: update AMD SEV API spec web link

2019-07-11 Thread Dr. David Alan Gilbert

* Singh, Brijesh (brijesh.si...@amd.com) wrote:
> Signed-off-by: Brijesh Singh 
> ---
>  docs/amd-memory-encryption.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/docs/amd-memory-encryption.txt b/docs/amd-memory-encryption.txt
> index 43bf3ee6a5..abb9a976f5 100644
> --- a/docs/amd-memory-encryption.txt
> +++ b/docs/amd-memory-encryption.txt
> @@ -98,7 +98,7 @@ AMD Memory Encryption whitepaper:
>  
> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
>  
>  Secure Encrypted Virtualization Key Management:
> -[1] http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf
> +[1] https://developer.amd.com/sev/ (Secure Encrypted Virtualization API)

No; that reference [1] is used a few lines hire up for:

See SEV KM API Spec [1] 'Launching a guest' usage flow (Appendix A) for the
complete flow chart.


so that needs fixing up to actually point to that flowchart or
equivalent.

That site is useful to include, but I guess it also needs a pointer
to the Volume2 section 15.34 or the like.

Dave


>  KVM Forum slides:
>  
> http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 8/8] iotests/257: test traditional sync modes

2019-07-11 Thread John Snow




On 7/11/19 8:37 AM, Max Reitz wrote:
> On 11.07.19 05:21, John Snow wrote:
>>
>> On 7/10/19 4:46 PM, Max Reitz wrote:
>>> On 10.07.19 21:00, John Snow wrote:
 On 7/10/19 1:14 PM, Max Reitz wrote:
> On 10.07.19 03:05, John Snow wrote:
>
> Hm.  How useful is bitmap support for 'top' then, anyway?  That means
> that if you want to resume a top backup, you always have to resume it
> like it was a full backup.  Which sounds kind of useless.
>
> Max
>

 Good point!

 I think this can be fixed by doing an initialization pass of the
 copy_bitmap when sync=top to set only the allocated regions in the bitmap.

 This means that the write notifier won't copy out regions that are
 written to that weren't already in the top layer. I believe this is
 actually a bugfix; the data we'd copy out in such cases is actually in
 the backing layer and shouldn't be copied with sync=top.
>>>
>>> Now that you mention it...  I didn’t realize that.  Yes, you’re right.
>>>
 So this would have two effects:
 (1) sync=top gets a little more judicious about what it copies out on
 sync=top, and
 (2) the bitmap return value is more meaningful again.

>>
>> This might be harder than I first thought.
>>
>> initializing the copy_bitmap generally happens before we install the
>> write notifier, which means that it occurs before the first yield.
>>
>> However, checking the allocation status can potentially be very slow,
>> can't it? I can't just hog the thread while I check.
> 
> I was thinking about that myself.  It isn’t that bad, because you aren’t
> doing the full block_status dance but just checking allocation status,
> which is reasonably quick (it just needs to look at the image format
> metadata, it doesn’t go down to the protocol layer).
> 
> But it’s probably not so good to halt the monitor for this, yes.
> 
>> There are ways to cooperatively process write notifier interrupts and
>> continue to check allocated status once we enter the main loop, but the
>> problem there becomes: if we fail early, how can we tell if the backup
>> is worth resuming?
>>
>> We might not have reached a convergence point for the copy_bitmap before
>> we failed, and still have a lot of extra bits set.
> 
> Is that so bad?
> 
>> I suppose at least in the case where we aren't trying to save the
>> copy_bitmap and need it to mean something specific, this is a reasonable
>> approach to fixing sync=TOP.
>>
>> As far as resume is concerned, I don't think I have good ideas. I could
>> emit an event or something if you're using sync=top with a bitmap for
>> output, but that feels *so* specialized for a niche(?) command that I
>> don't know if it's worth pursuing.
>>
>> (Plus, even then, what do you do if it fails before you see that event?
>> You just have to give up on what we copied out? That seems like a waste
>> and not the point of this exercise.)
> 
> Before that event, the bitmap can still be usable, as long as all
> “unknown” areas are set to dirty.  Sure, your resumed backup will then
> copy too much data.  But who cares.
> 
> So I don’t think you even need an event.
> 
>> The only way I can think of at all to get a retry on sync=top is to take
>> an always policy, and to allow a special invocation with something like
>> mode=bitmap+top:
> 
> Yes, that was my first idea, too.  But I didn’t even write about it,
> because of...
> 
>> "Assume we need to copy anything set in the bitmap, unless it's not in
>> the top layer, and then skip it."
>>
>> Which seems awful, because it would be a specialty mode for the
>> exclusive purpose of re-trying sync=top backups.
> 
> ...exactly this.
> 
>> Meh.
> 
> I don’t think it’s all that bad.
> 

No, it's just not ideal and it's something I'd have to defend in a
patch. It's a caveat that would need documenting.

"Hey, depending on how far the job got before it failed, you might not
want to resume it because it may not have finished determining which
segments held allocated data. There's no way to tell if this happened or
not."

It makes the decision making process by e.g. libvirt harder, though
there are still some heuristics you could use, like:

- Is the bitmap count less than the size of the top image?
- Is it bigger?

And that might be good enough when deciding how to proceed. I suppose if
we want to give more precise mechanisms for this we'd always be within
our right to continue refining it and just document that it MIGHT have
extra bits set.

--js

Re: [Qemu-devel] [PATCH 2/2] create_config: remove $(CONFIG_SOFTMMU) hack

2019-07-11 Thread Philippe Mathieu-Daudé

On 7/11/19 7:22 PM, Paolo Bonzini wrote:
> CONFIG_TPM is defined to a rather weird $(CONFIG_SOFTMMU) so that it
> expands to the right thing in hw/Makefile.objs.  This however is not
> needed anymore and it has a corresponding hack in create_config
> to turn it into "#define CONFIG_TPM 1".  Clean up.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  configure | 2 +-
>  scripts/create_config | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/configure b/configure
> index 4983c8b..eb635c3 100755
> --- a/configure
> +++ b/configure
> @@ -7159,7 +7159,7 @@ if test "$live_block_migration" = "yes" ; then
>  fi
>  
>  if test "$tpm" = "yes"; then
> -  echo 'CONFIG_TPM=$(CONFIG_SOFTMMU)' >> $config_host_mak
> +  echo 'CONFIG_TPM=y' >> $config_host_mak
>  fi
>  
>  echo "TRACE_BACKENDS=$trace_backends" >> $config_host_mak
> diff --git a/scripts/create_config b/scripts/create_config
> index 00e86c8..6d8f08b 100755
> --- a/scripts/create_config
> +++ b/scripts/create_config
> @@ -54,7 +54,7 @@ case $line in
>  done
>  echo "NULL"
>  ;;
> - CONFIG_*='$(CONFIG_SOFTMMU)'|CONFIG_*=y) # configuration
> + CONFIG_*=y) # configuration
>  name=${line%=*}
>  echo "#define $name 1"
>  ;;
> 

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé

[Qemu-devel] [RFC v5 28/29] hw/arm/smmuv3: Implement fault injection

2019-07-11 Thread Eric Auger

We convert iommu_fault structs received from the kernel
into the data struct used by the emulation code and record
the evnts into the virtual event queue.

Signed-off-by: Eric Auger 

---

v3 -> v4:
- fix compil issue on mingw

Exhaustive mapping remains to be done
---
 hw/arm/smmuv3.c | 71 +
 1 file changed, 71 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 4474682a33..bca7ecb147 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1553,6 +1553,76 @@ static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
 return -EINVAL;
 }
 
+struct iommu_fault;
+
+static inline int
+smmuv3_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+ struct iommu_fault *buf)
+{
+#ifdef __linux__
+SMMUDevice *sdev = container_of(iommu_mr, SMMUDevice, iommu);
+SMMUv3State *s3 = sdev->smmu;
+uint32_t sid = smmu_get_sid(sdev);
+int i;
+
+for (i = 0; i < count; i++) {
+SMMUEventInfo info = {};
+struct iommu_fault_unrecoverable *record;
+
+if (buf[i].type != IOMMU_FAULT_DMA_UNRECOV) {
+continue;
+}
+
+info.sid = sid;
+record = [i].event;
+
+switch (record->reason) {
+case IOMMU_FAULT_REASON_PASID_INVALID:
+info.type = SMMU_EVT_C_BAD_SUBSTREAMID;
+/* TODO further fill info.u.c_bad_substream */
+break;
+case IOMMU_FAULT_REASON_PASID_FETCH:
+info.type = SMMU_EVT_F_CD_FETCH;
+break;
+case IOMMU_FAULT_REASON_BAD_PASID_ENTRY:
+info.type = SMMU_EVT_C_BAD_CD;
+/* TODO further fill info.u.c_bad_cd */
+break;
+case IOMMU_FAULT_REASON_WALK_EABT:
+info.type = SMMU_EVT_F_WALK_EABT;
+info.u.f_walk_eabt.addr = record->addr;
+info.u.f_walk_eabt.addr2 = record->fetch_addr;
+break;
+case IOMMU_FAULT_REASON_PTE_FETCH:
+info.type = SMMU_EVT_F_TRANSLATION;
+info.u.f_translation.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_OOR_ADDRESS:
+info.type = SMMU_EVT_F_ADDR_SIZE;
+info.u.f_addr_size.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_ACCESS:
+info.type = SMMU_EVT_F_ACCESS;
+info.u.f_access.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_PERMISSION:
+info.type = SMMU_EVT_F_PERMISSION;
+info.u.f_permission.addr = record->addr;
+break;
+default:
+warn_report("%s Unexpected fault reason received from host: %d",
+__func__, record->reason);
+continue;
+}
+
+smmuv3_record_event(s3, );
+}
+return 0;
+#else
+return -1;
+#endif
+}
+
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
   void *data)
 {
@@ -1561,6 +1631,7 @@ static void 
smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
 imrc->translate = smmuv3_translate;
 imrc->notify_flag_changed = smmuv3_notify_flag_changed;
 imrc->get_attr = smmuv3_get_attr;
+imrc->inject_faults = smmuv3_inject_faults;
 }
 
 static const TypeInfo smmuv3_type_info = {
-- 
2.20.1

Re: [Qemu-devel] [PATCH v2 02/13] kvm: introduce high-level API to support encrypted page migration

2019-07-11 Thread Dr. David Alan Gilbert

* Singh, Brijesh (brijesh.si...@amd.com) wrote:
> When memory encryption is enabled in VM, the guest pages will be
> encrypted with the guest-specific key, to protect the confidentiality
> of data in transit. To support the live migration we need to use
> platform specific hooks to access the guest memory.
> 
> The kvm_memcrypt_save_outgoing_page() can be used by the sender to write
> the encrypted pages and metadata associated with it on the socket.
> 
> The kvm_memcrypt_load_incoming_page() can be used by receiver to read the
> incoming encrypted pages from the socket and load into the guest memory.
> 
> Signed-off-by: Brijesh Singh <>
> ---
>  accel/kvm/kvm-all.c| 27 +++
>  accel/kvm/sev-stub.c   | 11 +++
>  accel/stubs/kvm-stub.c | 12 
>  include/sysemu/kvm.h   | 12 
>  include/sysemu/sev.h   |  3 +++
>  5 files changed, 65 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 3d86ae5052..162a2d5085 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -110,6 +110,10 @@ struct KVMState
>  /* memory encryption */
>  void *memcrypt_handle;
>  int (*memcrypt_encrypt_data)(void *handle, uint8_t *ptr, uint64_t len);
> +int (*memcrypt_save_outgoing_page)(void *ehandle, QEMUFile *f,
> +uint8_t *ptr, uint32_t sz, uint64_t *bytes_sent);
> +int (*memcrypt_load_incoming_page)(void *ehandle, QEMUFile *f,
> +uint8_t *ptr);
>  };
>  
>  KVMState *kvm_state;
> @@ -165,6 +169,29 @@ int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
>  return 1;
>  }
>  
> +int kvm_memcrypt_save_outgoing_page(QEMUFile *f, uint8_t *ptr,
> +uint32_t size, uint64_t *bytes_sent)
> +{
> +if (kvm_state->memcrypt_handle &&
> +kvm_state->memcrypt_save_outgoing_page) {
> +return 
> kvm_state->memcrypt_save_outgoing_page(kvm_state->memcrypt_handle,
> +f, ptr, size, bytes_sent);
> +}
> +
> +return 1;

This needs to be commented saying what the return values mean.
I'm not sure what '1' means for the case when this didn't have
encryption support.

> +}
> +
> +int kvm_memcrypt_load_incoming_page(QEMUFile *f, uint8_t *ptr)
> +{
> +if (kvm_state->memcrypt_handle &&
> +kvm_state->memcrypt_load_incoming_page) {
> +return 
> kvm_state->memcrypt_load_incoming_page(kvm_state->memcrypt_handle,
> +f, ptr);
> +}
> +
> +return 1;
> +}
> +
>  static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
>  {
>  KVMState *s = kvm_state;
> diff --git a/accel/kvm/sev-stub.c b/accel/kvm/sev-stub.c
> index 4f97452585..c12a8e005e 100644
> --- a/accel/kvm/sev-stub.c
> +++ b/accel/kvm/sev-stub.c
> @@ -24,3 +24,14 @@ void *sev_guest_init(const char *id)
>  {
>  return NULL;
>  }
> +
> +int sev_save_outgoing_page(void *handle, QEMUFile *f, uint8_t *ptr,
> +   uint32_t size, uint64_t *bytes_sent)
> +{
> +return 1;
> +}
> +
> +int sev_load_incoming_page(void *handle, QEMUFile *f, uint8_t *ptr)
> +{
> +return 1;
> +}
> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
> index 6feb66ed80..e14b879531 100644
> --- a/accel/stubs/kvm-stub.c
> +++ b/accel/stubs/kvm-stub.c
> @@ -114,6 +114,18 @@ int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
>return 1;
>  }
>  
> +int kvm_memcrypt_save_outgoing_page(QEMUFile *f, uint8_t *ptr,
> +uint32_t size, uint64_t *bytes_sent)
> +{
> +return 1;
> +}
> +
> +int kvm_memcrypt_load_incoming_page(QEMUFile *f, uint8_t *ptr)
> +{
> +return 1;
> +}
> +
> +
>  #ifndef CONFIG_USER_ONLY
>  int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
>  {
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index acd90aebb6..bb6bcc143c 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -247,6 +247,18 @@ bool kvm_memcrypt_enabled(void);
>   */
>  int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len);
>  
> +/**
> + * kvm_memcrypt_save_outgoing_buffer - encrypt the outgoing buffer
> + * and write to the wire.
> + */
> +int kvm_memcrypt_save_outgoing_page(QEMUFile *f, uint8_t *ptr, uint32_t size,
> +uint64_t *bytes_sent);
> +
> +/**
> + * kvm_memcrypt_load_incoming_buffer - read the encrypt incoming buffer and 
> copy
> + * the buffer into the guest memory space.
> + */
> +int kvm_memcrypt_load_incoming_page(QEMUFile *f, uint8_t *ptr);
>  
>  #ifdef NEED_CPU_H
>  #include "cpu.h"
> diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
> index 98c1ec8d38..752a71b1c0 100644
> --- a/include/sysemu/sev.h
> +++ b/include/sysemu/sev.h
> @@ -18,4 +18,7 @@
>  
>  void *sev_guest_init(const char *id);
>  int sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len);
> +int sev_save_outgoing_page(void *handle, QEMUFile *f, uint8_t *ptr,
> +   uint32_t size, uint64_t

[Qemu-devel] [PATCH] target-i386: add CPUID bit for MSR_KVM_POLL_CONTROL

2019-07-11 Thread Paolo Bonzini

Cc: Marcelo Tosatti 
Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f2d868f..bc8853d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -902,7 +902,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 "kvmclock", "kvm-nopiodelay", "kvm-mmu", "kvmclock",
 "kvm-asyncpf", "kvm-steal-time", "kvm-pv-eoi", "kvm-pv-unhalt",
 NULL, "kvm-pv-tlb-flush", NULL, "kvm-pv-ipi",
-NULL, "kvm-pv-sched-yield", NULL, NULL,
+"kvm-poll-control", "kvm-pv-sched-yield", NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 "kvmclock-stable-bit", NULL, NULL, NULL,
-- 
1.8.3.1

[Qemu-devel] [RFC v5 21/29] vfio/pci: Set up the DMA FAULT region

2019-07-11 Thread Eric Auger

Set up the fault region which is composed of the actual fault
queue (mmappable) and a header used to handle it. The fault
queue is mmapped.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- use a single DMA FAULT region. No version selection anymore
---
 hw/vfio/pci.c | 64 +++
 hw/vfio/pci.h |  1 +
 2 files changed, 65 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 19702cdbbf..8c8647c4b5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2511,11 +2511,67 @@ int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
 return 0;
 }
 
+static void vfio_init_fault_regions(VFIOPCIDevice *vdev, Error **errp)
+{
+struct vfio_region_info *fault_region_info = NULL;
+struct vfio_region_info_cap_fault *cap_fault;
+VFIODevice *vbasedev = >vbasedev;
+struct vfio_info_cap_header *hdr;
+char *fault_region_name;
+int ret;
+
+ret = vfio_get_dev_region_info(>vbasedev,
+   VFIO_REGION_TYPE_NESTED,
+   VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT,
+   _region_info);
+if (ret) {
+goto out;
+}
+
+hdr = vfio_get_region_info_cap(fault_region_info,
+   VFIO_REGION_INFO_CAP_DMA_FAULT);
+if (!hdr) {
+error_setg(errp, "failed to retrieve DMA FAULT capability");
+goto out;
+}
+cap_fault = container_of(hdr, struct vfio_region_info_cap_fault,
+ header);
+if (cap_fault->version != 1) {
+error_setg(errp, "Unsupported DMA FAULT API version %d",
+   cap_fault->version);
+goto out;
+}
+
+fault_region_name = g_strdup_printf("%s DMA FAULT %d",
+vbasedev->name,
+fault_region_info->index);
+
+ret = vfio_region_setup(OBJECT(vdev), vbasedev,
+>dma_fault_region,
+fault_region_info->index,
+fault_region_name);
+g_free(fault_region_name);
+if (ret) {
+error_setg_errno(errp, -ret,
+ "failed to set up the DMA FAULT region %d",
+ fault_region_info->index);
+goto out;
+}
+
+ret = vfio_region_mmap(>dma_fault_region);
+if (ret) {
+error_setg_errno(errp, -ret, "Failed to mmap the DMA FAULT queue");
+}
+out:
+g_free(fault_region_info);
+}
+
 static void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
 {
 VFIODevice *vbasedev = >vbasedev;
 struct vfio_region_info *reg_info;
 struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+Error *err = NULL;
 int i, ret = -1;
 
 /* Sanity check device */
@@ -2579,6 +2635,12 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 }
 }
 
+vfio_init_fault_regions(vdev, );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
 irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
 ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
@@ -3159,6 +3221,7 @@ static void vfio_instance_finalize(Object *obj)
 
 vfio_display_finalize(vdev);
 vfio_bars_finalize(vdev);
+vfio_region_finalize(>dma_fault_region);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
 /*
@@ -3179,6 +3242,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
 vfio_unregister_ext_irq_notifiers(vdev);
+vfio_region_exit(>dma_fault_region);
 pci_device_set_intx_routing_notifier(>pdev, NULL);
 vfio_disable_interrupts(vdev);
 if (vdev->intx.mmap_timer) {
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 893d074375..815154656c 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -141,6 +141,7 @@ typedef struct VFIOPCIDevice {
 EventNotifier err_notifier;
 EventNotifier req_notifier;
 VFIOPCIExtIRQ *ext_irqs;
+VFIORegion dma_fault_region;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
-- 
2.20.1

[Qemu-devel] [RFC v5 16/29] vfio: Introduce helpers to DMA map/unmap a RAM section

2019-07-11 Thread Eric Auger

Let's introduce two helpers that allow to DMA map/unmap a RAM
section. Those helpers will be called for nested stage setup in
another call site. Also the vfio_listener_region_add/del()
structure may be clearer.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 177 ++-
 hw/vfio/trace-events |   4 +-
 2 files changed, 108 insertions(+), 73 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 81d29ce908..ef8452a4bc 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -510,13 +510,115 @@ hostwin_from_range(VFIOContainer *container, hwaddr 
iova, hwaddr end)
 return NULL;
 }
 
+static int vfio_dma_map_ram_section(VFIOContainer *container,
+MemoryRegionSection *section)
+{
+VFIOHostDMAWindow *hostwin;
+Int128 llend, llsize;
+hwaddr iova, end;
+void *vaddr;
+int ret;
+
+assert(memory_region_is_ram(section->mr));
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+end = int128_get64(int128_sub(llend, int128_one()));
+
+vaddr = memory_region_get_ram_ptr(section->mr) +
+section->offset_within_region +
+(iova - section->offset_within_address_space);
+
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
+error_report("vfio: IOMMU container %p can't map guest IOVA region"
+ " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
+ container, iova, end);
+return -EFAULT;
+}
+
+trace_vfio_dma_map_ram(iova, end, vaddr);
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+
+if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
+trace_vfio_listener_region_add_no_dma_map(
+memory_region_name(section->mr),
+section->offset_within_address_space,
+int128_getlo(section->size),
+pgmask + 1);
+return 0;
+}
+}
+
+ret = vfio_dma_map(container, iova, int128_get64(llsize),
+   vaddr, section->readonly);
+if (ret) {
+error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
+ container, iova, int128_get64(llsize), vaddr, ret);
+if (memory_region_is_ram_device(section->mr)) {
+/* Allow unexpected mappings not to be fatal for RAM devices */
+return 0;
+}
+return ret;
+}
+return 0;
+}
+
+static void vfio_dma_unmap_ram_section(VFIOContainer *container,
+   MemoryRegionSection *section)
+{
+Int128 llend, llsize;
+hwaddr iova, end;
+bool try_unmap = true;
+int ret;
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+if (int128_ge(int128_make64(iova), llend)) {
+return;
+}
+end = int128_get64(int128_sub(llend, int128_one()));
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+trace_vfio_dma_unmap_ram(iova, end);
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask;
+VFIOHostDMAWindow *hostwin = hostwin_from_range(container, iova, end);
+
+assert(hostwin); /* or region_add() would have failed */
+
+pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
+}
+
+if (try_unmap) {
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
+if (ret) {
+error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, int128_get64(llsize), ret);
+}
+}
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
 VFIOContainer *container = container_of(listener, VFIOContainer, listener);
 hwaddr iova, end;
-Int128 llend, llsize;
-void *vaddr;
+Int128 llend;
 int ret;
 VFIOHostDMAWindow *hostwin;
 
@@ -650,41 +752,9 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 
 /* Here we assume that memory_region_is_ram(section->mr)==true */
-
-vaddr = memory_region_get_ram_ptr(section->mr) +
-section->offset_within_region +
-(iova - section->offset_within_address_space);
-
-trace_vfio_listener_region_add_ram(iova, end, vaddr);
-
-llsize =

[Qemu-devel] [RFC v5 18/29] vfio: Pass stage 1 MSI bindings to the host

2019-07-11 Thread Eric Auger

We register the stage1 MSI bindings when enabling the vectors
and we unregister them on container disconnection.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- use VFIO_IOMMU_SET_MSI_BINDING

v2 -> v3:
- only register the notifier if the IOMMU translates MSIs
- record the msi bindings in a container list and unregister on
  container release
---
 hw/vfio/common.c  | 52 +++
 hw/vfio/pci.c | 51 +-
 hw/vfio/trace-events  |  2 ++
 include/hw/vfio/vfio-common.h |  9 ++
 4 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index bd975c5b83..4bbce6a43a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -490,6 +490,56 @@ static void vfio_iommu_unmap_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 }
 }
 
+int vfio_iommu_set_msi_binding(VFIOContainer *container,
+   IOMMUTLBEntry *iotlb)
+{
+struct vfio_iommu_type1_set_msi_binding ustruct;
+VFIOMSIBinding *binding;
+int ret;
+
+QLIST_FOREACH(binding, >msibinding_list, next) {
+if (binding->iova == iotlb->iova) {
+return 0;
+}
+}
+
+ustruct.argsz = sizeof(struct vfio_iommu_type1_set_msi_binding);
+ustruct.iova = iotlb->iova;
+ustruct.flags = VFIO_IOMMU_BIND_MSI;
+ustruct.gpa = iotlb->translated_addr;
+ustruct.size = iotlb->addr_mask + 1;
+ret = ioctl(container->fd, VFIO_IOMMU_SET_MSI_BINDING , );
+if (ret) {
+error_report("%s: failed to register the stage1 MSI binding (%m)",
+ __func__);
+return ret;
+}
+binding =  g_new0(VFIOMSIBinding, 1);
+binding->iova = ustruct.iova;
+binding->gpa = ustruct.gpa;
+binding->size = ustruct.size;
+
+QLIST_INSERT_HEAD(>msibinding_list, binding, next);
+return 0;
+}
+
+static void vfio_container_unbind_msis(VFIOContainer *container)
+{
+VFIOMSIBinding *binding, *tmp;
+
+QLIST_FOREACH_SAFE(binding, >msibinding_list, next, tmp) {
+struct vfio_iommu_type1_set_msi_binding ustruct;
+
+/* the MSI doorbell is not used anymore, unregister it */
+ustruct.argsz = sizeof(struct vfio_iommu_type1_set_msi_binding);
+ustruct.flags = VFIO_IOMMU_UNBIND_MSI;
+ustruct.iova = binding->iova;
+ioctl(container->fd, VFIO_IOMMU_SET_MSI_BINDING , );
+QLIST_REMOVE(binding, next);
+g_free(binding);
+}
+}
+
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
@@ -1589,6 +1639,8 @@ static void vfio_disconnect_container(VFIOGroup *group)
 g_free(giommu);
 }
 
+vfio_container_unbind_msis(container);
+
 trace_vfio_disconnect_container(container->fd);
 close(container->fd);
 g_free(container);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 729f1f353e..45e007575e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -359,6 +359,49 @@ static void vfio_msi_interrupt(void *opaque)
 notify(>pdev, nr);
 }
 
+static int vfio_register_msi_binding(VFIOPCIDevice *vdev, int vector_n)
+{
+VFIOContainer *container = vdev->vbasedev.group->container;
+PCIDevice *dev = >pdev;
+AddressSpace *as = pci_device_iommu_address_space(dev);
+MSIMessage msg = pci_get_msi_message(dev, vector_n);
+IOMMUMemoryRegionClass *imrc;
+IOMMUMemoryRegion *iommu_mr;
+bool msi_translate = false, nested = false;;
+IOMMUTLBEntry entry;
+
+if (as == _space_memory) {
+return 0;
+}
+
+iommu_mr = IOMMU_MEMORY_REGION(as->root);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_MSI_TRANSLATE,
+ (void *)_translate);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
+ (void *));
+imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
+
+if (!nested || !msi_translate) {
+return 0;
+}
+
+/* MSI doorbell address is translated by an IOMMU */
+
+rcu_read_lock();
+entry = imrc->translate(iommu_mr, msg.address, IOMMU_WO, 0);
+rcu_read_unlock();
+
+if (entry.perm == IOMMU_NONE) {
+return -ENOENT;
+}
+
+trace_vfio_register_msi_binding(vdev->vbasedev.name, vector_n,
+msg.address, entry.translated_addr);
+
+vfio_iommu_set_msi_binding(container, );
+return 0;
+}
+
 static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
 {
 struct vfio_irq_set *irq_set;
@@ -376,7 +419,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool 
msix)
 fds = (int32_t *)_set->data;
 
 for (i = 0; i < vdev->nr_vectors; i++) {
-int fd = -1;
+int ret, fd = -1;
 
 /*
  * MSI vs MSI-X - The guest has direct access to MSI mask and pending
@@ -391,6 +434,12 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool 
msix)

[Qemu-devel] [RFC v5 15/29] vfio: Introduce hostwin_from_range helper

2019-07-11 Thread Eric Auger

Let's introduce a hostwin_from_range() helper that returns the
hostwin encapsulating an IOVA range or NULL if non is found.

This improves the readibility of callers and removes the usage
of hostwin_found.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 46a1a089a4..81d29ce908 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -497,6 +497,19 @@ out:
 rcu_read_unlock();
 }
 
+static VFIOHostDMAWindow *
+hostwin_from_range(VFIOContainer *container, hwaddr iova, hwaddr end)
+{
+VFIOHostDMAWindow *hostwin;
+
+QLIST_FOREACH(hostwin, >hostwin_list, hostwin_next) {
+if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
+return hostwin;
+}
+}
+return NULL;
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -506,7 +519,6 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 void *vaddr;
 int ret;
 VFIOHostDMAWindow *hostwin;
-bool hostwin_found;
 
 if (vfio_listener_skipped_section(section)) {
 trace_vfio_listener_region_add_skip(
@@ -583,15 +595,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 #endif
 }
 
-hostwin_found = false;
-QLIST_FOREACH(hostwin, >hostwin_list, hostwin_next) {
-if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-hostwin_found = true;
-break;
-}
-}
-
-if (!hostwin_found) {
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
 error_report("vfio: IOMMU container %p can't map guest IOVA region"
  " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
  container, iova, end);
@@ -763,16 +768,9 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 
 if (memory_region_is_ram_device(section->mr)) {
 hwaddr pgmask;
-VFIOHostDMAWindow *hostwin;
-bool hostwin_found = false;
+VFIOHostDMAWindow *hostwin = hostwin_from_range(container, iova, end);
 
-QLIST_FOREACH(hostwin, >hostwin_list, hostwin_next) {
-if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-hostwin_found = true;
-break;
-}
-}
-assert(hostwin_found); /* or region_add() would have failed */
+assert(hostwin); /* or region_add() would have failed */
 
 pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
 try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
-- 
2.20.1

[Qemu-devel] [RFC v5 27/29] hw/arm/smmuv3: Pass stage 1 configurations to the host

2019-07-11 Thread Eric Auger

In case PASID PciOps are set for the device we call
the set_pasid_table() callback on each STE update.

This allows to pass the guest stage 1 configuration
to the host and apply it at physical level.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- Use PciOps instead of config notifiers

v3 -> v4:
- fix compile issue with mingw

v2 -> v3:
- adapt to pasid_cfg field changes. Use local variable
- add trace event
- set version fields
- use CONFIG_PASID

v1 -> v2:
- do not notify anymore on CD change. Anyway the smmuv3 linux
  driver is not sending any CD invalidation commands. If we were
  to propagate CD invalidation commands, we would use the
  CACHE_INVALIDATE VFIO ioctl.
- notify a precise config flags to prepare for addition of new
  flags
---
 hw/arm/smmuv3.c | 77 +++--
 hw/arm/trace-events |  1 +
 2 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 2a6bf78a8e..4474682a33 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -16,6 +16,10 @@
  * with this program; if not, see .
  */
 
+#ifdef __linux__
+#include "linux/iommu.h"
+#endif
+
 #include "qemu/osdep.h"
 #include "hw/boards.h"
 #include "sysemu/sysemu.h"
@@ -847,6 +851,60 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid,
 }
 }
 
+static void smmuv3_notify_config_change(SMMUState *bs, uint32_t sid)
+{
+#ifdef __linux__
+IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, sid);
+SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
+   .inval_ste_allowed = true};
+IOMMUConfig iommu_config;
+SMMUTransCfg *cfg;
+SMMUDevice *sdev;
+
+if (!mr) {
+return;
+}
+
+sdev = container_of(mr, SMMUDevice, iommu);
+
+/* flush QEMU config cache */
+smmuv3_flush_config(sdev);
+
+if (!pci_device_is_pasid_ops_set(sdev->bus, sdev->devfn)) {
+return;
+}
+
+cfg = smmuv3_get_config(sdev, );
+
+if (!cfg) {
+return;
+}
+
+iommu_config.pasid_cfg.version = PASID_TABLE_CFG_VERSION_1;
+iommu_config.pasid_cfg.format = IOMMU_PASID_FORMAT_SMMUV3;
+iommu_config.pasid_cfg.base_ptr = cfg->s1ctxptr;
+iommu_config.pasid_cfg.pasid_bits = 0;
+iommu_config.pasid_cfg.smmuv3.version = PASID_TABLE_SMMUV3_CFG_VERSION_1;
+
+if (cfg->disabled || cfg->bypassed) {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_BYPASS;
+} else if (cfg->aborted) {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_ABORT;
+} else {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_TRANSLATE;
+}
+
+trace_smmuv3_notify_config_change(mr->parent_obj.name,
+  iommu_config.pasid_cfg.config,
+  iommu_config.pasid_cfg.base_ptr);
+
+if (pci_device_set_pasid_table(sdev->bus, sdev->devfn, _config)) {
+error_report("Failed to pass PASID table to host for iommu mr %s (%m)",
+ mr->parent_obj.name);
+}
+#endif
+}
+
 static int smmuv3_cmdq_consume(SMMUv3State *s)
 {
 SMMUState *bs = ARM_SMMU(s);
@@ -897,22 +955,14 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 case SMMU_CMD_CFGI_STE:
 {
 uint32_t sid = CMD_SID();
-IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, sid);
-SMMUDevice *sdev;
 
 if (CMD_SSEC()) {
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
 
-if (!mr) {
-break;
-}
-
 trace_smmuv3_cmdq_cfgi_ste(sid);
-sdev = container_of(mr, SMMUDevice, iommu);
-smmuv3_flush_config(sdev);
-
+smmuv3_notify_config_change(bs, sid);
 break;
 }
 case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
@@ -929,14 +979,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 trace_smmuv3_cmdq_cfgi_ste_range(start, end);
 
 for (i = start; i <= end; i++) {
-IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, i);
-SMMUDevice *sdev;
-
-if (!mr) {
-continue;
-}
-sdev = container_of(mr, SMMUDevice, iommu);
-smmuv3_flush_config(sdev);
+smmuv3_notify_config_change(bs, i);
 }
 break;
 }
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 3809005cba..741e645ae2 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -52,4 +52,5 @@ smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for 
sid %d"
 smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier node for iommu 
mr=%s"
 smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu 
mr=%s"
 smmuv3_inv_notifiers_iova(const char *name, uint16_t asid, uint64_t iova) 
"iommu mr=%s asid=%d iova=0x%"PRIx64
+smmuv3_notify_config_change(const char *name,

[Qemu-devel] [RFC v5 26/29] hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation

2019-07-11 Thread Eric Auger

Let's propagate the leaf attribute throughout the invalidation path.
This hint is used to reduce the scope of the invalidations to the
last level of translation. Not enforcing it induces large performance
penalties in nested mode.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 16 +---
 hw/arm/trace-events |  2 +-
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 8c88923f73..2a6bf78a8e 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -797,8 +797,7 @@ epilogue:
  */
 static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
IOMMUNotifier *n,
-   int asid,
-   dma_addr_t iova)
+   int asid, dma_addr_t iova, bool leaf)
 {
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
 SMMUEventInfo event = {.inval_ste_allowed = true};
@@ -825,12 +824,14 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 entry.addr_mask = (1 << tt->granule_sz) - 1;
 entry.perm = IOMMU_NONE;
 entry.arch_id = asid;
+entry.leaf = leaf;
 
 memory_region_notify_one(n, );
 }
 
 /* invalidate an asid/iova tuple in all mr's */
-static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, dma_addr_t iova)
+static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid,
+  dma_addr_t iova, bool leaf)
 {
 SMMUDevice *sdev;
 
@@ -841,7 +842,7 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid, dma_addr_t iova)
 trace_smmuv3_inv_notifiers_iova(mr->parent_obj.name, asid, iova);
 
 IOMMU_NOTIFIER_FOREACH(n, mr) {
-smmuv3_notify_iova(mr, n, asid, iova);
+smmuv3_notify_iova(mr, n, asid, iova, leaf);
 }
 }
 }
@@ -979,9 +980,10 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 {
 dma_addr_t addr = CMD_ADDR();
 uint16_t vmid = CMD_VMID();
+bool leaf = CMD_LEAF();
 
-trace_smmuv3_cmdq_tlbi_nh_vaa(vmid, addr);
-smmuv3_inv_notifiers_iova(bs, -1, addr);
+trace_smmuv3_cmdq_tlbi_nh_vaa(vmid, addr, leaf);
+smmuv3_inv_notifiers_iova(bs, -1, addr, leaf);
 smmu_iotlb_inv_all(bs);
 break;
 }
@@ -993,7 +995,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 bool leaf = CMD_LEAF();
 
 trace_smmuv3_cmdq_tlbi_nh_va(vmid, asid, addr, leaf);
-smmuv3_inv_notifiers_iova(bs, asid, addr);
+smmuv3_inv_notifiers_iova(bs, asid, addr, leaf);
 smmu_iotlb_inv_iova(bs, asid, addr);
 break;
 }
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 0acedcedc6..3809005cba 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -43,7 +43,7 @@ smmuv3_cmdq_cfgi_cd(uint32_t sid) "streamid = %d"
 smmuv3_config_cache_hit(uint32_t sid, uint32_t hits, uint32_t misses, uint32_t 
perc) "Config cache HIT for sid %d (hits=%d, misses=%d, hit rate=%d)"
 smmuv3_config_cache_miss(uint32_t sid, uint32_t hits, uint32_t misses, 
uint32_t perc) "Config cache MISS for sid %d (hits=%d, misses=%d, hit rate=%d)"
 smmuv3_cmdq_tlbi_nh_va(int vmid, int asid, uint64_t addr, bool leaf) "vmid =%d 
asid =%d addr=0x%"PRIx64" leaf=%d"
-smmuv3_cmdq_tlbi_nh_vaa(int vmid, uint64_t addr) "vmid =%d addr=0x%"PRIx64
+smmuv3_cmdq_tlbi_nh_vaa(int vmid, uint64_t addr, bool leaf) "vmid =%d 
addr=0x%"PRIx64" leaf=%d"
 smmuv3_cmdq_tlbi_nh(void) ""
 smmuv3_cmdq_tlbi_nh_asid(uint16_t asid) "asid=%d"
 smmu_iotlb_cache_hit(uint16_t asid, uint64_t addr, uint32_t hit, uint32_t 
miss, uint32_t p) "IOTLB cache HIT asid=%d addr=0x%"PRIx64" hit=%d miss=%d hit 
rate=%d"
-- 
2.20.1

[Qemu-devel] [RFC v5 29/29] vfio: Remove VFIO/SMMUv3 assert

2019-07-11 Thread Eric Auger

Now all the bricks are there, let allow VFIO/SMMUv3 use case.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 8a2d201058..c849b084bf 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -834,17 +834,9 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 IOMMUNotify notify;
 VFIOGuestIOMMU *giommu;
 IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
-bool nested;
 int iommu_idx, flags;
 
 trace_vfio_listener_region_add_iommu(iova, end);
-
-if (!memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
-  (void *)) && nested) {
-error_report("VFIO/vIOMMU integration based on HW nested paging "
- "is not yet supported");
-abort();
-}
 /*
  * FIXME: For VFIO iommu types which have KVM acceleration to
  * avoid bouncing all map/unmaps through qemu this way, this
-- 
2.20.1

[Qemu-devel] [RFC v5 14/29] vfio: Force nested if iommu requires it

2019-07-11 Thread Eric Auger

In case we detect the address space is translated by
a virtual IOMMU which requires HW nested paging to
integrate with VFIO, let's set up the container with
the VFIO_TYPE1_NESTING_IOMMU iommu_type.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- fail immediatly if nested is wanted but not supported

v2 -> v3:
- add "nested only is selected if requested by @force_nested"
  comment in this patch
---
 hw/vfio/common.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d622191fe6..46a1a089a4 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1144,27 +1144,38 @@ static void vfio_put_address_space(VFIOAddressSpace 
*space)
  * vfio_get_iommu_type - selects the richest iommu_type (v2 first)
  */
 static int vfio_get_iommu_type(VFIOContainer *container,
+   bool want_nested,
Error **errp)
 {
-int iommu_types[] = { VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
+int iommu_types[] = { VFIO_TYPE1_NESTING_IOMMU,
+  VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
   VFIO_SPAPR_TCE_v2_IOMMU, VFIO_SPAPR_TCE_IOMMU };
-int i;
+int i, ret = -EINVAL;
 
 for (i = 0; i < ARRAY_SIZE(iommu_types); i++) {
 if (ioctl(container->fd, VFIO_CHECK_EXTENSION, iommu_types[i])) {
-return iommu_types[i];
+if (iommu_types[i] == VFIO_TYPE1_NESTING_IOMMU && !want_nested) {
+continue;
+}
+ret = iommu_types[i];
+break;
 }
 }
-error_setg(errp, "No available IOMMU models");
-return -EINVAL;
+if (ret < 0) {
+error_setg(errp, "No available IOMMU models");
+} else if (want_nested && ret != VFIO_TYPE1_NESTING_IOMMU) {
+error_setg(errp, "Nested mode requested but not supported");
+ret = -EINVAL;
+}
+return ret;
 }
 
 static int vfio_init_container(VFIOContainer *container, int group_fd,
-   Error **errp)
+   bool want_nested, Error **errp)
 {
 int iommu_type, ret;
 
-iommu_type = vfio_get_iommu_type(container, errp);
+iommu_type = vfio_get_iommu_type(container, want_nested, errp);
 if (iommu_type < 0) {
 return iommu_type;
 }
@@ -1200,6 +1211,14 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 VFIOContainer *container;
 int ret, fd;
 VFIOAddressSpace *space;
+IOMMUMemoryRegion *iommu_mr;
+bool nested = false;
+
+if (as != _space_memory && memory_region_is_iommu(as->root)) {
+iommu_mr = IOMMU_MEMORY_REGION(as->root);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
+ (void *));
+}
 
 space = vfio_get_address_space(as);
 
@@ -1260,12 +1279,13 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 QLIST_INIT(>giommu_list);
 QLIST_INIT(>hostwin_list);
 
-ret = vfio_init_container(container, group->fd, errp);
+ret = vfio_init_container(container, group->fd, nested, errp);
 if (ret) {
 goto free_container_exit;
 }
 
 switch (container->iommu_type) {
+case VFIO_TYPE1_NESTING_IOMMU:
 case VFIO_TYPE1v2_IOMMU:
 case VFIO_TYPE1_IOMMU:
 {
-- 
2.20.1

[Qemu-devel] [RFC v5 25/29] hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation

2019-07-11 Thread Eric Auger

When the guest invalidates one S1 entry, it passes the asid.
When propagating this invalidation downto the host, the asid
information also must be passed. So let's fill the arch_id field
introduced for that purpose.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index f7497de9e4..8c88923f73 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -824,6 +824,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 entry.iova = iova;
 entry.addr_mask = (1 << tt->granule_sz) - 1;
 entry.perm = IOMMU_NONE;
+entry.arch_id = asid;
 
 memory_region_notify_one(n, );
 }
-- 
2.20.1

[Qemu-devel] [RFC v5 23/29] hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute

2019-07-11 Thread Eric Auger

The SMMUv3 has the peculiarity to translate MSI
transactionss. let's advertise the corresponding
attribute.

Signed-off-by: Eric Auger 

---
---
 hw/arm/smmuv3.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 517755aed5..9372b15b34 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1499,6 +1499,9 @@ static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
 if (attr == IOMMU_ATTR_VFIO_NESTED) {
 *(bool *) data = true;
 return 0;
+} else if (attr == IOMMU_ATTR_MSI_TRANSLATE) {
+*(bool *) data = true;
+return 0;
 }
 return -EINVAL;
 }
-- 
2.20.1

[Qemu-devel] [RFC v5 22/29] vfio/pci: Implement the DMA fault handler

2019-07-11 Thread Eric Auger

Whenever the eventfd is triggered, we retrieve the DMA fault(s)
from the mmapped fault region and inject them in the iommu
memory region.

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.c | 50 ++
 hw/vfio/pci.h |  1 +
 2 files changed, 51 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8c8647c4b5..081e964788 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2826,10 +2826,60 @@ static PCIPASIDOps vfio_pci_pasid_ops = {
 static void vfio_dma_fault_notifier_handler(void *opaque)
 {
 VFIOPCIExtIRQ *ext_irq = opaque;
+VFIOPCIDevice *vdev = ext_irq->vdev;
+PCIDevice *pdev = >pdev;
+AddressSpace *as = pci_device_iommu_address_space(pdev);
+IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(as->root);
+struct vfio_region_dma_fault header;
+struct iommu_fault *queue;
+char *queue_buffer = NULL;
+ssize_t bytes;
 
 if (!event_notifier_test_and_clear(_irq->notifier)) {
 return;
 }
+
+bytes = pread(vdev->vbasedev.fd, , sizeof(header),
+  vdev->dma_fault_region.fd_offset);
+if (bytes != sizeof(header)) {
+error_report("%s unable to read the fault region header (0x%lx)",
+ __func__, bytes);
+return;
+}
+
+/* Normally the fault queue is mmapped */
+queue = (struct iommu_fault *)vdev->dma_fault_region.mmaps[0].mmap;
+if (!queue) {
+size_t queue_size = header.nb_entries * header.entry_size;
+
+error_report("%s: fault queue not mmapped: slower fault handling",
+ vdev->vbasedev.name);
+
+queue_buffer = g_malloc(queue_size);
+bytes =  pread(vdev->vbasedev.fd, queue_buffer, queue_size,
+   vdev->dma_fault_region.fd_offset + header.offset);
+if (bytes != queue_size) {
+error_report("%s unable to read the fault queue (0x%lx)",
+ __func__, bytes);
+return;
+}
+
+queue = (struct iommu_fault *)queue_buffer;
+}
+
+while (vdev->fault_tail_index != header.head) {
+memory_region_inject_faults(iommu_mr, 1,
+[vdev->fault_tail_index]);
+vdev->fault_tail_index =
+(vdev->fault_tail_index + 1) % header.nb_entries;
+}
+bytes = pwrite(vdev->vbasedev.fd, >fault_tail_index, 4,
+   vdev->dma_fault_region.fd_offset);
+if (bytes != 4) {
+error_report("%s unable to write the fault region tail index (0x%lx)",
+ __func__, bytes);
+}
+g_free(queue_buffer);
 }
 
 static int vfio_register_ext_irq_handler(VFIOPCIDevice *vdev,
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 815154656c..e31bc0173a 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -142,6 +142,7 @@ typedef struct VFIOPCIDevice {
 EventNotifier req_notifier;
 VFIOPCIExtIRQ *ext_irqs;
 VFIORegion dma_fault_region;
+uint32_t fault_tail_index;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
-- 
2.20.1

[Qemu-devel] [RFC v5 11/29] memory: Add arch_id and leaf fields in IOTLBEntry

2019-07-11 Thread Eric Auger

TLB entries are usually tagged with some ids such as the asid
or pasid. When propagating an invalidation command from the
guest to the host, we need to pass this id.

Also we add a leaf field which indicates, in case of invalidation
notification whether only cache entries for the last level of
translation are required to be invalidated.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index d0de192887..8dd4d787d4 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -68,12 +68,30 @@ typedef enum {
 
 #define IOMMU_ACCESS_FLAG(r, w) (((r) ? IOMMU_RO : 0) | ((w) ? IOMMU_WO : 0))
 
+/**
+ * IOMMUTLBEntry - IOMMU TLB entry
+ *
+ * Structure used when performing a translation or when notifying MAP or
+ * UNMAP (invalidation) events
+ *
+ * @target_as: target address space
+ * @iova: IO virtual address (input)
+ * @translated_addr: translated address (output)
+ * @addr_mask: address mask (0xfff means 4K binding), must be multiple of 2
+ * @perm: permission flag of the mapping (NONE encodes no mapping or
+ * invalidation notification)
+ * @arch_id: architecture specific ID tagging the TLB
+ * @leaf: when @perm is NONE, indicates whether only caches for the last
+ * level of translation need to be invalidated.
+ */
 struct IOMMUTLBEntry {
 AddressSpace*target_as;
 hwaddr   iova;
 hwaddr   translated_addr;
-hwaddr   addr_mask;  /* 0xfff = 4k translation */
+hwaddr   addr_mask;
 IOMMUAccessFlags perm;
+uint32_t arch_id;
+bool leaf;
 };
 
 /*
-- 
2.20.1

[Qemu-devel] [RFC v5 24/29] hw/arm/smmuv3: Store the PASID table GPA in the translation config

2019-07-11 Thread Eric Auger

For VFIO integration we will need to pass the Context Descriptor (CD)
table GPA to the host. The CD table is also referred to as the PASID
table. Its GPA corresponds to the s1ctrptr field of the Stream Table
Entry. So let's decode and store it in the configuration structure.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c  | 1 +
 include/hw/arm/smmu-common.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 9372b15b34..f7497de9e4 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -354,6 +354,7 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
   "SMMUv3 S1 stalling fault model not allowed yet\n");
 goto bad_ste;
 }
+cfg->s1ctxptr = STE_CTXPTR(ste);
 return 0;
 
 bad_ste:
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 1f37844e5c..353668f4ea 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -68,6 +68,7 @@ typedef struct SMMUTransCfg {
 uint8_t tbi;   /* Top Byte Ignore */
 uint16_t asid;
 SMMUTransTableInfo tt[2];
+dma_addr_t s1ctxptr;
 uint32_t iotlb_hits;   /* counts IOTLB hits for this asid */
 uint32_t iotlb_misses; /* counts IOTLB misses for this asid */
 } SMMUTransCfg;
-- 
2.20.1

Re: [Qemu-devel] [PATCH v4] linux-user: fix to handle variably sized SIOCGSTAMP with new kernels

2019-07-11 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20190711173131.6347-1-laur...@vivier.eu/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v4] linux-user: fix to handle variably sized 
SIOCGSTAMP with new kernels
Message-id: 20190711173131.6347-1-laur...@vivier.eu

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
adb3405 linux-user: fix to handle variably sized SIOCGSTAMP with new kernels

=== OUTPUT BEGIN ===
ERROR: line over 90 characters
#79: FILE: linux-user/syscall_defs.h:756:
+#define TARGET_SIOCGSTAMP_NEW   TARGET_IOR(0x89, 0x06, abi_llong[2]) /* Get 
stamp (timeval64) */

ERROR: line over 90 characters
#80: FILE: linux-user/syscall_defs.h:757:
+#define TARGET_SIOCGSTAMPNS_NEW TARGET_IOR(0x89, 0x07, abi_llong[2]) /* Get 
stamp (timespec64) */

total: 2 errors, 0 warnings, 50 lines checked

Commit adb3405a06a4 (linux-user: fix to handle variably sized SIOCGSTAMP with 
new kernels) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190711173131.6347-1-laur...@vivier.eu/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] [RFC v5 20/29] vfio/pci: Register handler for iommu fault

2019-07-11 Thread Eric Auger

We use the new extended IRQ VFIO_IRQ_TYPE_NESTED type and
VFIO_IRQ_SUBTYPE_DMA_FAULT subtype to set/unset
a notifier for physical DMA faults. The associated eventfd is
triggered, in nested mode, whenever a fault is detected at IOMMU
physical level.

The actual handler will be implemented in subsequent patches.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- index_to_str now returns the index name, ie. DMA_FAULT
- use the extended IRQ

v3 -> v4:
- check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
  before attempting to set signaling for it.
---
 hw/vfio/pci.c | 81 ++-
 hw/vfio/pci.h |  7 +
 2 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 45e007575e..19702cdbbf 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2761,6 +2761,76 @@ static PCIPASIDOps vfio_pci_pasid_ops = {
 .set_pasid_table = vfio_iommu_set_pasid_table,
 };
 
+static void vfio_dma_fault_notifier_handler(void *opaque)
+{
+VFIOPCIExtIRQ *ext_irq = opaque;
+
+if (!event_notifier_test_and_clear(_irq->notifier)) {
+return;
+}
+}
+
+static int vfio_register_ext_irq_handler(VFIOPCIDevice *vdev,
+ uint32_t type, uint32_t subtype,
+ IOHandler *handler)
+{
+int32_t fd, ext_irq_index, index;
+struct vfio_irq_info *irq_info;
+Error *err = NULL;
+EventNotifier *n;
+int ret;
+
+ret = vfio_get_dev_irq_info(>vbasedev, type, subtype, _info);
+if (ret) {
+return ret;
+}
+index = irq_info->index;
+ext_irq_index = irq_info->index - VFIO_PCI_NUM_IRQS;
+g_free(irq_info);
+
+vdev->ext_irqs[ext_irq_index].vdev = vdev;
+vdev->ext_irqs[ext_irq_index].index = index;
+n = >ext_irqs[ext_irq_index].notifier;
+
+ret = event_notifier_init(n, 0);
+if (ret) {
+error_report("vfio: Unable to init event notifier for ext irq %d(%d)",
+ ext_irq_index, ret);
+return ret;
+}
+
+fd = event_notifier_get_fd(n);
+qemu_set_fd_handler(fd, vfio_dma_fault_notifier_handler, NULL,
+>ext_irqs[ext_irq_index]);
+
+ret = vfio_set_irq_signaling(>vbasedev, index, 0,
+ VFIO_IRQ_SET_ACTION_TRIGGER, fd, );
+if (ret) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+qemu_set_fd_handler(fd, NULL, NULL, vdev);
+event_notifier_cleanup(n);
+}
+return ret;
+}
+
+static void vfio_unregister_ext_irq_notifiers(VFIOPCIDevice *vdev)
+{
+VFIODevice *vbasedev = >vbasedev;
+Error *err = NULL;
+int i;
+
+for (i = 0; i < vbasedev->num_irqs - VFIO_PCI_NUM_IRQS; i++) {
+if (vfio_set_irq_signaling(vbasedev, i + VFIO_PCI_NUM_IRQS , 0,
+   VFIO_IRQ_SET_ACTION_TRIGGER, -1, )) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+}
+qemu_set_fd_handler(event_notifier_get_fd(>ext_irqs[i].notifier),
+NULL, NULL, vdev);
+event_notifier_cleanup(>ext_irqs[i].notifier);
+}
+g_free(vdev->ext_irqs);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -2771,7 +2841,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 ssize_t len;
 struct stat st;
 int groupid;
-int i, ret;
+int i, ret, nb_ext_irqs;
 bool is_mdev;
 
 if (!vdev->vbasedev.sysfsdev) {
@@ -2859,6 +2929,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 goto error;
 }
 
+nb_ext_irqs = vdev->vbasedev.num_irqs - VFIO_PCI_NUM_IRQS;
+if (nb_ext_irqs > 0) {
+vdev->ext_irqs = g_new0(VFIOPCIExtIRQ, nb_ext_irqs);
+}
+
 vfio_populate_device(vdev, );
 if (err) {
 error_propagate(errp, err);
@@ -3060,6 +3135,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
+vfio_register_ext_irq_handler(vdev, VFIO_IRQ_TYPE_NESTED,
+  VFIO_IRQ_SUBTYPE_DMA_FAULT,
+  vfio_dma_fault_notifier_handler);
 vfio_setup_resetfn_quirk(vdev);
 
 pci_setup_pasid_ops(pdev, _pci_pasid_ops);
@@ -3100,6 +3178,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
+vfio_unregister_ext_irq_notifiers(vdev);
 pci_device_set_intx_routing_notifier(>pdev, NULL);
 vfio_disable_interrupts(vdev);
 if (vdev->intx.mmap_timer) {
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 834a90d646..893d074375 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -113,6 +113,12 @@ typedef struct VFIOMSIXInfo {
 unsigned long *pending;
 } VFIOMSIXInfo;
 
+typedef struct VFIOPCIExtIRQ {
+struct VFIOPCIDevice *vdev;
+EventNotifier notifier;
+

[Qemu-devel] [RFC v5 17/29] vfio: Set up nested stage mappings

2019-07-11 Thread Eric Auger

In nested mode, legacy vfio_iommu_map_notify cannot be used as
there is no "caching" mode and we do not trap on map.

On Intel, vfio_iommu_map_notify was used to DMA map the RAM
through the host single stage.

With nested mode, we need to setup the stage 2 and the stage 1
separately. This patch introduces a prereg_listener to setup
the stage 2 mapping.

The stage 1 mapping, owned by the guest, is passed to the host
when the guest invalidates the stage 1 configuration, through
a dedicated PCIPASIDOps callback. Guest IOTLB invalidations
are cascaded downto the host through another IOMMU MR UNMAP
notifier.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- use VFIO_IOMMU_SET_PASID_TABLE
- use PCIPASIDOps for config notification

v3 -> v4:
- use iommu_inv_pasid_info for ASID invalidation

v2 -> v3:
- use VFIO_IOMMU_ATTACH_PASID_TABLE
- new user API
- handle leaf

v1 -> v2:
- adapt to uapi changes
- pass the asid
- pass IOMMU_NOTIFIER_S1_CFG when initializing the config notifier
---
 hw/vfio/common.c | 123 +++
 hw/vfio/pci.c|  21 
 hw/vfio/trace-events |   2 +
 3 files changed, 136 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ef8452a4bc..bd975c5b83 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -444,6 +444,52 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 return true;
 }
 
+/* Propagate a guest IOTLB invalidation to the host (nested mode) */
+static void vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+hwaddr start = iotlb->iova + giommu->iommu_offset;
+
+VFIOContainer *container = giommu->container;
+struct vfio_iommu_type1_cache_invalidate ustruct;
+size_t size = iotlb->addr_mask + 1;
+int ret;
+
+assert(iotlb->perm == IOMMU_NONE);
+
+ustruct.argsz = sizeof(ustruct);
+ustruct.flags = 0;
+ustruct.info.version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
+
+if (size <= 0x1) {
+ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
+ustruct.info.addr_info.flags = IOMMU_INV_ADDR_FLAGS_ARCHID;
+if (iotlb->leaf) {
+ustruct.info.addr_info.flags |= IOMMU_INV_ADDR_FLAGS_LEAF;
+}
+ustruct.info.addr_info.archid = iotlb->arch_id;
+ustruct.info.addr_info.addr = start;
+ustruct.info.addr_info.granule_size = size;
+ustruct.info.addr_info.nb_granules = 1;
+trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
+iotlb->leaf);
+} else {
+ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
+ustruct.info.pasid_info.archid = iotlb->arch_id;
+ustruct.info.pasid_info.flags = IOMMU_INV_PASID_FLAGS_ARCHID;
+trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_CACHE_INVALIDATE, );
+if (ret) {
+error_report("%p: failed to invalidate CACHE for 0x%"PRIx64
+ " mask=0x%"PRIx64" (%d)",
+ container, start, iotlb->addr_mask, ret);
+}
+}
+
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
@@ -613,6 +659,32 @@ static void vfio_dma_unmap_ram_section(VFIOContainer 
*container,
 }
 }
 
+static void vfio_prereg_listener_region_add(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+VFIOContainer *container =
+container_of(listener, VFIOContainer, prereg_listener);
+
+if (!memory_region_is_ram(section->mr)) {
+return;
+}
+
+vfio_dma_map_ram_section(container, section);
+
+}
+static void vfio_prereg_listener_region_del(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+VFIOContainer *container =
+container_of(listener, VFIOContainer, prereg_listener);
+
+if (!memory_region_is_ram(section->mr)) {
+return;
+}
+
+vfio_dma_unmap_ram_section(container, section);
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -709,10 +781,11 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 memory_region_ref(section->mr);
 
 if (memory_region_is_iommu(section->mr)) {
+IOMMUNotify notify;
 VFIOGuestIOMMU *giommu;
 IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
 bool nested;
-int iommu_idx;
+int iommu_idx, flags;
 
 trace_vfio_listener_region_add_iommu(iova, end);
 
@@ -738,15 +811,26 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 llend = int128_sub(llend, int128_one());

[Qemu-devel] [RFC v5 05/29] hw/arm/smmuv3: Remove spurious error messages on IOVA invalidations

2019-07-11 Thread Eric Auger

An IOVA/ASID invalidation is notified to all IOMMU Memory Regions
through smmuv3_inv_notifiers_iova/smmuv3_notify_iova.

When the notification occurs it is possible that some of the
PCIe devices associated to the notified regions do not have a
valid stream table entry. In that case we output a LOG_GUEST_ERROR
message, for example:

invalid sid= (L1STD span=0)
"smmuv3_notify_iova error decoding the configuration for iommu mr=

This is unfortunate as the user gets the impression that there
are some translation decoding errors whereas there are not.

This patch adds a new field in SMMUEventInfo that tells whether
the detection of an invalid STE must lead to an error report.
invalid_ste_allowed is set before doing the invalidations and
kept unset on actual translation.

The other configuration decoding error messages are kept since if the
STE is valid then the rest of the config must be correct.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3-internal.h |  1 +
 hw/arm/smmuv3.c  | 15 ---
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index b160289cd1..d190181ef1 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -381,6 +381,7 @@ typedef struct SMMUEventInfo {
 uint32_t sid;
 bool recorded;
 bool record_trans_faults;
+bool inval_ste_allowed;
 union {
 struct {
 uint32_t ssid;
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 2e270a0f07..517755aed5 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -320,7 +320,9 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
 uint32_t config;
 
 if (!STE_VALID(ste)) {
-qemu_log_mask(LOG_GUEST_ERROR, "invalid STE\n");
+if (!event->inval_ste_allowed) {
+qemu_log_mask(LOG_GUEST_ERROR, "invalid STE\n");
+}
 goto bad_ste;
 }
 
@@ -405,7 +407,7 @@ static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE 
*ste,
 
 span = L1STD_SPAN();
 
-if (!span) {
+if (!span && !event->inval_ste_allowed) {
 /* l2ptr is not valid */
 qemu_log_mask(LOG_GUEST_ERROR,
   "invalid sid=%d (L1STD span=0)\n", sid);
@@ -603,7 +605,9 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
 SMMUv3State *s = sdev->smmu;
 uint32_t sid = smmu_get_sid(sdev);
-SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid};
+SMMUEventInfo event = {.type = SMMU_EVT_NONE,
+   .sid = sid,
+   .inval_ste_allowed = false};
 SMMUPTWEventInfo ptw_info = {};
 SMMUTranslationStatus status;
 SMMUState *bs = ARM_SMMU(s);
@@ -796,16 +800,13 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
dma_addr_t iova)
 {
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
-SMMUEventInfo event = {};
+SMMUEventInfo event = {.inval_ste_allowed = true};
 SMMUTransTableInfo *tt;
 SMMUTransCfg *cfg;
 IOMMUTLBEntry entry;
 
 cfg = smmuv3_get_config(sdev, );
 if (!cfg) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "%s error decoding the configuration for iommu mr=%s\n",
-  __func__, mr->parent_obj.name);
 return;
 }
 
-- 
2.20.1

[Qemu-devel] [RFC v5 19/29] vfio: Helper to get IRQ info including capabilities

2019-07-11 Thread Eric Auger

As done for vfio regions, add helpers to retrieve irq info
including their optional capabilities.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c  | 97 +++
 hw/vfio/trace-events  |  1 +
 include/hw/vfio/vfio-common.h |  7 +++
 3 files changed, 105 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4bbce6a43a..8a2d201058 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1007,6 +1007,25 @@ vfio_get_region_info_cap(struct vfio_region_info *info, 
uint16_t id)
 return NULL;
 }
 
+struct vfio_info_cap_header *
+vfio_get_irq_info_cap(struct vfio_irq_info *info, uint16_t id)
+{
+struct vfio_info_cap_header *hdr;
+void *ptr = info;
+
+if (!(info->flags & VFIO_IRQ_INFO_FLAG_CAPS)) {
+return NULL;
+}
+
+for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+if (hdr->id == id) {
+return hdr;
+}
+}
+
+return NULL;
+}
+
 static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
   struct vfio_region_info *info)
 {
@@ -1833,6 +1852,33 @@ retry:
 return 0;
 }
 
+int vfio_get_irq_info(VFIODevice *vbasedev, int index,
+  struct vfio_irq_info **info)
+{
+size_t argsz = sizeof(struct vfio_irq_info);
+
+*info = g_malloc0(argsz);
+
+(*info)->index = index;
+retry:
+(*info)->argsz = argsz;
+
+if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, *info)) {
+g_free(*info);
+*info = NULL;
+return -errno;
+}
+
+if ((*info)->argsz > argsz) {
+argsz = (*info)->argsz;
+*info = g_realloc(*info, argsz);
+
+goto retry;
+}
+
+return 0;
+}
+
 int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
  uint32_t subtype, struct vfio_region_info **info)
 {
@@ -1868,6 +1914,42 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, 
uint32_t type,
 return -ENODEV;
 }
 
+int vfio_get_dev_irq_info(VFIODevice *vbasedev, uint32_t type,
+  uint32_t subtype, struct vfio_irq_info **info)
+{
+int i;
+
+for (i = 0; i < vbasedev->num_irqs; i++) {
+struct vfio_info_cap_header *hdr;
+struct vfio_irq_info_cap_type *cap_type;
+
+if (vfio_get_irq_info(vbasedev, i, info)) {
+continue;
+}
+
+hdr = vfio_get_irq_info_cap(*info, VFIO_IRQ_INFO_CAP_TYPE);
+if (!hdr) {
+g_free(*info);
+continue;
+}
+
+cap_type = container_of(hdr, struct vfio_irq_info_cap_type, header);
+
+trace_vfio_get_dev_irq(vbasedev->name, i,
+   cap_type->type, cap_type->subtype);
+
+if (cap_type->type == type && cap_type->subtype == subtype) {
+return 0;
+}
+
+g_free(*info);
+}
+
+*info = NULL;
+return -ENODEV;
+}
+
+
 bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
 {
 struct vfio_region_info *info = NULL;
@@ -1883,6 +1965,21 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int 
region, uint16_t cap_type)
 return ret;
 }
 
+bool vfio_has_irq_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
+{
+struct vfio_region_info *info = NULL;
+bool ret = false;
+
+if (!vfio_get_region_info(vbasedev, region, )) {
+if (vfio_get_region_info_cap(info, cap_type)) {
+ret = true;
+}
+g_free(info);
+}
+
+return ret;
+}
+
 /*
  * Interfaces for IBM EEH (Enhanced Error Handling)
  */
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 5de97a8882..c04a8c12d8 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -114,6 +114,7 @@ vfio_region_mmaps_set_enabled(const char *name, bool 
enabled) "Region %s mmaps e
 vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) 
"Device %s region %d: %d sparse mmap entries"
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t 
subtype) "%s index %d, %08x/%0x8"
+vfio_get_dev_irq(const char *name, int index, uint32_t type, uint32_t subtype) 
"%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
 vfio_iommu_addr_inv_iotlb(int asid, uint64_t addr, uint64_t size, uint64_t 
nb_granules, bool leaf) "nested IOTLB invalidate asid=%d, addr=0x%"PRIx64" 
granule_size=0x%"PRIx64" nb_granules=0x%"PRIx64" leaf=%d"
 vfio_iommu_asid_inv_iotlb(int asid) "nested IOTLB invalidate asid=%d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index a88b4dd986..79d48df351 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -200,6 +200,13 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, 
uint32_t type,
 bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type);
 struct

[Qemu-devel] [RFC v5 07/29] update-linux-headers: Add sve_context.h to asm-arm64

2019-07-11 Thread Eric Auger

From: Andrew Jones 

Signed-off-by: Andrew Jones 
---
 scripts/update-linux-headers.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index dfdfdfddcf..c97d485b08 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -99,6 +99,9 @@ for arch in $ARCHLIST; do
 cp "$tmpdir/include/asm/$header" "$output/linux-headers/asm-$arch"
 done
 
+if [ $arch = arm64 ]; then
+cp "$tmpdir/include/asm/sve_context.h" 
"$output/linux-headers/asm-arm64/"
+fi
 if [ $arch = mips ]; then
 cp "$tmpdir/include/asm/sgidefs.h" "$output/linux-headers/asm-mips/"
 cp "$tmpdir/include/asm/unistd_o32.h" "$output/linux-headers/asm-mips/"
-- 
2.20.1

[Qemu-devel] [RFC v5 10/29] memory: Introduce IOMMU Memory Region inject_faults API

2019-07-11 Thread Eric Auger

This new API allows to inject @count iommu_faults into
the IOMMU memory region.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 25 +
 memory.c  | 10 ++
 2 files changed, 35 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 593ef947c6..d0de192887 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -54,6 +54,8 @@ struct MemoryRegionMmio {
 CPUWriteMemoryFunc *write[3];
 };
 
+struct iommu_fault;
+
 typedef struct IOMMUTLBEntry IOMMUTLBEntry;
 
 /* See address_space_translate: bit 0 is read, bit 1 is write.  */
@@ -342,6 +344,19 @@ typedef struct IOMMUMemoryRegionClass {
  * @iommu: the IOMMUMemoryRegion
  */
 int (*num_indexes)(IOMMUMemoryRegion *iommu);
+
+/*
+ * Inject @count faults into the IOMMU memory region
+ *
+ * Optional method: if this method is not provided, then
+ * memory_region_injection_faults() will return -ENOENT
+ *
+ * @iommu: the IOMMU memory region to inject the faults in
+ * @count: number of faults to inject
+ * @buf: fault buffer
+ */
+int (*inject_faults)(IOMMUMemoryRegion *iommu, int count,
+ struct iommu_fault *buf);
 } IOMMUMemoryRegionClass;
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -1123,6 +1138,16 @@ int memory_region_iommu_attrs_to_index(IOMMUMemoryRegion 
*iommu_mr,
  */
 int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr);
 
+/**
+ * memory_region_inject_faults : inject @count faults stored in @buf
+ *
+ * @iommu_mr: the IOMMU memory region
+ * @count: number of faults to be injected
+ * @buf: buffer containing the faults
+ */
+int memory_region_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+struct iommu_fault *buf);
+
 /**
  * memory_region_name: get a memory region's name
  *
diff --git a/memory.c b/memory.c
index 90967b579d..d81525fe11 100644
--- a/memory.c
+++ b/memory.c
@@ -2000,6 +2000,16 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion 
*iommu_mr)
 return imrc->num_indexes(iommu_mr);
 }
 
+int memory_region_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+struct iommu_fault *buf)
+{
+IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+if (!imrc->inject_faults) {
+return -ENOENT;
+}
+return imrc->inject_faults(iommu_mr, count, buf);
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
 uint8_t mask = 1 << client;
-- 
2.20.1

[Qemu-devel] [RFC v5 12/29] iommu: Introduce generic header

2019-07-11 Thread Eric Auger

This header is meant to exposes data types used by
several IOMMU devices such as struct for SVA and
nested stage configuration.

Signed-off-by: Eric Auger 
---
 include/hw/iommu/iommu.h | 25 +
 1 file changed, 25 insertions(+)
 create mode 100644 include/hw/iommu/iommu.h

diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
new file mode 100644
index 00..9e60ece160
--- /dev/null
+++ b/include/hw/iommu/iommu.h
@@ -0,0 +1,25 @@
+/*
+ * common header for iommu devices
+ *
+ * Copyright Red Hat, Inc. 2019
+ *
+ * Authors:
+ *  Eric Auger 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_HW_IOMMU_IOMMU_H
+#define QEMU_HW_IOMMU_IOMMU_H
+
+typedef struct IOMMUConfig {
+union {
+#ifdef __linux__
+struct iommu_pasid_table_config pasid_cfg;
+#endif
+  };
+} IOMMUConfig;
+
+
+#endif /* QEMU_HW_IOMMU_IOMMU_H */
-- 
2.20.1

[Qemu-devel] [RFC v5 09/29] memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute

2019-07-11 Thread Eric Auger

We introduce a new IOMMU Memory Region attribute, IOMMU_ATTR_MSI_TRANSLATE
which tells whether the virtual IOMMU translates MSIs. ARM SMMU
will expose this attribute since, as opposed to Intel DMAR, MSIs
are translated as any other DMA requests.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index e477a630a8..593ef947c6 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -206,6 +206,7 @@ struct MemoryRegionOps {
 enum IOMMUMemoryRegionAttr {
 IOMMU_ATTR_SPAPR_TCE_FD,
 IOMMU_ATTR_VFIO_NESTED,
+IOMMU_ATTR_MSI_TRANSLATE,
 };
 
 /**
-- 
2.20.1

[Qemu-devel] [RFC v5 13/29] pci: introduce PCIPASIDOps to PCIDevice

2019-07-11 Thread Eric Auger

From: Liu Yi L 

This patch introduces PCIPASIDOps for IOMMU related operations.

https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.html
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00940.html

So far, to setup virt-SVA for assigned SVA capable device, needs to
configure host translation structures for specific pasid. (e.g. bind
guest page table to host and enable nested translation in host).
Besides, vIOMMU emulator needs to forward guest's cache invalidation
to host since host nested translation is enabled. e.g. on VT-d, guest
owns 1st level translation table, thus cache invalidation for 1st
level should be propagated to host.

This patch adds two functions: alloc_pasid and free_pasid to support
guest pasid allocation and free. The implementations of the callbacks
would be device passthru modules. Like vfio.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Signed-off-by: Liu Yi L 
Signed-off-by: Yi Sun 
---
 hw/pci/pci.c | 34 ++
 include/hw/iommu/iommu.h |  3 +++
 include/hw/pci/pci.h | 11 +++
 3 files changed, 48 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8076a80ab3..43c0cef2f6 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2626,6 +2626,40 @@ void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void 
*opaque)
 bus->iommu_opaque = opaque;
 }
 
+void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
+{
+assert(ops && !dev->pasid_ops);
+dev->pasid_ops = ops;
+}
+
+bool pci_device_is_pasid_ops_set(PCIBus *bus, int32_t devfn)
+{
+PCIDevice *dev;
+
+if (!bus) {
+return false;
+}
+
+dev = bus->devices[devfn];
+return !!(dev && dev->pasid_ops);
+}
+
+int pci_device_set_pasid_table(PCIBus *bus, int32_t devfn,
+   IOMMUConfig *config)
+{
+PCIDevice *dev;
+
+if (!bus) {
+return -EINVAL;
+}
+
+dev = bus->devices[devfn];
+if (dev && dev->pasid_ops && dev->pasid_ops->set_pasid_table) {
+return dev->pasid_ops->set_pasid_table(bus, devfn, config);
+}
+return -ENOENT;
+}
+
 static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
 {
 Range *range = opaque;
diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
index 9e60ece160..12092bda7b 100644
--- a/include/hw/iommu/iommu.h
+++ b/include/hw/iommu/iommu.h
@@ -12,6 +12,9 @@
 
 #ifndef QEMU_HW_IOMMU_IOMMU_H
 #define QEMU_HW_IOMMU_IOMMU_H
+#ifdef __linux__
+#include 
+#endif
 
 typedef struct IOMMUConfig {
 union {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index aaf1b9f70d..84be2847a5 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -9,6 +9,7 @@
 #include "hw/isa/isa.h"
 
 #include "hw/pci/pcie.h"
+#include "hw/iommu/iommu.h"
 
 extern bool pci_available;
 
@@ -263,6 +264,11 @@ struct PCIReqIDCache {
 };
 typedef struct PCIReqIDCache PCIReqIDCache;
 
+typedef struct PCIPASIDOps PCIPASIDOps;
+struct PCIPASIDOps {
+int (*set_pasid_table)(PCIBus *bus, int32_t devfn, IOMMUConfig *config);
+};
+
 struct PCIDevice {
 DeviceState qdev;
 
@@ -352,6 +358,7 @@ struct PCIDevice {
 MSIVectorUseNotifier msix_vector_use_notifier;
 MSIVectorReleaseNotifier msix_vector_release_notifier;
 MSIVectorPollNotifier msix_vector_poll_notifier;
+PCIPASIDOps *pasid_ops;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
@@ -485,6 +492,10 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, 
int);
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
 
+void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops);
+bool pci_device_is_pasid_ops_set(PCIBus *bus, int32_t devfn);
+int pci_device_set_pasid_table(PCIBus *bus, int32_t devfn, IOMMUConfig 
*config);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
-- 
2.20.1

[Qemu-devel] [RFC v5 06/29] update-linux-headers: Import iommu.h

2019-07-11 Thread Eric Auger

Update the script to import the new iommu.h uapi header.

Signed-off-by: Eric Auger 
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index f76d77363b..dfdfdfddcf 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -141,7 +141,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h vfio.h vfio_ccw.h vhost.h \
+for header in kvm.h vfio.h vfio_ccw.h vhost.h iommu.h \
   psci.h psp-sev.h userfaultfd.h mman.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
-- 
2.20.1

[Qemu-devel] [RFC v5 08/29] header update against 5.3.0-rc0 and IOMMU/VFIO nested stage APIs

2019-07-11 Thread Eric Auger

This is an update against
https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9.

Signed-off-by: Eric Auger 
---
 include/standard-headers/asm-x86/bootparam.h |   2 +
 include/standard-headers/linux/virtio_ids.h  |   1 -
 include/standard-headers/linux/virtio_pmem.h |  34 --
 linux-headers/asm-arm/unistd-common.h|   1 +
 linux-headers/asm-arm64/kvm.h|   7 +
 linux-headers/asm-generic/unistd.h   |   4 +-
 linux-headers/asm-mips/unistd_n32.h  |   1 +
 linux-headers/asm-mips/unistd_n64.h  |   1 +
 linux-headers/asm-mips/unistd_o32.h  |   1 +
 linux-headers/asm-powerpc/unistd_32.h|   1 +
 linux-headers/asm-powerpc/unistd_64.h|   1 +
 linux-headers/asm-s390/unistd_32.h   |   1 +
 linux-headers/asm-s390/unistd_64.h   |   1 +
 linux-headers/asm-x86/kvm.h  |   6 +-
 linux-headers/asm-x86/unistd_32.h|   1 +
 linux-headers/asm-x86/unistd_64.h|   1 +
 linux-headers/asm-x86/unistd_x32.h   |   1 +
 linux-headers/linux/iommu.h  | 316 +++
 linux-headers/linux/psp-sev.h|   5 +-
 linux-headers/linux/vfio.h   | 109 ++-
 20 files changed, 451 insertions(+), 44 deletions(-)
 delete mode 100644 include/standard-headers/linux/virtio_pmem.h
 create mode 100644 linux-headers/linux/iommu.h

diff --git a/include/standard-headers/asm-x86/bootparam.h 
b/include/standard-headers/asm-x86/bootparam.h
index 67d4f0119f..a6f7cf535e 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -29,6 +29,8 @@
 #define XLF_EFI_HANDOVER_32(1<<2)
 #define XLF_EFI_HANDOVER_64(1<<3)
 #define XLF_EFI_KEXEC  (1<<4)
+#define XLF_5LEVEL (1<<5)
+#define XLF_5LEVEL_ENABLED (1<<6)
 
 
 #endif /* _ASM_X86_BOOTPARAM_H */
diff --git a/include/standard-headers/linux/virtio_ids.h 
b/include/standard-headers/linux/virtio_ids.h
index 32b2f94d1f..6d5c3b2d4f 100644
--- a/include/standard-headers/linux/virtio_ids.h
+++ b/include/standard-headers/linux/virtio_ids.h
@@ -43,6 +43,5 @@
 #define VIRTIO_ID_INPUT18 /* virtio input */
 #define VIRTIO_ID_VSOCK19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO   20 /* virtio crypto */
-#define VIRTIO_ID_PMEM 27 /* virtio pmem */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/standard-headers/linux/virtio_pmem.h 
b/include/standard-headers/linux/virtio_pmem.h
deleted file mode 100644
index 7e3d43b121..00
--- a/include/standard-headers/linux/virtio_pmem.h
+++ /dev/null
@@ -1,34 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
-/*
- * Definitions for virtio-pmem devices.
- *
- * Copyright (C) 2019 Red Hat, Inc.
- *
- * Author(s): Pankaj Gupta 
- */
-
-#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
-#define _UAPI_LINUX_VIRTIO_PMEM_H
-
-#include "standard-headers/linux/types.h"
-#include "standard-headers/linux/virtio_ids.h"
-#include "standard-headers/linux/virtio_config.h"
-
-struct virtio_pmem_config {
-   uint64_t start;
-   uint64_t size;
-};
-
-#define VIRTIO_PMEM_REQ_TYPE_FLUSH  0
-
-struct virtio_pmem_resp {
-   /* Host return status corresponding to flush request */
-   uint32_t ret;
-};
-
-struct virtio_pmem_req {
-   /* command type */
-   uint32_t type;
-};
-
-#endif
diff --git a/linux-headers/asm-arm/unistd-common.h 
b/linux-headers/asm-arm/unistd-common.h
index 27a9b6da27..fe1d2e5334 100644
--- a/linux-headers/asm-arm/unistd-common.h
+++ b/linux-headers/asm-arm/unistd-common.h
@@ -388,5 +388,6 @@
 #define __NR_fsconfig (__NR_SYSCALL_BASE + 431)
 #define __NR_fsmount (__NR_SYSCALL_BASE + 432)
 #define __NR_fspick (__NR_SYSCALL_BASE + 433)
+#define __NR_pidfd_open (__NR_SYSCALL_BASE + 434)
 
 #endif /* _ASM_ARM_UNISTD_COMMON_H */
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 2431ec35a9..9d701b6cbd 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -260,6 +260,13 @@ struct kvm_vcpu_events {
 KVM_REG_SIZE_U256 |\
 ((i) & (KVM_ARM64_SVE_MAX_SLICES - 1)))
 
+/*
+ * Register values for KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() and
+ * KVM_REG_ARM64_SVE_FFR() are represented in memory in an endianness-
+ * invariant layout which differs from the layout used for the FPSIMD
+ * V-registers on big-endian systems: see sigcontext.h for more explanation.
+ */
+
 #define KVM_ARM64_SVE_VQ_MIN __SVE_VQ_MIN
 #define KVM_ARM64_SVE_VQ_MAX __SVE_VQ_MAX
 
diff --git a/linux-headers/asm-generic/unistd.h 
b/linux-headers/asm-generic/unistd.h
index a87904daf1..e5684a4512 100644
--- a/linux-headers/asm-generic/unistd.h
+++ b/linux-headers/asm-generic/unistd.h
@@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
 __SYSCALL(__NR_fsmount, sys_fsmount)
 #define __NR_fspick 433

Re: [Qemu-devel] [PATCH RESEND v2] target-i386: adds PV_SCHED_YIELD CPUID feature bit

2019-07-11 Thread Paolo Bonzini

On 10/07/19 10:02, Wanpeng Li wrote:
> From: Wanpeng Li 
> 
> Adds PV_SCHED_YIELD CPUID feature bit.
> 
> Cc: Eduardo Habkost 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Signed-off-by: Wanpeng Li 
> ---
> Note: kvm part is merged
> v1 -> v2:
>  * use bit 13 instead of bit 12 since bit 12 has user now
> 
>  target/i386/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 5f07d68..f4c4b6b 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -902,7 +902,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = 
> {
>  "kvmclock", "kvm-nopiodelay", "kvm-mmu", "kvmclock",
>  "kvm-asyncpf", "kvm-steal-time", "kvm-pv-eoi", "kvm-pv-unhalt",
>  NULL, "kvm-pv-tlb-flush", NULL, "kvm-pv-ipi",
> -NULL, NULL, NULL, NULL,
> +NULL, "kvm-pv-sched-yield", NULL, NULL,
>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
>  "kvmclock-stable-bit", NULL, NULL, NULL,
> 

Queued for 4.2, thanks.

Paolo

1 2 3 4 >

1 - 100 of 318 matches

Mail list logo