date:20201018

Re: [PATCH RESEND v2 0/9] iOS and Apple Silicon host support

2020-10-18 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20201019051953.90107-...@getutm.app/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201019051953.90107-...@getutm.app
Subject: [PATCH RESEND v2 0/9] iOS and Apple Silicon host support

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20201019051953.90107-...@getutm.app -> 
patchew/20201019051953.90107-...@getutm.app
Switched to a new branch 'test'
9faec40 block: check availablity for preadv/pwritev on mac
86ad651 tcg: support JIT on Apple Silicon
29dfbc5 tcg: mirror mapping RWX pages for iOS optional
b114cca tcg: implement mirror mapped JIT for iOS
88d6dc4 tcg: add const hints for code pointers
dd2e464 coroutine: add libucontext as external library
49f3648 qemu: add support for iOS host
90e8d82 configure: cross-compiling without cross_prefix
01c26cd configure: option to disable host block devices

=== OUTPUT BEGIN ===
1/9 Checking commit 01c26cd84fe6 (configure: option to disable host block 
devices)
WARNING: architecture specific defines should be avoided
#22: FILE: block/file-posix.c:44:
+#if defined(CONFIG_HOST_BLOCK_DEVICE) && defined(__APPLE__) && (__MACH__)

total: 0 errors, 1 warnings, 61 lines checked

Patch 1/9 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
2/9 Checking commit 90e8d8255c14 (configure: cross-compiling without 
cross_prefix)
3/9 Checking commit 49f3648f7dad (qemu: add support for iOS host)
WARNING: architecture specific defines should be avoided
#27: FILE: block.c:56:
+#if !defined(__DragonFly__) && !defined(CONFIG_IOS)

ERROR: braces {} are necessary for all arms of this statement
#45: FILE: block/file-posix.c:189:
+if (s->fd >= 0)
[...]

WARNING: architecture specific defines should be avoided
#79: FILE: block/file-posix.c:2325:
+#if !defined(CONFIG_IOS) && defined(__APPLE__) && defined(__MACH__)

WARNING: architecture specific defines should be avoided
#363: FILE: tcg/aarch64/tcg-target.h:151:
+#if defined(__APPLE__)

WARNING: architecture specific defines should be avoided
#369: FILE: tcg/aarch64/tcg-target.h:157:
+#if defined(__APPLE__)

total: 1 errors, 4 warnings, 316 lines checked

Patch 3/9 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/9 Checking commit dd2e4646744e (coroutine: add libucontext as external 
library)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#97: 
new file mode 16

total: 0 errors, 1 warnings, 140 lines checked

Patch 4/9 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/9 Checking commit 88d6dc4d22eb (tcg: add const hints for code pointers)
6/9 Checking commit b114cca7a292 (tcg: implement mirror mapped JIT for iOS)
ERROR: externs should be avoided in .c files
#51: FILE: accel/tcg/translate-all.c:65:
+extern kern_return_t mach_vm_remap(vm_map_t target_task,

WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#187: 
new file mode 100644

WARNING: architecture specific defines should be avoided
#416: FILE: tcg/aarch64/tcg-target.h:171:
+#if defined(__APPLE__)

WARNING: architecture specific defines should be avoided
#608: FILE: tcg/i386/tcg-target.h:209:
+#ifdef __APPLE__

WARNING: architecture specific defines should be avoided
#619: FILE: tcg/i386/tcg-target.h:220:
+#if defined(__APPLE__)

total: 1 errors, 4 warnings, 1285 lines checked

Patch 6/9 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/9 Checking commit 29dfbc56da64 (tcg: mirror mapping RWX pages for iOS 
optional)
8/9 Checking commit 86ad651f4ca1 (tcg: support JIT on Apple Silicon)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#206: 
new file mode 100644

WARNING: architecture specific defines should be avoided
#238: FILE: include/tcg/tcg-apple-jit.h:28:
+#if defined(__aarch64__) && defined(CONFIG_DARWIN)

total: 0 errors, 2 warnings, 259 lines checked

Patch 8/9 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
9/9 Checking commit 9faec405cec8 (block: check availablity for preadv/pwritev 
on mac)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201019051953.90107-...@getutm.app/testing.checkpatch/?type=message.
---
Email generated

[PATCH RESEND v2 8/9] tcg: support JIT on Apple Silicon

2020-10-18 Thread Joelle van Dyne

From: osy 

https://developer.apple.com/documentation/apple_silicon/porting_just-in-time_compilers_to_apple_silicon

For < iOS 14, reverse engineered functions from libsystem_pthread.dylib is
implemented to handle APRR supported SoCs.

The following rules apply for JIT write protect:
  * JIT write-protect is enabled before tcg_qemu_tb_exec()
  * JIT write-protect is disabled after tcg_qemu_tb_exec() returns
  * JIT write-protect is disabled inside do_tb_phys_invalidate() but if it
is called inside of tcg_qemu_tb_exec() then write-protect will be
enabled again before returning.
  * JIT write-protect is disabled by cpu_loop_exit() for interrupt handling.
  * JIT write-protect is disabled everywhere else.

Signed-off-by: Joelle van Dyne 
---
 configure   | 20 +
 include/exec/exec-all.h |  2 +
 include/tcg/tcg-apple-jit.h | 85 +
 include/tcg/tcg.h   |  3 ++
 accel/tcg/cpu-exec-common.c |  2 +
 accel/tcg/cpu-exec.c|  2 +
 accel/tcg/translate-all.c   | 51 ++
 tcg/tcg.c   |  4 ++
 8 files changed, 169 insertions(+)
 create mode 100644 include/tcg/tcg-apple-jit.h

diff --git a/configure b/configure
index 2470be6790..f56000bb64 100755
--- a/configure
+++ b/configure
@@ -5739,6 +5739,22 @@ but not implemented on your system"
 fi
 fi
 
+##
+# check for Apple Silicon JIT function
+
+if [ "$darwin" = "yes" ] ; then
+  cat > $TMPC << EOF
+#include 
+int main() { pthread_jit_write_protect_np(0); return 0; }
+EOF
+  if ! compile_prog ""; then
+have_pthread_jit_protect='no'
+  else
+have_pthread_jit_protect='yes'
+  fi
+fi
+
+
 ##
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -6847,6 +6863,10 @@ if test "$secret_keyring" = "yes" ; then
   echo "CONFIG_SECRET_KEYRING=y" >> $config_host_mak
 fi
 
+if test "$have_pthread_jit_protect" = "yes" ; then
+  echo "HAVE_PTHREAD_JIT_PROTECT=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote ${source_path}/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 2db155a772..253af30a2e 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -521,6 +521,8 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
target_ulong cs_base, uint32_t flags,
uint32_t cf_mask);
 void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr);
+void tb_exec_lock(void);
+void tb_exec_unlock(void);
 
 /* GETPC is the true target of the return instruction that we'll execute.  */
 #if defined(CONFIG_TCG_INTERPRETER)
diff --git a/include/tcg/tcg-apple-jit.h b/include/tcg/tcg-apple-jit.h
new file mode 100644
index 00..1e70bf3afe
--- /dev/null
+++ b/include/tcg/tcg-apple-jit.h
@@ -0,0 +1,85 @@
+/*
+ * Apple Silicon APRR functions for JIT handling
+ *
+ * Copyright (c) 2020 osy
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+/*
+ * Credits to: https://siguza.github.io/APRR/
+ * Reversed from /usr/lib/system/libsystem_pthread.dylib
+ */
+
+#ifndef TCG_APPLE_JIT_H
+#define TCG_APPLE_JIT_H
+
+#if defined(__aarch64__) && defined(CONFIG_DARWIN)
+
+#define _COMM_PAGE_START_ADDRESS(0x000FC000ULL) /* In TTBR0 */
+#define _COMM_PAGE_APRR_SUPPORT (_COMM_PAGE_START_ADDRESS + 0x10C)
+#define _COMM_PAGE_APPR_WRITE_ENABLE(_COMM_PAGE_START_ADDRESS + 0x110)
+#define _COMM_PAGE_APRR_WRITE_DISABLE   (_COMM_PAGE_START_ADDRESS + 0x118)
+
+static __attribute__((__always_inline__)) bool 
jit_write_protect_supported(void)
+{
+/* Access shared kernel page at fixed memory location. */
+uint8_t aprr_support = *(volatile uint8_t *)_COMM_PAGE_APRR_SUPPORT;
+return aprr_support > 0;
+}
+
+/* write protect enable = write disable */
+static __attribute__((__always_inline__)) void jit_write_protect(int enabled)
+{
+/* Access shared kernel page at fixed memory location. */
+uint8_t aprr_support = *(volatile uint8_t *)_COMM_PAGE_APRR_SUPPORT;
+if (aprr_support == 0 || aprr_support > 3) {
+return;
+} else if (aprr_support == 1) {
+__asm__ __volatile__ (
+"mov x0, %0\n"
+

[PATCH RESEND v2 7/9] tcg: mirror mapping RWX pages for iOS optional

2020-10-18 Thread Joelle van Dyne

From: osy 

This allows jailbroken devices with entitlements to switch the option off.

Signed-off-by: Joelle van Dyne 
---
 include/sysemu/tcg.h  |  2 +-
 accel/tcg/tcg-all.c   | 27 +-
 accel/tcg/translate-all.c | 60 +--
 bsd-user/main.c   |  2 +-
 linux-user/main.c |  2 +-
 qemu-options.hx   | 11 +++
 6 files changed, 79 insertions(+), 25 deletions(-)

diff --git a/include/sysemu/tcg.h b/include/sysemu/tcg.h
index d9d3ca8559..569f90b11d 100644
--- a/include/sysemu/tcg.h
+++ b/include/sysemu/tcg.h
@@ -8,7 +8,7 @@
 #ifndef SYSEMU_TCG_H
 #define SYSEMU_TCG_H
 
-void tcg_exec_init(unsigned long tb_size);
+void tcg_exec_init(unsigned long tb_size, bool mirror_rwx);
 #ifdef CONFIG_TCG
 extern bool tcg_allowed;
 #define tcg_enabled() (tcg_allowed)
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index fa1208158f..5845744396 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -39,6 +39,7 @@ struct TCGState {
 
 bool mttcg_enabled;
 unsigned long tb_size;
+bool mirror_rwx;
 };
 typedef struct TCGState TCGState;
 
@@ -94,6 +95,7 @@ static void tcg_accel_instance_init(Object *obj)
 TCGState *s = TCG_STATE(obj);
 
 s->mttcg_enabled = default_mttcg_enabled();
+s->mirror_rwx = false;
 }
 
 bool mttcg_enabled;
@@ -102,7 +104,7 @@ static int tcg_init(MachineState *ms)
 {
 TCGState *s = TCG_STATE(current_accel());
 
-tcg_exec_init(s->tb_size * 1024 * 1024);
+tcg_exec_init(s->tb_size * 1024 * 1024, s->mirror_rwx);
 mttcg_enabled = s->mttcg_enabled;
 cpus_register_accel(_cpus);
 
@@ -168,6 +170,22 @@ static void tcg_set_tb_size(Object *obj, Visitor *v,
 s->tb_size = value;
 }
 
+#ifdef CONFIG_IOS_JIT
+static bool tcg_get_mirror_rwx(Object *obj, Error **errp)
+{
+TCGState *s = TCG_STATE(obj);
+
+return s->mirror_rwx;
+}
+
+static void tcg_set_mirror_rwx(Object *obj, bool value, Error **errp)
+{
+TCGState *s = TCG_STATE(obj);
+
+s->mirror_rwx = value;
+}
+#endif
+
 static void tcg_accel_class_init(ObjectClass *oc, void *data)
 {
 AccelClass *ac = ACCEL_CLASS(oc);
@@ -185,6 +203,13 @@ static void tcg_accel_class_init(ObjectClass *oc, void 
*data)
 object_class_property_set_description(oc, "tb-size",
 "TCG translation block cache size");
 
+#ifdef CONFIG_IOS_JIT
+object_class_property_add_bool(oc, "mirror-rwx",
+tcg_get_mirror_rwx, tcg_set_mirror_rwx);
+object_class_property_set_description(oc, "mirror-rwx",
+"mirror map executable pages for TCG on iOS");
+#endif
+
 }
 
 static const TypeInfo tcg_accel_type = {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index eb1d8fbe2f..1675951b75 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1042,12 +1042,15 @@ static inline void *split_cross_256mb(void *buf1, 
size_t size1)
 static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
 __attribute__((aligned(CODE_GEN_ALIGN)));
 
-static inline void *alloc_code_gen_buffer(void)
+static inline void *alloc_code_gen_buffer(bool no_rwx_pages)
 {
 void *buf = static_code_gen_buffer;
 void *end = static_code_gen_buffer + sizeof(static_code_gen_buffer);
 size_t size;
 
+/* not applicable */
+assert(!no_rwx_pages);
+
 /* page-align the beginning and end of the buffer */
 buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
 end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size);
@@ -1076,24 +1079,32 @@ static inline void *alloc_code_gen_buffer(void)
 return buf;
 }
 #elif defined(_WIN32)
-static inline void *alloc_code_gen_buffer(void)
+static inline void *alloc_code_gen_buffer(bool no_rwx_pages)
 {
 size_t size = tcg_ctx->code_gen_buffer_size;
+assert(!no_rwx_pages); /* not applicable */
 return VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
 PAGE_EXECUTE_READWRITE);
 }
 #else
-static inline void *alloc_code_gen_buffer(void)
+static inline void *alloc_code_gen_buffer(bool no_rwx_pages)
 {
-#if defined(CONFIG_IOS_JIT)
 int prot = PROT_READ | PROT_EXEC;
-#else
-int prot = PROT_WRITE | PROT_READ | PROT_EXEC;
-#endif
 int flags = MAP_PRIVATE | MAP_ANONYMOUS;
 size_t size = tcg_ctx->code_gen_buffer_size;
 void *buf;
 
+#if defined(CONFIG_DARWIN) /* both iOS and macOS (Apple Silicon) applicable */
+if (!no_rwx_pages) {
+prot |= PROT_WRITE;
+flags |= MAP_JIT;
+}
+#else
+/* not applicable */
+assert(!no_rwx_pages);
+prot |= PROT_WRITE;
+#endif
+
 buf = mmap(NULL, size, prot, flags, -1, 0);
 if (buf == MAP_FAILED) {
 return NULL;
@@ -1173,10 +1184,10 @@ static inline void *alloc_jit_rw_mirror(void *base, 
size_t size)
 }
 #endif /* CONFIG_IOS_JIT */
 
-static inline void code_gen_alloc(size_t tb_size)
+static inline void code_gen_alloc(size_t tb_size, bool mirror_rwx)
 {
 tcg_ctx->code_gen_buffer_size = size_code_gen_buffer(tb_size);
-

[PATCH RESEND v2 4/9] coroutine: add libucontext as external library

2020-10-18 Thread Joelle van Dyne

From: osy 

iOS does not support ucontext natively for aarch64 and the sigaltstack is
also unsupported (even worse, it fails silently, see:
https://openradar.appspot.com/13002712 )

As a workaround we include a library implementation of ucontext and add it
as a build option.

Signed-off-by: Joelle van Dyne 
---
 configure | 23 ---
 meson.build   | 29 -
 util/coroutine-ucontext.c |  9 +
 .gitmodules   |  3 +++
 libucontext   |  1 +
 meson_options.txt |  2 ++
 6 files changed, 63 insertions(+), 4 deletions(-)
 create mode 16 libucontext

diff --git a/configure b/configure
index 12d65397b1..e7e6ba2c45 100755
--- a/configure
+++ b/configure
@@ -1757,7 +1757,7 @@ Advanced options (experts only):
   --oss-libpath to OSS library
   --cpu=CPUBuild for host CPU [$cpu]
   --with-coroutine=BACKEND coroutine backend. Supported options:
-   ucontext, sigaltstack, windows
+   ucontext, libucontext, sigaltstack, windows
   --enable-gcovenable test coverage analysis with gcov
   --disable-blobs  disable installing provided firmware blobs
   --with-vss-sdk=SDK-path  enable Windows VSS support in QEMU Guest Agent
@@ -4929,6 +4929,8 @@ if test "$coroutine" = ""; then
 coroutine=win32
   elif test "$ucontext_works" = "yes"; then
 coroutine=ucontext
+  elif test "$ios" = "yes"; then
+coroutine=libucontext
   else
 coroutine=sigaltstack
   fi
@@ -4952,12 +4954,27 @@ else
   error_exit "only the 'windows' coroutine backend is valid for Windows"
 fi
 ;;
+  libucontext)
+  ;;
   *)
 error_exit "unknown coroutine backend $coroutine"
 ;;
   esac
 fi
 
+case $coroutine in
+libucontext)
+  git_submodules="${git_submodules} libucontext"
+  mkdir -p libucontext
+  coroutine_impl=ucontext
+  libucontext="enabled"
+  ;;
+*)
+  coroutine_impl=$coroutine
+  libucontext="disabled"
+  ;;
+esac
+
 if test "$coroutine_pool" = ""; then
   coroutine_pool=yes
 fi
@@ -6534,7 +6551,7 @@ if test "$rbd" = "yes" ; then
   echo "RBD_LIBS=$rbd_libs" >> $config_host_mak
 fi
 
-echo "CONFIG_COROUTINE_BACKEND=$coroutine" >> $config_host_mak
+echo "CONFIG_COROUTINE_BACKEND=$coroutine_impl" >> $config_host_mak
 if test "$coroutine_pool" = "yes" ; then
   echo "CONFIG_COROUTINE_POOL=1" >> $config_host_mak
 else
@@ -7133,7 +7150,7 @@ NINJA=${ninja:-$PWD/ninjatool} $meson setup \
 -Dvnc=$vnc -Dvnc_sasl=$vnc_sasl -Dvnc_jpeg=$vnc_jpeg 
-Dvnc_png=$vnc_png \
 -Dgettext=$gettext -Dxkbcommon=$xkbcommon -Du2f=$u2f \
 -Dcapstone=$capstone -Dslirp=$slirp -Dfdt=$fdt \
--Diconv=$iconv -Dcurses=$curses \
+-Diconv=$iconv -Dcurses=$curses -Ducontext=$libucontext \
 $cross_arg \
 "$PWD" "$source_path"
 
diff --git a/meson.build b/meson.build
index ea7c10ba08..888a3b156d 100644
--- a/meson.build
+++ b/meson.build
@@ -1170,9 +1170,35 @@ if not fdt.found() and fdt_required.length() > 0
   error('fdt not available but required by targets ' + ', '.join(fdt_required))
 endif
 
+ucontext = not_found
+slirp_opt = 'disabled'
+if get_option('ucontext').enabled()
+  if not fs.is_dir(meson.current_source_dir() / 'libucontext/arch' / cpu)
+error('libucontext is wanted but not implemented for host ' + cpu)
+  endif
+  arch = host_machine.cpu()
+  ucontext_cargs = ['-DG_LOG_DOMAIN="ucontext"', '-DCUSTOM_IMPL']
+  ucontext_files = [
+'libucontext/arch' / arch / 'getcontext.S',
+'libucontext/arch' / arch / 'setcontext.S',
+'libucontext/arch' / arch / 'makecontext.c',
+'libucontext/arch' / arch / 'startcontext.S',
+'libucontext/arch' / arch / 'swapcontext.S',
+  ]
+
+  ucontext_inc = include_directories('libucontext/include')
+  libucontext = static_library('ucontext',
+   sources: ucontext_files,
+   c_args: ucontext_cargs,
+   include_directories: ucontext_inc)
+  ucontext = declare_dependency(link_with: libucontext,
+include_directories: ucontext_inc)
+endif
+
 config_host_data.set('CONFIG_CAPSTONE', capstone.found())
 config_host_data.set('CONFIG_FDT', fdt.found())
 config_host_data.set('CONFIG_SLIRP', slirp.found())
+config_host_data.set('CONFIG_LIBUCONTEXT', ucontext.found())
 
 #
 # Generated sources #
@@ -1399,7 +1425,7 @@ util_ss.add_all(trace_ss)
 util_ss = util_ss.apply(config_all, strict: false)
 libqemuutil = static_library('qemuutil',
  sources: util_ss.sources() + stub_ss.sources() + 
genh,
- dependencies: [util_ss.dependencies(), m, glib, 
socket, malloc])
+ dependencies: [util_ss.dependencies(), m, glib, 
socket, malloc, ucontext])
 qemuutil = declare_dependency(link_with: libqemuutil,
   sources: genh +

[PATCH RESEND v2 2/9] configure: cross-compiling without cross_prefix

2020-10-18 Thread Joelle van Dyne

From: osy 

The iOS toolchain does not use the host prefix naming convention. We add a
new option `--enable-cross-compile` that forces cross-compile even without
a cross_prefix.

Signed-off-by: Joelle van Dyne 
---
 configure | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index ea1753c117..ced6d2e961 100755
--- a/configure
+++ b/configure
@@ -234,6 +234,7 @@ cpu=""
 iasl="iasl"
 interp_prefix="/usr/gnemul/qemu-%M"
 static="no"
+cross_compile="no"
 cross_prefix=""
 audio_drv_list=""
 block_drv_rw_whitelist=""
@@ -457,6 +458,11 @@ for opt do
   optarg=$(expr "x$opt" : 'x[^=]*=\(.*\)')
   case "$opt" in
   --cross-prefix=*) cross_prefix="$optarg"
+cross_compile="yes"
+  ;;
+  --enable-cross-compile) cross_compile="yes"
+  ;;
+  --disable-cross-compile) cross_compile="no"
   ;;
   --cc=*) CC="$optarg"
   ;;
@@ -879,6 +885,10 @@ for opt do
   ;;
   --cross-prefix=*)
   ;;
+  --enable-cross-compile)
+  ;;
+  --disable-cross-compile)
+  ;;
   --cc=*)
   ;;
   --host-cc=*) host_cc="$optarg"
@@ -1688,6 +1698,7 @@ Advanced options (experts only):
   --efi-aarch64=PATH   PATH of efi file to use for aarch64 VMs.
   --with-suffix=SUFFIX suffix for QEMU data inside 
datadir/libdir/sysconfdir/docdir [$qemu_suffix]
   --with-pkgversion=VERS   use specified string as sub-version of the package
+  --enable-cross-compile   enable cross compiling (set automatically if 
$cross_prefix is set)
   --enable-debug   enable common debug build options
   --enable-sanitizers  enable default sanitizers
   --enable-tsanenable thread sanitizer
@@ -7023,7 +7034,7 @@ if has $sdl2_config; then
 fi
 echo "strip = [$(meson_quote $strip)]" >> $cross
 echo "windres = [$(meson_quote $windres)]" >> $cross
-if test -n "$cross_prefix"; then
+if test "$cross_compile" = "yes"; then
 cross_arg="--cross-file config-meson.cross"
 echo "[host_machine]" >> $cross
 if test "$mingw32" = "yes" ; then
-- 
2.24.3 (Apple Git-128)

[PATCH RESEND v2 9/9] block: check availablity for preadv/pwritev on mac

2020-10-18 Thread Joelle van Dyne

From: osy 

macOS 11/iOS 14 added preadv/pwritev APIs. Due to weak linking, configure
will succeed with CONFIG_PREADV even when targeting a lower OS version. We
therefore need to check at run time if we can actually use these APIs.

Signed-off-by: Joelle van Dyne 
---
 block/file-posix.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index cdc73b5f1d..d7482036a3 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1393,12 +1393,24 @@ static bool preadv_present = true;
 static ssize_t
 qemu_preadv(int fd, const struct iovec *iov, int nr_iov, off_t offset)
 {
+#ifdef CONFIG_DARWIN /* preadv introduced in macOS 11 */
+if (!__builtin_available(macOS 11, iOS 14, watchOS 7, tvOS 14, *)) {
+preadv_present = false;
+return -ENOSYS;
+} else
+#endif
 return preadv(fd, iov, nr_iov, offset);
 }
 
 static ssize_t
 qemu_pwritev(int fd, const struct iovec *iov, int nr_iov, off_t offset)
 {
+#ifdef CONFIG_DARWIN /* pwritev introduced in macOS 11 */
+if (!__builtin_available(macOS 11, iOS 14, watchOS 7, tvOS 14, *)) {
+preadv_present = false;
+return -ENOSYS;
+} else
+#endif
 return pwritev(fd, iov, nr_iov, offset);
 }
 
-- 
2.24.3 (Apple Git-128)

[PATCH RESEND v2 1/9] configure: option to disable host block devices

2020-10-18 Thread Joelle van Dyne

From: osy 

Some hosts (iOS) have a sandboxed filesystem and do not provide low-level
APIs for interfacing with host block devices.

Signed-off-by: Joelle van Dyne 
---
 configure  | 4 
 meson.build| 1 +
 block/file-posix.c | 8 +++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index f498a37f9a..ea1753c117 100755
--- a/configure
+++ b/configure
@@ -447,6 +447,7 @@ meson=""
 ninja=""
 skip_meson=no
 gettext=""
+host_block_device_support="yes"
 
 bogus_os="no"
 malloc_trim="auto"
@@ -5969,6 +5970,9 @@ if test "$default_devices" = "yes" ; then
 else
   echo "CONFIG_MINIKCONF_MODE=--allnoconfig" >> $config_host_mak
 fi
+if test "$host_block_device_support" = "yes" ; then
+  echo "CONFIG_HOST_BLOCK_DEVICE=y" >> $config_host_mak
+fi
 if test "$debug_tcg" = "yes" ; then
   echo "CONFIG_DEBUG_TCG=y" >> $config_host_mak
 fi
diff --git a/meson.build b/meson.build
index 2c6169fab0..75967914dc 100644
--- a/meson.build
+++ b/meson.build
@@ -2080,6 +2080,7 @@ summary_info += {'vvfat support': 
config_host.has_key('CONFIG_VVFAT')}
 summary_info += {'qed support':   config_host.has_key('CONFIG_QED')}
 summary_info += {'parallels support': config_host.has_key('CONFIG_PARALLELS')}
 summary_info += {'sheepdog support':  config_host.has_key('CONFIG_SHEEPDOG')}
+summary_info += {'host block dev support': 
config_host.has_key('CONFIG_HOST_BLOCK_DEVICE')}
 summary_info += {'capstone':  capstone_opt == 'disabled' ? false : 
capstone_opt}
 summary_info += {'libpmem support':   config_host.has_key('CONFIG_LIBPMEM')}
 summary_info += {'libdaxctl support': config_host.has_key('CONFIG_LIBDAXCTL')}
diff --git a/block/file-posix.c b/block/file-posix.c
index c63926d592..52f7c20525 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -41,7 +41,7 @@
 #include "scsi/pr-manager.h"
 #include "scsi/constants.h"
 
-#if defined(__APPLE__) && (__MACH__)
+#if defined(CONFIG_HOST_BLOCK_DEVICE) && defined(__APPLE__) && (__MACH__)
 #include 
 #include 
 #include 
@@ -3247,6 +3247,8 @@ BlockDriver bdrv_file = {
 /***/
 /* host device */
 
+#if defined(CONFIG_HOST_BLOCK_DEVICE)
+
 #if defined(__APPLE__) && defined(__MACH__)
 static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
 CFIndex maxPathSize, int flags);
@@ -3872,6 +3874,8 @@ static BlockDriver bdrv_host_cdrom = {
 };
 #endif /* __FreeBSD__ */
 
+#endif /* CONFIG_HOST_BLOCK_DEVICE */
+
 static void bdrv_file_init(void)
 {
 /*
@@ -3879,6 +3883,7 @@ static void bdrv_file_init(void)
  * registered last will get probed first.
  */
 bdrv_register(_file);
+#if defined(CONFIG_HOST_BLOCK_DEVICE)
 bdrv_register(_host_device);
 #ifdef __linux__
 bdrv_register(_host_cdrom);
@@ -3886,6 +3891,7 @@ static void bdrv_file_init(void)
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
 bdrv_register(_host_cdrom);
 #endif
+#endif /* CONFIG_HOST_BLOCK_DEVICE */
 }
 
 block_init(bdrv_file_init);
-- 
2.24.3 (Apple Git-128)

[PATCH RESEND v2 5/9] tcg: add const hints for code pointers

2020-10-18 Thread Joelle van Dyne

From: osy 

We will introduce mirror mapping for JIT segment with separate RX and RW
access. Adding 'const' hints will make it easier to identify read-only
accesses and allow us to easier catch bugs at compile time in the future.

Signed-off-by: Joelle van Dyne 
---
 include/tcg/tcg.h|  8 
 tcg/tcg.c|  4 ++--
 tcg/aarch64/tcg-target.c.inc | 19 +++
 tcg/arm/tcg-target.c.inc | 12 +++-
 tcg/i386/tcg-target.c.inc| 10 +-
 tcg/mips/tcg-target.c.inc| 33 +++--
 tcg/ppc/tcg-target.c.inc | 21 +
 tcg/riscv/tcg-target.c.inc   | 11 ++-
 tcg/s390/tcg-target.c.inc|  9 +
 tcg/sparc/tcg-target.c.inc   | 10 +-
 tcg/tcg-ldst.c.inc   |  2 +-
 tcg/tci/tcg-target.c.inc |  2 +-
 12 files changed, 79 insertions(+), 62 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 8804a8c4a2..79c5ff8dab 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -261,7 +261,7 @@ struct TCGLabel {
 unsigned refs : 16;
 union {
 uintptr_t value;
-tcg_insn_unit *value_ptr;
+const tcg_insn_unit *value_ptr;
 } u;
 QSIMPLEQ_HEAD(, TCGRelocation) relocs;
 QSIMPLEQ_ENTRY(TCGLabel) next;
@@ -593,7 +593,7 @@ struct TCGContext {
 int nb_ops;
 
 /* goto_tb support */
-tcg_insn_unit *code_buf;
+const tcg_insn_unit *code_buf;
 uint16_t *tb_jmp_reset_offset; /* tb->jmp_reset_offset */
 uintptr_t *tb_jmp_insn_offset; /* tb->jmp_target_arg if direct_jump */
 uintptr_t *tb_jmp_target_addr; /* tb->jmp_target_arg if !direct_jump */
@@ -1099,7 +1099,7 @@ static inline TCGLabel *arg_label(TCGArg i)
  * correct result.
  */
 
-static inline ptrdiff_t tcg_ptr_byte_diff(void *a, void *b)
+static inline ptrdiff_t tcg_ptr_byte_diff(const void *a, const void *b)
 {
 return a - b;
 }
@@ -1113,7 +1113,7 @@ static inline ptrdiff_t tcg_ptr_byte_diff(void *a, void 
*b)
  * to the destination address.
  */
 
-static inline ptrdiff_t tcg_pcrel_diff(TCGContext *s, void *target)
+static inline ptrdiff_t tcg_pcrel_diff(TCGContext *s, const void *target)
 {
 return tcg_ptr_byte_diff(target, s->code_ptr);
 }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index a8c28440e2..bb890c506d 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -148,7 +148,7 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg 
arg, TCGReg arg1,
intptr_t arg2);
 static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
 TCGReg base, intptr_t ofs);
-static void tcg_out_call(TCGContext *s, tcg_insn_unit *target);
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *target);
 static int tcg_target_const_match(tcg_target_long val, TCGType type,
   const TCGArgConstraint *arg_ct);
 #ifdef TCG_TARGET_NEED_LDST_LABELS
@@ -295,7 +295,7 @@ static void tcg_out_reloc(TCGContext *s, tcg_insn_unit 
*code_ptr, int type,
 QSIMPLEQ_INSERT_TAIL(>relocs, r, next);
 }
 
-static void tcg_out_label(TCGContext *s, TCGLabel *l, tcg_insn_unit *ptr)
+static void tcg_out_label(TCGContext *s, TCGLabel *l, const tcg_insn_unit *ptr)
 {
 tcg_debug_assert(!l->has_value);
 l->has_value = 1;
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 26f71cb599..1aa5f37fc6 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -78,7 +78,8 @@ static const int tcg_target_call_oarg_regs[1] = {
 #define TCG_REG_GUEST_BASE TCG_REG_X28
 #endif
 
-static inline bool reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc26(tcg_insn_unit *code_ptr,
+  const tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - code_ptr;
 if (offset == sextract64(offset, 0, 26)) {
@@ -90,7 +91,8 @@ static inline bool reloc_pc26(tcg_insn_unit *code_ptr, 
tcg_insn_unit *target)
 return false;
 }
 
-static inline bool reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc19(tcg_insn_unit *code_ptr,
+  const tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - code_ptr;
 if (offset == sextract64(offset, 0, 19)) {
@@ -1306,14 +1308,14 @@ static void tcg_out_cmp(TCGContext *s, TCGType ext, 
TCGReg a,
 }
 }
 
-static inline void tcg_out_goto(TCGContext *s, tcg_insn_unit *target)
+static inline void tcg_out_goto(TCGContext *s, const tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - s->code_ptr;
 tcg_debug_assert(offset == sextract64(offset, 0, 26));
 tcg_out_insn(s, 3206, B, offset);
 }
 
-static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target)
+static inline void tcg_out_goto_long(TCGContext *s, const tcg_insn_unit 
*target)
 {
 ptrdiff_t offset = target - s->code_ptr;
 if (offset == sextract64(offset, 0, 26)) {
@@ -1329,7 +1331,7 @@ static inline void tcg_out_callr(TCGContext

[PATCH RESEND v2 6/9] tcg: implement mirror mapped JIT for iOS

2020-10-18 Thread Joelle van Dyne

From: osy 

On iOS, we cannot allocate RWX pages without special entitlements. As a
workaround, we can allocate a RX region and then mirror map it to a separate
RX region. Then we can write to one region and execute from the other one.

We also define `tcg_mirror_ptr_rw` and `tcg_code_ptr_rw` to return a pointer
to RW memory. The difference between the RW and RX regions is stored in the
TCG context.

To ensure cache coherency, we flush the data cache in the RW mapping and
then invalidate the instruction cache in the RX mapping (where applicable).
Because data cache flush is OS defined on some architectures, we do not
provide implementations for non iOS platforms (ARM/x86).

Signed-off-by: Joelle van Dyne 
---
 docs/devel/ios.rst   | 40 +++
 configure|  1 +
 include/exec/exec-all.h  |  8 
 include/tcg/tcg.h| 17 
 tcg/aarch64/tcg-target.h | 13 +-
 tcg/arm/tcg-target.h |  9 -
 tcg/i386/tcg-target.h| 24 ++-
 tcg/mips/tcg-target.h|  8 +++-
 tcg/ppc/tcg-target.h |  8 +++-
 tcg/riscv/tcg-target.h   |  9 -
 tcg/s390/tcg-target.h| 13 +-
 tcg/sparc/tcg-target.h   |  8 +++-
 tcg/tci/tcg-target.h |  9 -
 accel/tcg/cpu-exec.c |  7 +++-
 accel/tcg/translate-all.c| 77 ++--
 tcg/tcg.c| 56 +-
 tcg/aarch64/tcg-target.c.inc | 33 ++--
 tcg/arm/tcg-target.c.inc | 25 ++--
 tcg/i386/tcg-target.c.inc| 18 -
 tcg/mips/tcg-target.c.inc| 35 +---
 tcg/ppc/tcg-target.c.inc | 38 +++---
 tcg/riscv/tcg-target.c.inc   | 40 +++
 tcg/s390/tcg-target.c.inc| 16 
 tcg/sparc/tcg-target.c.inc   | 23 +++
 tcg/tcg-pool.c.inc   |  9 +++--
 tcg/tci/tcg-target.c.inc |  6 +--
 26 files changed, 416 insertions(+), 134 deletions(-)
 create mode 100644 docs/devel/ios.rst

diff --git a/docs/devel/ios.rst b/docs/devel/ios.rst
new file mode 100644
index 00..dba9fdd868
--- /dev/null
+++ b/docs/devel/ios.rst
@@ -0,0 +1,40 @@
+===
+iOS Support
+===
+
+To run qemu on the iOS platform, some modifications were required. Most of the
+modifications are conditioned on the ``CONFIG_IOS`` and ``CONFIG_IOS_JIT``
+configuration variables.
+
+Build support
+-
+
+For the code to compile, certain changes in the block driver and the slirp
+driver had to be made. There is no ``system()`` call, so code requiring it had
+to be disabled.
+
+``ucontext`` support is broken on iOS. The implementation from ``libucontext``
+is used instead.
+
+Because ``fork()`` is not allowed on iOS apps, the option to build qemu and the
+utilities as shared libraries is added. Note that because qemu does not perform
+resource cleanup in most cases (open files, allocated memory, etc), it is
+advisable that the user implements a proxy layer for syscalls so resources can
+be kept track by the app that uses qemu as a shared library.
+
+JIT support
+---
+
+On iOS, allocating RWX pages require special entitlements not usually granted 
to
+apps. However, it is possible to use `bulletproof JIT`_ with a development
+certificate. This means that we need to allocate one chunk of memory with RX
+permissions and then mirror map the same memory with RW permissions. We 
generate
+code to the mirror mapping and execute the original mapping.
+
+With ``CONFIG_IOS_JIT`` defined, we store inside the TCG context the difference
+between the two mappings. Then, we make sure that any writes to JIT memory is
+done to the pointer + the difference (in order to get a pointer to the mirror
+mapped space). Additionally, we make sure to flush the data cache before we
+invalidate the instruction cache so the changes are seen in both mappings.
+
+.. _bulletproof JIT: 
https://www.blackhat.com/docs/us-16/materials/us-16-Krstic.pdf
diff --git a/configure b/configure
index e7e6ba2c45..2470be6790 100755
--- a/configure
+++ b/configure
@@ -6085,6 +6085,7 @@ fi
 
 if test "$ios" = "yes" ; then
   echo "CONFIG_IOS=y" >> $config_host_mak
+  echo "CONFIG_IOS_JIT=y" >> $config_host_mak
 fi
 
 if test "$solaris" = "yes" ; then
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 66f9b4cca6..2db155a772 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -483,6 +483,14 @@ struct TranslationBlock {
 uintptr_t jmp_list_head;
 uintptr_t jmp_list_next[2];
 uintptr_t jmp_dest[2];
+
+#if defined(CONFIG_IOS_JIT)
+/*
+ * Store difference to writable mirror
+ * We need this when patching the jump instructions
+ */
+ptrdiff_t code_rw_mirror_diff;
+#endif
 };
 
 extern bool parallel_cpus;
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 79c5ff8dab..ade01d2e41 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -627,6 +627,9 @@ struct TCGContext {
 size_t

[PATCH RESEND v2 0/9] iOS and Apple Silicon host support

2020-10-18 Thread Joelle van Dyne

These set of changes brings QEMU TCG to iOS devices and future Apple Silicon
devices. They were originally developed last year and have been working in the
UTM app. Recently, we ported the changes to master, re-wrote a lot of the build
script changes for meson, and broke up the patches into more distinct units.

A summary of the changes:

* `CONFIG_IOS` and `CONFIG_IOS_JIT` defined when building for iOS and
  iOS specific changes (as well as unsupported code) are gated behind it.
* A new dependency, libucontext is added since iOS does not have native ucontext
  and broken support for sigaltstack. libucontext is available as a new option
  for coroutine backend.
* On stock iOS devices, there is a workaround for running JIT code without
  any special entitlement. It requires the JIT region to be mirror mapped with
  one region RW and another one RX. To support this style of JIT, TCG is changed
  to support writing to a different code_ptr. These changes are gated by the
  `CONFIG_IOS_JIT`.
* For (recent) jailbroken iOS devices as well as upcoming Apple Silicon devices,
  there are new rules for applications supporting JIT (with the proper
  entitlement). These rules are implemented as well.

Since v2:

* Changed getting mirror pointer from a macro to inline functions
* Split constification of TCG code pointers to separate patch
* Removed slirp updates (will send future patch once slirp changes are in)
* Removed shared library patch (will send future patch)

-j

osy (9):
  configure: option to disable host block devices
  configure: cross-compiling without cross_prefix
  qemu: add support for iOS host
  coroutine: add libucontext as external library
  tcg: add const hints for code pointers
  tcg: implement mirror mapped JIT for iOS
  tcg: mirror mapping RWX pages for iOS optional
  tcg: support JIT on Apple Silicon
  block: check availablity for preadv/pwritev on mac

 docs/devel/ios.rst   |  40 +
 configure| 104 --
 meson.build  |  32 ++-
 include/exec/exec-all.h  |  10 +++
 include/sysemu/tcg.h |   2 +-
 include/tcg/tcg-apple-jit.h  |  85 ++
 include/tcg/tcg.h|  28 +-
 tcg/aarch64/tcg-target.h |  23 -
 tcg/arm/tcg-target.h |   9 +-
 tcg/i386/tcg-target.h|  24 -
 tcg/mips/tcg-target.h|   8 +-
 tcg/ppc/tcg-target.h |   8 +-
 tcg/riscv/tcg-target.h   |   9 +-
 tcg/s390/tcg-target.h|  13 ++-
 tcg/sparc/tcg-target.h   |   8 +-
 tcg/tci/tcg-target.h |   9 +-
 accel/tcg/cpu-exec-common.c  |   2 +
 accel/tcg/cpu-exec.c |   9 +-
 accel/tcg/tcg-all.c  |  27 +-
 accel/tcg/translate-all.c| 168 ---
 block.c  |   2 +-
 block/file-posix.c   |  50 ---
 bsd-user/main.c  |   2 +-
 linux-user/main.c|   2 +-
 net/slirp.c  |  16 ++--
 qga/commands-posix.c |   6 ++
 target/arm/arm-semi.c|   2 +
 target/m68k/m68k-semi.c  |   2 +
 target/nios2/nios2-semi.c|   2 +
 tcg/tcg.c|  64 -
 util/coroutine-ucontext.c|   9 ++
 .gitmodules  |   3 +
 libucontext  |   1 +
 meson_options.txt|   2 +
 qemu-options.hx  |  11 +++
 tcg/aarch64/tcg-target.c.inc |  48 ++
 tcg/arm/tcg-target.c.inc |  33 ---
 tcg/i386/tcg-target.c.inc|  28 +++---
 tcg/mips/tcg-target.c.inc|  64 +++--
 tcg/ppc/tcg-target.c.inc |  55 +++-
 tcg/riscv/tcg-target.c.inc   |  51 ++-
 tcg/s390/tcg-target.c.inc|  25 +++---
 tcg/sparc/tcg-target.c.inc   |  33 ---
 tcg/tcg-ldst.c.inc   |   2 +-
 tcg/tcg-pool.c.inc   |   9 +-
 tcg/tci/tcg-target.c.inc |   8 +-
 tests/qtest/meson.build  |   7 +-
 47 files changed, 919 insertions(+), 236 deletions(-)
 create mode 100644 docs/devel/ios.rst
 create mode 100644 include/tcg/tcg-apple-jit.h
 create mode 16 libucontext

-- 
2.24.3 (Apple Git-128)

[PATCH RESEND v2 3/9] qemu: add support for iOS host

2020-10-18 Thread Joelle van Dyne

From: osy 

This introduces support for building for iOS hosts. When the correct Xcode
toolchain is used, iOS host will be detected automatically.

block: disable features not supported by iOS sandbox
slirp: disable SMB features for iOS
target: disable system() calls for iOS
tcg: use sys_icache_invalidate() instead of GCC builtin for iOS
tests: disable tests on iOS which uses system()
Signed-off-by: Joelle van Dyne 
---
 configure | 43 ++-
 meson.build   |  2 +-
 tcg/aarch64/tcg-target.h  | 10 +
 block.c   |  2 +-
 block/file-posix.c| 30 ---
 net/slirp.c   | 16 +++
 qga/commands-posix.c  |  6 ++
 target/arm/arm-semi.c |  2 ++
 target/m68k/m68k-semi.c   |  2 ++
 target/nios2/nios2-semi.c |  2 ++
 tests/qtest/meson.build   |  7 +++
 11 files changed, 95 insertions(+), 27 deletions(-)

diff --git a/configure b/configure
index ced6d2e961..12d65397b1 100755
--- a/configure
+++ b/configure
@@ -562,6 +562,19 @@ EOF
   compile_object
 }
 
+check_ios() {
+  cat > $TMPC < $TMPC <
@@ -604,7 +617,11 @@ elif check_define __DragonFly__ ; then
 elif check_define __NetBSD__; then
   targetos='NetBSD'
 elif check_define __APPLE__; then
-  targetos='Darwin'
+  if check_ios ; then
+targetos='iOS'
+  else
+targetos='Darwin'
+  fi
 else
   # This is a fatal error, but don't report it yet, because we
   # might be going to just print the --help text, or it might
@@ -781,6 +798,22 @@ Darwin)
   # won't work when we're compiling with gcc as a C compiler.
   QEMU_CFLAGS="-DOS_OBJECT_USE_OBJC=0 $QEMU_CFLAGS"
 ;;
+iOS)
+  bsd="yes"
+  darwin="yes"
+  ios="yes"
+  if [ "$cpu" = "x86_64" ] ; then
+QEMU_CFLAGS="-arch x86_64 $QEMU_CFLAGS"
+QEMU_LDFLAGS="-arch x86_64 $QEMU_LDFLAGS"
+  fi
+  host_block_device_support="no"
+  audio_drv_list=""
+  audio_possible_drivers=""
+  QEMU_LDFLAGS="-framework CoreFoundation $QEMU_LDFLAGS"
+  # Disable attempts to use ObjectiveC features in os/object.h since they
+  # won't work when we're compiling with gcc as a C compiler.
+  QEMU_CFLAGS="-DOS_OBJECT_USE_OBJC=0 $QEMU_CFLAGS"
+;;
 SunOS)
   solaris="yes"
   make="${MAKE-gmake}"
@@ -6033,6 +6066,10 @@ if test "$darwin" = "yes" ; then
   echo "CONFIG_DARWIN=y" >> $config_host_mak
 fi
 
+if test "$ios" = "yes" ; then
+  echo "CONFIG_IOS=y" >> $config_host_mak
+fi
+
 if test "$solaris" = "yes" ; then
   echo "CONFIG_SOLARIS=y" >> $config_host_mak
 fi
@@ -7025,6 +7062,7 @@ echo "cpp_link_args = [${LDFLAGS:+$(meson_quote 
$LDFLAGS)}]" >> $cross
 echo "[binaries]" >> $cross
 echo "c = [$(meson_quote $cc)]" >> $cross
 test -n "$cxx" && echo "cpp = [$(meson_quote $cxx)]" >> $cross
+test -n "$objcc" && echo "objc = [$(meson_quote $objcc)]" >> $cross
 echo "ar = [$(meson_quote $ar)]" >> $cross
 echo "nm = [$(meson_quote $nm)]" >> $cross
 echo "pkgconfig = [$(meson_quote $pkg_config_exe)]" >> $cross
@@ -7043,6 +7081,9 @@ if test "$cross_compile" = "yes"; then
 if test "$linux" = "yes" ; then
 echo "system = 'linux'" >> $cross
 fi
+if test "$darwin" = "yes" ; then
+echo "system = 'darwin'" >> $cross
+fi
 case "$ARCH" in
 i386|x86_64)
 echo "cpu_family = 'x86'" >> $cross
diff --git a/meson.build b/meson.build
index 75967914dc..ea7c10ba08 100644
--- a/meson.build
+++ b/meson.build
@@ -142,7 +142,7 @@ if targetos == 'windows'
   include_directories: 
include_directories('.'))
 elif targetos == 'darwin'
   coref = dependency('appleframeworks', modules: 'CoreFoundation')
-  iokit = dependency('appleframeworks', modules: 'IOKit')
+  iokit = dependency('appleframeworks', modules: 'IOKit', required: 
'CONFIG_IOS' not in config_host)
   cocoa = dependency('appleframeworks', modules: 'Cocoa', required: 
get_option('cocoa'))
 elif targetos == 'sunos'
   socket = [cc.find_library('socket'),
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 663dd0b95e..a2b22b4305 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -148,9 +148,19 @@ typedef enum {
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP 1
 
+#if defined(__APPLE__)
+void sys_icache_invalidate(void *start, size_t len);
+#endif
+
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
+#if defined(__APPLE__)
+sys_icache_invalidate((char *)start, stop - start);
+#elif defined(__GNUC__)
 __builtin___clear_cache((char *)start, (char *)stop);
+#else
+#error "Missing builtin to flush instruction cache"
+#endif
 }
 
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
diff --git a/block.c b/block.c
index 430edf79bb..5d49869d02 100644
--- a/block.c
+++ b/block.c
@@ -53,7 +53,7 @@
 #ifdef CONFIG_BSD
 #include 
 #include 
-#ifndef __DragonFly__
+#if !defined(__DragonFly__) && !defined(CONFIG_IOS)
 #include 
 #endif
 #endif
diff --git a/block/file-posix.c

[Bug 1253563] Re: bad performance with rng-egd backend

2020-10-18 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1253563

Title:
  bad performance with rng-egd backend

Status in QEMU:
  Expired

Bug description:
  
  1. create listen socket
  # cat /dev/random | nc -l localhost 1024

  2. start vm with rng-egd backend

  ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -mon 
chardev=qmp,mode=control,pretty=on -chardev 
socket,id=qmp,host=localhost,port=1234,server,nowait -m 2000 -device 
virtio-net-pci,netdev=h1,id=vnet0 -netdev tap,id=h1 -vnc :0 -drive 
file=/images/RHEL-64-virtio.qcow2 \
  -chardev socket,host=localhost,port=1024,id=chr0 \
  -object rng-egd,chardev=chr0,id=rng0 \
  -device virtio-rng-pci,rng=rng0,max-bytes=1024000,period=1000

  (guest) # dd if=/dev/hwrng of=/dev/null

  note: cancelling dd process by Ctrl+c, it will return the read speed.

  Problem:   the speed is around 1k/s

  ===

  If I use rng-random backend (filename=/dev/random), the speed is about
  350k/s).

  It seems that when the request entry is added to the list, we don't read the 
data from queue list immediately.
  The chr_read() is delayed, the virtio_notify() is delayed.  the next request 
will also be delayed. It effects the speed.

  I tried to change rng_egd_chr_can_read() always returns 1,  the speed
  is improved to (about 400k/s)

  Problem: we can't poll the content in time currently

  
  Any thoughts?

  Thanks, Amos

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1253563/+subscriptions

Re: [PATCH v1] migration: using trace_ to replace DPRINTF

2020-10-18 Thread Bihong Yu

Thank you for your review.OK ,I will adapt them.

On 2020/10/17 17:57, Philippe Mathieu-Daudé wrote:
> On 10/17/20 11:35 AM, Bihong Yu wrote:
>> Signed-off-by: Bihong Yu 
>> ---
>>   migration/block.c  | 36 ++--
>>   migration/page_cache.c | 13 +++--
>>   migration/trace-events | 13 +
>>   3 files changed, 34 insertions(+), 28 deletions(-)
> ...
>> diff --git a/migration/trace-events b/migration/trace-events
>> index 338f38b..772bb81 100644
>> --- a/migration/trace-events
>> +++ b/migration/trace-events
>> @@ -325,3 +325,16 @@ get_ramblock_vfn_hash(const char *idstr, uint64_t vfn, 
>> uint32_t crc) "ramblock n
>>   calc_page_dirty_rate(const char *idstr, uint32_t new_crc, uint32_t 
>> old_crc) "ramblock name: %s, new crc: %" PRIu32 ", old crc: %" PRIu32
>>   skip_sample_ramblock(const char *idstr, uint64_t ramblock_size) "ramblock 
>> name: %s, ramblock size: %" PRIu64
>>   find_page_matched(const char *idstr) "ramblock %s addr or size changed"
>> +
>> +# block.c
>> +init_blk_migration_shared(const char *blk_device_name) "Start migration for 
>> %s with shared base image"
>> +init_blk_migration_full(const char *blk_device_name) "Start full migration 
>> for %s"
>> +mig_save_device_dirty(int64_t sector) "Error reading sector %" PRId64
>> +flush_blks(const char *action, int submitted, int read_done, int 
>> transferred) "%s submitted %d read_done %d transferred %d"
>> +block_save(const char *mig_stage, int submitted, int transferred) "Enter 
>> save live %s submitted %d transferred %d"
>> +block_save_complete(void) "Block migration completed"
>> +block_save_pending(uint64_t pending) "Enter save live pending  %" PRIu64
>> +
>> +# page_cache.c
>> +cache_init(int64_t max_num_items) "Setting cache buckets to %" PRId64
>> +cache_insert(void) "Error allocating page"
> 
> The patch is good, but I strongly recommend to have trace events
> starting with the subsystem prefix (here migration). So we can
> keep using the 'block*' rule to match all events from the block
> subsystem, without including the migration events.
> 
> Thanks,
> 
> Phil.
> 
> .

[PATCH v7 10/11] hw/block/nvme: Separate read and write handlers

2020-10-18 Thread Dmitry Fomichev

With ZNS support in place, the majority of code in nvme_rw() has
become read- or write-specific. Move these parts to two separate
handlers, nvme_read() and nvme_write() to make the code more
readable and to remove multiple is_write checks that so far existed
in the i/o path.

This is a refactoring patch, no change in functionality.

Signed-off-by: Dmitry Fomichev 
---
 hw/block/nvme.c   | 191 +-
 hw/block/trace-events |   3 +-
 2 files changed, 114 insertions(+), 80 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 3b9ea326d7..5ec4ce5e28 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1162,10 +1162,10 @@ typedef struct NvmeReadFillCtx {
 uint32_t  post_rd_fill_nlb;
 } NvmeReadFillCtx;
 
-static uint16_t nvme_check_zone_read(NvmeNamespace *ns, NvmeZone *zone,
- uint64_t slba, uint32_t nlb,
- NvmeReadFillCtx *rfc)
+static uint16_t nvme_check_zone_read(NvmeNamespace *ns, uint64_t slba,
+ uint32_t nlb, NvmeReadFillCtx *rfc)
 {
+NvmeZone *zone = nvme_get_zone_by_slba(ns, slba);
 NvmeZone *next_zone;
 uint64_t bndry = nvme_zone_rd_boundary(ns, zone);
 uint64_t end = slba + nlb, wp1, wp2;
@@ -1449,6 +1449,86 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
+static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
+NvmeNamespace *ns = req->ns;
+uint64_t slba = le64_to_cpu(rw->slba);
+uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
+uint32_t fill_len;
+uint64_t data_size = nvme_l2b(ns, nlb);
+uint64_t data_offset, fill_ofs;
+NvmeReadFillCtx rfc;
+BlockBackend *blk = ns->blkconf.blk;
+uint16_t status;
+
+trace_pci_nvme_read(nvme_cid(req), nvme_nsid(ns), nlb, data_size, slba);
+
+status = nvme_check_mdts(n, data_size);
+if (status) {
+trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
+goto invalid;
+}
+
+status = nvme_check_bounds(n, ns, slba, nlb);
+if (status) {
+trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
+goto invalid;
+}
+
+if (ns->params.zoned) {
+status = nvme_check_zone_read(ns, slba, nlb, );
+if (status != NVME_SUCCESS) {
+trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status);
+goto invalid;
+}
+}
+
+status = nvme_map_dptr(n, data_size, req);
+if (status) {
+goto invalid;
+}
+
+if (ns->params.zoned) {
+if (rfc.pre_rd_fill_nlb) {
+fill_ofs = nvme_l2b(ns, rfc.pre_rd_fill_slba - slba);
+fill_len = nvme_l2b(ns, rfc.pre_rd_fill_nlb);
+nvme_fill_read_data(req, fill_ofs, fill_len,
+n->params.fill_pattern);
+}
+if (!rfc.read_nlb) {
+/* No backend I/O necessary, only needed to fill the buffer */
+req->status = NVME_SUCCESS;
+return NVME_SUCCESS;
+}
+if (rfc.post_rd_fill_nlb) {
+req->fill_ofs = nvme_l2b(ns, rfc.post_rd_fill_slba - slba);
+req->fill_len = nvme_l2b(ns, rfc.post_rd_fill_nlb);
+} else {
+req->fill_len = 0;
+}
+slba = rfc.read_slba;
+data_size = nvme_l2b(ns, rfc.read_nlb);
+}
+
+data_offset = nvme_l2b(ns, slba);
+
+block_acct_start(blk_get_stats(blk), >acct, data_size,
+ BLOCK_ACCT_READ);
+if (req->qsg.sg) {
+req->aiocb = dma_blk_read(blk, >qsg, data_offset,
+  BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+} else {
+req->aiocb = blk_aio_preadv(blk, data_offset, >iov, 0,
+nvme_rw_cb, req);
+}
+return NVME_NO_COMPLETE;
+
+invalid:
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ);
+return status | NVME_DNR;
+}
+
 static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
@@ -1495,25 +1575,20 @@ invalid:
 return status | NVME_DNR;
 }
 
-static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req, bool append)
+static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append)
 {
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
 NvmeNamespace *ns = req->ns;
-uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
 uint64_t slba = le64_to_cpu(rw->slba);
+uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
 uint64_t data_size = nvme_l2b(ns, nlb);
-uint64_t data_offset, fill_ofs;
-
+uint64_t data_offset;
 NvmeZone *zone;
-uint32_t fill_len;
-NvmeReadFillCtx rfc;
-bool is_write = rw->opcode == NVME_CMD_WRITE || append;
-enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
 BlockBackend *blk = ns->blkconf.blk;
 uint16_t status;
 
-trace_pci_nvme_rw(nvme_cid(req),

[PATCH v7 11/11] hw/block/nvme: Merge nvme_write_zeroes() with nvme_write()

2020-10-18 Thread Dmitry Fomichev

nvme_write() now handles WRITE, WRITE ZEROES and ZONE_APPEND.

Signed-off-by: Dmitry Fomichev 
---
 hw/block/nvme.c   | 95 +--
 hw/block/trace-events |  1 -
 2 files changed, 28 insertions(+), 68 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5ec4ce5e28..aa929d1edf 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1529,53 +1529,7 @@ invalid:
 return status | NVME_DNR;
 }
 
-static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
-{
-NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
-NvmeNamespace *ns = req->ns;
-uint64_t slba = le64_to_cpu(rw->slba);
-uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
-NvmeZone *zone;
-uint64_t offset = nvme_l2b(ns, slba);
-uint32_t count = nvme_l2b(ns, nlb);
-BlockBackend *blk = ns->blkconf.blk;
-uint16_t status;
-
-trace_pci_nvme_write_zeroes(nvme_cid(req), nvme_nsid(ns), slba, nlb);
-
-status = nvme_check_bounds(n, ns, slba, nlb);
-if (status) {
-trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
-return status;
-}
-
-if (ns->params.zoned) {
-zone = nvme_get_zone_by_slba(ns, slba);
-
-status = nvme_check_zone_write(n, ns, zone, slba, nlb, false);
-if (status != NVME_SUCCESS) {
-goto invalid;
-}
-
-status = nvme_auto_open_zone(ns, zone);
-if (status != NVME_SUCCESS) {
-goto invalid;
-}
-
-req->cqe.result64 = nvme_advance_zone_wp(ns, zone, nlb);
-}
-
-block_acct_start(blk_get_stats(blk), >acct, 0, BLOCK_ACCT_WRITE);
-req->aiocb = blk_aio_pwrite_zeroes(blk, offset, count,
-   BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
-return NVME_NO_COMPLETE;
-
-invalid:
-block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE);
-return status | NVME_DNR;
-}
-
-static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append)
+static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append, bool 
wrz)
 {
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
 NvmeNamespace *ns = req->ns;
@@ -1590,10 +1544,12 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest 
*req, bool append)
 trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode),
  nvme_nsid(ns), nlb, data_size, slba);
 
-status = nvme_check_mdts(n, data_size);
-if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
-goto invalid;
+if (!wrz) {
+status = nvme_check_mdts(n, data_size);
+if (status) {
+trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
+goto invalid;
+}
 }
 
 status = nvme_check_bounds(n, ns, slba, nlb);
@@ -1628,21 +1584,26 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest 
*req, bool append)
 
 data_offset = nvme_l2b(ns, slba);
 
-status = nvme_map_dptr(n, data_size, req);
-if (status) {
-goto invalid;
-}
+if (!wrz) {
+status = nvme_map_dptr(n, data_size, req);
+if (status) {
+goto invalid;
+}
 
-data_offset = nvme_l2b(ns, slba);
-
-block_acct_start(blk_get_stats(blk), >acct, data_size,
- BLOCK_ACCT_WRITE);
-if (req->qsg.sg) {
-req->aiocb = dma_blk_write(blk, >qsg, data_offset,
-   BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+block_acct_start(blk_get_stats(blk), >acct, data_size,
+ BLOCK_ACCT_WRITE);
+if (req->qsg.sg) {
+req->aiocb = dma_blk_write(blk, >qsg, data_offset,
+   BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+} else {
+req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0,
+ nvme_rw_cb, req);
+}
 } else {
-req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0,
- nvme_rw_cb, req);
+block_acct_start(blk_get_stats(blk), >acct, 0, BLOCK_ACCT_WRITE);
+req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size,
+   BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
+   req);
 }
 return NVME_NO_COMPLETE;
 
@@ -2126,11 +2087,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 case NVME_CMD_FLUSH:
 return nvme_flush(n, req);
 case NVME_CMD_WRITE_ZEROES:
-return nvme_write_zeroes(n, req);
+return nvme_write(n, req, false, true);
 case NVME_CMD_ZONE_APPEND:
-return nvme_write(n, req, true);
+return nvme_write(n, req, true, false);
 case NVME_CMD_WRITE:
-return nvme_write(n, req, false);
+return nvme_write(n, req, false, false);
 case NVME_CMD_READ:
 return nvme_read(n, req);
 case NVME_CMD_ZONE_MGMT_SEND:
diff --git a/hw/block/trace-events b/hw/block/trace-events
index

[PATCH v7 05/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-18 Thread Dmitry Fomichev

The emulation code has been changed to advertise NVM Command Set when
"zoned" device property is not set (default) and Zoned Namespace
Command Set otherwise.

Define values and structures that are needed to support Zoned
Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
Define trace events where needed in newly introduced code.

In order to improve scalability, all open, closed and full zones
are organized in separate linked lists. Consequently, almost all
zone operations don't require scanning of the entire zone array
(which potentially can be quite large) - it is only necessary to
enumerate one or more zone lists.

Handlers for three new NVMe commands introduced in Zoned Namespace
Command Set specification are added, namely for Zone Management
Receive, Zone Management Send and Zone Append.

Device initialization code has been extended to create a proper
configuration for zoned operation using device properties.

Read/Write command handler is modified to only allow writes at the
write pointer if the namespace is zoned. For Zone Append command,
writes implicitly happen at the write pointer and the starting write
pointer value is returned as the result of the command. Write Zeroes
handler is modified to add zoned checks that are identical to those
done as a part of Write flow.

Subsequent commits in this series add ZDE support and checks for
active and open zone limits.

Signed-off-by: Niklas Cassel 
Signed-off-by: Hans Holmberg 
Signed-off-by: Ajay Joshi 
Signed-off-by: Chaitanya Kulkarni 
Signed-off-by: Matias Bjorling 
Signed-off-by: Aravind Ramesh 
Signed-off-by: Shin'ichiro Kawasaki 
Signed-off-by: Adam Manzanares 
Signed-off-by: Dmitry Fomichev 
---
 block/nvme.c  |   2 +-
 hw/block/nvme-ns.c| 193 +
 hw/block/nvme-ns.h|  54 +++
 hw/block/nvme.c   | 975 --
 hw/block/nvme.h   |   9 +
 hw/block/trace-events |  21 +
 include/block/nvme.h  | 113 -
 7 files changed, 1339 insertions(+), 28 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 05485fdd11..7a513c9a17 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c)
 {
 uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF;
 if (status) {
-trace_nvme_error(le32_to_cpu(c->result),
+trace_nvme_error(le32_to_cpu(c->result32),
  le16_to_cpu(c->sq_head),
  le16_to_cpu(c->sq_id),
  le16_to_cpu(c->cid),
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 974aea33f7..fedfad595c 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -25,6 +25,7 @@
 #include "hw/qdev-properties.h"
 #include "hw/qdev-core.h"
 
+#include "trace.h"
 #include "nvme.h"
 #include "nvme-ns.h"
 
@@ -76,6 +77,171 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, 
Error **errp)
 return 0;
 }
 
+static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp)
+{
+uint64_t zone_size, zone_cap;
+uint32_t nz, lbasz = ns->blkconf.logical_block_size;
+
+if (ns->params.zone_size_bs) {
+zone_size = ns->params.zone_size_bs;
+} else {
+zone_size = NVME_DEFAULT_ZONE_SIZE;
+}
+if (ns->params.zone_cap_bs) {
+zone_cap = ns->params.zone_cap_bs;
+} else {
+zone_cap = zone_size;
+}
+if (zone_cap > zone_size) {
+error_setg(errp, "zone capacity %luB exceeds zone size %luB",
+   zone_cap, zone_size);
+return -1;
+}
+if (zone_size < lbasz) {
+error_setg(errp, "zone size %luB too small, must be at least %uB",
+   zone_size, lbasz);
+return -1;
+}
+if (zone_cap < lbasz) {
+error_setg(errp, "zone capacity %luB too small, must be at least %uB",
+   zone_cap, lbasz);
+return -1;
+}
+ns->zone_size = zone_size / lbasz;
+ns->zone_capacity = zone_cap / lbasz;
+
+nz = DIV_ROUND_UP(ns->size / lbasz, ns->zone_size);
+ns->num_zones = nz;
+ns->zone_array_size = sizeof(NvmeZone) * nz;
+ns->zone_size_log2 = 0;
+if (is_power_of_2(ns->zone_size)) {
+ns->zone_size_log2 = 63 - clz64(ns->zone_size);
+}
+
+return 0;
+}
+
+static void nvme_init_zone_state(NvmeNamespace *ns)
+{
+uint64_t start = 0, zone_size = ns->zone_size;
+uint64_t capacity = ns->num_zones * zone_size;
+NvmeZone *zone;
+int i;
+
+ns->zone_array = g_malloc0(ns->zone_array_size);
+
+QTAILQ_INIT(>exp_open_zones);
+QTAILQ_INIT(>imp_open_zones);
+QTAILQ_INIT(>closed_zones);
+QTAILQ_INIT(>full_zones);
+
+zone = ns->zone_array;
+for (i = 0; i < ns->num_zones; i++, zone++) {
+if (start + zone_size > capacity) {
+zone_size = capacity - start;
+}
+zone->d.zt = NVME_ZONE_TYPE_SEQ_WRITE;
+nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY);
+zone->d.za = 0;
+

[PATCH v7 06/11] hw/block/nvme: Introduce max active and open zone limits

2020-10-18 Thread Dmitry Fomichev

Add two module properties, "max_active" and "max_open" to control
the maximum number of zones that can be active or open. Once these
variables are set to non-default values, these limits are checked
during I/O and Too Many Active or Too Many Open command status is
returned if they are exceeded.

Signed-off-by: Hans Holmberg 
Signed-off-by: Dmitry Fomichev 
---
 hw/block/nvme-ns.c | 28 -
 hw/block/nvme-ns.h | 41 +++
 hw/block/nvme.c| 99 ++
 3 files changed, 166 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index fedfad595c..8d9e11eef2 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -118,6 +118,20 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, 
Error **errp)
 ns->zone_size_log2 = 63 - clz64(ns->zone_size);
 }
 
+/* Make sure that the values of all ZNS properties are sane */
+if (ns->params.max_open_zones > nz) {
+error_setg(errp,
+   "max_open_zones value %u exceeds the number of zones %u",
+   ns->params.max_open_zones, nz);
+return -1;
+}
+if (ns->params.max_active_zones > nz) {
+error_setg(errp,
+   "max_active_zones value %u exceeds the number of zones %u",
+   ns->params.max_active_zones, nz);
+return -1;
+}
+
 return 0;
 }
 
@@ -172,8 +186,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace 
*ns, int lba_index,
 id_ns_z = g_malloc0(sizeof(NvmeIdNsZoned));
 
 /* MAR/MOR are zeroes-based, 0x means no limit */
-id_ns_z->mar = 0x;
-id_ns_z->mor = 0x;
+id_ns_z->mar = cpu_to_le32(ns->params.max_active_zones - 1);
+id_ns_z->mor = cpu_to_le32(ns->params.max_open_zones - 1);
 id_ns_z->zoc = 0;
 id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00;
 
@@ -199,6 +213,9 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns)
 uint32_t set_state;
 int i;
 
+ns->nr_active_zones = 0;
+ns->nr_open_zones = 0;
+
 zone = ns->zone_array;
 for (i = 0; i < ns->num_zones; i++, zone++) {
 switch (nvme_get_zone_state(zone)) {
@@ -209,6 +226,7 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns)
 QTAILQ_REMOVE(>exp_open_zones, zone, entry);
 break;
 case NVME_ZONE_STATE_CLOSED:
+nvme_aor_inc_active(ns);
 /* fall through */
 default:
 continue;
@@ -216,6 +234,9 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns)
 
 if (zone->d.wp == zone->d.zslba) {
 set_state = NVME_ZONE_STATE_EMPTY;
+} else if (ns->params.max_active_zones == 0 ||
+   ns->nr_active_zones < ns->params.max_active_zones) {
+set_state = NVME_ZONE_STATE_CLOSED;
 } else {
 set_state = NVME_ZONE_STATE_CLOSED;
 }
@@ -224,6 +245,7 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns)
 case NVME_ZONE_STATE_CLOSED:
 trace_pci_nvme_clear_ns_close(nvme_get_zone_state(zone),
   zone->d.zslba);
+nvme_aor_inc_active(ns);
 QTAILQ_INSERT_TAIL(>closed_zones, zone, entry);
 break;
 case NVME_ZONE_STATE_EMPTY:
@@ -326,6 +348,8 @@ static Property nvme_ns_props[] = {
 DEFINE_PROP_SIZE("zone_capacity", NvmeNamespace, params.zone_cap_bs, 0),
 DEFINE_PROP_BOOL("cross_zone_read", NvmeNamespace,
  params.cross_zone_read, false),
+DEFINE_PROP_UINT32("max_active", NvmeNamespace, params.max_active_zones, 
0),
+DEFINE_PROP_UINT32("max_open", NvmeNamespace, params.max_open_zones, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 170cbb8cdc..b0633d0def 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -34,6 +34,8 @@ typedef struct NvmeNamespaceParams {
 bool cross_zone_read;
 uint64_t zone_size_bs;
 uint64_t zone_cap_bs;
+uint32_t max_active_zones;
+uint32_t max_open_zones;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
@@ -56,6 +58,8 @@ typedef struct NvmeNamespace {
 uint64_tzone_capacity;
 uint64_tzone_array_size;
 uint32_tzone_size_log2;
+int32_t nr_open_zones;
+int32_t nr_active_zones;
 
 NvmeNamespaceParams params;
 } NvmeNamespace;
@@ -123,6 +127,43 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone)
st != NVME_ZONE_STATE_OFFLINE;
 }
 
+static inline void nvme_aor_inc_open(NvmeNamespace *ns)
+{
+assert(ns->nr_open_zones >= 0);
+if (ns->params.max_open_zones) {
+ns->nr_open_zones++;
+assert(ns->nr_open_zones <= ns->params.max_open_zones);
+}
+}
+
+static inline void nvme_aor_dec_open(NvmeNamespace *ns)
+{
+if (ns->params.max_open_zones) {
+assert(ns->nr_open_zones > 0);
+ns->nr_open_zones--;
+}
+

[PATCH v7 08/11] hw/block/nvme: Add injection of Offline/Read-Only zones

2020-10-18 Thread Dmitry Fomichev

ZNS specification defines two zone conditions for the zones that no
longer can function properly, possibly because of flash wear or other
internal fault. It is useful to be able to "inject" a small number of
such zones for testing purposes.

This commit defines two optional device properties, "offline_zones"
and "rdonly_zones". Users can assign non-zero values to these variables
to specify the number of zones to be initialized as Offline or
Read-Only. The actual number of injected zones may be smaller than the
requested amount - Read-Only and Offline counts are expected to be much
smaller than the total number of zones on a drive.

Signed-off-by: Dmitry Fomichev 
---
 hw/block/nvme-ns.c | 64 ++
 hw/block/nvme-ns.h |  2 ++
 2 files changed, 66 insertions(+)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 255ded2b43..d050f97909 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -21,6 +21,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
+#include "crypto/random.h"
 
 #include "hw/qdev-properties.h"
 #include "hw/qdev-core.h"
@@ -132,6 +133,32 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, 
Error **errp)
 return -1;
 }
 
+if (ns->params.zd_extension_size) {
+if (ns->params.zd_extension_size & 0x3f) {
+error_setg(errp,
+"zone descriptor extension size must be a multiple of 64B");
+return -1;
+}
+if ((ns->params.zd_extension_size >> 6) > 0xff) {
+error_setg(errp, "zone descriptor extension size is too large");
+return -1;
+}
+}
+
+if (ns->params.max_open_zones < nz) {
+if (ns->params.nr_offline_zones > nz - ns->params.max_open_zones) {
+error_setg(errp, "offline_zones value %u is too large",
+ns->params.nr_offline_zones);
+return -1;
+}
+if (ns->params.nr_rdonly_zones >
+nz - ns->params.max_open_zones - ns->params.nr_offline_zones) {
+error_setg(errp, "rdonly_zones value %u is too large",
+ns->params.nr_rdonly_zones);
+return -1;
+}
+}
+
 return 0;
 }
 
@@ -140,7 +167,9 @@ static void nvme_init_zone_state(NvmeNamespace *ns)
 uint64_t start = 0, zone_size = ns->zone_size;
 uint64_t capacity = ns->num_zones * zone_size;
 NvmeZone *zone;
+uint32_t rnd;
 int i;
+uint16_t zs;
 
 ns->zone_array = g_malloc0(ns->zone_array_size);
 if (ns->params.zd_extension_size) {
@@ -167,6 +196,37 @@ static void nvme_init_zone_state(NvmeNamespace *ns)
 zone->w_ptr = start;
 start += zone_size;
 }
+
+/* If required, make some zones Offline or Read Only */
+
+for (i = 0; i < ns->params.nr_offline_zones; i++) {
+do {
+qcrypto_random_bytes(, sizeof(rnd), NULL);
+rnd %= ns->num_zones;
+} while (rnd < ns->params.max_open_zones);
+zone = >zone_array[rnd];
+zs = nvme_get_zone_state(zone);
+if (zs != NVME_ZONE_STATE_OFFLINE) {
+nvme_set_zone_state(zone, NVME_ZONE_STATE_OFFLINE);
+} else {
+i--;
+}
+}
+
+for (i = 0; i < ns->params.nr_rdonly_zones; i++) {
+do {
+qcrypto_random_bytes(, sizeof(rnd), NULL);
+rnd %= ns->num_zones;
+} while (rnd < ns->params.max_open_zones);
+zone = >zone_array[rnd];
+zs = nvme_get_zone_state(zone);
+if (zs != NVME_ZONE_STATE_OFFLINE &&
+zs != NVME_ZONE_STATE_READ_ONLY) {
+nvme_set_zone_state(zone, NVME_ZONE_STATE_READ_ONLY);
+} else {
+i--;
+}
+}
 }
 
 static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index,
@@ -360,6 +420,10 @@ static Property nvme_ns_props[] = {
 DEFINE_PROP_UINT32("max_open", NvmeNamespace, params.max_open_zones, 0),
 DEFINE_PROP_UINT32("zone_descr_ext_size", NvmeNamespace,
params.zd_extension_size, 0),
+DEFINE_PROP_UINT32("offline_zones", NvmeNamespace,
+   params.nr_offline_zones, 0),
+DEFINE_PROP_UINT32("rdonly_zones", NvmeNamespace,
+   params.nr_rdonly_zones, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 2d70a13701..d65d8b0930 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -37,6 +37,8 @@ typedef struct NvmeNamespaceParams {
 uint32_t max_active_zones;
 uint32_t max_open_zones;
 uint32_t zd_extension_size;
+uint32_t nr_offline_zones;
+uint32_t nr_rdonly_zones;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
-- 
2.21.0

[PATCH v7 07/11] hw/block/nvme: Support Zone Descriptor Extensions

2020-10-18 Thread Dmitry Fomichev

Zone Descriptor Extension is a label that can be assigned to a zone.
It can be set to an Empty zone and it stays assigned until the zone
is reset.

This commit adds a new optional module property, "zone_descr_ext_size".
Its value must be a multiple of 64 bytes. If this value is non-zero,
it becomes possible to assign extensions of that size to any Empty
zones. The default value for this property is 0, therefore setting
extensions is disabled by default.

Signed-off-by: Hans Holmberg 
Signed-off-by: Dmitry Fomichev 
Reviewed-by: Klaus Jensen 
---
 hw/block/nvme-ns.c| 14 ++--
 hw/block/nvme-ns.h|  8 +++
 hw/block/nvme.c   | 51 +--
 hw/block/trace-events |  2 ++
 4 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 8d9e11eef2..255ded2b43 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -143,6 +143,10 @@ static void nvme_init_zone_state(NvmeNamespace *ns)
 int i;
 
 ns->zone_array = g_malloc0(ns->zone_array_size);
+if (ns->params.zd_extension_size) {
+ns->zd_extensions = g_malloc0(ns->params.zd_extension_size *
+  ns->num_zones);
+}
 
 QTAILQ_INIT(>exp_open_zones);
 QTAILQ_INIT(>imp_open_zones);
@@ -192,7 +196,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace 
*ns, int lba_index,
 id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00;
 
 id_ns_z->lbafe[lba_index].zsze = cpu_to_le64(ns->zone_size);
-id_ns_z->lbafe[lba_index].zdes = 0;
+id_ns_z->lbafe[lba_index].zdes =
+ns->params.zd_extension_size >> 6; /* Units of 64B */
 
 ns->csi = NVME_CSI_ZONED;
 ns->id_ns.nsze = cpu_to_le64(ns->zone_size * ns->num_zones);
@@ -232,7 +237,9 @@ static void nvme_zoned_clear_ns(NvmeNamespace *ns)
 continue;
 }
 
-if (zone->d.wp == zone->d.zslba) {
+if (zone->d.za & NVME_ZA_ZD_EXT_VALID) {
+set_state = NVME_ZONE_STATE_CLOSED;
+} else if (zone->d.wp == zone->d.zslba) {
 set_state = NVME_ZONE_STATE_EMPTY;
 } else if (ns->params.max_active_zones == 0 ||
ns->nr_active_zones < ns->params.max_active_zones) {
@@ -320,6 +327,7 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
 if (ns->params.zoned) {
 g_free(ns->id_ns_zoned);
 g_free(ns->zone_array);
+g_free(ns->zd_extensions);
 }
 }
 
@@ -350,6 +358,8 @@ static Property nvme_ns_props[] = {
  params.cross_zone_read, false),
 DEFINE_PROP_UINT32("max_active", NvmeNamespace, params.max_active_zones, 
0),
 DEFINE_PROP_UINT32("max_open", NvmeNamespace, params.max_open_zones, 0),
+DEFINE_PROP_UINT32("zone_descr_ext_size", NvmeNamespace,
+   params.zd_extension_size, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index b0633d0def..2d70a13701 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -36,6 +36,7 @@ typedef struct NvmeNamespaceParams {
 uint64_t zone_cap_bs;
 uint32_t max_active_zones;
 uint32_t max_open_zones;
+uint32_t zd_extension_size;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
@@ -58,6 +59,7 @@ typedef struct NvmeNamespace {
 uint64_tzone_capacity;
 uint64_tzone_array_size;
 uint32_tzone_size_log2;
+uint8_t *zd_extensions;
 int32_t nr_open_zones;
 int32_t nr_active_zones;
 
@@ -127,6 +129,12 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone)
st != NVME_ZONE_STATE_OFFLINE;
 }
 
+static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns,
+ uint32_t zone_idx)
+{
+return >zd_extensions[zone_idx * ns->params.zd_extension_size];
+}
+
 static inline void nvme_aor_inc_open(NvmeNamespace *ns)
 {
 assert(ns->nr_open_zones >= 0);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b3cdfccdfb..fbf27a5098 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1747,6 +1747,26 @@ static bool nvme_cond_offline_all(uint8_t state)
 return state == NVME_ZONE_STATE_READ_ONLY;
 }
 
+static uint16_t nvme_set_zd_ext(NvmeNamespace *ns, NvmeZone *zone,
+uint8_t state)
+{
+uint16_t status;
+
+if (state == NVME_ZONE_STATE_EMPTY) {
+nvme_auto_transition_zone(ns, false, true);
+status = nvme_aor_check(ns, 1, 0);
+if (status != NVME_SUCCESS) {
+return status;
+}
+nvme_aor_inc_active(ns);
+zone->d.za |= NVME_ZA_ZD_EXT_VALID;
+nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED);
+return NVME_SUCCESS;
+}
+
+return NVME_ZONE_INVAL_TRANSITION;
+}
+
 typedef uint16_t (*op_handler_t)(NvmeNamespace *, NvmeZone *,
  uint8_t);
 typedef bool (*need_to_proc_zone_t)(uint8_t);
@@ -1787,6 +1807,7 @@ static uint16_t

[PATCH v7 09/11] hw/block/nvme: Document zoned parameters in usage text

2020-10-18 Thread Dmitry Fomichev

Added brief descriptions of the new device properties that are
now available to users to configure features of Zoned Namespace
Command Set in the emulator.

This patch is for documentation only, no functionality change.

Signed-off-by: Dmitry Fomichev 
---
 hw/block/nvme.c | 41 +++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index fbf27a5098..3b9ea326d7 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -9,7 +9,7 @@
  */
 
 /**
- * Reference Specs: http://www.nvmexpress.org, 1.2, 1.1, 1.0e
+ * Reference Specs: http://www.nvmexpress.org, 1.4, 1.3, 1.2, 1.1, 1.0e
  *
  *  https://nvmexpress.org/developers/nvme-specification/
  */
@@ -23,7 +23,8 @@
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
  *  mdts=
- *  -device nvme-ns,drive=,bus=bus_name,nsid=
+ *  -device nvme-ns,drive=,bus=,nsid=, \
+ *  zoned=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
@@ -49,6 +50,42 @@
  *   completion when there are no oustanding AERs. When the maximum number of
  *   enqueued events are reached, subsequent events will be dropped.
  *
+ * Setting `zoned` to true selects Zoned Command Set at the namespace.
+ * In this case, the following options are available to configure zoned
+ * operation:
+ * zone_size=
+ * The number may be followed by K, M, G as in kilo-, mega- or giga.
+ *
+ * zone_capacity=
+ * The value 0 (default) forces zone capacity to be the same as zone
+ * size. The value of this property may not exceed zone size.
+ *
+ * zone_descr_ext_size=
+ * This value needs to be specified in 64B units. If it is zero,
+ * namespace(s) will not support zone descriptor extensions.
+ *
+ * max_active=
+ *
+ * max_open=
+ *
+ * zone_append_size_limit=
+ * The maximum I/O size that can be supported by Zone Append
+ * command. Since internally this this value is maintained as
+ * ZASL = log2( / ), some
+ * values assigned to this property may be rounded down and
+ * result in a lower maximum ZA data size being in effect.
+ * By setting this property to 0, user can make ZASL to be
+ * equial to MDTS.
+ *
+ * offline_zones=
+ *
+ * rdonly_zones=
+ *
+ * cross_zone_read=
+ *
+ * fill_pattern=
+ * The byte pattern to return for any portions of unwritten data
+ * during read.
  */
 
 #include "qemu/osdep.h"
-- 
2.21.0

[PATCH v7 03/11] hw/block/nvme: Add support for Namespace Types

2020-10-18 Thread Dmitry Fomichev

From: Niklas Cassel 

Define the structures and constants required to implement
Namespace Types support.

Namespace Types introduce a new command set, "I/O Command Sets",
that allows the host to retrieve the command sets associated with
a namespace. Introduce support for the command set and enable
detection for the NVM Command Set.

The new workflows for identify commands rely heavily on zero-filled
identify structs. E.g., certain CNS commands are defined to return
a zero-filled identify struct when an inactive namespace NSID
is supplied.

Add a helper function in order to avoid code duplication when
reporting zero-filled identify structures.

Signed-off-by: Niklas Cassel 
Signed-off-by: Dmitry Fomichev 
---
 hw/block/nvme-ns.c|   2 +
 hw/block/nvme-ns.h|   1 +
 hw/block/nvme.c   | 169 +++---
 hw/block/trace-events |   7 ++
 include/block/nvme.h  |  65 
 5 files changed, 202 insertions(+), 42 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index de735eb9f3..c0362426cc 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -41,6 +41,8 @@ static void nvme_ns_init(NvmeNamespace *ns)
 
 id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
 
+ns->csi = NVME_CSI_NVM;
+
 /* no thin provisioning */
 id_ns->ncap = id_ns->nsze;
 id_ns->nuse = id_ns->ncap;
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index a38071884a..d795e44bab 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -31,6 +31,7 @@ typedef struct NvmeNamespace {
 int64_t  size;
 NvmeIdNs id_ns;
 const uint32_t *iocs;
+uint8_t  csi;
 
 NvmeNamespaceParams params;
 } NvmeNamespace;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 29139d8a17..ca0d0abf5c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1503,6 +1503,13 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_SUCCESS;
 }
 
+static uint16_t nvme_rpt_empty_id_struct(NvmeCtrl *n, NvmeRequest *req)
+{
+uint8_t id[NVME_IDENTIFY_DATA_SIZE] = {};
+
+return nvme_dma(n, id, sizeof(id), DMA_DIRECTION_FROM_DEVICE, req);
+}
+
 static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req)
 {
 trace_pci_nvme_identify_ctrl();
@@ -1511,11 +1518,23 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, 
NvmeRequest *req)
 DMA_DIRECTION_FROM_DEVICE, req);
 }
 
+static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeIdentify *c = (NvmeIdentify *)>cmd;
+
+trace_pci_nvme_identify_ctrl_csi(c->csi);
+
+if (c->csi == NVME_CSI_NVM) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
 static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
-NvmeIdNs *id_ns, inactive = { 0 };
 uint32_t nsid = le32_to_cpu(c->nsid);
 
 trace_pci_nvme_identify_ns(nsid);
@@ -1526,23 +1545,46 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req)
 
 ns = nvme_ns(n, nsid);
 if (unlikely(!ns)) {
-id_ns = 
-} else {
-id_ns = >id_ns;
+return nvme_rpt_empty_id_struct(n, req);
 }
 
-return nvme_dma(n, (uint8_t *)id_ns, sizeof(NvmeIdNs),
+return nvme_dma(n, (uint8_t *)>id_ns, sizeof(NvmeIdNs),
 DMA_DIRECTION_FROM_DEVICE, req);
 }
 
+static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeNamespace *ns;
+NvmeIdentify *c = (NvmeIdentify *)>cmd;
+uint32_t nsid = le32_to_cpu(c->nsid);
+
+trace_pci_nvme_identify_ns_csi(nsid, c->csi);
+
+if (!nvme_nsid_valid(n, nsid) || nsid == NVME_NSID_BROADCAST) {
+return NVME_INVALID_NSID | NVME_DNR;
+}
+
+ns = nvme_ns(n, nsid);
+if (unlikely(!ns)) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+
+if (c->csi == NVME_CSI_NVM) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
 static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
 {
+NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
-static const int data_len = NVME_IDENTIFY_DATA_SIZE;
 uint32_t min_nsid = le32_to_cpu(c->nsid);
-uint32_t *list;
-uint16_t ret;
-int j = 0;
+uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {};
+static const int data_len = sizeof(list);
+uint32_t *list_ptr = (uint32_t *)list;
+int i, j = 0;
 
 trace_pci_nvme_identify_nslist(min_nsid);
 
@@ -1556,20 +1598,54 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_NSID | NVME_DNR;
 }
 
-list = g_malloc0(data_len);
-for (int i = 1; i <= n->num_namespaces; i++) {
-if (i <= min_nsid || !nvme_ns(n, i)) {
+for (i = 1; i <= n->num_namespaces; i++) {
+ns = nvme_ns(n, i);
+if (!ns) {
 continue;
 }
-list[j++] =

[PATCH v7 04/11] hw/block/nvme: Support allocated CNS command variants

2020-10-18 Thread Dmitry Fomichev

From: Niklas Cassel 

Many CNS commands have "allocated" command variants. These include
a namespace as long as it is allocated, that is a namespace is
included regardless if it is active (attached) or not.

While these commands are optional (they are mandatory for controllers
supporting the namespace attachment command), our QEMU implementation
is more complete by actually providing support for these CNS values.

However, since our QEMU model currently does not support the namespace
attachment command, these new allocated CNS commands will return the
same result as the active CNS command variants.

In NVMe, a namespace is active if it exists and is attached to the
controller.

CAP.CSS (together with the I/O Command Set data structure) defines
what command sets are supported by the controller.

CC.CSS (together with Set Profile) can be set to enable a subset of
the available command sets.

Even if a user configures CC.CSS to e.g. Admin only, NVM namespaces
will still be attached (and thus marked as active).
Similarly, if a user configures CC.CSS to e.g. NVM, ZNS namespaces
will still be attached (and thus marked as active).

However, any operation from a disabled command set will result in a
Invalid Command Opcode.

Add a new Boolean namespace property, "attached", to provide the most
basic namespace attachment support. The default value for this new
property is true. Also, implement the logic in the new CNS values to
include/exclude namespaces based on this new property. The only thing
missing is hooking up the actual Namespace Attachment command opcode,
which will allow a user to toggle the "attached" flag per namespace.

The reason for not hooking up this command completely is because the
NVMe specification requires the namespace management command to be
supported if the namespace attachment command is supported.

Signed-off-by: Niklas Cassel 
Signed-off-by: Dmitry Fomichev 
---
 hw/block/nvme-ns.c   |  1 +
 hw/block/nvme-ns.h   |  1 +
 hw/block/nvme.c  | 68 
 include/block/nvme.h | 20 +++--
 4 files changed, 70 insertions(+), 20 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index c0362426cc..974aea33f7 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -132,6 +132,7 @@ static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
+DEFINE_PROP_BOOL("attached", NvmeNamespace, params.attached, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index d795e44bab..d6b2808b97 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -21,6 +21,7 @@
 
 typedef struct NvmeNamespaceParams {
 uint32_t nsid;
+bool attached;
 QemuUUID uuid;
 } NvmeNamespaceParams;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ca0d0abf5c..93728e51b3 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1062,6 +1062,9 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 if (unlikely(!req->ns)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
+if (!req->ns->params.attached) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
 
 if (!(req->ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
 trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
@@ -1222,6 +1225,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, 
uint32_t buf_len,
 uint32_t trans_len;
 NvmeNamespace *ns;
 time_t current_ms;
+int i;
 
 if (off >= sizeof(smart)) {
 return NVME_INVALID_FIELD | NVME_DNR;
@@ -1232,15 +1236,18 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 if (!ns) {
 return NVME_INVALID_NSID | NVME_DNR;
 }
-nvme_set_blk_stats(ns, );
+if (ns->params.attached) {
+nvme_set_blk_stats(ns, );
+}
 } else {
-int i;
-
 for (i = 1; i <= n->num_namespaces; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 continue;
 }
+if (!ns->params.attached) {
+continue;
+}
 nvme_set_blk_stats(ns, );
 }
 }
@@ -1531,7 +1538,8 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req,
+ bool only_active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -1548,11 +1556,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req)
 return nvme_rpt_empty_id_struct(n, req);
 }
 
+if (only_active && !ns->params.attached) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+
 return nvme_dma(n, (uint8_t

[PATCH v7 01/11] hw/block/nvme: Add Commands Supported and Effects log

2020-10-18 Thread Dmitry Fomichev

This log page becomes necessary to implement to allow checking for
Zone Append command support in Zoned Namespace Command Set.

This commit adds the code to report this log page for NVM Command
Set only. The parts that are specific to zoned operation will be
added later in the series.

All incoming admin and i/o commands are now only processed if their
corresponding support bits are set in this log. This provides an
easy way to control what commands to support and what not to
depending on set CC.CSS.

Signed-off-by: Dmitry Fomichev 
---
 hw/block/nvme-ns.h|  1 +
 hw/block/nvme.c   | 98 +++
 hw/block/trace-events |  2 +
 include/block/nvme.h  | 19 +
 4 files changed, 111 insertions(+), 9 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 83734f4606..ea8c2f785d 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,7 @@ typedef struct NvmeNamespace {
 int32_t  bootindex;
 int64_t  size;
 NvmeIdNs id_ns;
+const uint32_t *iocs;
 
 NvmeNamespaceParams params;
 } NvmeNamespace;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 9d30ca69dc..5a9493d89f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -111,6 +111,28 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
 [NVME_TIMESTAMP]= NVME_FEAT_CAP_CHANGE,
 };
 
+static const uint32_t nvme_cse_acs[256] = {
+[NVME_ADM_CMD_DELETE_SQ]= NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_CREATE_SQ]= NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_DELETE_CQ]= NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_CREATE_CQ]= NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_IDENTIFY] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_GET_LOG_PAGE] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
+};
+
+static const uint32_t nvme_cse_iocs_none[256] = {
+};
+
+static const uint32_t nvme_cse_iocs_nvm[256] = {
+[NVME_CMD_FLUSH]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_WRITE]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_READ] = NVME_CMD_EFF_CSUPP,
+};
+
 static void nvme_process_sq(void *opaque);
 
 static uint16_t nvme_cid(NvmeRequest *req)
@@ -1032,10 +1054,6 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req),
   req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode));
 
-if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_ADMIN_ONLY) {
-return NVME_INVALID_OPCODE | NVME_DNR;
-}
-
 if (!nvme_nsid_valid(n, nsid)) {
 return NVME_INVALID_NSID | NVME_DNR;
 }
@@ -1045,6 +1063,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
+if (!(req->ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
+trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
+return NVME_INVALID_OPCODE | NVME_DNR;
+}
+
 switch (req->cmd.opcode) {
 case NVME_CMD_FLUSH:
 return nvme_flush(n, req);
@@ -1054,8 +1077,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 case NVME_CMD_READ:
 return nvme_rw(n, req);
 default:
-trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
-return NVME_INVALID_OPCODE | NVME_DNR;
+assert(false);
 }
 }
 
@@ -1291,6 +1313,39 @@ static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 DMA_DIRECTION_FROM_DEVICE, req);
 }
 
+static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint32_t buf_len,
+ uint64_t off, NvmeRequest *req)
+{
+NvmeEffectsLog log = {};
+const uint32_t *src_iocs = NULL;
+uint32_t trans_len;
+
+trace_pci_nvme_cmd_supp_and_effects_log_read();
+
+if (off >= sizeof(log)) {
+trace_pci_nvme_err_invalid_effects_log_offset(off);
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+switch (NVME_CC_CSS(n->bar.cc)) {
+case NVME_CC_CSS_NVM:
+src_iocs = nvme_cse_iocs_nvm;
+case NVME_CC_CSS_ADMIN_ONLY:
+break;
+}
+
+memcpy(log.acs, nvme_cse_acs, sizeof(nvme_cse_acs));
+
+if (src_iocs) {
+memcpy(log.iocs, src_iocs, sizeof(log.iocs));
+}
+
+trans_len = MIN(sizeof(log) - off, buf_len);
+
+return nvme_dma(n, ((uint8_t *)) + off, trans_len,
+DMA_DIRECTION_FROM_DEVICE, req);
+}
+
 static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeCmd *cmd = >cmd;
@@ -1334,6 +1389,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_smart_info(n, rae, len, off, req);
 case NVME_LOG_FW_SLOT_INFO:
 return nvme_fw_log_info(n, len, off, req);
+

[PATCH v7 02/11] hw/block/nvme: Generate namespace UUIDs

2020-10-18 Thread Dmitry Fomichev

In NVMe 1.4, a namespace must report an ID descriptor of UUID type
if it doesn't support EUI64 or NGUID. Add a new namespace property,
"uuid", that provides the user the option to either specify the UUID
explicitly or have a UUID generated automatically every time a
namespace is initialized.

Suggested-by: Klaus Jansen 
Signed-off-by: Dmitry Fomichev 
Reviewed-by: Klaus Jansen 
---
 hw/block/nvme-ns.c | 1 +
 hw/block/nvme-ns.h | 1 +
 hw/block/nvme.c| 9 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index b69cdaf27e..de735eb9f3 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -129,6 +129,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
+DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index ea8c2f785d..a38071884a 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -21,6 +21,7 @@
 
 typedef struct NvmeNamespaceParams {
 uint32_t nsid;
+QemuUUID uuid;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5a9493d89f..29139d8a17 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1574,6 +1574,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 
 static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
 {
+NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
 uint32_t nsid = le32_to_cpu(c->nsid);
 uint8_t list[NVME_IDENTIFY_DATA_SIZE];
@@ -1593,7 +1594,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_NSID | NVME_DNR;
 }
 
-if (unlikely(!nvme_ns(n, nsid))) {
+ns = nvme_ns(n, nsid);
+if (unlikely(!ns)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
@@ -1602,12 +1604,11 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
*n, NvmeRequest *req)
 /*
  * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data
  * structure, a Namespace UUID (nidt = 0x3) must be reported in the
- * Namespace Identification Descriptor. Add a very basic Namespace UUID
- * here.
+ * Namespace Identification Descriptor. Add the namespace UUID here.
  */
 ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID;
 ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN;
-stl_be_p(_descrs->uuid.v, nsid);
+memcpy(_descrs->uuid.v, ns->params.uuid.data, NVME_NIDT_UUID_LEN);
 
 return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE,
 DMA_DIRECTION_FROM_DEVICE, req);
-- 
2.21.0

[PATCH v7 00/11] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set

2020-10-18 Thread Dmitry Fomichev

v6 -> v7:

 - Introduce ns->iocs initialization function earlier in the series,
   in CSE Log patch.

 - Set NVM iocs for zoned namespaces when CC.CSS is set to
   NVME_CC_CSS_NVM.

 - Clean up code in CSE log handler.
 
v5 -> v6:

 - Remove zoned state persistence code. Replace position-independent
   zone lists with QTAILQs.

 - Close all open zones upon clearing of the controller. This is
   a similar procedure to the one previously performed upon powering
   up with zone persistence. 

 - Squash NS Types and ZNS triplets of commits to keep definitions
   and trace event definitions together with the implementation code.

 - Move namespace UUID generation to a separate patch. Add the new
   "uuid" property as suggested by Klaus.

 - Rework Commands and Effects patch to make sure that the log is
   always in sync with the actual set of commands supported.

 - Add two refactoring commits at the end of the series to
   optimize read and write i/o path.

- Incorporate feedback from Keith, Klaus and Niklas:

  * fix rebase errors in nvme_identify_ns_descr_list()
  * remove unnecessary code from nvme_write_bar()
  * move csi to NvmeNamespace and use it from the beginning in NSTypes
patch
  * change zone read processing to cover all corner cases with RAZB=1
  * sync w_ptr and d.wp in case of a i/o error at the preceding zone
  * reword the commit message in active/inactive patch with the new
text from Niklas
  * correct dlfeat reporting depending on the fill pattern set
  * add more checks for "attached" n/s parameter to prevent i/o and
get/set features on inactive namespaces
  * Use DEFINE_PROP_SIZE and DEFINE_PROP_SIZE32 for zone size/capacity
and ZASL respectively
  * Improve zone size and capacity validation
  * Correctly report NSZE

v4 -> v5:

 - Rebase to the current qemu-nvme.

 - Use HostMemoryBackendFile as the backing storage for persistent
   zone metadata.

 - Fix the issue with filling the valid data in the next zone if RAZBi
   is enabled.

v3 -> v4:

 - Fix bugs introduced in v2/v3 for QD > 1 operation. Now, all writes
   to a zone happen at the new write pointer variable, zone->w_ptr,
   that is advanced right after submitting the backend i/o. The existing
   zone->d.wp variable is updated upon the successful write completion
   and it is used for zone reporting. Some code has been split from
   nvme_finalize_zoned_write() function to a new function,
   nvme_advance_zone_wp().

 - Make the code compile under mingw. Switch to using QEMU API for
   mmap/msync, i.e. memory_region...(). Since mmap is not available in
   mingw (even though there is mman-win32 library available on Github),
   conditional compilation is added around these calls to avoid
   undefined symbols under mingw. A better fix would be to add stub
   functions to softmmu/memory.c for the case when CONFIG_POSIX is not
   defined, but such change is beyond the scope of this patchset and it
   can be made in a separate patch.

 - Correct permission mask used to open zone metadata file.

 - Fold "Define 64 bit cqe.result" patch into ZNS commit.

 - Use clz64/clz32 instead of defining nvme_ilog2() function.

 - Simplify rpt_empty_id_struct() code, move nvme_fill_data() back
   to ZNS patch.

 - Fix a power-on processing bug.

 - Rename NVME_CMD_ZONE_APND to NVME_CMD_ZONE_APPEND.

 - Make the list of review comments addressed in v2 of the series
   (see below).

v2 -> v3:

 - Moved nvme_fill_data() function to the NSTypes patch as it is
   now used there to output empty namespace identify structs.
 - Fixed typo in Maxim's email address.

v1 -> v2:

 - Rebased on top of qemu-nvme/next branch.
 - Incorporated feedback from Klaus and Alistair.
* Allow a subset of CSE log to be read, not the entire log
* Assign admin command entries in CSE log to ACS fields
* Set LPA bit 1 to indicate support of CSE log page
* Rename CC.CSS value CSS_ALL_NSTYPES (110b) to CSS_CSI
* Move the code to assign lbaf.ds to a separate patch
* Remove the change in firmware revision
* Change "driver" to "device" in comments and annotations
* Rename ZAMDS to ZASL
* Correct a few format expressions and some wording in
  trace event definitions
* Remove validation code to return NVME_CAP_EXCEEDED error
* Make ZASL to be equal to MDTS if "zone_append_size_limit"
  module parameter is not set
* Clean up nvme_zoned_init_ctrl() to make size calculations
  less confusing
* Avoid changing module parameters, use separate n/s variables
  if additional calculations are necessary to convert parameters
  to running values
* Use NVME_DEFAULT_ZONE_SIZE to assign the default zone size value
* Use default 0 for zone capacity meaning that zone capacity will
  be equal to zone size by default
* Issue warnings if user MAR/MOR values are too large and have
  to be adjusted
* Use unsigned values for MAR/MOR
 - Dropped "Simulate Zone Active excursions" patch.
   Excursion

Re: [PATCH] hw/riscv: microchip_pfsoc: IOSCBCTRL memmap entry

2020-10-18 Thread Bin Meng

Hi Ivan,

On Sat, Oct 17, 2020 at 12:31 AM Ivan Griffin  wrote:
>
> I don't know why it isn't documented in that PDF (or in the register map), 
> but if you check 
> https://github.com/polarfire-soc/polarfire-soc-bare-metal-library/blob/master/src/platform/drivers/mss_sys_services/mss_sys_services.h
>  you'll see the following
>
> ```
> typedef struct
> {
> volatile uint32_t SOFT_RESET;
> volatile uint32_t VDETECTOR;
> volatile uint32_t TVS_CONTROL;
> volatile uint32_t TVS_TEMP_A;
> volatile uint32_t TVS_TEMP_B;
> volatile uint32_t TVS_TEMP_C;
> volatile uint32_t TVS_VOLT_A;
> volatile uint32_t TVS_VOLT_B;
> volatile uint32_t TVS_VOLT_C;
> volatile uint32_t TVS_OUTPUT0;
> volatile uint32_t TVS_OUTPUT1;
> volatile uint32_t TVS_TRIGGER;
> volatile uint32_t TRIM_VDET1P05;
> volatile uint32_t TRIM_VDET1P8;
> volatile uint32_t TRIM_VDET2P5;
> volatile uint32_t TRIM_TVS;
> volatile uint32_t TRIM_GDET1P05;
> volatile uint32_t RESERVED0;
> volatile uint32_t RESERVED1;
> volatile uint32_t RESERVED2;
> volatile uint32_t SERVICES_CR;
> volatile uint32_t SERVICES_SR;
> volatile uint32_t USER_DETECTOR_SR;
> volatile uint32_t USER_DETECTOR_CR;
> volatile uint32_t MSS_SPI_CR;
>
> } SCBCTRL_TypeDef;
>
> #define MSS_SCBCTRL((SCBCTRL_TypeDef*) (0x3702UL))
>
> /*2kB bytes long mailbox.*/
> #define MSS_SCBMAILBOX ((uint32_t*) (0x37020800UL))
> ```
>
> And in 
> https://github.com/polarfire-soc/polarfire-soc-bare-metal-library/blob/master/src/platform/drivers/mss_sys_services/mss_sys_services.c
>  you'll see MSS_SCB and MSS_SCBMAILBOX used in many places to interact with 
> the FPGA system controller to perform various services.

It's actually documented, but not in the PDF file. I also spent some
time locating the doc when I do the DDR controller modeling work.

See Register Map/PF_SoC_RegMap_V1_1/MPFS250T/pfsoc_control_scb.htm in
https://www.microsemi.com/document-portal/doc_download/1244581-polarfire-soc-register-map

Regards,
Bin

RE: [PATCH v6 03/11] hw/block/nvme: Add support for Namespace Types

2020-10-18 Thread Dmitry Fomichev

> -Original Message-
> From: Niklas Cassel 
> Sent: Wednesday, October 14, 2020 9:01 AM
> To: Dmitry Fomichev 
> Cc: Keith Busch ; Klaus Jensen
> ; Kevin Wolf ; Philippe
> Mathieu-Daudé ; Maxim Levitsky
> ; Fam Zheng ; Alistair Francis
> ; Matias Bjorling ;
> Damien Le Moal ; qemu-bl...@nongnu.org;
> qemu-devel@nongnu.org
> Subject: Re: [PATCH v6 03/11] hw/block/nvme: Add support for Namespace
> Types
> 
> On Wed, Oct 14, 2020 at 06:42:04AM +0900, Dmitry Fomichev wrote:
> > From: Niklas Cassel 
> >
> > Define the structures and constants required to implement
> > Namespace Types support.
> >
> > Namespace Types introduce a new command set, "I/O Command Sets",
> > that allows the host to retrieve the command sets associated with
> > a namespace. Introduce support for the command set and enable
> > detection for the NVM Command Set.
> >
> > The new workflows for identify commands rely heavily on zero-filled
> > identify structs. E.g., certain CNS commands are defined to return
> > a zero-filled identify struct when an inactive namespace NSID
> > is supplied.
> >
> > Add a helper function in order to avoid code duplication when
> > reporting zero-filled identify structures.
> >
> > Signed-off-by: Niklas Cassel 
> > Signed-off-by: Dmitry Fomichev 
> > ---
> 
> (snip)
> 
> > @@ -2090,6 +2199,27 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
> >  n->bar.cc = 0;
> >  }
> >
> > +static void nvme_select_ns_iocs(NvmeCtrl *n)
> > +{
> > +NvmeNamespace *ns;
> > +int i;
> > +
> > +for (i = 1; i <= n->num_namespaces; i++) {
> > +ns = nvme_ns(n, i);
> > +if (!ns) {
> > +continue;
> > +}
> > +ns->iocs = nvme_cse_iocs_none;
> > +switch (ns->csi) {
> > +case NVME_CSI_NVM:
> > +if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
> > +ns->iocs = nvme_cse_iocs_nvm;
> > +}
> > +break;
> > +}
> > +}
> > +}
> > +
> >  static int nvme_start_ctrl(NvmeCtrl *n)
> >  {
> >  uint32_t page_bits = NVME_CC_MPS(n->bar.cc) + 12;
> > @@ -2188,6 +2318,8 @@ static int nvme_start_ctrl(NvmeCtrl *n)
> >
> >  QTAILQ_INIT(>aer_queue);
> >
> > +nvme_select_ns_iocs(n);
> > +
> >  return 0;
> >  }
> >
> > @@ -2655,7 +2787,6 @@ int nvme_register_namespace(NvmeCtrl *n,
> NvmeNamespace *ns, Error **errp)
> >  trace_pci_nvme_register_namespace(nsid);
> >
> >  n->namespaces[nsid - 1] = ns;
> > -ns->iocs = nvme_cse_iocs_nvm;
> >
> >  return 0;
> >  }
> 
> Considering how tightly coupled the three above diffs are with the
> Commands Supported and Effects log, and since patch 1 already adds
> the ns->iocs checking in nvme_admin_cmd() and nvme_io_cmd(),
> and since these three diffs are not really related to NS types,
> I think they should be moved to patch 1.
> 
> It really helps the reviewer if both the ns->iocs assignment
> and checking is done in the same patch, and introduced as early
> as possible. And since this code is needed/valid simply to check
> if ADMIN_ONLY is selected (even before NS Types were introduced),
> I don't see any reason not to introduce them in to patch 1
> together with the other ns->iocs stuff.
> 
> (We were always able to select a I/O Command Set using CC.CSS
> (Admin only/None, or NVM), NS types simply introduced the ability
> to select/enable more than one command set at the same time.)
> 

OK, perhaps it is better to introduce nvme_select_ns_iocs() earlier,
in the CSE Log patch. This way there will be no need to modify
nvme_start_ctrl() again in this patch. Since ns->csi is not defined
until this commit, the initial code for nvme_select_ns_iocs() will
be simpler.

> 
> Kind regards,
> Niklas

RE: [PATCH v6 05/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-18 Thread Dmitry Fomichev

> -Original Message-
> From: Niklas Cassel 
> Sent: Wednesday, October 14, 2020 7:59 AM
> To: Dmitry Fomichev 
> Cc: Keith Busch ; Klaus Jensen
> ; Kevin Wolf ; Philippe
> Mathieu-Daudé ; Maxim Levitsky
> ; Fam Zheng ; Alistair Francis
> ; Matias Bjorling ;
> Damien Le Moal ; qemu-bl...@nongnu.org;
> qemu-devel@nongnu.org
> Subject: Re: [PATCH v6 05/11] hw/block/nvme: Support Zoned Namespace
> Command Set
> 
> On Wed, Oct 14, 2020 at 06:42:06AM +0900, Dmitry Fomichev wrote:
> > The emulation code has been changed to advertise NVM Command Set
> when
> > "zoned" device property is not set (default) and Zoned Namespace
> > Command Set otherwise.
> >
> > Define values and structures that are needed to support Zoned
> > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller
> emulator.
> > Define trace events where needed in newly introduced code.
> >
> > In order to improve scalability, all open, closed and full zones
> > are organized in separate linked lists. Consequently, almost all
> > zone operations don't require scanning of the entire zone array
> > (which potentially can be quite large) - it is only necessary to
> > enumerate one or more zone lists.
> >
> > Handlers for three new NVMe commands introduced in Zoned Namespace
> > Command Set specification are added, namely for Zone Management
> > Receive, Zone Management Send and Zone Append.
> >
> > Device initialization code has been extended to create a proper
> > configuration for zoned operation using device properties.
> >
> > Read/Write command handler is modified to only allow writes at the
> > write pointer if the namespace is zoned. For Zone Append command,
> > writes implicitly happen at the write pointer and the starting write
> > pointer value is returned as the result of the command. Write Zeroes
> > handler is modified to add zoned checks that are identical to those
> > done as a part of Write flow.
> >
> > Subsequent commits in this series add ZDE support and checks for
> > active and open zone limits.
> >
> > Signed-off-by: Niklas Cassel 
> > Signed-off-by: Hans Holmberg 
> > Signed-off-by: Ajay Joshi 
> > Signed-off-by: Chaitanya Kulkarni 
> > Signed-off-by: Matias Bjorling 
> > Signed-off-by: Aravind Ramesh 
> > Signed-off-by: Shin'ichiro Kawasaki 
> > Signed-off-by: Adam Manzanares 
> > Signed-off-by: Dmitry Fomichev 
> 
> (snip)
> 
> > @@ -2260,6 +3155,11 @@ static void nvme_select_ns_iocs(NvmeCtrl *n)
> >  ns->iocs = nvme_cse_iocs_nvm;
> >  }
> >  break;
> > +case NVME_CSI_ZONED:
> > +if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
> > +ns->iocs = nvme_cse_iocs_zoned;
> > +}
> > +break;
> >  }
> >  }
> >  }
> 
> Who knows how this whole command set mess is supposed to work,
> since e.g. the Key Value Command Set assigns opcodes for new commands
> (Delete, Exist) with a opcode values (0x10,0x14) smaller than the
> current highest opcode value (0x15) in the NVM Command Set,
> while those opcodes (0x10,0x14) are reserved in the NVM Command Set.
> 
> At least for Zoned Command Set, they defined the new commands
> (Zone Mgmt Send, Zone Mgmt Recv) to opcode values (0x79,0x7a)
> that are higher than the current highest opcode value in the
> NVM Command Set.
> 
> So since we know that the Zoned Command Set is a strict superset of
> the NVM Command Set, I guess it might be nice to do something like:
> 
> case NVME_CSI_ZONED:
> if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
> ns->iocs = nvme_cse_iocs_zoned;
> } else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
> ns->iocs = nvme_cse_iocs_nvm;
> }
> break;
> 
> 
> Since I assume that the spec people intended reads/writes
> to a ZNS namespace to still be possible when CC_CSS == NVM,
> but who knows?

Yes, I think it should be this way, thanks. Now it is matched with what
CSE log reports in this case.

> 
> 
> Kind regards,
> Niklas

RE: [PATCH v6 01/11] hw/block/nvme: Add Commands Supported and Effects log

2020-10-18 Thread Dmitry Fomichev

> -Original Message-
> From: Keith Busch 
> Sent: Tuesday, October 13, 2020 8:51 PM
> To: Dmitry Fomichev 
> Cc: Klaus Jensen ; Kevin Wolf
> ; Philippe Mathieu-Daudé ;
> Maxim Levitsky ; Fam Zheng ;
> Alistair Francis ; Matias Bjorling
> ; Niklas Cassel ;
> Damien Le Moal ; qemu-bl...@nongnu.org;
> qemu-devel@nongnu.org
> Subject: Re: [PATCH v6 01/11] hw/block/nvme: Add Commands Supported
> and Effects log
> 
> On Wed, Oct 14, 2020 at 06:42:02AM +0900, Dmitry Fomichev wrote:
> > +{
> > +NvmeEffectsLog log = {};
> > +uint32_t *dst_acs = log.acs, *dst_iocs = log.iocs;
> > +uint32_t trans_len;
> > +int i;
> > +
> > +trace_pci_nvme_cmd_supp_and_effects_log_read();
> > +
> > +if (off >= sizeof(log)) {
> > +trace_pci_nvme_err_invalid_effects_log_offset(off);
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +for (i = 0; i < 256; i++) {
> > +dst_acs[i] = nvme_cse_acs[i];
> > +}
> > +
> > +for (i = 0; i < 256; i++) {
> > +dst_iocs[i] = nvme_cse_iocs_nvm[i];
> > +}
> 
> You're just copying the array, so let's do it like this:
> 
> memcpy(log.acs, nvme_cse_acs, sizeof(nvme_cse_acs));
> memcpy(log.iocs, nvme_cse_iocs_nvm, sizeof(nvme_cse_iocs_nvm));
> 

Sure, this is a lot cleaner.

> I think you also need to check
> 
> if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY)
> 
> before copying iocs.

Yes, thanks.

Re: [PATCH v2] hw/riscv: microchip_pfsoc: IOSCBCTRL memmap entry

2020-10-18 Thread Bin Meng

Hi Ivan,

On Sat, Oct 17, 2020 at 1:10 AM Ivan Griffin  wrote:
>
> Adding the PolarFire SoC IOSCBCTRL memory region to prevent QEMU
> reporting a STORE/AMO Access Fault.
>
> This region is used by the PolarFire SoC port of U-Boot to
> interact with the FPGA system controller.
>
> Signed-off-by: Ivan Griffin 
> ---
>  hw/riscv/microchip_pfsoc.c | 10 ++
>  include/hw/riscv/microchip_pfsoc.h |  1 +
>  2 files changed, 11 insertions(+)
>
> diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
> index 4627179cd3..9aaa276ee2 100644
> --- a/hw/riscv/microchip_pfsoc.c
> +++ b/hw/riscv/microchip_pfsoc.c
> @@ -97,6 +97,7 @@ static const struct MemmapEntry {
>  [MICROCHIP_PFSOC_GPIO2] =   { 0x20122000, 0x1000 },
>  [MICROCHIP_PFSOC_ENVM_CFG] ={ 0x2020, 0x1000 },
>  [MICROCHIP_PFSOC_ENVM_DATA] =   { 0x2022,0x2 },
> +[MICROCHIP_PFSOC_IOSCB_CTRL] =  { 0x3702, 0x1000 },
>  [MICROCHIP_PFSOC_IOSCB_CFG] =   { 0x3708, 0x1000 },
>  [MICROCHIP_PFSOC_DRAM] ={ 0x8000,0x0 },
>  };
> @@ -341,6 +342,15 @@ static void microchip_pfsoc_soc_realize(DeviceState 
> *dev, Error **errp)
>  create_unimplemented_device("microchip.pfsoc.ioscb.cfg",
>  memmap[MICROCHIP_PFSOC_IOSCB_CFG].base,
>  memmap[MICROCHIP_PFSOC_IOSCB_CFG].size);
> +
> +/* IOSCBCTRL
> + *
> + * These registers are not documented in the official documentation
> + * but used by the polarfire-soc-bare-meta-library code
> + */
> +create_unimplemented_device("microchip.pfsoc.ioscb.ctrl",
> +memmap[MICROCHIP_PFSOC_IOSCB_CTRL].base,
> +memmap[MICROCHIP_PFSOC_IOSCB_CTRL].size);
>  }
>
>  static void microchip_pfsoc_soc_class_init(ObjectClass *oc, void *data)
> diff --git a/include/hw/riscv/microchip_pfsoc.h 
> b/include/hw/riscv/microchip_pfsoc.h
> index 8bfc7e1a85..3f1874b162 100644
> --- a/include/hw/riscv/microchip_pfsoc.h
> +++ b/include/hw/riscv/microchip_pfsoc.h
> @@ -95,6 +95,7 @@ enum {
>  MICROCHIP_PFSOC_ENVM_CFG,
>  MICROCHIP_PFSOC_ENVM_DATA,
>  MICROCHIP_PFSOC_IOSCB_CFG,
> +MICROCHIP_PFSOC_IOSCB_CTRL,
>  MICROCHIP_PFSOC_DRAM,
>  };

Thank you for the patch!

I am currently adding the DDR controller modeling support to PolarFire
SoC which will cover this memory map. With my patch series, your patch
is no longer needed.

Regards,
Bin

[PATCH v4 4/4] Jobs based on custom runners: add job definitions for QEMU's machines

2020-10-18 Thread Cleber Rosa

The QEMU project has two machines (aarch64 and s390) that can be used
for jobs that do build and run tests.  This introduces those jobs,
which are a mapping of custom scripts used for the same purpose.

Signed-off-by: Cleber Rosa 
---
 .gitlab-ci.d/custom-runners.yml | 192 
 1 file changed, 192 insertions(+)

diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
index 3004da2bda..5b51d1b336 100644
--- a/.gitlab-ci.d/custom-runners.yml
+++ b/.gitlab-ci.d/custom-runners.yml
@@ -12,3 +12,195 @@
 # strategy.
 variables:
   GIT_SUBMODULE_STRATEGY: recursive
+
+# All ubuntu-18.04 jobs should run successfully in an environment
+# setup by the scripts/ci/setup/build-environment.yml task
+# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
+ubuntu-18.04-s390x-all-linux-static:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ # --disable-libssh is needed because of 
https://bugs.launchpad.net/qemu/+bug/1838763
+ # --disable-glusterfs is needed because there's no static version of those 
libs in distro supplied packages
+ - mkdir build
+ - cd build
+ - ../configure --enable-debug --static --disable-system --disable-glusterfs 
--disable-libssh
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+ - make --output-sync -j`nproc` check-tcg V=1
+
+ubuntu-18.04-s390x-all:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-18.04-s390x-alldbg:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --enable-debug --disable-libssh
+ - make clean
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-18.04-s390x-clang:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --cc=clang --cxx=clang++ --enable-sanitizers
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-18.04-s390x-tci:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --enable-tcg-interpreter
+ - make --output-sync -j`nproc`
+
+ubuntu-18.04-s390x-notcg:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --disable-tcg
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+# All ubuntu-20.04 jobs should run successfully in an environment
+# setup by the scripts/ci/setup/qemu/build-environment.yml task
+# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
+ubuntu-20.04-aarch64-all-linux-static:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ # --disable-libssh is needed because of 
https://bugs.launchpad.net/qemu/+bug/1838763
+ # --disable-glusterfs is needed because there's no static version of those 
libs in distro supplied packages
+ - mkdir build
+ - cd build
+ - ../configure --enable-debug --static --disable-system --disable-glusterfs 
--disable-libssh
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+ - make --output-sync -j`nproc` check-tcg V=1
+
+ubuntu-20.04-aarch64-all:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-20.04-aarch64-alldbg:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --enable-debug --disable-libssh
+ - make clean
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-20.04-aarch64-clang:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --cc=clang --cxx=clang++ --enable-sanitizers
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-20.04-aarch64-tci:
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --enable-tcg-interpreter
+ - make --output-sync -j`nproc`
+

[PATCH v4 3/4] Jobs based on custom runners: docs and gitlab-runner setup playbook

2020-10-18 Thread Cleber Rosa

To have the jobs dispatched to custom runners, gitlab-runner must
be installed, active as a service and properly configured.  The
variables file and playbook introduced here should help with those
steps.

The playbook introduced here covers a number of different Linux
distributions and FreeBSD, and are intended to provide a reproducible
environment.

Signed-off-by: Cleber Rosa 
Reviewed-by: Daniel P. Berrangé 
---
 docs/devel/ci.rst  | 63 ++
 scripts/ci/setup/.gitignore|  1 +
 scripts/ci/setup/gitlab-runner.yml | 72 ++
 scripts/ci/setup/vars.yml.template | 13 ++
 4 files changed, 149 insertions(+)
 create mode 100644 scripts/ci/setup/.gitignore
 create mode 100644 scripts/ci/setup/gitlab-runner.yml
 create mode 100644 scripts/ci/setup/vars.yml.template

diff --git a/docs/devel/ci.rst b/docs/devel/ci.rst
index 208b5e399b..a234a5e24c 100644
--- a/docs/devel/ci.rst
+++ b/docs/devel/ci.rst
@@ -84,3 +84,66 @@ To run the playbook, execute::
 
   cd scripts/ci/setup
   ansible-playbook -i inventory build-environment.yml
+
+gitlab-runner setup and registration
+
+
+The gitlab-runner agent needs to be installed on each machine that
+will run jobs.  The association between a machine and a GitLab project
+happens with a registration token.  To find the registration token for
+your repository/project, navigate on GitLab's web UI to:
+
+ * Settings (the gears like icon), then
+ * CI/CD, then
+ * Runners, and click on the "Expand" button, then
+ * Under "Set up a specific Runner manually", look for the value under
+   "Use the following registration token during setup"
+
+Copy the ``scripts/ci/setup/vars.yml.template`` file to
+``scripts/ci/setup/vars.yml``.  Then, set the
+``gitlab_runner_registration_token`` variable to the value obtained
+earlier.
+
+.. note:: gitlab-runner is not available from the standard location
+  for all OS and architectures combinations.  For some systems,
+  a custom build may be necessary.  Some builds are avaiable
+  at https://cleber.fedorapeople.org/gitlab-runner/ and this
+  URI may be used as a value on ``vars.yml``
+
+To run the playbook, execute::
+
+  cd scripts/ci/setup
+  ansible-playbook -i inventory gitlab-runner.yml
+
+.. note:: there are currently limitations to gitlab-runner itself when
+  setting up a service under FreeBSD systems.  You will need to
+  perform steps 4 to 10 manually, as described at
+  https://docs.gitlab.com/runner/install/freebsd.html
+
+Following the registration, it's necessary to configure the runner tags,
+and optionally other configurations on the GitLab UI.  Navigate to:
+
+ * Settings (the gears like icon), then
+ * CI/CD, then
+ * Runners, and click on the "Expand" button, then
+ * "Runners activated for this project", then
+ * Click on the "Edit" icon (next to the "Lock" Icon)
+
+Under tags, add values matching the jobs a runner should run.  For a
+FreeBSD 12.1 x86_64 system, the tags should be set as::
+
+  freebsd12.1,x86_64
+
+Because the job definition at ``.gitlab-ci.d/custom-runners.yml``
+would contain::
+
+  freebsd-12.1-x86_64-all:
+   tags:
+   - freebsd_12.1
+   - x86_64
+
+It's also recommended to:
+
+ * increase the "Maximum job timeout" to something like ``2h``
+ * uncheck the "Run untagged jobs" check box
+ * give it a better Description
diff --git a/scripts/ci/setup/.gitignore b/scripts/ci/setup/.gitignore
new file mode 100644
index 00..f112d05dd0
--- /dev/null
+++ b/scripts/ci/setup/.gitignore
@@ -0,0 +1 @@
+vars.yml
\ No newline at end of file
diff --git a/scripts/ci/setup/gitlab-runner.yml 
b/scripts/ci/setup/gitlab-runner.yml
new file mode 100644
index 00..c2f52dad10
--- /dev/null
+++ b/scripts/ci/setup/gitlab-runner.yml
@@ -0,0 +1,72 @@
+---
+- name: Installation of gitlab-runner
+  hosts: all
+  vars_files:
+- vars.yml
+  tasks:
+- debug:
+msg: 'Checking for a valid GitLab registration token'
+  failed_when: "gitlab_runner_registration_token == 
'PLEASE_PROVIDE_A_VALID_TOKEN'"
+
+- name: Checks the availability of official gitlab-runner builds in the 
archive
+  uri:
+url: https://s3.amazonaws.com/gitlab-runner-downloads/v{{ 
gitlab_runner_version  }}/binaries/gitlab-runner-linux-386
+method: HEAD
+status_code:
+  - 200
+  - 403
+  register: gitlab_runner_available_archive
+
+- name: Update base url
+  set_fact:
+gitlab_runner_base_url: 
https://s3.amazonaws.com/gitlab-runner-downloads/v{{ gitlab_runner_version  
}}/binaries/gitlab-runner-
+  when: gitlab_runner_available_archive.status == 200
+- debug:
+msg: Base gitlab-runner url is {{ gitlab_runner_base_url  }}
+
+- name: Set OS name (FreeBSD)
+  set_fact:
+gitlab_runner_os: freebsd
+  when: "ansible_facts['system'] == 'FreeBSD'"
+
+- name: Create a group for

[PATCH v4 1/4] Jobs based on custom runners: documentation and configuration placeholder

2020-10-18 Thread Cleber Rosa

As described in the included documentation, the "custom runner" jobs
extend the GitLab CI jobs already in place.

Those jobs are intended to run on hardware and/or Operating Systems
not provided by GitLab's shared runners.

Signed-off-by: Cleber Rosa 
Reviewed-by: Daniel P. Berrangé 
---
 .gitlab-ci.d/custom-runners.yml | 14 +
 .gitlab-ci.yml  |  1 +
 docs/devel/ci.rst   | 54 +
 docs/devel/index.rst|  1 +
 4 files changed, 70 insertions(+)
 create mode 100644 .gitlab-ci.d/custom-runners.yml
 create mode 100644 docs/devel/ci.rst

diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
new file mode 100644
index 00..3004da2bda
--- /dev/null
+++ b/.gitlab-ci.d/custom-runners.yml
@@ -0,0 +1,14 @@
+# The CI jobs defined here require GitLab runners installed and
+# registered on machines that match their operating system names,
+# versions and architectures.  This is in contrast to the other CI
+# jobs that are intended to run on GitLab's "shared" runners.
+
+# Different than the default approach on "shared" runners, based on
+# containers, the custom runners have no such *requirement*, as those
+# jobs should be capable of running on operating systems with no
+# compatible container implementation, or no support from
+# gitlab-runner.  To avoid problems that gitlab-runner can cause while
+# reusing the GIT repository, let's enable the recursive submodule
+# strategy.
+variables:
+  GIT_SUBMODULE_STRATEGY: recursive
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 8ffd415ca5..b33c433fd7 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -18,6 +18,7 @@ include:
   - local: '/.gitlab-ci.d/opensbi.yml'
   - local: '/.gitlab-ci.d/containers.yml'
   - local: '/.gitlab-ci.d/crossbuilds.yml'
+  - local: '/.gitlab-ci.d/custom-runners.yml'
 
 .native_build_job_template: _build_job_definition
   stage: build
diff --git a/docs/devel/ci.rst b/docs/devel/ci.rst
new file mode 100644
index 00..41a4bbddad
--- /dev/null
+++ b/docs/devel/ci.rst
@@ -0,0 +1,54 @@
+==
+CI
+==
+
+QEMU has configurations enabled for a number of different CI services.
+The most up to date information about them and their status can be
+found at::
+
+   https://wiki.qemu.org/Testing/CI
+
+Jobs on Custom Runners
+==
+
+Besides the jobs run under the various CI systems listed before, there
+are a number additional jobs that will run before an actual merge.
+These use the same GitLab CI's service/framework already used for all
+other GitLab based CI jobs, but rely on additional systems, not the
+ones provided by GitLab as "shared runners".
+
+The architecture of GitLab's CI service allows different machines to
+be set up with GitLab's "agent", called gitlab-runner, which will take
+care of running jobs created by events such as a push to a branch.
+Here, the combination of a machine, properly configured with GitLab's
+gitlab-runner, is called a "custom runner" here.
+
+The GitLab CI jobs definition for the custom runners are located under::
+
+  .gitlab-ci.d/custom-runners.yml
+
+Current Jobs
+
+
+The current CI jobs based on custom runners have the primary goal of
+catching and preventing regressions on a wider number of host systems
+than the ones provided by GitLab's shared runners.
+
+Also, the mechanics of reliability, capacity and overall maintanance
+of the machines provided by the QEMU project itself for those jobs
+will be evaluated.
+
+Future Plans and Jobs
+-
+
+Once the CI Jobs based on custom runners have been proved mature with
+the initial set of jobs run on machines from the QEMU project, other
+members in the community should be able provide their own machine
+configuration documentation/scripts, and accompanying job definitions.
+
+As a general rule, those newly added contributed jobs should run as
+"non-gating", until their reliability is verified.
+
+The precise minimum requirements and exact rules for machine
+configuration documentation/scripts, and the success rate of jobs are
+still to be defined.
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 77baae5c77..2fdd36e751 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -21,6 +21,7 @@ Contents:
atomics
stable-process
testing
+   ci
qtest
decodetree
secure-coding-practices
-- 
2.25.4

Re: [PATCH v3 2/4] Jobs based on custom runners: build environment docs and playbook

2020-10-18 Thread Cleber Rosa

On Thu, Oct 15, 2020 at 09:29:40AM +0100, Daniel P. Berrangé wrote:
> On Wed, Oct 14, 2020 at 03:19:47PM -0400, Cleber Rosa wrote:
> > On Wed, Oct 14, 2020 at 02:59:58PM -0400, Cleber Rosa wrote:
> > > On Wed, Oct 14, 2020 at 06:30:09PM +0100, Daniel P. Berrangé wrote:
> > > > 
> > > > This needs updating to add meson, and with Paolo's series today you
> > > > might as well go ahead and add ninja-build immediately too
> > > >
> > 
> > I replied too quickly, but allow me to get this right: meson is *not*
> > included in the dockerfiles (and other similar configurations), and
> > all setups I found rely on the submodule.  Are suggesting to add meson
> > and diverge from the dockerfiles?
> 
> Doh, right, I forgot that we use the submodule for now, since we need
> such a new meson. So ignore this...
> 
> > > > https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg04025.html
> > > >
> > 
> > ^ I'll add meson according to this, of course.
> 
> Just ninja is needed
>

Right, I meant ninja there!

Thanks,
- Cleber.


signature.asc
Description: PGP signature

[PATCH v4 2/4] Jobs based on custom runners: build environment docs and playbook

2020-10-18 Thread Cleber Rosa

To run basic jobs on custom runners, the environment needs to be
properly set up.  The most common requirement is having the right
packages installed.

The playbook introduced here covers a number of different Linux
distributions and FreeBSD, and are intended to provide a reproducible
environment.

Signed-off-by: Cleber Rosa 
---
 docs/devel/ci.rst  |  32 
 scripts/ci/setup/build-environment.yml | 233 +
 scripts/ci/setup/inventory |   2 +
 3 files changed, 267 insertions(+)
 create mode 100644 scripts/ci/setup/build-environment.yml
 create mode 100644 scripts/ci/setup/inventory

diff --git a/docs/devel/ci.rst b/docs/devel/ci.rst
index 41a4bbddad..208b5e399b 100644
--- a/docs/devel/ci.rst
+++ b/docs/devel/ci.rst
@@ -52,3 +52,35 @@ As a general rule, those newly added contributed jobs should 
run as
 The precise minimum requirements and exact rules for machine
 configuration documentation/scripts, and the success rate of jobs are
 still to be defined.
+
+Machine Setup Howto
+---
+
+For all Linux based systems, the setup can be mostly automated by the
+execution of two Ansible playbooks.  Start by adding your machines to
+the ``inventory`` file under ``scripts/ci/setup``, such as this::
+
+  [local]
+  fully.qualified.domain
+  other.machine.hostname
+
+You may need to set some variables in the inventory file itself.  One
+very common need is to tell Ansible to use a Python 3 interpreter on
+those hosts.  This would look like::
+
+  [local]
+  fully.qualified.domain ansible_python_interpreter=/usr/bin/python3
+  other.machine.hostname ansible_python_interpreter=/usr/bin/python3
+
+Build environment
+~
+
+The ``scripts/ci/setup/build-environment.yml`` Ansible playbook will
+set up machines with the environment needed to perform builds and run
+QEMU tests.  It covers a number of different Linux distributions and
+FreeBSD.
+
+To run the playbook, execute::
+
+  cd scripts/ci/setup
+  ansible-playbook -i inventory build-environment.yml
diff --git a/scripts/ci/setup/build-environment.yml 
b/scripts/ci/setup/build-environment.yml
new file mode 100644
index 00..074aaca927
--- /dev/null
+++ b/scripts/ci/setup/build-environment.yml
@@ -0,0 +1,233 @@
+---
+- name: Installation of basic packages to build QEMU
+  hosts: all
+  tasks:
+- name: Install basic packages to build QEMU on Ubuntu 18.04/20.04
+  apt:
+update_cache: yes
+# Originally from tests/docker/dockerfiles/ubuntu1804.docker
+pkg:
+  - ccache
+  - clang
+  - gcc
+  - gettext
+  - git
+  - glusterfs-common
+  - libaio-dev
+  - libattr1-dev
+  - libbrlapi-dev
+  - libbz2-dev
+  - libcacard-dev
+  - libcap-ng-dev
+  - libcurl4-gnutls-dev
+  - libdrm-dev
+  - libepoxy-dev
+  - libfdt-dev
+  - libgbm-dev
+  - libgtk-3-dev
+  - libibverbs-dev
+  - libiscsi-dev
+  - libjemalloc-dev
+  - libjpeg-turbo8-dev
+  - liblzo2-dev
+  - libncurses5-dev
+  - libncursesw5-dev
+  - libnfs-dev
+  - libnss3-dev
+  - libnuma-dev
+  - libpixman-1-dev
+  - librados-dev
+  - librbd-dev
+  - librdmacm-dev
+  - libsasl2-dev
+  - libsdl2-dev
+  - libseccomp-dev
+  - libsnappy-dev
+  - libspice-protocol-dev
+  - libssh-dev
+  - libusb-1.0-0-dev
+  - libusbredirhost-dev
+  - libvdeplug-dev
+  - libvte-2.91-dev
+  - libzstd-dev
+  - make
+  - ninja-build
+  - python3-yaml
+  - python3-sphinx
+  - sparse
+  - xfslibs-dev
+state: present
+  when: "ansible_facts['distribution'] == 'Ubuntu'"
+
+- name: Install packages to build QEMU on Ubuntu 18.04/20.04 on non-s390x
+  apt:
+update_cache: yes
+pkg:
+ - libspice-server-dev
+ - libxen-dev
+state: present
+  when:
+- "ansible_facts['distribution'] == 'Ubuntu'"
+- "ansible_facts['architecture'] != 's390x'"
+
+- name: Install basic packages to build QEMU on FreeBSD 12.x
+  pkgng:
+# Originally from packages on .cirrus.yml under the freebsd_12_task
+name: 
bash,curl,cyrus-sasl,git,glib,gmake,gnutls,gsed,nettle,ninja,perl5,pixman,pkgconf,png,usbredir
+state: present
+  when: "ansible_facts['os_family'] == 'FreeBSD'"
+
+- name: Install basic packages to build QEMU on Fedora 30, 31 and 32
+  dnf:
+# Originally from tests/docker/dockerfiles/fedora.docker
+name:
+  - SDL2-devel
+  - bc
+  - brlapi-devel
+  - bzip2
+  - bzip2-devel
+  - ccache
+  - clang
+  - cyrus-sasl-devel
+  - dbus-daemon
+  -

[PATCH v4 0/4] GitLab Custom Runners and Jobs (was: QEMU Gating CI)

2020-10-18 Thread Cleber Rosa

TL;DR: this should allow the QEMU maintainer to push to the staging
branch, and have custom jobs running on the project's aarch64 and
s390x machines.  Simple usage looks like:

   git push remote staging
   ./scripts/ci/gitlab-pipeline-status --verbose --wait

Long version:

The idea about a public facing Gating CI for QEMU was summarized in an
RFC[1].  Since then, it was decided that a simpler version should be
attempted first.

At this point, there are two specific runners (an aarch64 and an s390)
registered with GitLab, at https://gitlab.com/qemu-project, currently
setup to the "qemu" repository.

Changes from v3:

- Applied changes to match <20201014135416.1290679-1-pbonz...@redhat.com>,
  that is, added ninja-build to "build-environment.yml" list of packages
  and enabled PowerTools repository on CentOS 8.

Changes from v2:

- The overall idea of "Gating CI" has been re-worded "custom runners",
  given that the other jobs running on shared runners are also
  considered gating (Daniel)

- Fixed wording and typos on the documentation, including:
 * update -> up to date (Erik)
 * a different set of CI jobs -> different CI jobs (Erik)
 * Pull requests will only be merged -> code will only be merged (Stefan)
 * Setup -> set up (Stefan)
 * them -> they (Stefan)
 * the -> where the (Stefan)
 * dropped "in the near future" (Stefan)

- Changed comment on "build-environment.yml" regarding the origin of
  the package list (Stefan)

- Removed inclusion of "vars.yml" from "build-environment.yml", given that
  no external variable is used there

- Updated package list in "build-environment.yml" from current
  dockerfiles

- Tested "build-environment" on Fedora 31 and 32 (in addition to Fedora 30),
  and noted that it's possible to use it on those distros

- Moved CI documentation from "testing.rst" to its own file (Phillipe)

- Split "GitLab Gating CI: initial set of jobs, documentation and scripts"
  into (Phillipe):
  1) Basic documentation and configuration (gitlab-ci.yml) placeholder
  2) Playbooks for setting up a build environment
  3) Playbooks for setting up gitlab-runner
  4) Actual GitLab CI jobs configuration

- Set custom jobs to be on the "build" stage, given that they combine
  build and test.

- Set custom jobs to not depend on any other job, so they can start
  right away.

- Set rules for starting jobs so that all pushing to any branch that
  start with name "staging".  This allows the project maintainer to
  use the "push to staging" workflow, while also allowing others to
  generate similar jobs.  If this project has configured custom
  runners, the jobs will run, if not, the pipeline will be marked as
  "stuck".

- Changed "scripts" on custom jobs to follow the now common pattern
  (on other jobs) of creating a "build" directory.

Changes from v1:

- Added jobs that require specific GitLab runners already available
  (Ubuntu 20.04 on aarch64, and Ubuntu 18.04 on s390x)
- Removed jobs that require specific GitLab runners not yet available
  (Fedora 30, FreeBSD 12.1)
- Updated documentation
- Added copyright and license to new scripts
- Moved script to from "contrib" to "scripts/ci/"
- Moved setup playbooks form "contrib" to "scripts/ci/setup"
- Moved "gating.yml" to ".gitlab-ci.d" directory
- Removed "staging" only branch restriction on jobs defined in
  ".gitlab-ci.yml", assumes that the additional jobs on the staging
  branch running on the freely available gitlab shared runner are
  positive
- Dropped patches 1-3 (already merged)
- Simplified amount of version specifity on Ubuntu, from 18.04.3 to
  simply 18.04 (assumes no diverse minor levels will be used or
  specific runners)

Changes from the RFC patches[2] accompanying the RFC document:

- Moved gating job definitions to .gitlab-ci-gating.yml
- Added info on "--disable-libssh" build option requirement
  (https://bugs.launchpad.net/qemu/+bug/1838763) to Ubuntu 18.04 jobs
- Added info on "--disable-glusterfs" build option requirement
  (there's no static version of those libs in distro supplied
  packages) to one
- Dropped ubuntu-18.04.3-x86_64-notools job definition, because it
  doesn't fall into the general scope of gating job described by PMM
  (and it did not run any test)
- Added w32 and w64 cross builds based on Fedora 30
- Added a FreeBSD based job that builds all targets and runs `make
  check`
- Added "-j`nproc`" and "-j`sysctl -n hw.ncpu`" options to make as a
  simple but effective way of speeding up the builds and tests by
  using a number of make jobs matching the number of CPUs
- Because the Ansible playbooks reference the content on Dockerfiles,
  some fixes to some Dockerfiles caught in the process were included
- New patch with script to check or wait on a pipeline execution

[1] - https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html
[2] - https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg00154.html

Cleber Rosa (4):
  Jobs based on custom runners: documentation and configuration
placeholder

[PATCH v2 9/9] block: check availablity for preadv/pwritev on mac

2020-10-18 Thread Joelle van Dyne

From: osy 

macOS 11/iOS 14 added preadv/pwritev APIs. Due to weak linking, configure
will succeed with CONFIG_PREADV even when targeting a lower OS version. We
therefore need to check at run time if we can actually use these APIs.

Signed-off-by: Joelle van Dyne 
---
 block/file-posix.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index cdc73b5f1d..d7482036a3 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1393,12 +1393,24 @@ static bool preadv_present = true;
 static ssize_t
 qemu_preadv(int fd, const struct iovec *iov, int nr_iov, off_t offset)
 {
+#ifdef CONFIG_DARWIN /* preadv introduced in macOS 11 */
+if (!__builtin_available(macOS 11, iOS 14, watchOS 7, tvOS 14, *)) {
+preadv_present = false;
+return -ENOSYS;
+} else
+#endif
 return preadv(fd, iov, nr_iov, offset);
 }
 
 static ssize_t
 qemu_pwritev(int fd, const struct iovec *iov, int nr_iov, off_t offset)
 {
+#ifdef CONFIG_DARWIN /* pwritev introduced in macOS 11 */
+if (!__builtin_available(macOS 11, iOS 14, watchOS 7, tvOS 14, *)) {
+preadv_present = false;
+return -ENOSYS;
+} else
+#endif
 return pwritev(fd, iov, nr_iov, offset);
 }
 
-- 
2.24.3 (Apple Git-128)

[PATCH v2 6/9] tcg: implement mirror mapped JIT for iOS

2020-10-18 Thread Joelle van Dyne

From: osy 

On iOS, we cannot allocate RWX pages without special entitlements. As a
workaround, we can allocate a RX region and then mirror map it to a separate
RX region. Then we can write to one region and execute from the other one.

We also define `tcg_mirror_ptr_rw` and `tcg_code_ptr_rw` to return a pointer
to RW memory. The difference between the RW and RX regions is stored in the
TCG context.

To ensure cache coherency, we flush the data cache in the RW mapping and
then invalidate the instruction cache in the RX mapping (where applicable).
Because data cache flush is OS defined on some architectures, we do not
provide implementations for non iOS platforms (ARM/x86).

Signed-off-by: Joelle van Dyne 
---
 docs/devel/ios.rst   | 40 +++
 configure|  1 +
 include/exec/exec-all.h  |  8 
 include/tcg/tcg.h| 17 
 tcg/aarch64/tcg-target.h | 13 +-
 tcg/arm/tcg-target.h |  9 -
 tcg/i386/tcg-target.h| 24 ++-
 tcg/mips/tcg-target.h|  8 +++-
 tcg/ppc/tcg-target.h |  8 +++-
 tcg/riscv/tcg-target.h   |  9 -
 tcg/s390/tcg-target.h| 13 +-
 tcg/sparc/tcg-target.h   |  8 +++-
 tcg/tci/tcg-target.h |  9 -
 accel/tcg/cpu-exec.c |  7 +++-
 accel/tcg/translate-all.c| 77 ++--
 tcg/tcg.c| 56 +-
 tcg/aarch64/tcg-target.c.inc | 33 ++--
 tcg/arm/tcg-target.c.inc | 25 ++--
 tcg/i386/tcg-target.c.inc| 18 -
 tcg/mips/tcg-target.c.inc| 35 +---
 tcg/ppc/tcg-target.c.inc | 38 +++---
 tcg/riscv/tcg-target.c.inc   | 40 +++
 tcg/s390/tcg-target.c.inc| 16 
 tcg/sparc/tcg-target.c.inc   | 23 +++
 tcg/tcg-pool.c.inc   |  9 +++--
 tcg/tci/tcg-target.c.inc |  6 +--
 26 files changed, 416 insertions(+), 134 deletions(-)
 create mode 100644 docs/devel/ios.rst

diff --git a/docs/devel/ios.rst b/docs/devel/ios.rst
new file mode 100644
index 00..dba9fdd868
--- /dev/null
+++ b/docs/devel/ios.rst
@@ -0,0 +1,40 @@
+===
+iOS Support
+===
+
+To run qemu on the iOS platform, some modifications were required. Most of the
+modifications are conditioned on the ``CONFIG_IOS`` and ``CONFIG_IOS_JIT``
+configuration variables.
+
+Build support
+-
+
+For the code to compile, certain changes in the block driver and the slirp
+driver had to be made. There is no ``system()`` call, so code requiring it had
+to be disabled.
+
+``ucontext`` support is broken on iOS. The implementation from ``libucontext``
+is used instead.
+
+Because ``fork()`` is not allowed on iOS apps, the option to build qemu and the
+utilities as shared libraries is added. Note that because qemu does not perform
+resource cleanup in most cases (open files, allocated memory, etc), it is
+advisable that the user implements a proxy layer for syscalls so resources can
+be kept track by the app that uses qemu as a shared library.
+
+JIT support
+---
+
+On iOS, allocating RWX pages require special entitlements not usually granted 
to
+apps. However, it is possible to use `bulletproof JIT`_ with a development
+certificate. This means that we need to allocate one chunk of memory with RX
+permissions and then mirror map the same memory with RW permissions. We 
generate
+code to the mirror mapping and execute the original mapping.
+
+With ``CONFIG_IOS_JIT`` defined, we store inside the TCG context the difference
+between the two mappings. Then, we make sure that any writes to JIT memory is
+done to the pointer + the difference (in order to get a pointer to the mirror
+mapped space). Additionally, we make sure to flush the data cache before we
+invalidate the instruction cache so the changes are seen in both mappings.
+
+.. _bulletproof JIT: 
https://www.blackhat.com/docs/us-16/materials/us-16-Krstic.pdf
diff --git a/configure b/configure
index 0b7e25e7a5..93d6fd5ce2 100755
--- a/configure
+++ b/configure
@@ -6214,6 +6214,7 @@ fi
 
 if test "$ios" = "yes" ; then
   echo "CONFIG_IOS=y" >> $config_host_mak
+  echo "CONFIG_IOS_JIT=y" >> $config_host_mak
 fi
 
 if test "$solaris" = "yes" ; then
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 66f9b4cca6..2db155a772 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -483,6 +483,14 @@ struct TranslationBlock {
 uintptr_t jmp_list_head;
 uintptr_t jmp_list_next[2];
 uintptr_t jmp_dest[2];
+
+#if defined(CONFIG_IOS_JIT)
+/*
+ * Store difference to writable mirror
+ * We need this when patching the jump instructions
+ */
+ptrdiff_t code_rw_mirror_diff;
+#endif
 };
 
 extern bool parallel_cpus;
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 79c5ff8dab..ade01d2e41 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -627,6 +627,9 @@ struct TCGContext {
 size_t

[PATCH v2 5/9] tcg: add const hints for code pointers

2020-10-18 Thread Joelle van Dyne

From: osy 

We will introduce mirror mapping for JIT segment with separate RX and RW
access. Adding 'const' hints will make it easier to identify read-only
accesses and allow us to easier catch bugs at compile time in the future.

Signed-off-by: Joelle van Dyne 
---
 include/tcg/tcg.h|  8 
 tcg/tcg.c|  4 ++--
 tcg/aarch64/tcg-target.c.inc | 19 +++
 tcg/arm/tcg-target.c.inc | 12 +++-
 tcg/i386/tcg-target.c.inc| 10 +-
 tcg/mips/tcg-target.c.inc| 33 +++--
 tcg/ppc/tcg-target.c.inc | 21 +
 tcg/riscv/tcg-target.c.inc   | 11 ++-
 tcg/s390/tcg-target.c.inc|  9 +
 tcg/sparc/tcg-target.c.inc   | 10 +-
 tcg/tcg-ldst.c.inc   |  2 +-
 tcg/tci/tcg-target.c.inc |  2 +-
 12 files changed, 79 insertions(+), 62 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 8804a8c4a2..79c5ff8dab 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -261,7 +261,7 @@ struct TCGLabel {
 unsigned refs : 16;
 union {
 uintptr_t value;
-tcg_insn_unit *value_ptr;
+const tcg_insn_unit *value_ptr;
 } u;
 QSIMPLEQ_HEAD(, TCGRelocation) relocs;
 QSIMPLEQ_ENTRY(TCGLabel) next;
@@ -593,7 +593,7 @@ struct TCGContext {
 int nb_ops;
 
 /* goto_tb support */
-tcg_insn_unit *code_buf;
+const tcg_insn_unit *code_buf;
 uint16_t *tb_jmp_reset_offset; /* tb->jmp_reset_offset */
 uintptr_t *tb_jmp_insn_offset; /* tb->jmp_target_arg if direct_jump */
 uintptr_t *tb_jmp_target_addr; /* tb->jmp_target_arg if !direct_jump */
@@ -1099,7 +1099,7 @@ static inline TCGLabel *arg_label(TCGArg i)
  * correct result.
  */
 
-static inline ptrdiff_t tcg_ptr_byte_diff(void *a, void *b)
+static inline ptrdiff_t tcg_ptr_byte_diff(const void *a, const void *b)
 {
 return a - b;
 }
@@ -1113,7 +1113,7 @@ static inline ptrdiff_t tcg_ptr_byte_diff(void *a, void 
*b)
  * to the destination address.
  */
 
-static inline ptrdiff_t tcg_pcrel_diff(TCGContext *s, void *target)
+static inline ptrdiff_t tcg_pcrel_diff(TCGContext *s, const void *target)
 {
 return tcg_ptr_byte_diff(target, s->code_ptr);
 }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index a8c28440e2..bb890c506d 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -148,7 +148,7 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg 
arg, TCGReg arg1,
intptr_t arg2);
 static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
 TCGReg base, intptr_t ofs);
-static void tcg_out_call(TCGContext *s, tcg_insn_unit *target);
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *target);
 static int tcg_target_const_match(tcg_target_long val, TCGType type,
   const TCGArgConstraint *arg_ct);
 #ifdef TCG_TARGET_NEED_LDST_LABELS
@@ -295,7 +295,7 @@ static void tcg_out_reloc(TCGContext *s, tcg_insn_unit 
*code_ptr, int type,
 QSIMPLEQ_INSERT_TAIL(>relocs, r, next);
 }
 
-static void tcg_out_label(TCGContext *s, TCGLabel *l, tcg_insn_unit *ptr)
+static void tcg_out_label(TCGContext *s, TCGLabel *l, const tcg_insn_unit *ptr)
 {
 tcg_debug_assert(!l->has_value);
 l->has_value = 1;
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 26f71cb599..1aa5f37fc6 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -78,7 +78,8 @@ static const int tcg_target_call_oarg_regs[1] = {
 #define TCG_REG_GUEST_BASE TCG_REG_X28
 #endif
 
-static inline bool reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc26(tcg_insn_unit *code_ptr,
+  const tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - code_ptr;
 if (offset == sextract64(offset, 0, 26)) {
@@ -90,7 +91,8 @@ static inline bool reloc_pc26(tcg_insn_unit *code_ptr, 
tcg_insn_unit *target)
 return false;
 }
 
-static inline bool reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc19(tcg_insn_unit *code_ptr,
+  const tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - code_ptr;
 if (offset == sextract64(offset, 0, 19)) {
@@ -1306,14 +1308,14 @@ static void tcg_out_cmp(TCGContext *s, TCGType ext, 
TCGReg a,
 }
 }
 
-static inline void tcg_out_goto(TCGContext *s, tcg_insn_unit *target)
+static inline void tcg_out_goto(TCGContext *s, const tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - s->code_ptr;
 tcg_debug_assert(offset == sextract64(offset, 0, 26));
 tcg_out_insn(s, 3206, B, offset);
 }
 
-static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target)
+static inline void tcg_out_goto_long(TCGContext *s, const tcg_insn_unit 
*target)
 {
 ptrdiff_t offset = target - s->code_ptr;
 if (offset == sextract64(offset, 0, 26)) {
@@ -1329,7 +1331,7 @@ static inline void tcg_out_callr(TCGContext

[PATCH v2 8/9] tcg: support JIT on Apple Silicon

2020-10-18 Thread Joelle van Dyne

From: osy 

https://developer.apple.com/documentation/apple_silicon/porting_just-in-time_compilers_to_apple_silicon

For < iOS 14, reverse engineered functions from libsystem_pthread.dylib is
implemented to handle APRR supported SoCs.

The following rules apply for JIT write protect:
  * JIT write-protect is enabled before tcg_qemu_tb_exec()
  * JIT write-protect is disabled after tcg_qemu_tb_exec() returns
  * JIT write-protect is disabled inside do_tb_phys_invalidate() but if it
is called inside of tcg_qemu_tb_exec() then write-protect will be
enabled again before returning.
  * JIT write-protect is disabled by cpu_loop_exit() for interrupt handling.
  * JIT write-protect is disabled everywhere else.

Signed-off-by: Joelle van Dyne 
---
 configure   | 20 +
 include/exec/exec-all.h |  2 +
 include/tcg/tcg-apple-jit.h | 85 +
 include/tcg/tcg.h   |  3 ++
 accel/tcg/cpu-exec-common.c |  2 +
 accel/tcg/cpu-exec.c|  2 +
 accel/tcg/translate-all.c   | 51 ++
 tcg/tcg.c   |  4 ++
 8 files changed, 169 insertions(+)
 create mode 100644 include/tcg/tcg-apple-jit.h

diff --git a/configure b/configure
index 93d6fd5ce2..2221c276f4 100755
--- a/configure
+++ b/configure
@@ -5868,6 +5868,22 @@ but not implemented on your system"
 fi
 fi
 
+##
+# check for Apple Silicon JIT function
+
+if [ "$darwin" = "yes" ] ; then
+  cat > $TMPC << EOF
+#include 
+int main() { pthread_jit_write_protect_np(0); return 0; }
+EOF
+  if ! compile_prog ""; then
+have_pthread_jit_protect='no'
+  else
+have_pthread_jit_protect='yes'
+  fi
+fi
+
+
 ##
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -6988,6 +7004,10 @@ if test "$secret_keyring" = "yes" ; then
   echo "CONFIG_SECRET_KEYRING=y" >> $config_host_mak
 fi
 
+if test "$have_pthread_jit_protect" = "yes" ; then
+  echo "HAVE_PTHREAD_JIT_PROTECT=y" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote ${source_path}/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 2db155a772..253af30a2e 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -521,6 +521,8 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
target_ulong cs_base, uint32_t flags,
uint32_t cf_mask);
 void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr);
+void tb_exec_lock(void);
+void tb_exec_unlock(void);
 
 /* GETPC is the true target of the return instruction that we'll execute.  */
 #if defined(CONFIG_TCG_INTERPRETER)
diff --git a/include/tcg/tcg-apple-jit.h b/include/tcg/tcg-apple-jit.h
new file mode 100644
index 00..1e70bf3afe
--- /dev/null
+++ b/include/tcg/tcg-apple-jit.h
@@ -0,0 +1,85 @@
+/*
+ * Apple Silicon APRR functions for JIT handling
+ *
+ * Copyright (c) 2020 osy
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+/*
+ * Credits to: https://siguza.github.io/APRR/
+ * Reversed from /usr/lib/system/libsystem_pthread.dylib
+ */
+
+#ifndef TCG_APPLE_JIT_H
+#define TCG_APPLE_JIT_H
+
+#if defined(__aarch64__) && defined(CONFIG_DARWIN)
+
+#define _COMM_PAGE_START_ADDRESS(0x000FC000ULL) /* In TTBR0 */
+#define _COMM_PAGE_APRR_SUPPORT (_COMM_PAGE_START_ADDRESS + 0x10C)
+#define _COMM_PAGE_APPR_WRITE_ENABLE(_COMM_PAGE_START_ADDRESS + 0x110)
+#define _COMM_PAGE_APRR_WRITE_DISABLE   (_COMM_PAGE_START_ADDRESS + 0x118)
+
+static __attribute__((__always_inline__)) bool 
jit_write_protect_supported(void)
+{
+/* Access shared kernel page at fixed memory location. */
+uint8_t aprr_support = *(volatile uint8_t *)_COMM_PAGE_APRR_SUPPORT;
+return aprr_support > 0;
+}
+
+/* write protect enable = write disable */
+static __attribute__((__always_inline__)) void jit_write_protect(int enabled)
+{
+/* Access shared kernel page at fixed memory location. */
+uint8_t aprr_support = *(volatile uint8_t *)_COMM_PAGE_APRR_SUPPORT;
+if (aprr_support == 0 || aprr_support > 3) {
+return;
+} else if (aprr_support == 1) {
+__asm__ __volatile__ (
+"mov x0, %0\n"
+

[PATCH v2 2/9] configure: cross-compiling without cross_prefix

2020-10-18 Thread Joelle van Dyne

From: osy 

The iOS toolchain does not use the host prefix naming convention. We add a
new option `--enable-cross-compile` that forces cross-compile even without
a cross_prefix.

Signed-off-by: Joelle van Dyne 
---
 configure | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 3c63879750..46d5db63e8 100755
--- a/configure
+++ b/configure
@@ -234,6 +234,7 @@ cpu=""
 iasl="iasl"
 interp_prefix="/usr/gnemul/qemu-%M"
 static="no"
+cross_compile="no"
 cross_prefix=""
 audio_drv_list=""
 block_drv_rw_whitelist=""
@@ -456,6 +457,11 @@ for opt do
   optarg=$(expr "x$opt" : 'x[^=]*=\(.*\)')
   case "$opt" in
   --cross-prefix=*) cross_prefix="$optarg"
+cross_compile="yes"
+  ;;
+  --enable-cross-compile) cross_compile="yes"
+  ;;
+  --disable-cross-compile) cross_compile="no"
   ;;
   --cc=*) CC="$optarg"
   ;;
@@ -878,6 +884,10 @@ for opt do
   ;;
   --cross-prefix=*)
   ;;
+  --enable-cross-compile)
+  ;;
+  --disable-cross-compile)
+  ;;
   --cc=*)
   ;;
   --host-cc=*) host_cc="$optarg"
@@ -1687,6 +1697,7 @@ Advanced options (experts only):
   --efi-aarch64=PATH   PATH of efi file to use for aarch64 VMs.
   --with-suffix=SUFFIX suffix for QEMU data inside 
datadir/libdir/sysconfdir/docdir [$qemu_suffix]
   --with-pkgversion=VERS   use specified string as sub-version of the package
+  --enable-cross-compile   enable cross compiling (set automatically if 
$cross_prefix is set)
   --enable-debug   enable common debug build options
   --enable-sanitizers  enable default sanitizers
   --enable-tsanenable thread sanitizer
@@ -7164,7 +7175,7 @@ if has $sdl2_config; then
 fi
 echo "strip = [$(meson_quote $strip)]" >> $cross
 echo "windres = [$(meson_quote $windres)]" >> $cross
-if test -n "$cross_prefix"; then
+if test "$cross_compile" = "yes"; then
 cross_arg="--cross-file config-meson.cross"
 echo "[host_machine]" >> $cross
 if test "$mingw32" = "yes" ; then
-- 
2.24.3 (Apple Git-128)

[PATCH v2 7/9] tcg: mirror mapping RWX pages for iOS optional

2020-10-18 Thread Joelle van Dyne

From: osy 

This allows jailbroken devices with entitlements to switch the option off.

Signed-off-by: Joelle van Dyne 
---
 include/sysemu/tcg.h  |  2 +-
 accel/tcg/tcg-all.c   | 27 +-
 accel/tcg/translate-all.c | 60 +--
 bsd-user/main.c   |  2 +-
 linux-user/main.c |  2 +-
 qemu-options.hx   | 11 +++
 6 files changed, 79 insertions(+), 25 deletions(-)

diff --git a/include/sysemu/tcg.h b/include/sysemu/tcg.h
index d9d3ca8559..569f90b11d 100644
--- a/include/sysemu/tcg.h
+++ b/include/sysemu/tcg.h
@@ -8,7 +8,7 @@
 #ifndef SYSEMU_TCG_H
 #define SYSEMU_TCG_H
 
-void tcg_exec_init(unsigned long tb_size);
+void tcg_exec_init(unsigned long tb_size, bool mirror_rwx);
 #ifdef CONFIG_TCG
 extern bool tcg_allowed;
 #define tcg_enabled() (tcg_allowed)
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index fa1208158f..5845744396 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -39,6 +39,7 @@ struct TCGState {
 
 bool mttcg_enabled;
 unsigned long tb_size;
+bool mirror_rwx;
 };
 typedef struct TCGState TCGState;
 
@@ -94,6 +95,7 @@ static void tcg_accel_instance_init(Object *obj)
 TCGState *s = TCG_STATE(obj);
 
 s->mttcg_enabled = default_mttcg_enabled();
+s->mirror_rwx = false;
 }
 
 bool mttcg_enabled;
@@ -102,7 +104,7 @@ static int tcg_init(MachineState *ms)
 {
 TCGState *s = TCG_STATE(current_accel());
 
-tcg_exec_init(s->tb_size * 1024 * 1024);
+tcg_exec_init(s->tb_size * 1024 * 1024, s->mirror_rwx);
 mttcg_enabled = s->mttcg_enabled;
 cpus_register_accel(_cpus);
 
@@ -168,6 +170,22 @@ static void tcg_set_tb_size(Object *obj, Visitor *v,
 s->tb_size = value;
 }
 
+#ifdef CONFIG_IOS_JIT
+static bool tcg_get_mirror_rwx(Object *obj, Error **errp)
+{
+TCGState *s = TCG_STATE(obj);
+
+return s->mirror_rwx;
+}
+
+static void tcg_set_mirror_rwx(Object *obj, bool value, Error **errp)
+{
+TCGState *s = TCG_STATE(obj);
+
+s->mirror_rwx = value;
+}
+#endif
+
 static void tcg_accel_class_init(ObjectClass *oc, void *data)
 {
 AccelClass *ac = ACCEL_CLASS(oc);
@@ -185,6 +203,13 @@ static void tcg_accel_class_init(ObjectClass *oc, void 
*data)
 object_class_property_set_description(oc, "tb-size",
 "TCG translation block cache size");
 
+#ifdef CONFIG_IOS_JIT
+object_class_property_add_bool(oc, "mirror-rwx",
+tcg_get_mirror_rwx, tcg_set_mirror_rwx);
+object_class_property_set_description(oc, "mirror-rwx",
+"mirror map executable pages for TCG on iOS");
+#endif
+
 }
 
 static const TypeInfo tcg_accel_type = {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index eb1d8fbe2f..1675951b75 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1042,12 +1042,15 @@ static inline void *split_cross_256mb(void *buf1, 
size_t size1)
 static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
 __attribute__((aligned(CODE_GEN_ALIGN)));
 
-static inline void *alloc_code_gen_buffer(void)
+static inline void *alloc_code_gen_buffer(bool no_rwx_pages)
 {
 void *buf = static_code_gen_buffer;
 void *end = static_code_gen_buffer + sizeof(static_code_gen_buffer);
 size_t size;
 
+/* not applicable */
+assert(!no_rwx_pages);
+
 /* page-align the beginning and end of the buffer */
 buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
 end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size);
@@ -1076,24 +1079,32 @@ static inline void *alloc_code_gen_buffer(void)
 return buf;
 }
 #elif defined(_WIN32)
-static inline void *alloc_code_gen_buffer(void)
+static inline void *alloc_code_gen_buffer(bool no_rwx_pages)
 {
 size_t size = tcg_ctx->code_gen_buffer_size;
+assert(!no_rwx_pages); /* not applicable */
 return VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
 PAGE_EXECUTE_READWRITE);
 }
 #else
-static inline void *alloc_code_gen_buffer(void)
+static inline void *alloc_code_gen_buffer(bool no_rwx_pages)
 {
-#if defined(CONFIG_IOS_JIT)
 int prot = PROT_READ | PROT_EXEC;
-#else
-int prot = PROT_WRITE | PROT_READ | PROT_EXEC;
-#endif
 int flags = MAP_PRIVATE | MAP_ANONYMOUS;
 size_t size = tcg_ctx->code_gen_buffer_size;
 void *buf;
 
+#if defined(CONFIG_DARWIN) /* both iOS and macOS (Apple Silicon) applicable */
+if (!no_rwx_pages) {
+prot |= PROT_WRITE;
+flags |= MAP_JIT;
+}
+#else
+/* not applicable */
+assert(!no_rwx_pages);
+prot |= PROT_WRITE;
+#endif
+
 buf = mmap(NULL, size, prot, flags, -1, 0);
 if (buf == MAP_FAILED) {
 return NULL;
@@ -1173,10 +1184,10 @@ static inline void *alloc_jit_rw_mirror(void *base, 
size_t size)
 }
 #endif /* CONFIG_IOS_JIT */
 
-static inline void code_gen_alloc(size_t tb_size)
+static inline void code_gen_alloc(size_t tb_size, bool mirror_rwx)
 {
 tcg_ctx->code_gen_buffer_size = size_code_gen_buffer(tb_size);
-

[PATCH v2 3/9] qemu: add support for iOS host

2020-10-18 Thread Joelle van Dyne

From: osy 

This introduces support for building for iOS hosts. When the correct Xcode
toolchain is used, iOS host will be detected automatically.

block: disable features not supported by iOS sandbox
slirp: disable SMB features for iOS
target: disable system() calls for iOS
tcg: use sys_icache_invalidate() instead of GCC builtin for iOS
tests: disable tests on iOS which uses system()
Signed-off-by: Joelle van Dyne 
---
 configure | 43 ++-
 meson.build   |  2 +-
 tcg/aarch64/tcg-target.h  | 10 +
 block.c   |  2 +-
 block/file-posix.c| 30 ---
 net/slirp.c   | 16 +++
 qga/commands-posix.c  |  6 ++
 target/arm/arm-semi.c |  2 ++
 target/m68k/m68k-semi.c   |  2 ++
 target/nios2/nios2-semi.c |  2 ++
 tests/qtest/meson.build   |  7 +++
 11 files changed, 95 insertions(+), 27 deletions(-)

diff --git a/configure b/configure
index 46d5db63e8..c474d7c221 100755
--- a/configure
+++ b/configure
@@ -561,6 +561,19 @@ EOF
   compile_object
 }
 
+check_ios() {
+  cat > $TMPC < $TMPC <
@@ -603,7 +616,11 @@ elif check_define __DragonFly__ ; then
 elif check_define __NetBSD__; then
   targetos='NetBSD'
 elif check_define __APPLE__; then
-  targetos='Darwin'
+  if check_ios ; then
+targetos='iOS'
+  else
+targetos='Darwin'
+  fi
 else
   # This is a fatal error, but don't report it yet, because we
   # might be going to just print the --help text, or it might
@@ -780,6 +797,22 @@ Darwin)
   # won't work when we're compiling with gcc as a C compiler.
   QEMU_CFLAGS="-DOS_OBJECT_USE_OBJC=0 $QEMU_CFLAGS"
 ;;
+iOS)
+  bsd="yes"
+  darwin="yes"
+  ios="yes"
+  if [ "$cpu" = "x86_64" ] ; then
+QEMU_CFLAGS="-arch x86_64 $QEMU_CFLAGS"
+QEMU_LDFLAGS="-arch x86_64 $QEMU_LDFLAGS"
+  fi
+  host_block_device_support="no"
+  audio_drv_list=""
+  audio_possible_drivers=""
+  QEMU_LDFLAGS="-framework CoreFoundation $QEMU_LDFLAGS"
+  # Disable attempts to use ObjectiveC features in os/object.h since they
+  # won't work when we're compiling with gcc as a C compiler.
+  QEMU_CFLAGS="-DOS_OBJECT_USE_OBJC=0 $QEMU_CFLAGS"
+;;
 SunOS)
   solaris="yes"
   make="${MAKE-gmake}"
@@ -6162,6 +6195,10 @@ if test "$darwin" = "yes" ; then
   echo "CONFIG_DARWIN=y" >> $config_host_mak
 fi
 
+if test "$ios" = "yes" ; then
+  echo "CONFIG_IOS=y" >> $config_host_mak
+fi
+
 if test "$solaris" = "yes" ; then
   echo "CONFIG_SOLARIS=y" >> $config_host_mak
 fi
@@ -7166,6 +7203,7 @@ echo "cpp_link_args = [${LDFLAGS:+$(meson_quote 
$LDFLAGS)}]" >> $cross
 echo "[binaries]" >> $cross
 echo "c = [$(meson_quote $cc)]" >> $cross
 test -n "$cxx" && echo "cpp = [$(meson_quote $cxx)]" >> $cross
+test -n "$objcc" && echo "objc = [$(meson_quote $objcc)]" >> $cross
 echo "ar = [$(meson_quote $ar)]" >> $cross
 echo "nm = [$(meson_quote $nm)]" >> $cross
 echo "pkgconfig = [$(meson_quote $pkg_config_exe)]" >> $cross
@@ -7184,6 +7222,9 @@ if test "$cross_compile" = "yes"; then
 if test "$linux" = "yes" ; then
 echo "system = 'linux'" >> $cross
 fi
+if test "$darwin" = "yes" ; then
+echo "system = 'darwin'" >> $cross
+fi
 case "$ARCH" in
 i386|x86_64)
 echo "cpu_family = 'x86'" >> $cross
diff --git a/meson.build b/meson.build
index 5d3a47784b..69a3c00cce 100644
--- a/meson.build
+++ b/meson.build
@@ -140,7 +140,7 @@ if targetos == 'windows'
   include_directories: 
include_directories('.'))
 elif targetos == 'darwin'
   coref = dependency('appleframeworks', modules: 'CoreFoundation')
-  iokit = dependency('appleframeworks', modules: 'IOKit')
+  iokit = dependency('appleframeworks', modules: 'IOKit', required: 
'CONFIG_IOS' not in config_host)
   cocoa = dependency('appleframeworks', modules: 'Cocoa', required: 
get_option('cocoa'))
 elif targetos == 'sunos'
   socket = [cc.find_library('socket'),
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 663dd0b95e..a2b22b4305 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -148,9 +148,19 @@ typedef enum {
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP 1
 
+#if defined(__APPLE__)
+void sys_icache_invalidate(void *start, size_t len);
+#endif
+
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
+#if defined(__APPLE__)
+sys_icache_invalidate((char *)start, stop - start);
+#elif defined(__GNUC__)
 __builtin___clear_cache((char *)start, (char *)stop);
+#else
+#error "Missing builtin to flush instruction cache"
+#endif
 }
 
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
diff --git a/block.c b/block.c
index 430edf79bb..5d49869d02 100644
--- a/block.c
+++ b/block.c
@@ -53,7 +53,7 @@
 #ifdef CONFIG_BSD
 #include 
 #include 
-#ifndef __DragonFly__
+#if !defined(__DragonFly__) && !defined(CONFIG_IOS)
 #include 
 #endif
 #endif
diff --git a/block/file-posix.c

[PATCH v2 4/9] coroutine: add libucontext as external library

2020-10-18 Thread Joelle van Dyne

From: osy 

iOS does not support ucontext natively for aarch64 and the sigaltstack is
also unsupported (even worse, it fails silently, see:
https://openradar.appspot.com/13002712 )

As a workaround we include a library implementation of ucontext and add it
as a build option.

Signed-off-by: Joelle van Dyne 
---
 configure | 23 ---
 meson.build   | 29 -
 util/coroutine-ucontext.c |  9 +
 .gitmodules   |  3 +++
 libucontext   |  1 +
 meson_options.txt |  2 ++
 6 files changed, 63 insertions(+), 4 deletions(-)
 create mode 16 libucontext

diff --git a/configure b/configure
index c474d7c221..0b7e25e7a5 100755
--- a/configure
+++ b/configure
@@ -1756,7 +1756,7 @@ Advanced options (experts only):
   --oss-libpath to OSS library
   --cpu=CPUBuild for host CPU [$cpu]
   --with-coroutine=BACKEND coroutine backend. Supported options:
-   ucontext, sigaltstack, windows
+   ucontext, libucontext, sigaltstack, windows
   --enable-gcovenable test coverage analysis with gcov
   --disable-blobs  disable installing provided firmware blobs
   --with-vss-sdk=SDK-path  enable Windows VSS support in QEMU Guest Agent
@@ -5058,6 +5058,8 @@ if test "$coroutine" = ""; then
 coroutine=win32
   elif test "$ucontext_works" = "yes"; then
 coroutine=ucontext
+  elif test "$ios" = "yes"; then
+coroutine=libucontext
   else
 coroutine=sigaltstack
   fi
@@ -5081,12 +5083,27 @@ else
   error_exit "only the 'windows' coroutine backend is valid for Windows"
 fi
 ;;
+  libucontext)
+  ;;
   *)
 error_exit "unknown coroutine backend $coroutine"
 ;;
   esac
 fi
 
+case $coroutine in
+libucontext)
+  git_submodules="${git_submodules} libucontext"
+  mkdir -p libucontext
+  coroutine_impl=ucontext
+  libucontext="enabled"
+  ;;
+*)
+  coroutine_impl=$coroutine
+  libucontext="disabled"
+  ;;
+esac
+
 if test "$coroutine_pool" = ""; then
   coroutine_pool=yes
 fi
@@ -6676,7 +6693,7 @@ if test "$rbd" = "yes" ; then
   echo "RBD_LIBS=$rbd_libs" >> $config_host_mak
 fi
 
-echo "CONFIG_COROUTINE_BACKEND=$coroutine" >> $config_host_mak
+echo "CONFIG_COROUTINE_BACKEND=$coroutine_impl" >> $config_host_mak
 if test "$coroutine_pool" = "yes" ; then
   echo "CONFIG_COROUTINE_POOL=1" >> $config_host_mak
 else
@@ -7273,7 +7290,7 @@ NINJA=${ninja:-$PWD/ninjatool} $meson setup \
-Dcocoa=$cocoa -Dmpath=$mpath -Dsdl=$sdl -Dsdl_image=$sdl_image \
-Dvnc=$vnc -Dvnc_sasl=$vnc_sasl -Dvnc_jpeg=$vnc_jpeg -Dvnc_png=$vnc_png 
\
-Dgettext=$gettext -Dxkbcommon=$xkbcommon -Du2f=$u2f \
-   -Dcapstone=$capstone -Dslirp=$slirp -Dfdt=$fdt \
+   -Dcapstone=$capstone -Dslirp=$slirp -Dfdt=$fdt -Ducontext=$libucontext \
 $cross_arg \
 "$PWD" "$source_path"
 
diff --git a/meson.build b/meson.build
index 69a3c00cce..e3ff35f46b 100644
--- a/meson.build
+++ b/meson.build
@@ -1099,9 +1099,35 @@ if not fdt.found() and fdt_required.length() > 0
   error('fdt not available but required by targets ' + ', '.join(fdt_required))
 endif
 
+ucontext = not_found
+slirp_opt = 'disabled'
+if get_option('ucontext').enabled()
+  if not fs.is_dir(meson.current_source_dir() / 'libucontext/arch' / cpu)
+error('libucontext is wanted but not implemented for host ' + cpu)
+  endif
+  arch = host_machine.cpu()
+  ucontext_cargs = ['-DG_LOG_DOMAIN="ucontext"', '-DCUSTOM_IMPL']
+  ucontext_files = [
+'libucontext/arch' / arch / 'getcontext.S',
+'libucontext/arch' / arch / 'setcontext.S',
+'libucontext/arch' / arch / 'makecontext.c',
+'libucontext/arch' / arch / 'startcontext.S',
+'libucontext/arch' / arch / 'swapcontext.S',
+  ]
+
+  ucontext_inc = include_directories('libucontext/include')
+  libucontext = static_library('ucontext',
+   sources: ucontext_files,
+   c_args: ucontext_cargs,
+   include_directories: ucontext_inc)
+  ucontext = declare_dependency(link_with: libucontext,
+include_directories: ucontext_inc)
+endif
+
 config_host_data.set('CONFIG_CAPSTONE', capstone.found())
 config_host_data.set('CONFIG_FDT', fdt.found())
 config_host_data.set('CONFIG_SLIRP', slirp.found())
+config_host_data.set('CONFIG_LIBUCONTEXT', ucontext.found())
 
 genh += configure_file(output: 'config-host.h', configuration: 
config_host_data)
 
@@ -1321,7 +1347,7 @@ util_ss.add_all(trace_ss)
 util_ss = util_ss.apply(config_all, strict: false)
 libqemuutil = static_library('qemuutil',
  sources: util_ss.sources() + stub_ss.sources() + 
genh,
- dependencies: [util_ss.dependencies(), m, glib, 
socket, malloc])
+ dependencies: [util_ss.dependencies(), m, glib, 
socket, malloc, ucontext])
 qemuutil =

[PATCH v2 0/9] iOS and Apple Silicon host support

2020-10-18 Thread Joelle van Dyne

These set of changes brings QEMU TCG to iOS devices and future Apple Silicon
devices. They were originally developed last year and have been working in the
UTM app. Recently, we ported the changes to master, re-wrote a lot of the build
script changes for meson, and broke up the patches into more distinct units.

A summary of the changes:

* `CONFIG_IOS` and `CONFIG_IOS_JIT` defined when building for iOS and
  iOS specific changes (as well as unsupported code) are gated behind it.
* A new dependency, libucontext is added since iOS does not have native ucontext
  and broken support for sigaltstack. libucontext is available as a new option
  for coroutine backend.
* On stock iOS devices, there is a workaround for running JIT code without
  any special entitlement. It requires the JIT region to be mirror mapped with
  one region RW and another one RX. To support this style of JIT, TCG is changed
  to support writing to a different code_ptr. These changes are gated by the
  `CONFIG_IOS_JIT`.
* For (recent) jailbroken iOS devices as well as upcoming Apple Silicon devices,
  there are new rules for applications supporting JIT (with the proper
  entitlement). These rules are implemented as well.

Since v2:

* Changed getting mirror pointer from a macro to inline functions
* Split constification of TCG code pointers to separate patch
* Removed slirp updates (will send future patch once slirp changes are in)
* Removed shared library patch (will send future patch)

-j

osy (9):
  configure: option to disable host block devices
  configure: cross-compiling without cross_prefix
  qemu: add support for iOS host
  coroutine: add libucontext as external library
  tcg: add const hints for code pointers
  tcg: implement mirror mapped JIT for iOS
  tcg: mirror mapping RWX pages for iOS optional
  tcg: support JIT on Apple Silicon
  block: check availablity for preadv/pwritev on mac

 docs/devel/ios.rst   |  40 +
 configure| 104 --
 meson.build  |  32 ++-
 include/exec/exec-all.h  |  10 +++
 include/sysemu/tcg.h |   2 +-
 include/tcg/tcg-apple-jit.h  |  85 ++
 include/tcg/tcg.h|  28 +-
 tcg/aarch64/tcg-target.h |  23 -
 tcg/arm/tcg-target.h |   9 +-
 tcg/i386/tcg-target.h|  24 -
 tcg/mips/tcg-target.h|   8 +-
 tcg/ppc/tcg-target.h |   8 +-
 tcg/riscv/tcg-target.h   |   9 +-
 tcg/s390/tcg-target.h|  13 ++-
 tcg/sparc/tcg-target.h   |   8 +-
 tcg/tci/tcg-target.h |   9 +-
 accel/tcg/cpu-exec-common.c  |   2 +
 accel/tcg/cpu-exec.c |   9 +-
 accel/tcg/tcg-all.c  |  27 +-
 accel/tcg/translate-all.c| 168 ---
 block.c  |   2 +-
 block/file-posix.c   |  50 ---
 bsd-user/main.c  |   2 +-
 linux-user/main.c|   2 +-
 net/slirp.c  |  16 ++--
 qga/commands-posix.c |   6 ++
 target/arm/arm-semi.c|   2 +
 target/m68k/m68k-semi.c  |   2 +
 target/nios2/nios2-semi.c|   2 +
 tcg/tcg.c|  64 -
 util/coroutine-ucontext.c|   9 ++
 .gitmodules  |   3 +
 libucontext  |   1 +
 meson_options.txt|   2 +
 qemu-options.hx  |  11 +++
 tcg/aarch64/tcg-target.c.inc |  48 ++
 tcg/arm/tcg-target.c.inc |  33 ---
 tcg/i386/tcg-target.c.inc|  28 +++---
 tcg/mips/tcg-target.c.inc|  64 +++--
 tcg/ppc/tcg-target.c.inc |  55 +++-
 tcg/riscv/tcg-target.c.inc   |  51 ++-
 tcg/s390/tcg-target.c.inc|  25 +++---
 tcg/sparc/tcg-target.c.inc   |  33 ---
 tcg/tcg-ldst.c.inc   |   2 +-
 tcg/tcg-pool.c.inc   |   9 +-
 tcg/tci/tcg-target.c.inc |   8 +-
 tests/qtest/meson.build  |   7 +-
 47 files changed, 919 insertions(+), 236 deletions(-)
 create mode 100644 docs/devel/ios.rst
 create mode 100644 include/tcg/tcg-apple-jit.h
 create mode 16 libucontext

-- 
2.24.3 (Apple Git-128)

[PATCH v2 1/9] configure: option to disable host block devices

2020-10-18 Thread Joelle van Dyne

From: osy 

Some hosts (iOS) have a sandboxed filesystem and do not provide low-level
APIs for interfacing with host block devices.

Signed-off-by: Joelle van Dyne 
---
 configure  | 4 
 meson.build| 1 +
 block/file-posix.c | 8 +++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index b553288c5e..3c63879750 100755
--- a/configure
+++ b/configure
@@ -446,6 +446,7 @@ meson=""
 ninja=""
 skip_meson=no
 gettext=""
+host_block_device_support="yes"
 
 bogus_os="no"
 malloc_trim="auto"
@@ -6098,6 +6099,9 @@ if test "$default_devices" = "yes" ; then
 else
   echo "CONFIG_MINIKCONF_MODE=--allnoconfig" >> $config_host_mak
 fi
+if test "$host_block_device_support" = "yes" ; then
+  echo "CONFIG_HOST_BLOCK_DEVICE=y" >> $config_host_mak
+fi
 if test "$debug_tcg" = "yes" ; then
   echo "CONFIG_DEBUG_TCG=y" >> $config_host_mak
 fi
diff --git a/meson.build b/meson.build
index 17c89c87c6..5d3a47784b 100644
--- a/meson.build
+++ b/meson.build
@@ -1947,6 +1947,7 @@ summary_info += {'vvfat support': 
config_host.has_key('CONFIG_VVFAT')}
 summary_info += {'qed support':   config_host.has_key('CONFIG_QED')}
 summary_info += {'parallels support': config_host.has_key('CONFIG_PARALLELS')}
 summary_info += {'sheepdog support':  config_host.has_key('CONFIG_SHEEPDOG')}
+summary_info += {'host block dev support': 
config_host.has_key('CONFIG_HOST_BLOCK_DEVICE')}
 summary_info += {'capstone':  capstone_opt == 'disabled' ? false : 
capstone_opt}
 summary_info += {'libpmem support':   config_host.has_key('CONFIG_LIBPMEM')}
 summary_info += {'libdaxctl support': config_host.has_key('CONFIG_LIBDAXCTL')}
diff --git a/block/file-posix.c b/block/file-posix.c
index c63926d592..52f7c20525 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -41,7 +41,7 @@
 #include "scsi/pr-manager.h"
 #include "scsi/constants.h"
 
-#if defined(__APPLE__) && (__MACH__)
+#if defined(CONFIG_HOST_BLOCK_DEVICE) && defined(__APPLE__) && (__MACH__)
 #include 
 #include 
 #include 
@@ -3247,6 +3247,8 @@ BlockDriver bdrv_file = {
 /***/
 /* host device */
 
+#if defined(CONFIG_HOST_BLOCK_DEVICE)
+
 #if defined(__APPLE__) && defined(__MACH__)
 static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
 CFIndex maxPathSize, int flags);
@@ -3872,6 +3874,8 @@ static BlockDriver bdrv_host_cdrom = {
 };
 #endif /* __FreeBSD__ */
 
+#endif /* CONFIG_HOST_BLOCK_DEVICE */
+
 static void bdrv_file_init(void)
 {
 /*
@@ -3879,6 +3883,7 @@ static void bdrv_file_init(void)
  * registered last will get probed first.
  */
 bdrv_register(_file);
+#if defined(CONFIG_HOST_BLOCK_DEVICE)
 bdrv_register(_host_device);
 #ifdef __linux__
 bdrv_register(_host_cdrom);
@@ -3886,6 +3891,7 @@ static void bdrv_file_init(void)
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
 bdrv_register(_host_cdrom);
 #endif
+#endif /* CONFIG_HOST_BLOCK_DEVICE */
 }
 
 block_init(bdrv_file_init);
-- 
2.24.3 (Apple Git-128)

[PATCH] softfpu: Generalize pick_nan_muladd to opaque structures

2020-10-18 Thread Richard Henderson

This will allow us to share code between FloatParts and FloatParts128.

Signed-off-by: Richard Henderson 
---
Cc: Alex Bennee 

What do you think of this instead of inlining pick_nan_muladd
into the two muladd implementations?


r~

---
 fpu/softfloat.c | 40 
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3e625c47cd..60f163 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -929,16 +929,23 @@ static FloatParts pick_nan(FloatParts a, FloatParts b, 
float_status *s)
 return a;
 }
 
-static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
-  bool inf_zero, float_status *s)
+/*
+ * Given pointers to A, B, C, and the respective classes, return the
+ * pointer to the structure that is the NaN result, or NULL to signal
+ * that the result is the default NaN.
+ */
+static inline void *
+pick_nan_muladd(FloatClass a_cls, FloatClass b_cls, FloatClass c_cls,
+void *a, void *b, void *c,
+bool inf_zero, int abc_mask, float_status *s)
 {
 int which;
 
-if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
+if (unlikely(abc_mask & float_cmask_snan)) {
 s->float_exception_flags |= float_flag_invalid;
 }
 
-which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
+which = pickNaNMulAdd(a_cls, b_cls, c_cls, inf_zero, s);
 
 if (s->default_nan_mode) {
 /* Note that this check is after pickNaNMulAdd so that function
@@ -949,23 +956,16 @@ static FloatParts pick_nan_muladd(FloatParts a, 
FloatParts b, FloatParts c,
 
 switch (which) {
 case 0:
-break;
+return a;
 case 1:
-a = b;
-break;
+return b;
 case 2:
-a = c;
-break;
+return c;
 case 3:
-return parts_default_nan(s);
+return NULL;
 default:
 g_assert_not_reached();
 }
-
-if (is_snan(a.cls)) {
-return parts_silence_nan(a, s);
-}
-return a;
 }
 
 /*
@@ -1366,7 +1366,15 @@ static FloatParts muladd_floats(FloatParts a, FloatParts 
b, FloatParts c,
  * off to the target-specific pick-a-NaN routine.
  */
 if (unlikely(abc_mask & float_cmask_anynan)) {
-return pick_nan_muladd(a, b, c, inf_zero, s);
+FloatParts *r = pick_nan_muladd(a.cls, b.cls, c.cls, , , ,
+inf_zero, abc_mask, s);
+if (r == NULL) {
+return parts_default_nan(s);
+}
+if (is_snan(r->cls)) {
+return parts_silence_nan(*r, s);
+}
+return *r;
 }
 
 if (unlikely(inf_zero)) {
-- 
2.25.1

[Bug 1900352] [NEW] no sound in spice when VNC enabled

2020-10-18 Thread azrdev

Public bug reported:

Running Fedora32 with virt-manager → libvirt → qemu  I noticed that I
got no sound in my spice client. The VM is configured with a SPICE-
server and a QXL display, and in addition a VNC display.

Apparently when I remove the VNC display, then the sound is routed just
fine to the spice client: I can hear it, and `G_MESSAGES_DEBUG=all
remote-viewer --spice-debug  spice://localhost:5900` mentions
SpicePlaybackChannel and SpiceRecordChannel. With the VNC server
configured, such messages are missing, and I cannot hear the sound
(which is sent by the guest OS to the virtual hardware).

qemu-4.2.1-1.fc32

** Affects: qemu
 Importance: Undecided
 Status: New

** Description changed:

  Running Fedora32 with virt-manager → libvirt → qemu  I noticed that I
  got no sound in my spice client. The VM is configured with a SPICE-
  server and a QXL display, and in addition a VNC display.
  
  Apparently when I remove the VNC display, then the sound is routed just
  fine to the spice client: I can hear it, and `G_MESSAGES_DEBUG=all
  remote-viewer --spice-debug  spice://localhost:5900` mentions
  SpicePlaybackChannel and SpiceRecordChannel. With the VNC server
  configured, such messages are missing, and I cannot hear the sound
  (which is sent by the guest OS to the virtual hardware).
+ 
+ qemu-4.2.1-1.fc32

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1900352

Title:
  no sound in spice when VNC enabled

Status in QEMU:
  New

Bug description:
  Running Fedora32 with virt-manager → libvirt → qemu  I noticed that I
  got no sound in my spice client. The VM is configured with a SPICE-
  server and a QXL display, and in addition a VNC display.

  Apparently when I remove the VNC display, then the sound is routed
  just fine to the spice client: I can hear it, and
  `G_MESSAGES_DEBUG=all remote-viewer --spice-debug
  spice://localhost:5900` mentions SpicePlaybackChannel and
  SpiceRecordChannel. With the VNC server configured, such messages are
  missing, and I cannot hear the sound (which is sent by the guest OS to
  the virtual hardware).

  qemu-4.2.1-1.fc32

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1900352/+subscriptions

[NOTFORMERGE PATCH v2 3/3] hw/arm/raspi: Remove unsupported raspi4 peripherals from device tree

2020-10-18 Thread Philippe Mathieu-Daudé

Kludge when using Linux kernels to reach userland.
No device in DT -> no hardware initialization.

Linux 5.9 uses the RPI_FIRMWARE_GET_CLOCKS so we now need to
implement that feature too. Look like a cat and mouse game...

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 6a793766840..93eb6591ee8 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -25,6 +25,7 @@
 #include "hw/arm/boot.h"
 #include "sysemu/sysemu.h"
 #include "qom/object.h"
+#include 
 
 #define SMPBOOT_ADDR0x300 /* this should leave enough space for ATAGS */
 #define MVBAR_ADDR  0x400 /* secure vectors */
@@ -200,6 +201,29 @@ static void reset_secondary(ARMCPU *cpu, const struct 
arm_boot_info *info)
 cpu_set_pc(cs, info->smp_loader_start);
 }
 
+static void raspi4_modify_dtb(const struct arm_boot_info *info, void *fdt)
+{
+int offset;
+
+offset = fdt_node_offset_by_compatible(fdt, -1, "brcm,genet-v5");
+if (offset >= 0) {
+/* FIXME we shouldn't nop the parent */
+offset = fdt_parent_offset(fdt, offset);
+if (offset >= 0) {
+if (!fdt_nop_node(fdt, offset)) {
+warn_report("dtc: bcm2838-genet removed!");
+}
+}
+}
+
+offset = fdt_node_offset_by_compatible(fdt, -1, "brcm,avs-tmon-bcm2838");
+if (offset >= 0) {
+if (!fdt_nop_node(fdt, offset)) {
+warn_report("dtc: bcm2838-tmon removed!");
+}
+}
+}
+
 static void setup_boot(MachineState *machine, RaspiProcessorId processor_id,
size_t ram_size)
 {
@@ -234,6 +258,9 @@ static void setup_boot(MachineState *machine, 
RaspiProcessorId processor_id,
 }
 s->binfo.secondary_cpu_reset_hook = reset_secondary;
 }
+if (processor_id >= PROCESSOR_ID_BCM2838) {
+s->binfo.modify_dtb = raspi4_modify_dtb;
+}
 
 /* If the user specified a "firmware" image (e.g. UEFI), we bypass
  * the normal Linux boot process
-- 
2.26.2

[RFC PATCH v2 2/3] hw/arm/raspi: Add the Raspberry Pi 4 model B

2020-10-18 Thread Philippe Mathieu-Daudé

Add 2 variants of the raspi4:

- raspi4b1g:Raspberry Pi 4B (revision 1.1, with 1 GiB of RAM)
- raspi4b2g Raspberry Pi 4B (revision 1.2, with 2 GiB)

Example booting the 2GiB machine using content from [*]:

  $ qemu-system-aarch64 -M raspi4b2g -serial stdio \
  -kernel raspberrypi/firmware/boot/kernel8.img \
  -dtb raspberrypi/firmware/boot/bcm2711-rpi-4-b.dtb \
  -append 'printk.time=0 earlycon=pl011,0xfe201000 console=ttyAMA0'
  [0.00] Booting Linux on physical CPU 0x00 [0x410fd083]
  [0.00] Linux version 5.4.51-v8+ (dom@buildbot) (gcc version 5.4.0 
20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9)) #1333 SMP PREEMPT Mon Aug 10 
16:58:35 BST 2020
  [0.00] Machine model: Raspberry Pi 4 Model B
  [0.00] earlycon: pl11 at MMIO 0xfe201000 (options '')
  [0.00] printk: bootconsole [pl11] enabled
  [0.00] efi: Getting EFI parameters from FDT:
  [0.00] efi: UEFI not found.
  [0.00] Reserved memory: created CMA memory pool at 
0x2c00, size 64 MiB
  [0.00] OF: reserved mem: initialized node linux,cma, compatible id 
shared-dma-pool
  [0.00] Detected PIPT I-cache on CPU0
  [0.00] CPU features: detected: EL2 vector hardening
  [0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware
  [0.00] software IO TLB: mapped [mem 0x3bfff000-0x3000] (64MB)
  [0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
  [0.00] GIC: Using split EOI/Deactivate mode
  [0.633289] smp: Bringing up secondary CPUs ...
  [0.694226] Detected PIPT I-cache on CPU1
  [0.699002] CPU1: Booted secondary processor 0x01 [0x410fd083]
  [0.782443] Detected PIPT I-cache on CPU2
  [0.783511] CPU2: Booted secondary processor 0x02 [0x410fd083]
  [0.848854] Detected PIPT I-cache on CPU3
  [0.850003] CPU3: Booted secondary processor 0x03 [0x410fd083]
  [0.857099] smp: Brought up 1 node, 4 CPUs
  [0.863500] SMP: Total of 4 processors activated.
  [0.865446] CPU features: detected: 32-bit EL0 Support
  [0.87] CPU features: detected: CRC32 instructions
  [2.235648] CPU: All CPU(s) started at EL2
  ...

[*] 
http://archive.raspberrypi.org/debian/pool/main/r/raspberrypi-firmware/raspberrypi-kernel_1.20200512-2_armhf.deb

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 4ea200572ea..6a793766840 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -73,6 +73,7 @@ typedef enum RaspiProcessorId {
 PROCESSOR_ID_BCM2835 = 0,
 PROCESSOR_ID_BCM2836 = 1,
 PROCESSOR_ID_BCM2837 = 2,
+PROCESSOR_ID_BCM2838 = 3,
 } RaspiProcessorId;
 
 static const struct {
@@ -82,6 +83,7 @@ static const struct {
 [PROCESSOR_ID_BCM2835] = {TYPE_BCM2835, 1},
 [PROCESSOR_ID_BCM2836] = {TYPE_BCM2836, BCM283X_NCPUS},
 [PROCESSOR_ID_BCM2837] = {TYPE_BCM2837, BCM283X_NCPUS},
+[PROCESSOR_ID_BCM2838] = {TYPE_BCM2838, BCM283X_NCPUS},
 };
 
 static uint64_t board_ram_size(uint32_t board_rev)
@@ -366,6 +368,24 @@ static void raspi3b_machine_class_init(ObjectClass *oc, 
void *data)
 rmc->board_rev = 0xa02082;
 raspi_machine_class_common_init(mc, rmc->board_rev);
 };
+
+static void raspi4b1g_machine_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+RaspiMachineClass *rmc = RASPI_MACHINE_CLASS(oc);
+
+rmc->board_rev = 0xa03111;
+raspi_machine_class_common_init(mc, rmc->board_rev);
+};
+
+static void raspi4b2g_machine_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+RaspiMachineClass *rmc = RASPI_MACHINE_CLASS(oc);
+
+rmc->board_rev = 0xb03112;
+raspi_machine_class_common_init(mc, rmc->board_rev);
+};
 #endif /* TARGET_AARCH64 */
 
 static const TypeInfo raspi_machine_types[] = {
@@ -390,6 +410,14 @@ static const TypeInfo raspi_machine_types[] = {
 .name   = MACHINE_TYPE_NAME("raspi3b"),
 .parent = TYPE_RASPI_MACHINE,
 .class_init = raspi3b_machine_class_init,
+}, {
+.name   = MACHINE_TYPE_NAME("raspi4b1g"),
+.parent = TYPE_RASPI_MACHINE,
+.class_init = raspi4b1g_machine_class_init,
+}, {
+.name   = MACHINE_TYPE_NAME("raspi4b2g"),
+.parent = TYPE_RASPI_MACHINE,
+.class_init = raspi4b2g_machine_class_init,
 #endif
 }, {
 .name   = TYPE_RASPI_MACHINE,
-- 
2.26.2

[RFC PATCH v2 1/3] hw/arm/bcm2836: Add the ARMv8 BCM2838

2020-10-18 Thread Philippe Mathieu-Daudé

The BCM2838 share the same peripheral base block from the BCM283x
family, but connects 4 Cortex-A72 cores via a GICv2.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/arm/bcm2836.h |   3 +
 hw/arm/bcm2836.c | 179 +++
 hw/arm/trace-events  |   2 +
 3 files changed, 184 insertions(+)

diff --git a/include/hw/arm/bcm2836.h b/include/hw/arm/bcm2836.h
index 6f90cabfa3a..92561e96aa2 100644
--- a/include/hw/arm/bcm2836.h
+++ b/include/hw/arm/bcm2836.h
@@ -14,6 +14,7 @@
 
 #include "hw/arm/bcm2835_peripherals.h"
 #include "hw/intc/bcm2836_control.h"
+#include "hw/intc/arm_gic.h"
 #include "target/arm/cpu.h"
 #include "qom/object.h"
 
@@ -29,6 +30,7 @@ OBJECT_DECLARE_TYPE(BCM283XState, BCM283XClass, BCM283X)
 #define TYPE_BCM2835 "bcm2835"
 #define TYPE_BCM2836 "bcm2836"
 #define TYPE_BCM2837 "bcm2837"
+#define TYPE_BCM2838 "bcm2838"
 
 struct BCM283XState {
 /*< private >*/
@@ -40,6 +42,7 @@ struct BCM283XState {
 struct {
 ARMCPU core;
 } cpu[BCM283X_NCPUS];
+GICState gic;
 BCM2836ControlState control;
 BCM2835PeripheralState peripherals;
 };
diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index de7ade2878e..fe795217e26 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -16,6 +16,7 @@
 #include "hw/arm/bcm2836.h"
 #include "hw/arm/raspi_platform.h"
 #include "hw/sysbus.h"
+#include "trace.h"
 
 typedef struct BCM283XClass {
 /*< private >*/
@@ -26,6 +27,7 @@ typedef struct BCM283XClass {
 unsigned core_count;
 hwaddr peri_base; /* Peripheral base address seen by the CPU */
 hwaddr ctrl_base; /* Interrupt controller and mailboxes etc. */
+hwaddr gic_base;
 int clusterid;
 } BCM283XClass;
 
@@ -52,6 +54,10 @@ static void bcm2836_init(Object *obj)
 qdev_prop_set_uint32(DEVICE(obj), "enabled-cpus", bc->core_count);
 }
 
+if (bc->gic_base) {
+object_initialize_child(obj, "gic", >gic, TYPE_ARM_GIC);
+}
+
 if (bc->ctrl_base) {
 object_initialize_child(obj, "control", >control,
 TYPE_BCM2836_CONTROL);
@@ -170,6 +176,161 @@ static void bcm2836_realize(DeviceState *dev, Error 
**errp)
 }
 }
 
+#ifdef TARGET_AARCH64
+
+#define GIC400_MAINTAINANCE_IRQ  9
+#define GIC400_TIMER_NS_EL2_IRQ 10
+#define GIC400_TIMER_VIRT_IRQ   11
+#define GIC400_LEGACY_FIQ   12
+#define GIC400_TIMER_S_EL1_IRQ  13
+#define GIC400_TIMER_NS_EL1_IRQ 14
+#define GIC400_LEGACY_IRQ   15
+
+/* Number of external interrupt lines to configure the GIC with */
+#define GIC_NUM_IRQS128
+
+#define PPI(cpu, irq) (GIC_NUM_IRQS + (cpu) * GIC_INTERNAL + GIC_NR_SGIS + irq)
+
+#define GIC_BASE_OFS0x
+#define GIC_DIST_OFS0x1000
+#define GIC_CPU_OFS 0x2000
+#define GIC_VIFACE_THIS_OFS 0x4000
+#define GIC_VIFACE_OTHER_OFS(cpu)  (0x5000 + (cpu) * 0x200)
+#define GIC_VCPU_OFS0x6000
+
+#define VIRTUAL_PMU_IRQ 7
+
+static void bcm2838_gic_set_irq(void *opaque, int irq, int level)
+{
+BCM283XState *s = (BCM283XState *)opaque;
+
+trace_bcm2838_gic_set_irq(irq, level);
+qemu_set_irq(qdev_get_gpio_in(DEVICE(>gic), irq), level);
+}
+
+static void bcm2838_realize(DeviceState *dev, Error **errp)
+{
+BCM283XState *s = BCM283X(dev);
+BCM283XClass *bc = BCM283X_GET_CLASS(dev);
+int n;
+
+if (!bcm283x_common_realize(dev, errp)) {
+return;
+}
+
+sysbus_mmio_map_overlap(SYS_BUS_DEVICE(>peripherals), 0,
+bc->peri_base, 1);
+
+/* bcm2836 interrupt controller (and mailboxes, etc.) */
+if (!sysbus_realize(SYS_BUS_DEVICE(>control), errp)) {
+return;
+}
+
+sysbus_mmio_map(SYS_BUS_DEVICE(>control), 0, bc->ctrl_base);
+
+/* Create cores */
+for (n = 0; n < bc->core_count; n++) {
+/* TODO: this should be converted to a property of ARM_CPU */
+s->cpu[n].core.mp_affinity = (bc->clusterid << 8) | n;
+
+/* set periphbase/CBAR value for CPU-local registers */
+if (!object_property_set_int(OBJECT(>cpu[n].core), "reset-cbar",
+ bc->peri_base, errp)) {
+return;
+}
+
+/* start powered off if not enabled */
+if (!object_property_set_bool(OBJECT(>cpu[n].core),
+  "start-powered-off",
+  n >= s->enabled_cpus,
+  errp)) {
+return;
+}
+
+if (!qdev_realize(DEVICE(>cpu[n].core), NULL, errp)) {
+return;
+}
+}
+
+if (!object_property_set_uint(OBJECT(>gic), "revision", 2, errp)) {
+return;
+}
+
+if (!object_property_set_uint(OBJECT(>gic), "num-cpu", BCM283X_NCPUS,
+  errp)) {
+return;
+}
+
+if (!object_property_set_uint(OBJECT(>gic),
+

[RFC PATCH v2 0/3] hw/arm: Add the Raspberry Pi 4B

2020-10-18 Thread Philippe Mathieu-Daudé

Still not complete as we need to implement more firmware properties.
However state good enough for review, or in case someone want to work
on it and improve it.

Since RFC v1:
- Rebased
- Used recommendations from Luc
  https://www.mail-archive.com/qemu-devel@nongnu.org/msg642450.html

Based-on: <20201018203358.1530378-1-f4...@amsat.org>

Philippe Mathieu-Daudé (3):
  hw/arm/bcm2836: Add the ARMv8 BCM2838
  hw/arm/raspi: Add the Raspberry Pi 4 model B
  hw/arm/raspi: Remove unsupported raspi4 peripherals from device tree

 include/hw/arm/bcm2836.h |   3 +
 hw/arm/bcm2836.c | 179 +++
 hw/arm/raspi.c   |  55 
 hw/arm/trace-events  |   2 +
 4 files changed, 239 insertions(+)

-- 
2.26.2

Re: [PATCH v26 07/17] vfio: Register SaveVMHandlers for VFIO device

2020-10-18 Thread Kirti Wankhede





On 9/25/2020 5:23 PM, Cornelia Huck wrote:

On Wed, 23 Sep 2020 04:54:09 +0530
Kirti Wankhede  wrote:


Define flags to be used as delimeter in migration file stream.
Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
region from these functions at source during saving or pre-copy phase.
Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/migration.c  | 91 
  hw/vfio/trace-events |  2 ++
  2 files changed, 93 insertions(+)



(...)


+/*
+ * Flags used as delimiter:
+ * 0x => MSB 32-bit all 1s
+ * 0xef10 => emulated (virtual) function IO


Where is this value coming from?



Delimiter flags should be unique and this is a magic number that 
represents (e)mulated (f)unction (10) representing IO.



+ * 0x => 16-bits reserved for flags
+ */
+#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
+#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
+#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
+#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)


I think we need some more documentation what these values mean and how
they are used. From reading ahead a bit, it seems there is always
supposed to be a pair of DEV_*_STATE and END_OF_STATE framing some kind
of data?



Adding comment as below, hope it helps.

/*
 * Flags used as delimiter for VFIO devices should be unique in 
migration stream

 * These flags are composed as:
 * 0x => MSB 32-bit all 1s
 * 0xef10 => Magic ID, represents emulated (virtual) function IO
 * 0x => 16-bits reserved for flags
 *
 * Flags _DEV_CONFIG_STATE, _DEV_SETUP_STATE and _DEV_DATA_STATE marks 
start of

 * respective states in migration stream.
 * FLAG _END_OF_STATE indicates end of current state, state could be any
 * of above states.
 */

Thanks,
Kirti

Re: [PATCH v26 12/17] vfio: Add function to start and stop dirty pages tracking

2020-10-18 Thread Kirti Wankhede





On 9/26/2020 3:25 AM, Alex Williamson wrote:

On Wed, 23 Sep 2020 04:54:14 +0530
Kirti Wankhede  wrote:


Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking
for VFIO devices.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Dr. David Alan Gilbert 
---
  hw/vfio/migration.c | 36 
  1 file changed, 36 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 4306f6316417..822b68b4e015 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -11,6 +11,7 @@
  #include "qemu/main-loop.h"
  #include "qemu/cutils.h"
  #include 
+#include 
  
  #include "sysemu/runstate.h"

  #include "hw/vfio/vfio-common.h"
@@ -355,6 +356,32 @@ static int vfio_load_device_config_state(QEMUFile *f, void 
*opaque)
  return qemu_file_get_error(f);
  }
  
+static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start)

+{
+int ret;
+VFIOContainer *container = vbasedev->group->container;
+struct vfio_iommu_type1_dirty_bitmap dirty = {
+.argsz = sizeof(dirty),
+};
+
+if (start) {
+if (vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
+} else {
+return -EINVAL;
+}
+} else {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, );
+if (ret) {
+error_report("Failed to set dirty tracking flag 0x%x errno: %d",
+ dirty.flags, errno);
+}


Maybe doesn't matter in the long run, but do you want to use -errno for
the return rather than -1 from the ioctl on error?  Thanks,



Makes sense. Changing it.

Thanks,
Kirti


Alex


+return ret;
+}
+
  /* -- */
  
  static int vfio_save_setup(QEMUFile *f, void *opaque)

@@ -386,6 +413,11 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
  return ret;
  }
  
+ret = vfio_set_dirty_page_tracking(vbasedev, true);

+if (ret) {
+return ret;
+}
+
  qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
  
  ret = qemu_file_get_error(f);

@@ -401,6 +433,8 @@ static void vfio_save_cleanup(void *opaque)
  VFIODevice *vbasedev = opaque;
  VFIOMigration *migration = vbasedev->migration;
  
+vfio_set_dirty_page_tracking(vbasedev, false);

+
  if (migration->region.mmaps) {
  vfio_region_unmap(>region);
  }
@@ -734,6 +768,8 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
  if (ret) {
  error_report("%s: Failed to set state RUNNING", vbasedev->name);
  }
+
+vfio_set_dirty_page_tracking(vbasedev, false);
  }
  }

Re: [PATCH v26 09/17] vfio: Add load state functions to SaveVMHandlers

2020-10-18 Thread Kirti Wankhede





On 10/1/2020 3:37 PM, Cornelia Huck wrote:

On Wed, 23 Sep 2020 04:54:11 +0530
Kirti Wankhede  wrote:


Sequence  during _RESUMING device state:
While data for this device is available, repeat below steps:
a. read data_offset from where user application should write data.
b. write data of data_size to migration region from data_offset.
c. write data_size which indicates vendor driver that data is written in
staging buffer.

For user, data is opaque. User should write data in the same order as
received.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Dr. David Alan Gilbert 
---
  hw/vfio/migration.c  | 170 +++
  hw/vfio/trace-events |   3 +
  2 files changed, 173 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 4611bb972228..ffd70282dd0e 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -328,6 +328,33 @@ static int vfio_save_device_config_state(QEMUFile *f, void 
*opaque)
  return qemu_file_get_error(f);
  }
  
+static int vfio_load_device_config_state(QEMUFile *f, void *opaque)

+{
+VFIODevice *vbasedev = opaque;
+uint64_t data;
+
+if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
+int ret;
+
+ret = vbasedev->ops->vfio_load_config(vbasedev, f);
+if (ret) {
+error_report("%s: Failed to load device config space",
+ vbasedev->name);
+return ret;
+}
+}
+
+data = qemu_get_be64(f);
+if (data != VFIO_MIG_FLAG_END_OF_STATE) {
+error_report("%s: Failed loading device config space, "
+ "end flag incorrect 0x%"PRIx64, vbasedev->name, data);


I'm confused here: If we don't have a vfio_load_config callback, or if
that callback did not read everything, we also might end up with a
value that's not END_OF_STATE... in that case, the problem is not with
the stream, but rather with the consumer?


Right, hence "end flag incorrect" is reported.




+return -EINVAL;
+}
+
+trace_vfio_load_device_config_state(vbasedev->name);
+return qemu_file_get_error(f);
+}
+
  /* -- */
  
  static int vfio_save_setup(QEMUFile *f, void *opaque)

@@ -502,12 +529,155 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
  return ret;
  }
  
+static int vfio_load_setup(QEMUFile *f, void *opaque)

+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+
+if (migration->region.mmaps) {
+ret = vfio_region_mmap(>region);
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.nr,
+ strerror(-ret));
+error_report("%s: Falling back to slow path", vbasedev->name);
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
+   VFIO_DEVICE_STATE_RESUMING);
+if (ret) {
+error_report("%s: Failed to set state RESUMING", vbasedev->name);
+}
+return ret;


If I follow the code correctly, the cleanup callback will not be
invoked if you return != 0 here... should you clean up possible
mappings on error here?



Makes sense, adding region ummap on error.


+}
+
+static int vfio_load_cleanup(void *opaque)
+{
+vfio_save_cleanup(opaque);
+return 0;
+}
+
+static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+uint64_t data, data_size;
+
+data = qemu_get_be64(f);
+while (data != VFIO_MIG_FLAG_END_OF_STATE) {
+
+trace_vfio_load_state(vbasedev->name, data);
+
+switch (data) {
+case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
+{
+ret = vfio_load_device_config_state(f, opaque);
+if (ret) {
+return ret;
+}
+break;
+}
+case VFIO_MIG_FLAG_DEV_SETUP_STATE:
+{
+data = qemu_get_be64(f);
+if (data == VFIO_MIG_FLAG_END_OF_STATE) {
+return ret;
+} else {
+error_report("%s: SETUP STATE: EOS not found 0x%"PRIx64,
+ vbasedev->name, data);
+return -EINVAL;
+}
+break;
+}
+case VFIO_MIG_FLAG_DEV_DATA_STATE:
+{
+VFIORegion *region = >region;
+uint64_t data_offset = 0, size;


I think this function would benefit from splitting this off into a
function handling DEV_DATA_STATE. It is quite hard to follow through
all the checks and find out when we continue, and when we break off.



Each switch case has a break, we break off on success cases, where as we 
return error if we encounter any case where (ret < 0)




Some

Re: [PATCH] softfloat: Mark base int-to-float routines QEMU_FLATTEN

2020-10-18 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20201018203334.1229243-1-richard.hender...@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201018203334.1229243-1-richard.hender...@linaro.org
Subject: [PATCH] softfloat: Mark base int-to-float routines QEMU_FLATTEN

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
4ffd616 softfloat: Mark base int-to-float routines QEMU_FLATTEN

=== OUTPUT BEGIN ===
ERROR: spaces required around that '*' (ctx:WxV)
#29: FILE: fpu/softfloat.c:2779:
+int64_to_float16_scalbn(int64_t a, int scale, float_status *status)
^

ERROR: spaces required around that '*' (ctx:WxV)
#39: FILE: fpu/softfloat.c:2816:
+int64_to_float32_scalbn(int64_t a, int scale, float_status *status)
^

ERROR: spaces required around that '*' (ctx:WxV)
#49: FILE: fpu/softfloat.c:2848:
+int64_to_float64_scalbn(int64_t a, int scale, float_status *status)
^

ERROR: spaces required around that '*' (ctx:WxV)
#59: FILE: fpu/softfloat.c:2885:
+int64_to_bfloat16_scalbn(int64_t a, int scale, float_status *status)
 ^

ERROR: spaces required around that '*' (ctx:WxV)
#69: FILE: fpu/softfloat.c:2948:
+uint64_to_float16_scalbn(uint64_t a, int scale, float_status *status)
  ^

ERROR: spaces required around that '*' (ctx:WxV)
#79: FILE: fpu/softfloat.c:2985:
+uint64_to_float32_scalbn(uint64_t a, int scale, float_status *status)
  ^

ERROR: spaces required around that '*' (ctx:WxV)
#89: FILE: fpu/softfloat.c:3017:
+uint64_to_float64_scalbn(uint64_t a, int scale, float_status *status)
  ^

ERROR: spaces required around that '*' (ctx:WxV)
#99: FILE: fpu/softfloat.c:3054:
+uint64_to_bfloat16_scalbn(uint64_t a, int scale, float_status *status)
   ^

total: 8 errors, 0 warnings, 72 lines checked

Commit 4ffd616f4c5d (softfloat: Mark base int-to-float routines QEMU_FLATTEN) 
has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201018203334.1229243-1-richard.hender...@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Bug 1892081] Re: Performance improvement when using "QEMU_FLATTEN" with softfloat type conversions

2020-10-18 Thread Richard Henderson

Confirmed, although "65% decrease" is on 0.44% of the total
execution for this test case, so the decrease isn't actually
noticeable.

Nevertheless, it's a simple enough change.

** Changed in: qemu
   Status: New => In Progress

** Changed in: qemu
 Assignee: (unassigned) => Richard Henderson (rth)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1892081

Title:
  Performance improvement when using "QEMU_FLATTEN" with softfloat type
  conversions

Status in QEMU:
  In Progress

Bug description:
  Attached below is a matrix multiplication program for double data
  types. The program performs the casting operation "(double)rand()"
  when generating random numbers.

  This operation calls the integer to float softfloat conversion
  function "int32_to_float_64".

  Adding the "QEMU_FLATTEN" attribute to the function definition
  decreases the instructions per call of the function by about 63%.

  Attached are before and after performance screenshots from
  KCachegrind.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1892081/+subscriptions

[PATCH v3 7/9] hw/arm/raspi: Add the Raspberry Pi A+ machine

2020-10-18 Thread Philippe Mathieu-Daudé

The Pi A is almost the first machine released.
It uses a BCM2835 SoC which includes a ARMv6Z core.

Example booting the machine using content from [*]
(we use the device tree from the B model):

  $ qemu-system-arm -M raspi1ap -serial stdio \
  -kernel raspberrypi/firmware/boot/kernel.img \
  -dtb raspberrypi/firmware/boot/bcm2708-rpi-b-plus.dtb \
  -append 'earlycon=pl011,0x20201000 console=ttyAMA0'
  [0.00] Booting Linux on physical CPU 0x0
  [0.00] Linux version 4.19.118+ (dom@buildbot) (gcc version 4.9.3 
(crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1311 Mon Apr 27 14:16:15 BST 
2020
  [0.00] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), 
cr=00c5387d
  [0.00] CPU: VIPT aliasing data cache, unknown instruction cache
  [0.00] OF: fdt: Machine model: Raspberry Pi Model B+
  ...

[*] 
http://archive.raspberrypi.org/debian/pool/main/r/raspberrypi-firmware/raspberrypi-kernel_1.20200512-2_armhf.deb

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 30fafa59ecb..91a59d1d489 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -319,6 +319,15 @@ static void raspi_machine_class_common_init(MachineClass 
*mc,
 mc->default_ram_id = "ram";
 };
 
+static void raspi1ap_machine_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+RaspiMachineClass *rmc = RASPI_MACHINE_CLASS(oc);
+
+rmc->board_rev = 0x900021;
+raspi_machine_class_common_init(mc, rmc->board_rev);
+};
+
 static void raspi2b_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -343,6 +352,10 @@ static void raspi3b_machine_class_init(ObjectClass *oc, 
void *data)
 
 static const TypeInfo raspi_machine_types[] = {
 {
+.name   = MACHINE_TYPE_NAME("raspi1ap"),
+.parent = TYPE_RASPI_MACHINE,
+.class_init = raspi1ap_machine_class_init,
+}, {
 .name   = MACHINE_TYPE_NAME("raspi2b"),
 .parent = TYPE_RASPI_MACHINE,
 .class_init = raspi2b_machine_class_init,
-- 
2.26.2

[PATCH v3 9/9] hw/arm/raspi: Add the Raspberry Pi 3 model A+

2020-10-18 Thread Philippe Mathieu-Daudé

The Pi 3A+ is a stripped down version of the 3B:
- 512 MiB of RAM instead of 1 GiB
- no on-board ethernet chipset

Add it as it is a closer match to what we model.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 1510ca01afe..4ea200572ea 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -348,6 +348,15 @@ static void raspi2b_machine_class_init(ObjectClass *oc, 
void *data)
 };
 
 #ifdef TARGET_AARCH64
+static void raspi3ap_machine_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+RaspiMachineClass *rmc = RASPI_MACHINE_CLASS(oc);
+
+rmc->board_rev = 0x9020e0;
+raspi_machine_class_common_init(mc, rmc->board_rev);
+};
+
 static void raspi3b_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -373,6 +382,10 @@ static const TypeInfo raspi_machine_types[] = {
 .parent = TYPE_RASPI_MACHINE,
 .class_init = raspi2b_machine_class_init,
 #ifdef TARGET_AARCH64
+}, {
+.name   = MACHINE_TYPE_NAME("raspi3ap"),
+.parent = TYPE_RASPI_MACHINE,
+.class_init = raspi3ap_machine_class_init,
 }, {
 .name   = MACHINE_TYPE_NAME("raspi3b"),
 .parent = TYPE_RASPI_MACHINE,
-- 
2.26.2

[PATCH v3 8/9] hw/arm/raspi: Add the Raspberry Pi Zero machine

2020-10-18 Thread Philippe Mathieu-Daudé

Similarly to the Pi A, the Pi Zero uses a BCM2835 SoC (ARMv6Z core).

Example booting the machine using content from [*]:

  $ qemu-system-arm -M raspi0 -serial stdio \
  -kernel raspberrypi/firmware/boot/kernel.img \
  -dtb raspberrypi/firmware/boot/bcm2708-rpi-zero.dtb \
  -append 'printk.time=0 earlycon=pl011,0x20201000 console=ttyAMA0'
  [0.00] Booting Linux on physical CPU 0x0
  [0.00] Linux version 4.19.118+ (dom@buildbot) (gcc version 4.9.3 
(crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1311 Mon Apr 27 14:16:15 BST 
2020
  [0.00] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), 
cr=00c5387d
  [0.00] CPU: VIPT aliasing data cache, unknown instruction cache
  [0.00] OF: fdt: Machine model: Raspberry Pi Zero
  ...

[*] 
http://archive.raspberrypi.org/debian/pool/main/r/raspberrypi-firmware/raspberrypi-kernel_1.20200512-2_armhf.deb

Reviewed-by: Luc Michel 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/raspi.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 91a59d1d489..1510ca01afe 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -319,6 +319,15 @@ static void raspi_machine_class_common_init(MachineClass 
*mc,
 mc->default_ram_id = "ram";
 };
 
+static void raspi0_machine_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+RaspiMachineClass *rmc = RASPI_MACHINE_CLASS(oc);
+
+rmc->board_rev = 0x900092;
+raspi_machine_class_common_init(mc, rmc->board_rev);
+};
+
 static void raspi1ap_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -352,6 +361,10 @@ static void raspi3b_machine_class_init(ObjectClass *oc, 
void *data)
 
 static const TypeInfo raspi_machine_types[] = {
 {
+.name   = MACHINE_TYPE_NAME("raspi0"),
+.parent = TYPE_RASPI_MACHINE,
+.class_init = raspi0_machine_class_init,
+}, {
 .name   = MACHINE_TYPE_NAME("raspi1ap"),
 .parent = TYPE_RASPI_MACHINE,
 .class_init = raspi1ap_machine_class_init,
-- 
2.26.2

[PATCH v3 6/9] hw/arm/bcm2836: Introduce the BCM2835 SoC

2020-10-18 Thread Philippe Mathieu-Daudé

Reviewed-by: Luc Michel 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/arm/bcm2836.h |  1 +
 hw/arm/bcm2836.c | 34 ++
 hw/arm/raspi.c   |  2 ++
 3 files changed, 37 insertions(+)

diff --git a/include/hw/arm/bcm2836.h b/include/hw/arm/bcm2836.h
index 43e9f8cd0ef..6f90cabfa3a 100644
--- a/include/hw/arm/bcm2836.h
+++ b/include/hw/arm/bcm2836.h
@@ -26,6 +26,7 @@ OBJECT_DECLARE_TYPE(BCM283XState, BCM283XClass, BCM283X)
  * them, code using these devices should always handle them via the
  * BCM283x base class, so they have no BCM2836(obj) etc macros.
  */
+#define TYPE_BCM2835 "bcm2835"
 #define TYPE_BCM2836 "bcm2836"
 #define TYPE_BCM2837 "bcm2837"
 
diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index 7d975cf2f53..de7ade2878e 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -89,6 +89,25 @@ static bool bcm283x_common_realize(DeviceState *dev, Error 
**errp)
 return true;
 }
 
+static void bcm2835_realize(DeviceState *dev, Error **errp)
+{
+BCM283XState *s = BCM283X(dev);
+
+if (!bcm283x_common_realize(dev, errp)) {
+return;
+}
+
+if (!qdev_realize(DEVICE(>cpu[0].core), NULL, errp)) {
+return;
+}
+
+/* Connect irq/fiq outputs from the interrupt controller. */
+sysbus_connect_irq(SYS_BUS_DEVICE(>peripherals), 0,
+qdev_get_gpio_in(DEVICE(>cpu[0].core), ARM_CPU_IRQ));
+sysbus_connect_irq(SYS_BUS_DEVICE(>peripherals), 1,
+qdev_get_gpio_in(DEVICE(>cpu[0].core), ARM_CPU_FIQ));
+}
+
 static void bcm2836_realize(DeviceState *dev, Error **errp)
 {
 BCM283XState *s = BCM283X(dev);
@@ -159,6 +178,17 @@ static void bcm283x_class_init(ObjectClass *oc, void *data)
 dc->user_creatable = false;
 }
 
+static void bcm2835_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+BCM283XClass *bc = BCM283X_CLASS(oc);
+
+bc->cpu_type = ARM_CPU_TYPE_NAME("arm1176");
+bc->core_count = 1;
+bc->peri_base = 0x2000;
+dc->realize = bcm2835_realize;
+};
+
 static void bcm2836_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
@@ -189,6 +219,10 @@ static void bcm2837_class_init(ObjectClass *oc, void *data)
 
 static const TypeInfo bcm283x_types[] = {
 {
+.name   = TYPE_BCM2835,
+.parent = TYPE_BCM283X,
+.class_init = bcm2835_class_init,
+}, {
 .name   = TYPE_BCM2836,
 .parent = TYPE_BCM283X,
 .class_init = bcm2836_class_init,
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index b5b30f0f38f..30fafa59ecb 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -70,6 +70,7 @@ FIELD(REV_CODE, MEMORY_SIZE,   20, 3);
 FIELD(REV_CODE, STYLE, 23, 1);
 
 typedef enum RaspiProcessorId {
+PROCESSOR_ID_BCM2835 = 0,
 PROCESSOR_ID_BCM2836 = 1,
 PROCESSOR_ID_BCM2837 = 2,
 } RaspiProcessorId;
@@ -78,6 +79,7 @@ static const struct {
 const char *type;
 int cores_count;
 } soc_property[] = {
+[PROCESSOR_ID_BCM2835] = {TYPE_BCM2835, 1},
 [PROCESSOR_ID_BCM2836] = {TYPE_BCM2836, BCM283X_NCPUS},
 [PROCESSOR_ID_BCM2837] = {TYPE_BCM2837, BCM283X_NCPUS},
 };
-- 
2.26.2

[PATCH v3 3/9] hw/arm/bcm2836: Introduce BCM283XClass::core_count

2020-10-18 Thread Philippe Mathieu-Daudé

The BCM2835 has only one core. Introduce the core_count field to
be able to use values different than BCM283X_NCPUS (4).

Reviewed-by: Luc Michel 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/bcm2836.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index 8f921d8e904..c5d46a8e805 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -23,6 +23,7 @@ typedef struct BCM283XClass {
 /*< public >*/
 const char *name;
 const char *cpu_type;
+unsigned core_count;
 hwaddr peri_base; /* Peripheral base address seen by the CPU */
 hwaddr ctrl_base; /* Interrupt controller and mailboxes etc. */
 int clusterid;
@@ -39,7 +40,7 @@ static void bcm2836_init(Object *obj)
 BCM283XClass *bc = BCM283X_GET_CLASS(obj);
 int n;
 
-for (n = 0; n < BCM283X_NCPUS; n++) {
+for (n = 0; n < bc->core_count; n++) {
 object_initialize_child(obj, "cpu[*]", >cpu[n].core,
 bc->cpu_type);
 }
@@ -149,6 +150,7 @@ static void bcm2836_class_init(ObjectClass *oc, void *data)
 BCM283XClass *bc = BCM283X_CLASS(oc);
 
 bc->cpu_type = ARM_CPU_TYPE_NAME("cortex-a7");
+bc->core_count = BCM283X_NCPUS;
 bc->peri_base = 0x3f00;
 bc->ctrl_base = 0x4000;
 bc->clusterid = 0xf;
@@ -163,6 +165,7 @@ static void bcm2837_class_init(ObjectClass *oc, void *data)
 BCM283XClass *bc = BCM283X_CLASS(oc);
 
 bc->cpu_type = ARM_CPU_TYPE_NAME("cortex-a53");
+bc->core_count = BCM283X_NCPUS;
 bc->peri_base = 0x3f00;
 bc->ctrl_base = 0x4000;
 bc->clusterid = 0x0;
-- 
2.26.2

[PATCH v3 5/9] hw/arm/bcm2836: Split out common realize() code

2020-10-18 Thread Philippe Mathieu-Daudé

The realize() function is clearly composed of two parts,
each described by a comment:

  void realize()
  {
 /* common peripherals from bcm2835 */
 ...
 /* bcm2836 interrupt controller (and mailboxes, etc.) */
 ...
   }

Split the two part, so we can reuse the common part with other
SoCs from this family.

Reviewed-by: Luc Michel 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/bcm2836.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index fcb2c9c3e73..7d975cf2f53 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -52,7 +52,10 @@ static void bcm2836_init(Object *obj)
 qdev_prop_set_uint32(DEVICE(obj), "enabled-cpus", bc->core_count);
 }
 
-object_initialize_child(obj, "control", >control, TYPE_BCM2836_CONTROL);
+if (bc->ctrl_base) {
+object_initialize_child(obj, "control", >control,
+TYPE_BCM2836_CONTROL);
+}
 
 object_initialize_child(obj, "peripherals", >peripherals,
 TYPE_BCM2835_PERIPHERALS);
@@ -62,12 +65,11 @@ static void bcm2836_init(Object *obj)
   "vcram-size");
 }
 
-static void bcm2836_realize(DeviceState *dev, Error **errp)
+static bool bcm283x_common_realize(DeviceState *dev, Error **errp)
 {
 BCM283XState *s = BCM283X(dev);
 BCM283XClass *bc = BCM283X_GET_CLASS(dev);
 Object *obj;
-int n;
 
 /* common peripherals from bcm2835 */
 
@@ -76,7 +78,7 @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
 object_property_add_const_link(OBJECT(>peripherals), "ram", obj);
 
 if (!sysbus_realize(SYS_BUS_DEVICE(>peripherals), errp)) {
-return;
+return false;
 }
 
 object_property_add_alias(OBJECT(s), "sd-bus", OBJECT(>peripherals),
@@ -84,6 +86,18 @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
 
 sysbus_mmio_map_overlap(SYS_BUS_DEVICE(>peripherals), 0,
 bc->peri_base, 1);
+return true;
+}
+
+static void bcm2836_realize(DeviceState *dev, Error **errp)
+{
+BCM283XState *s = BCM283X(dev);
+BCM283XClass *bc = BCM283X_GET_CLASS(dev);
+int n;
+
+if (!bcm283x_common_realize(dev, errp)) {
+return;
+}
 
 /* bcm2836 interrupt controller (and mailboxes, etc.) */
 if (!sysbus_realize(SYS_BUS_DEVICE(>control), errp)) {
-- 
2.26.2

[PATCH v3 0/9] hw/arm: Add raspi Zero, 1A+ and 3A+ machines

2020-10-18 Thread Philippe Mathieu-Daudé

Add the raspi0/1/3A+ machines.

Missing review: #7 and #9

Since v2:
- Rebased
- Addressed Igor comment
- Added Luc R-b
- Added model 3A+

Since v1:
- Use more specific machine names

Based-on: <20201010135759.437903-1-...@lmichel.fr>
Supersedes: <20200217114533.17779-1-f4...@amsat.org>

Philippe Mathieu-Daudé (9):
  hw/arm/bcm2836: Restrict BCM283XInfo declaration to C source
  hw/arm/bcm2836: QOM'ify more by adding class_init() to each SoC type
  hw/arm/bcm2836: Introduce BCM283XClass::core_count
  hw/arm/bcm2836: Only provide "enabled-cpus" property to multicore SoCs
  hw/arm/bcm2836: Split out common realize() code
  hw/arm/bcm2836: Introduce the BCM2835 SoC
  hw/arm/raspi: Add the Raspberry Pi A+ machine
  hw/arm/raspi: Add the Raspberry Pi Zero machine
  hw/arm/raspi: Add the Raspberry Pi 3 model A+

 include/hw/arm/bcm2836.h |   9 +-
 hw/arm/bcm2836.c | 182 ++-
 hw/arm/raspi.c   |  41 +
 3 files changed, 162 insertions(+), 70 deletions(-)

-- 
2.26.2

[PATCH v3 4/9] hw/arm/bcm2836: Only provide "enabled-cpus" property to multicore SoCs

2020-10-18 Thread Philippe Mathieu-Daudé

It makes no sense to set enabled-cpus=0 on single core SoCs.

Reviewed-by: Luc Michel 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/bcm2836.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index c5d46a8e805..fcb2c9c3e73 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -34,6 +34,9 @@ typedef struct BCM283XClass {
 #define BCM283X_GET_CLASS(obj) \
 OBJECT_GET_CLASS(BCM283XClass, (obj), TYPE_BCM283X)
 
+static Property bcm2836_enabled_cores_property =
+DEFINE_PROP_UINT32("enabled-cpus", BCM283XState, enabled_cpus, 0);
+
 static void bcm2836_init(Object *obj)
 {
 BCM283XState *s = BCM283X(obj);
@@ -44,6 +47,10 @@ static void bcm2836_init(Object *obj)
 object_initialize_child(obj, "cpu[*]", >cpu[n].core,
 bc->cpu_type);
 }
+if (bc->core_count > 1) {
+qdev_property_add_static(DEVICE(obj), _enabled_cores_property);
+qdev_prop_set_uint32(DEVICE(obj), "enabled-cpus", bc->core_count);
+}
 
 object_initialize_child(obj, "control", >control, TYPE_BCM2836_CONTROL);
 
@@ -130,12 +137,6 @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
 }
 }
 
-static Property bcm2836_props[] = {
-DEFINE_PROP_UINT32("enabled-cpus", BCM283XState, enabled_cpus,
-   BCM283X_NCPUS),
-DEFINE_PROP_END_OF_LIST()
-};
-
 static void bcm283x_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
@@ -155,7 +156,6 @@ static void bcm2836_class_init(ObjectClass *oc, void *data)
 bc->ctrl_base = 0x4000;
 bc->clusterid = 0xf;
 dc->realize = bcm2836_realize;
-device_class_set_props(dc, bcm2836_props);
 };
 
 #ifdef TARGET_AARCH64
@@ -170,7 +170,6 @@ static void bcm2837_class_init(ObjectClass *oc, void *data)
 bc->ctrl_base = 0x4000;
 bc->clusterid = 0x0;
 dc->realize = bcm2836_realize;
-device_class_set_props(dc, bcm2836_props);
 };
 #endif
 
-- 
2.26.2

[PATCH v3 2/9] hw/arm/bcm2836: QOM'ify more by adding class_init() to each SoC type

2020-10-18 Thread Philippe Mathieu-Daudé

Remove usage of TypeInfo::class_data. Instead fill the fields in
the corresponding class_init().

So far all children use the same values for almost all fields,
but we are going to add the BCM2711/BCM2838 SoC for the raspi4
machine which use different fields.

Reviewed-by: Igor Mammedov 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/bcm2836.c | 108 ++-
 1 file changed, 51 insertions(+), 57 deletions(-)

diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index e7cc2c930d9..8f921d8e904 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -17,57 +17,31 @@
 #include "hw/arm/raspi_platform.h"
 #include "hw/sysbus.h"
 
-typedef struct BCM283XInfo BCM283XInfo;
-
 typedef struct BCM283XClass {
 /*< private >*/
 DeviceClass parent_class;
 /*< public >*/
-const BCM283XInfo *info;
-} BCM283XClass;
-
-struct BCM283XInfo {
 const char *name;
 const char *cpu_type;
 hwaddr peri_base; /* Peripheral base address seen by the CPU */
 hwaddr ctrl_base; /* Interrupt controller and mailboxes etc. */
 int clusterid;
-};
+} BCM283XClass;
 
 #define BCM283X_CLASS(klass) \
 OBJECT_CLASS_CHECK(BCM283XClass, (klass), TYPE_BCM283X)
 #define BCM283X_GET_CLASS(obj) \
 OBJECT_GET_CLASS(BCM283XClass, (obj), TYPE_BCM283X)
 
-static const BCM283XInfo bcm283x_socs[] = {
-{
-.name = TYPE_BCM2836,
-.cpu_type = ARM_CPU_TYPE_NAME("cortex-a7"),
-.peri_base = 0x3f00,
-.ctrl_base = 0x4000,
-.clusterid = 0xf,
-},
-#ifdef TARGET_AARCH64
-{
-.name = TYPE_BCM2837,
-.cpu_type = ARM_CPU_TYPE_NAME("cortex-a53"),
-.peri_base = 0x3f00,
-.ctrl_base = 0x4000,
-.clusterid = 0x0,
-},
-#endif
-};
-
 static void bcm2836_init(Object *obj)
 {
 BCM283XState *s = BCM283X(obj);
 BCM283XClass *bc = BCM283X_GET_CLASS(obj);
-const BCM283XInfo *info = bc->info;
 int n;
 
 for (n = 0; n < BCM283X_NCPUS; n++) {
 object_initialize_child(obj, "cpu[*]", >cpu[n].core,
-info->cpu_type);
+bc->cpu_type);
 }
 
 object_initialize_child(obj, "control", >control, TYPE_BCM2836_CONTROL);
@@ -84,7 +58,6 @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
 {
 BCM283XState *s = BCM283X(dev);
 BCM283XClass *bc = BCM283X_GET_CLASS(dev);
-const BCM283XInfo *info = bc->info;
 Object *obj;
 int n;
 
@@ -102,14 +75,14 @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
   "sd-bus");
 
 sysbus_mmio_map_overlap(SYS_BUS_DEVICE(>peripherals), 0,
-info->peri_base, 1);
+bc->peri_base, 1);
 
 /* bcm2836 interrupt controller (and mailboxes, etc.) */
 if (!sysbus_realize(SYS_BUS_DEVICE(>control), errp)) {
 return;
 }
 
-sysbus_mmio_map(SYS_BUS_DEVICE(>control), 0, info->ctrl_base);
+sysbus_mmio_map(SYS_BUS_DEVICE(>control), 0, bc->ctrl_base);
 
 sysbus_connect_irq(SYS_BUS_DEVICE(>peripherals), 0,
 qdev_get_gpio_in_named(DEVICE(>control), "gpu-irq", 0));
@@ -118,11 +91,11 @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
 
 for (n = 0; n < BCM283X_NCPUS; n++) {
 /* TODO: this should be converted to a property of ARM_CPU */
-s->cpu[n].core.mp_affinity = (info->clusterid << 8) | n;
+s->cpu[n].core.mp_affinity = (bc->clusterid << 8) | n;
 
 /* set periphbase/CBAR value for CPU-local registers */
 if (!object_property_set_int(OBJECT(>cpu[n].core), "reset-cbar",
- info->peri_base, errp)) {
+ bc->peri_base, errp)) {
 return;
 }
 
@@ -165,38 +138,59 @@ static Property bcm2836_props[] = {
 static void bcm283x_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
-BCM283XClass *bc = BCM283X_CLASS(oc);
 
-bc->info = data;
-dc->realize = bcm2836_realize;
-device_class_set_props(dc, bcm2836_props);
 /* Reason: Must be wired up in code (see raspi_init() function) */
 dc->user_creatable = false;
 }
 
-static const TypeInfo bcm283x_type_info = {
-.name = TYPE_BCM283X,
-.parent = TYPE_DEVICE,
-.instance_size = sizeof(BCM283XState),
-.instance_init = bcm2836_init,
-.class_size = sizeof(BCM283XClass),
-.abstract = true,
+static void bcm2836_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+BCM283XClass *bc = BCM283X_CLASS(oc);
+
+bc->cpu_type = ARM_CPU_TYPE_NAME("cortex-a7");
+bc->peri_base = 0x3f00;
+bc->ctrl_base = 0x4000;
+bc->clusterid = 0xf;
+dc->realize = bcm2836_realize;
+device_class_set_props(dc, bcm2836_props);
 };
 
-static void bcm2836_register_types(void)
+#ifdef TARGET_AARCH64
+static void bcm2837_class_init(ObjectClass *oc, void *data)
 {
-int

[PATCH v3 1/9] hw/arm/bcm2836: Restrict BCM283XInfo declaration to C source

2020-10-18 Thread Philippe Mathieu-Daudé

No code out of bcm2836.c uses (or requires) the BCM283XInfo
declarations. Move it locally to the C source file.

Reviewed-by: Luc Michel 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/arm/bcm2836.h |  8 
 hw/arm/bcm2836.c | 14 ++
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/hw/arm/bcm2836.h b/include/hw/arm/bcm2836.h
index 428c15d316e..43e9f8cd0ef 100644
--- a/include/hw/arm/bcm2836.h
+++ b/include/hw/arm/bcm2836.h
@@ -43,12 +43,4 @@ struct BCM283XState {
 BCM2835PeripheralState peripherals;
 };
 
-typedef struct BCM283XInfo BCM283XInfo;
-
-struct BCM283XClass {
-DeviceClass parent_class;
-const BCM283XInfo *info;
-};
-
-
 #endif /* BCM2836_H */
diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index f15cc3b4053..e7cc2c930d9 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -17,6 +17,15 @@
 #include "hw/arm/raspi_platform.h"
 #include "hw/sysbus.h"
 
+typedef struct BCM283XInfo BCM283XInfo;
+
+typedef struct BCM283XClass {
+/*< private >*/
+DeviceClass parent_class;
+/*< public >*/
+const BCM283XInfo *info;
+} BCM283XClass;
+
 struct BCM283XInfo {
 const char *name;
 const char *cpu_type;
@@ -25,6 +34,11 @@ struct BCM283XInfo {
 int clusterid;
 };
 
+#define BCM283X_CLASS(klass) \
+OBJECT_CLASS_CHECK(BCM283XClass, (klass), TYPE_BCM283X)
+#define BCM283X_GET_CLASS(obj) \
+OBJECT_GET_CLASS(BCM283XClass, (obj), TYPE_BCM283X)
+
 static const BCM283XInfo bcm283x_socs[] = {
 {
 .name = TYPE_BCM2836,
-- 
2.26.2

[PATCH] softfloat: Mark base int-to-float routines QEMU_FLATTEN

2020-10-18 Thread Richard Henderson

This merges the int_to_float routine and the round_pack_canonical
routine into the same function, allowing the FloatParts structure
to be decomposed by the compiler.

This results in a 60-75% speedup of the flattened function.

Leave the narrower integer inputs to tail-call the int64_t version.

Buglink: https://bugs.launchpad.net/qemu/+bug/1892081
Signed-off-by: Richard Henderson 
---
 fpu/softfloat.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 7b6aee9323..2cbcf5bf10 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2794,7 +2794,8 @@ static FloatParts int_to_float(int64_t a, int scale, 
float_status *status)
 return r;
 }
 
-float16 int64_to_float16_scalbn(int64_t a, int scale, float_status *status)
+float16 QEMU_FLATTEN
+int64_to_float16_scalbn(int64_t a, int scale, float_status *status)
 {
 FloatParts pa = int_to_float(a, scale, status);
 return float16_round_pack_canonical(pa, status);
@@ -2830,7 +2831,8 @@ float16 int8_to_float16(int8_t a, float_status *status)
 return int64_to_float16_scalbn(a, 0, status);
 }
 
-float32 int64_to_float32_scalbn(int64_t a, int scale, float_status *status)
+float32 QEMU_FLATTEN
+int64_to_float32_scalbn(int64_t a, int scale, float_status *status)
 {
 FloatParts pa = int_to_float(a, scale, status);
 return float32_round_pack_canonical(pa, status);
@@ -2861,7 +2863,8 @@ float32 int16_to_float32(int16_t a, float_status *status)
 return int64_to_float32_scalbn(a, 0, status);
 }
 
-float64 int64_to_float64_scalbn(int64_t a, int scale, float_status *status)
+float64 QEMU_FLATTEN
+int64_to_float64_scalbn(int64_t a, int scale, float_status *status)
 {
 FloatParts pa = int_to_float(a, scale, status);
 return float64_round_pack_canonical(pa, status);
@@ -2897,7 +2900,8 @@ float64 int16_to_float64(int16_t a, float_status *status)
  * to the bfloat16 format.
  */
 
-bfloat16 int64_to_bfloat16_scalbn(int64_t a, int scale, float_status *status)
+bfloat16 QEMU_FLATTEN
+int64_to_bfloat16_scalbn(int64_t a, int scale, float_status *status)
 {
 FloatParts pa = int_to_float(a, scale, status);
 return bfloat16_round_pack_canonical(pa, status);
@@ -2959,7 +2963,8 @@ static FloatParts uint_to_float(uint64_t a, int scale, 
float_status *status)
 return r;
 }
 
-float16 uint64_to_float16_scalbn(uint64_t a, int scale, float_status *status)
+float16 QEMU_FLATTEN
+uint64_to_float16_scalbn(uint64_t a, int scale, float_status *status)
 {
 FloatParts pa = uint_to_float(a, scale, status);
 return float16_round_pack_canonical(pa, status);
@@ -2995,7 +3000,8 @@ float16 uint8_to_float16(uint8_t a, float_status *status)
 return uint64_to_float16_scalbn(a, 0, status);
 }
 
-float32 uint64_to_float32_scalbn(uint64_t a, int scale, float_status *status)
+float32 QEMU_FLATTEN
+uint64_to_float32_scalbn(uint64_t a, int scale, float_status *status)
 {
 FloatParts pa = uint_to_float(a, scale, status);
 return float32_round_pack_canonical(pa, status);
@@ -3026,7 +3032,8 @@ float32 uint16_to_float32(uint16_t a, float_status 
*status)
 return uint64_to_float32_scalbn(a, 0, status);
 }
 
-float64 uint64_to_float64_scalbn(uint64_t a, int scale, float_status *status)
+float64 QEMU_FLATTEN
+uint64_to_float64_scalbn(uint64_t a, int scale, float_status *status)
 {
 FloatParts pa = uint_to_float(a, scale, status);
 return float64_round_pack_canonical(pa, status);
@@ -3062,7 +3069,8 @@ float64 uint16_to_float64(uint16_t a, float_status 
*status)
  * bfloat16 format.
  */
 
-bfloat16 uint64_to_bfloat16_scalbn(uint64_t a, int scale, float_status *status)
+bfloat16 QEMU_FLATTEN
+uint64_to_bfloat16_scalbn(uint64_t a, int scale, float_status *status)
 {
 FloatParts pa = uint_to_float(a, scale, status);
 return bfloat16_round_pack_canonical(pa, status);
-- 
2.25.1

Re: [PATCH v3] util/oslib-win32: Use _aligned_malloc for qemu_try_memalign

2020-10-18 Thread Richard Henderson

On 10/18/20 11:34 AM, Philippe Mathieu-Daudé wrote:
>> +    g_assert(size != 0);
> 
> "The alignment value, which must be an integer power of 2.",
> so maybe:
> 
>    g_assert(size != 0 && is_power_of_2(alignment));

This is also true of posix_memalign.  If we are going to add this, we should
also assert in the other qemu_try_memalign.

r~

Re: [PATCH v3] util/oslib-win32: Use _aligned_malloc for qemu_try_memalign

2020-10-18 Thread Philippe Mathieu-Daudé


On 10/18/20 6:48 PM, Richard Henderson wrote:

We do not need or want to be allocating page sized quanta.

Signed-off-by: Richard Henderson 
---
v3: Include ; use g_assert not assert.
---
  util/oslib-win32.c | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index e99debfb8d..29dd05d59d 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -39,6 +39,7 @@
  #include "trace.h"
  #include "qemu/sockets.h"
  #include "qemu/cutils.h"
+#include 
  
  /* this must come after including "trace.h" */

  #include 
@@ -56,10 +57,8 @@ void *qemu_try_memalign(size_t alignment, size_t size)
  {
  void *ptr;
  
-if (!size) {

-abort();
-}
-ptr = VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE);
+g_assert(size != 0);


"The alignment value, which must be an integer power of 2.",
so maybe:

   g_assert(size != 0 && is_power_of_2(alignment));

Regardless:
Reviewed-by: Philippe Mathieu-Daudé 


+ptr = _aligned_malloc(alignment, size);
  trace_qemu_memalign(alignment, size, ptr);
  return ptr;
  }
@@ -93,9 +92,7 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool 
shared)
  void qemu_vfree(void *ptr)
  {
  trace_qemu_vfree(ptr);
-if (ptr) {
-VirtualFree(ptr, 0, MEM_RELEASE);
-}
+_aligned_free(ptr);
  }
  
  void qemu_anon_ram_free(void *ptr, size_t size)

Re: [PATCH v3 0/3] target/arm: Implement an IMPDEF pauth algorithm

2020-10-18 Thread Richard Henderson

Ping.

On 8/14/20 2:39 PM, Richard Henderson wrote:
> The architected pauth algorithm is quite slow without
> hardware support, and boot times for kernels that enable
> use of the feature have been significantly impacted.
> 
> Version 1 blurb at
>   https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg02172.html
> which contains larger study of the tradeoffs.
> 
> Version 2 changes:
>   * Use boolean properties, for qmp_query_cpu_model_expansion (drjones).
>   * Move XXH64 implementation to xxhash.h (ajb).
>   * Include a small cleanup to parsing the "sve" property
> that I noticed along the way.
> 
> Version 3 changes:
>   * Swap order of patches (drjones).
>   * Add properties test case (drjones).
> 
> 
> r~
> 
> Richard Henderson (3):
>   target/arm: Implement an IMPDEF pauth algorithm
>   target/arm: Add cpu properties to control pauth
>   target/arm: Use object_property_add_bool for "sve" property
> 
>  include/qemu/xxhash.h  | 82 ++
>  target/arm/cpu.h   | 25 +--
>  target/arm/cpu.c   | 13 ++
>  target/arm/cpu64.c | 64 ++
>  target/arm/monitor.c   |  1 +
>  target/arm/pauth_helper.c  | 41 ++---
>  tests/qtest/arm-cpu-features.c | 13 ++
>  7 files changed, 212 insertions(+), 27 deletions(-)
>

Re: [PATCH v26 08/17] vfio: Add save state functions to SaveVMHandlers

2020-10-18 Thread Kirti Wankhede





On 9/26/2020 2:32 AM, Alex Williamson wrote:

On Wed, 23 Sep 2020 04:54:10 +0530
Kirti Wankhede  wrote:


Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy
functions. These functions handles pre-copy and stop-and-copy phase.

In _SAVING|_RUNNING device state or pre-copy phase:
- read pending_bytes. If pending_bytes > 0, go through below steps.
- read data_offset - indicates kernel driver to write data to staging
   buffer.
- read data_size - amount of data in bytes written by vendor driver in
   migration region.
- read data_size bytes of data from data_offset in the migration region.
- Write data packet to file stream as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
VFIO_MIG_FLAG_END_OF_STATE }

In _SAVING device state or stop-and-copy phase
a. read config space of device and save to migration file stream. This
doesn't need to be from vendor driver. Any other special config state
from driver can be saved as data in following iteration.
b. read pending_bytes. If pending_bytes > 0, go through below steps.
c. read data_offset - indicates kernel driver to write data to staging
buffer.
d. read data_size - amount of data in bytes written by vendor driver in
migration region.
e. read data_size bytes of data from data_offset in the migration region.
f. Write data packet as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
g. iterate through steps b to f while (pending_bytes > 0)
h. Write {VFIO_MIG_FLAG_END_OF_STATE}

When data region is mapped, its user's responsibility to read data from
data_offset of data_size before moving to next steps.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/migration.c   | 273 ++
  hw/vfio/trace-events  |   6 +
  include/hw/vfio/vfio-common.h |   1 +
  3 files changed, 280 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 8e8adaa25779..4611bb972228 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -180,6 +180,154 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
  return 0;
  }
  
+static void *get_data_section_size(VFIORegion *region, uint64_t data_offset,

+   uint64_t data_size, uint64_t *size)
+{
+void *ptr = NULL;
+uint64_t limit = 0;
+int i;
+
+if (!region->mmaps) {
+if (size) {
+*size = data_size;
+}
+return ptr;
+}
+
+for (i = 0; i < region->nr_mmaps; i++) {
+VFIOMmap *map = region->mmaps + i;
+
+if ((data_offset >= map->offset) &&
+(data_offset < map->offset + map->size)) {
+
+/* check if data_offset is within sparse mmap areas */
+ptr = map->mmap + data_offset - map->offset;
+if (size) {
+*size = MIN(data_size, map->offset + map->size - data_offset);
+}
+break;
+} else if ((data_offset < map->offset) &&
+   (!limit || limit > map->offset)) {
+/*
+ * data_offset is not within sparse mmap areas, find size of
+ * non-mapped area. Check through all list since region->mmaps list
+ * is not sorted.
+ */
+limit = map->offset;
+}
+}
+
+if (!ptr && size) {
+*size = limit ? limit - data_offset : data_size;


'limit - data_offset' doesn't take data_size into account, this should
be MIN(data_size, limit - data_offset).



Done.


+}
+return ptr;
+}
+
+static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIORegion *region = >region;
+uint64_t data_offset = 0, data_size = 0, sz;
+int ret;
+
+ret = vfio_mig_read(vbasedev, _offset, sizeof(data_offset),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+ data_offset));
+if (ret < 0) {
+return ret;
+}
+
+ret = vfio_mig_read(vbasedev, _size, sizeof(data_size),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+ data_size));
+if (ret < 0) {
+return ret;
+}
+
+trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
+   migration->pending_bytes);
+
+qemu_put_be64(f, data_size);
+sz = data_size;
+
+while (sz) {
+void *buf = NULL;


Unnecessary initialization.


+uint64_t sec_size;
+bool buf_allocated = false;
+
+buf = get_data_section_size(region, data_offset, sz, _size);
+
+if (!buf) {
+buf = g_try_malloc(sec_size);
+if (!buf) {
+error_report("%s: Error allocating buffer ", __func__);
+return -ENOMEM;
+}
+buf_allocated = true;
+
+ret =

Re: [PATCH v26 05/17] vfio: Add VM state change handler to know state of VM

2020-10-18 Thread Kirti Wankhede





+vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
+vfio_vmstate_change(char *name, int running, const char *reason, uint32_t dev_state) 
" (%s) running %d reason %s device state %d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8275c4c68f45..25e3b1a3b90a 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -29,6 +29,7 @@
   #ifdef CONFIG_LINUX
   #include 
   #endif
+#include "sysemu/sysemu.h"
   
   #define VFIO_MSG_PREFIX "vfio %s: "
   
@@ -119,6 +120,9 @@ typedef struct VFIODevice {

   unsigned int flags;
   VFIOMigration *migration;
   Error *migration_blocker;
+VMChangeStateEntry *vm_state;
+uint32_t device_state;
+int vm_running;


Could these be placed in VFIOMigration?  Thanks,
  


I think device_state should be part of VFIODevice since its about device
rather than only related to migration, others can be moved to VFIOMigration.


But these are only valid when migration is supported and thus when
VFIOMigration exists.  Thanks,



Even though it is used when migration is supported, its device's attribute.

Thanks,
Kirti

Re: [PATCH v26 07/17] vfio: Register SaveVMHandlers for VFIO device

2020-10-18 Thread Kirti Wankhede





On 9/26/2020 1:50 AM, Alex Williamson wrote:

On Wed, 23 Sep 2020 04:54:09 +0530
Kirti Wankhede  wrote:


Define flags to be used as delimeter in migration file stream.
Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
region from these functions at source during saving or pre-copy phase.
Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/migration.c  | 91 
  hw/vfio/trace-events |  2 ++
  2 files changed, 93 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index f650fe9fc3c8..8e8adaa25779 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -8,12 +8,15 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
  #include 
  
  #include "sysemu/runstate.h"

  #include "hw/vfio/vfio-common.h"
  #include "cpu.h"
  #include "migration/migration.h"
+#include "migration/vmstate.h"
  #include "migration/qemu-file.h"
  #include "migration/register.h"
  #include "migration/blocker.h"
@@ -25,6 +28,17 @@
  #include "trace.h"
  #include "hw/hw.h"
  
+/*

+ * Flags used as delimiter:
+ * 0x => MSB 32-bit all 1s
+ * 0xef10 => emulated (virtual) function IO
+ * 0x => 16-bits reserved for flags
+ */
+#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
+#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
+#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
+#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
+
  static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
off_t off, bool iswrite)
  {
@@ -166,6 +180,65 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
  return 0;
  }
  
+/* -- */

+
+static int vfio_save_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret;
+
+trace_vfio_save_setup(vbasedev->name);
+
+qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
+
+if (migration->region.mmaps) {
+qemu_mutex_lock_iothread();
+ret = vfio_region_mmap(>region);
+qemu_mutex_unlock_iothread();


Please add a comment identifying why the iothread mutex lock is
necessary here.


+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.nr,


We don't support multiple migration regions, is it useful to include
the region index here?



Ok. Removing region.nr



+ strerror(-ret));
+error_report("%s: Falling back to slow path", vbasedev->name);
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
+   VFIO_DEVICE_STATE_SAVING);
+if (ret) {
+error_report("%s: Failed to set state SAVING", vbasedev->name);
+return ret;
+}


Again, doesn't match the function semantics that success only means the
device is in a non-error state, maybe the one that was asked for.



Fixed in patch 05.


+
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);


What's the overall purpose of writing these markers into the migration
stream?  vfio_load_state() doesn't do anything with this other than
validate that the end-of-state immediately follows.  Is this a
placeholder for something in the future?



Its not placeholder, it is used in vfio_load_state() to determine upto 
what point to loop to fetch data for each state, otherwise how would we 
know when to stop reading data from stream for that VFIO device.



+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+return 0;
+}
+
+static void vfio_save_cleanup(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+
+if (migration->region.mmaps) {
+vfio_region_unmap(>region);
+}
+trace_vfio_save_cleanup(vbasedev->name);
+}
+
+static SaveVMHandlers savevm_vfio_handlers = {
+.save_setup = vfio_save_setup,
+.save_cleanup = vfio_save_cleanup,
+};
+
+/* -- */
+
  static void vfio_vmstate_change(void *opaque, int running, RunState state)
  {
  VFIODevice *vbasedev = opaque;
@@ -225,6 +298,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 struct vfio_region_info *info)
  {
  int ret = -EINVAL;
+char id[256] = "";
+Object *obj;
  
  if (!vbasedev->ops->vfio_get_object) {

  return ret;
@@ -241,6 +316,22 @@ static int

Re: [PATCH v3] util/oslib-win32: Use _aligned_malloc for qemu_try_memalign

2020-10-18 Thread Stefan Weil


Am 18.10.20 um 18:48 schrieb Richard Henderson:


We do not need or want to be allocating page sized quanta.

Signed-off-by: Richard Henderson 
---
v3: Include ; use g_assert not assert.
---
  util/oslib-win32.c | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index e99debfb8d..29dd05d59d 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -39,6 +39,7 @@
  #include "trace.h"
  #include "qemu/sockets.h"
  #include "qemu/cutils.h"
+#include 
  
  /* this must come after including "trace.h" */

  #include 
@@ -56,10 +57,8 @@ void *qemu_try_memalign(size_t alignment, size_t size)
  {
  void *ptr;
  
-if (!size) {

-abort();
-}
-ptr = VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE);
+g_assert(size != 0);
+ptr = _aligned_malloc(alignment, size);
  trace_qemu_memalign(alignment, size, ptr);
  return ptr;
  }
@@ -93,9 +92,7 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool 
shared)
  void qemu_vfree(void *ptr)
  {
  trace_qemu_vfree(ptr);
-if (ptr) {
-VirtualFree(ptr, 0, MEM_RELEASE);
-}
+_aligned_free(ptr);
  }
  
  void qemu_anon_ram_free(void *ptr, size_t size)



Thanks.

Reviewed-by: Stefan Weil

[PATCH v3] util/oslib-win32: Use _aligned_malloc for qemu_try_memalign

2020-10-18 Thread Richard Henderson

We do not need or want to be allocating page sized quanta.

Signed-off-by: Richard Henderson 
---
v3: Include ; use g_assert not assert.
---
 util/oslib-win32.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index e99debfb8d..29dd05d59d 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -39,6 +39,7 @@
 #include "trace.h"
 #include "qemu/sockets.h"
 #include "qemu/cutils.h"
+#include 
 
 /* this must come after including "trace.h" */
 #include 
@@ -56,10 +57,8 @@ void *qemu_try_memalign(size_t alignment, size_t size)
 {
 void *ptr;
 
-if (!size) {
-abort();
-}
-ptr = VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE);
+g_assert(size != 0);
+ptr = _aligned_malloc(alignment, size);
 trace_qemu_memalign(alignment, size, ptr);
 return ptr;
 }
@@ -93,9 +92,7 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool 
shared)
 void qemu_vfree(void *ptr)
 {
 trace_qemu_vfree(ptr);
-if (ptr) {
-VirtualFree(ptr, 0, MEM_RELEASE);
-}
+_aligned_free(ptr);
 }
 
 void qemu_anon_ram_free(void *ptr, size_t size)
-- 
2.25.1

Re: ERROR: glib-2.48 gthread-2.0 is required to compile QEMU

2020-10-18 Thread Peter Maydell

On Sun, 18 Oct 2020 at 15:38, Lee <380121...@163.com> wrote:
> Ubuntu 14.04.6 LTS, X86_64
> I  make source code of version qemu 5.1\5.0\4.2,and foud the error:glib-2.48 
> gthread-2.0 is required to compile QEMU
> try apt-get install libglib2.0-dev,and it is sucess :
> Reading state information... Done
> libglib2.0-dev is already the newest version.

I believe that Ubuntu 14.04 shipped with libglib2.0-dev
version 2.40.2 -- this is too old, as the QEMU error
message says.

The simplest thing to do would be for you to upgrade your Ubuntu
install -- 14.04 is now very old (it reached "end of standard
support", ie no more security fixes unless you're paying
Canonical for extended security maintenance, in April 2019).

QEMU's distro support policy is documented here:
https://www.qemu.org/docs/master/system/build-platforms.html
For distros like Ubuntu with a 'long-lifetime' type release
(like LTS), we support the most recent major version, and
the previous major version up until 2 years after the next
major version was released. So we support 20.04 LTS (the
most recent) and also will support 18.04 LTS until at least
April 2022 (since 20.04 was released in April 2020), but
anything older than that is not officially supported and
may or may not work.

> i found that version qemu 4.1 is OK,the same environment

QEMU 4.1 did not require the newer version of glib. (As we
develop QEMU we want to be able to use the extra features
in newer versions of our dependent libraries and to be able
to remove backwards-compatibility code that is needed only
when using older versions of those libraries, so sometimes
when all the distros and versions we support ship with a
new enough version of a library we will increase the
minimum required version that QEMU needs to build.)

thanks
-- PMM

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-18 Thread Andy Lutomirski

On Sun, Oct 18, 2020 at 8:59 AM Michael S. Tsirkin  wrote:
>
> On Sun, Oct 18, 2020 at 08:54:36AM -0700, Andy Lutomirski wrote:
> > On Sun, Oct 18, 2020 at 8:52 AM Michael S. Tsirkin  wrote:
> > >
> > > On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> > > > 4c. The guest kernel maintains an array of physical addresses that are
> > > > MADV_WIPEONFORK. The hypervisor knows about this array and its
> > > > location through whatever protocol, and before resuming a
> > > > moved/snapshotted/duplicated VM, it takes the responsibility for
> > > > memzeroing this memory. The huge pro here would be that this
> > > > eliminates all races, and reduces complexity quite a bit, because the
> > > > hypervisor can perfectly synchronize its bringup (and SMP bringup)
> > > > with this, and it can even optimize things like on-disk memory
> > > > snapshots to simply not write out those pages to disk.
> > > >
> > > > A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> > > > reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> > > > userspace API to deal with, and it'd be race free, and eliminate a lot
> > > > of kernel complexity.
> > >
> > > Clearly this has a chance to break applications, right?
> > > If there's an app that uses this as a non-system-calls way
> > > to find out whether there was a fork, it will break
> > > when wipe triggers without a fork ...
> > > For example, imagine:
> > >
> > > MADV_WIPEONFORK
> > > copy secret data to MADV_DONTFORK
> > > fork
> > >
> > >
> > > used to work, with this change it gets 0s instead of the secret data.
> > >
> > >
> > > I am also not sure it's wise to expose each guest process
> > > to the hypervisor like this. E.g. each process needs a
> > > guest physical address of its own then. This is a finite resource.
> > >
> > >
> > > The mmap interface proposed here is somewhat baroque, but it is
> > > certainly simple to implement ...
> >
> > Wipe of fork/vmgenid/whatever could end up being much more problematic
> > than it naively appears -- it could be wiped in the middle of a read.
> > Either the API needs to handle this cleanly, or we need something more
> > aggressive like signal-on-fork.
> >
> > --Andy
>
>
> Right, it's not on fork, it's actually when process is snapshotted.
>
> If we assume it's CRIU we care about, then I
> wonder what's wrong with something like
> MADV_CHANGEONPTRACE_SEIZE
> and basically say it's X bytes which change the value...

I feel like we may be approaching this from the wrong end.  Rather
than saying "what data structure can the kernel expose that might
plausibly be useful", how about we try identifying some specific
userspace needs and see what a good solution could look like.  I can
identify two major cryptographic use cases:

1. A userspace RNG.  The API exposed by the userspace end is a
function that generates random numbers.  The userspace code in turn
wants to know some things from the kernel: it wants some
best-quality-available random seed data from the kernel (and possibly
an indication of how good it is) as well as an indication of whether
the userspace memory may have been cloned or rolled back, or, failing
that, an indication of whether a reseed is needed.  Userspace could
implement a wide variety of algorithms on top depending on its goals
and compliance requirements, but the end goal is for the userspace
part to be very, very fast.

2. A userspace crypto stack that wants to avoid shooting itself in the
foot due to inadvertently doing the same thing twice.  For example, an
AES-GCM stack does not want to reuse an IV, *expecially* if there is
even the slightest chance that it might reuse the IV for different
data.  This use case doesn't necessarily involve random numbers, but,
if anything, it needs to be even faster than #1.

The threats here are not really the same.  For #1, a userspace RNG
should be able to recover from a scenario in which an adversary clones
the entire process *and gets to own the clone*.  For example, in
Android, an adversary can often gain complete control of a fork of the
zygote -- this shouldn't adversely affect the security properties of
other forks.  Similarly, a server farm could operate by having one
booted server that is cloned to create more workers.  Those clones
could be provisioned with secrets and permissions post-clone, and at
attacker gaining control of a fresh clone could be considered
acceptable.  For #2, in contrast, if an adversary gains control of a
clone of an AES-GCM session, they learn the key outright -- the
relevant attack scenario is that the adversary gets to interact with
two clones without compromising either clone per se.

It's worth noting that, in both cases, there could possibly be more
than one instance of an RNG or an AES-GCM session in the same process.
This means that using signals is awkward but not necessarily
impossibly.  (This is an area in which Linux, and POSIX in general, is
much weaker than Windows.)

[PULL 12/13] mac_oldworld: Drop some variables

2020-10-18 Thread Mark Cave-Ayland

From: BALATON Zoltan 

Values not used frequently enough may not worth putting in a local
variable, especially with names almost as long as the original value
because that does not improve readability, to the contrary it makes it
harder to see what value is used. Drop a few such variables.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: 

Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/mac_oldworld.c | 35 +--
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 1abf26b5b4..e34680f980 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -83,14 +83,11 @@ static void ppc_heathrow_reset(void *opaque)
 static void ppc_heathrow_init(MachineState *machine)
 {
 ram_addr_t ram_size = machine->ram_size;
-const char *kernel_filename = machine->kernel_filename;
-const char *kernel_cmdline = machine->kernel_cmdline;
-const char *initrd_filename = machine->initrd_filename;
 const char *boot_device = machine->boot_order;
 PowerPCCPU *cpu = NULL;
 CPUPPCState *env = NULL;
 char *filename;
-int linux_boot, i;
+int i;
 MemoryRegion *bios = g_new(MemoryRegion, 1);
 uint32_t kernel_base, initrd_base, cmdline_base = 0;
 int32_t kernel_size, initrd_size;
@@ -109,8 +106,6 @@ static void ppc_heathrow_init(MachineState *machine)
 void *fw_cfg;
 uint64_t tbfreq;
 
-linux_boot = (kernel_filename != NULL);
-
 /* init CPUs */
 for (i = 0; i < smp_cpus; i++) {
 cpu = POWERPC_CPU(cpu_create(machine->cpu_type));
@@ -147,7 +142,7 @@ static void ppc_heathrow_init(MachineState *machine)
 bios_addr = (uint32_t)bios_addr;
 
 if (bios_size <= 0) {
-/* or load binary ROM image */
+/* or if could not load ELF try loading a binary ROM image */
 bios_size = load_image_targphys(filename, PROM_BASE, PROM_SIZE);
 bios_addr = PROM_BASE;
 }
@@ -160,7 +155,7 @@ static void ppc_heathrow_init(MachineState *machine)
 exit(1);
 }
 
-if (linux_boot) {
+if (machine->kernel_filename) {
 int bswap_needed;
 
 #ifdef BSWAP_NEEDED
@@ -169,29 +164,32 @@ static void ppc_heathrow_init(MachineState *machine)
 bswap_needed = 0;
 #endif
 kernel_base = KERNEL_LOAD_ADDR;
-kernel_size = load_elf(kernel_filename, NULL,
+kernel_size = load_elf(machine->kernel_filename, NULL,
translate_kernel_address, NULL, NULL, NULL,
NULL, NULL, 1, PPC_ELF_MACHINE, 0, 0);
 if (kernel_size < 0)
-kernel_size = load_aout(kernel_filename, kernel_base,
+kernel_size = load_aout(machine->kernel_filename, kernel_base,
 ram_size - kernel_base, bswap_needed,
 TARGET_PAGE_SIZE);
 if (kernel_size < 0)
-kernel_size = load_image_targphys(kernel_filename,
+kernel_size = load_image_targphys(machine->kernel_filename,
   kernel_base,
   ram_size - kernel_base);
 if (kernel_size < 0) {
-error_report("could not load kernel '%s'", kernel_filename);
+error_report("could not load kernel '%s'",
+ machine->kernel_filename);
 exit(1);
 }
 /* load initrd */
-if (initrd_filename) {
-initrd_base = TARGET_PAGE_ALIGN(kernel_base + kernel_size + 
KERNEL_GAP);
-initrd_size = load_image_targphys(initrd_filename, initrd_base,
+if (machine->initrd_filename) {
+initrd_base = TARGET_PAGE_ALIGN(kernel_base + kernel_size +
+KERNEL_GAP);
+initrd_size = load_image_targphys(machine->initrd_filename,
+  initrd_base,
   ram_size - initrd_base);
 if (initrd_size < 0) {
 error_report("could not load initial ram disk '%s'",
- initrd_filename);
+ machine->initrd_filename);
 exit(1);
 }
 cmdline_base = TARGET_PAGE_ALIGN(initrd_base + initrd_size);
@@ -343,9 +341,10 @@ static void ppc_heathrow_init(MachineState *machine)
 fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, ARCH_HEATHROW);
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, kernel_base);
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
-if (kernel_cmdline) {
+if (machine->kernel_cmdline) {
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_CMDLINE, cmdline_base);
-pstrcpy_targphys("cmdline", cmdline_base, TARGET_PAGE_SIZE, 
kernel_cmdline);
+pstrcpy_targphys("cmdline", cmdline_base,

Re: [PATCH 0/6] m48t59: remove legacy init functions

2020-10-18 Thread Mark Cave-Ayland


On 17/10/2020 21:13, BALATON Zoltan via wrote:


This is inspired by Mark's series:

https://lists.nongnu.org/archive/html/qemu-ppc/2020-10/msg00251.html

and implements what I've suggested in review of that series to
simplify it and avoid code churn if implementing my suggestion later.

Regards,
BALATON Zoltan

BALATON Zoltan (4):
   mt48t59: Set default value of base-year property to 1968
   sun4m: use qdev instead of legacy m48t59_init() function
   sun4u: use qdev instead of legacy m48t59_init() function
   ppc405_boards: use qdev instead of legacy m48t59_init() function

Mark Cave-Ayland (2):
   m48t59-isa: remove legacy m48t59_init_isa() function
   m48t59: remove legacy m48t59_init() function

  hw/ppc/ppc405_boards.c  |  3 ++-
  hw/rtc/m48t59-isa.c | 25 -
  hw/rtc/m48t59.c | 37 +
  hw/sparc/sun4m.c|  5 +++--
  hw/sparc64/sun4u.c  |  6 --
  include/hw/rtc/m48t59.h |  6 --
  6 files changed, 10 insertions(+), 72 deletions(-)


Unfortunately this arrived too late - I'd already finished the tagging and local 
testing, but didn't get a chance to do the final PR before having to head out yesterday.


I think most people here agree that this code could be improved, but I'm not clear 
that this is the right solution given that Artyom has already pointed out that 40p 
uses 1900 as the base year. There would also be an overlap with the ideas that 
Philippe has expressed in this thread which would cause more code churn later, so if 
this is something that interests you I would suggest starting a separate thread to 
gain consensus as to the desired solution first before working on an updated series.



ATB,

Mark.

[PULL 09/13] mac_oldworld: Allow loading binary ROM image

2020-10-18 Thread Mark Cave-Ayland

From: BALATON Zoltan via 

The beige G3 Power Macintosh has a 4MB firmware ROM. Fix the size of
the rom region and fall back to loading a binary image with -bios if
loading ELF image failed. This allows testing emulation with a ROM
image from real hardware as well as using an ELF OpenBIOS image.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Mark Cave-Ayland 
Message-Id: <20201017155139.5a36a746...@zero.eik.bme.hu>
Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/mac_oldworld.c | 29 -
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 05e46ee6fe..0117ae17f5 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -59,6 +59,8 @@
 #define NDRV_VGA_FILENAME "qemu_vga.ndrv"
 
 #define GRACKLE_BASE 0xfec0
+#define PROM_BASE 0xffc0
+#define PROM_SIZE (4 * MiB)
 
 static void fw_cfg_boot_set(void *opaque, const char *boot_device,
 Error **errp)
@@ -100,6 +102,7 @@ static void ppc_heathrow_init(MachineState *machine)
 SysBusDevice *s;
 DeviceState *dev, *pic_dev;
 BusState *adb_bus;
+uint64_t bios_addr;
 int bios_size;
 unsigned int smp_cpus = machine->smp.cpus;
 uint16_t ppc_boot_device;
@@ -128,24 +131,32 @@ static void ppc_heathrow_init(MachineState *machine)
 
 memory_region_add_subregion(sysmem, 0, machine->ram);
 
-/* allocate and load BIOS */
-memory_region_init_rom(bios, NULL, "ppc_heathrow.bios", BIOS_SIZE,
+/* allocate and load firmware ROM */
+memory_region_init_rom(bios, NULL, "ppc_heathrow.bios", PROM_SIZE,
_fatal);
+memory_region_add_subregion(sysmem, PROM_BASE, bios);
 
-if (bios_name == NULL)
+if (!bios_name) {
 bios_name = PROM_FILENAME;
+}
 filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
-memory_region_add_subregion(sysmem, PROM_ADDR, bios);
-
-/* Load OpenBIOS (ELF) */
 if (filename) {
-bios_size = load_elf(filename, NULL, 0, NULL, NULL, NULL, NULL, NULL,
- 1, PPC_ELF_MACHINE, 0, 0);
+/* Load OpenBIOS (ELF) */
+bios_size = load_elf(filename, NULL, NULL, NULL, NULL, _addr,
+ NULL, NULL, 1, PPC_ELF_MACHINE, 0, 0);
+/* Unfortunately, load_elf sign-extends reading elf32 */
+bios_addr = (uint32_t)bios_addr;
+
+if (bios_size <= 0) {
+/* or load binary ROM image */
+bios_size = load_image_targphys(filename, PROM_BASE, PROM_SIZE);
+bios_addr = PROM_BASE;
+}
 g_free(filename);
 } else {
 bios_size = -1;
 }
-if (bios_size < 0 || bios_size > BIOS_SIZE) {
+if (bios_size < 0 || bios_addr - PROM_BASE + bios_size > PROM_SIZE) {
 error_report("could not load PowerPC bios '%s'", bios_name);
 exit(1);
 }
-- 
2.20.1

[PULL 11/13] mac_oldworld: Drop a variable, use get_system_memory() directly

2020-10-18 Thread Mark Cave-Ayland

From: BALATON Zoltan 

Half of the occurances already use get_system_memory() directly
instead of sysmem variable, convert the two other uses to
get_system_memory() too which seems to be more common and drop the
variable.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: 

Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/mac_oldworld.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 0117ae17f5..1abf26b5b4 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -87,7 +87,6 @@ static void ppc_heathrow_init(MachineState *machine)
 const char *kernel_cmdline = machine->kernel_cmdline;
 const char *initrd_filename = machine->initrd_filename;
 const char *boot_device = machine->boot_order;
-MemoryRegion *sysmem = get_system_memory();
 PowerPCCPU *cpu = NULL;
 CPUPPCState *env = NULL;
 char *filename;
@@ -129,12 +128,12 @@ static void ppc_heathrow_init(MachineState *machine)
 exit(1);
 }
 
-memory_region_add_subregion(sysmem, 0, machine->ram);
+memory_region_add_subregion(get_system_memory(), 0, machine->ram);
 
 /* allocate and load firmware ROM */
 memory_region_init_rom(bios, NULL, "ppc_heathrow.bios", PROM_SIZE,
_fatal);
-memory_region_add_subregion(sysmem, PROM_BASE, bios);
+memory_region_add_subregion(get_system_memory(), PROM_BASE, bios);
 
 if (!bios_name) {
 bios_name = PROM_FILENAME;
-- 
2.20.1

[PULL 08/13] m48t59: remove legacy m48t59_init() function

2020-10-18 Thread Mark Cave-Ayland

Now that all of the callers of this function have been switched to use qdev
properties, this legacy init function can now be removed.

Signed-off-by: Mark Cave-Ayland 
Message-Id: <20201016182739.22875-6-mark.cave-ayl...@ilande.co.uk>
Reviewed-by: Hervé Poussineau 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Mark Cave-Ayland 
---
 hw/rtc/m48t59.c | 35 ---
 include/hw/rtc/m48t59.h |  4 
 2 files changed, 39 deletions(-)

diff --git a/hw/rtc/m48t59.c b/hw/rtc/m48t59.c
index 6525206976..d54929e861 100644
--- a/hw/rtc/m48t59.c
+++ b/hw/rtc/m48t59.c
@@ -564,41 +564,6 @@ const MemoryRegionOps m48t59_io_ops = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-/* Initialisation routine */
-Nvram *m48t59_init(qemu_irq IRQ, hwaddr mem_base,
-   uint32_t io_base, uint16_t size, int base_year,
-   int model)
-{
-DeviceState *dev;
-SysBusDevice *s;
-int i;
-
-for (i = 0; i < ARRAY_SIZE(m48txx_sysbus_info); i++) {
-if (m48txx_sysbus_info[i].size != size ||
-m48txx_sysbus_info[i].model != model) {
-continue;
-}
-
-dev = qdev_new(m48txx_sysbus_info[i].bus_name);
-qdev_prop_set_int32(dev, "base-year", base_year);
-s = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(s, _fatal);
-sysbus_connect_irq(s, 0, IRQ);
-if (io_base != 0) {
-memory_region_add_subregion(get_system_io(), io_base,
-sysbus_mmio_get_region(s, 1));
-}
-if (mem_base != 0) {
-sysbus_mmio_map(s, 0, mem_base);
-}
-
-return NVRAM(s);
-}
-
-assert(false);
-return NULL;
-}
-
 void m48t59_realize_common(M48t59State *s, Error **errp)
 {
 s->buffer = g_malloc0(s->size);
diff --git a/include/hw/rtc/m48t59.h b/include/hw/rtc/m48t59.h
index 9defe578d1..d9b45eb161 100644
--- a/include/hw/rtc/m48t59.h
+++ b/include/hw/rtc/m48t59.h
@@ -47,8 +47,4 @@ struct NvramClass {
 void (*toggle_lock)(Nvram *obj, int lock);
 };
 
-Nvram *m48t59_init(qemu_irq IRQ, hwaddr mem_base,
-   uint32_t io_base, uint16_t size, int base_year,
-   int type);
-
 #endif /* HW_M48T59_H */
-- 
2.20.1

[PULL 07/13] ppc405_boards: use qdev properties instead of legacy m48t59_init() function

2020-10-18 Thread Mark Cave-Ayland

Signed-off-by: Mark Cave-Ayland 
Message-Id: <20201016182739.22875-5-mark.cave-ayl...@ilande.co.uk>
Reviewed-by: Hervé Poussineau 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/ppc405_boards.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 6198ec1035..4687715b15 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -28,6 +28,8 @@
 #include "qemu-common.h"
 #include "cpu.h"
 #include "hw/ppc/ppc.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
 #include "ppc405.h"
 #include "hw/rtc/m48t59.h"
 #include "hw/block/flash.h"
@@ -145,6 +147,8 @@ static void ref405ep_init(MachineState *machine)
 char *filename;
 ppc4xx_bd_info_t bd;
 CPUPPCState *env;
+DeviceState *dev;
+SysBusDevice *s;
 qemu_irq *pic;
 MemoryRegion *bios;
 MemoryRegion *sram = g_new(MemoryRegion, 1);
@@ -227,7 +231,11 @@ static void ref405ep_init(MachineState *machine)
 /* Register FPGA */
 ref405ep_fpga_init(sysmem, 0xF030);
 /* Register NVRAM */
-m48t59_init(NULL, 0xF000, 0, 8192, 1968, 8);
+dev = qdev_new("sysbus-m48t08");
+qdev_prop_set_int32(dev, "base-year", 1968);
+s = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(s, _fatal);
+sysbus_mmio_map(s, 0, 0xF000);
 /* Load kernel */
 linux_boot = (kernel_filename != NULL);
 if (linux_boot) {
-- 
2.20.1

[PULL 13/13] mac_oldworld: Change PCI address of macio to match real hardware

2020-10-18 Thread Mark Cave-Ayland

From: BALATON Zoltan 

The board firmware expect these to be at fixed addresses and programs
them without probing, this patch puts the macio device at the expected
PCI address.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Mark Cave-Ayland 
Message-Id: 

Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/mac_oldworld.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index e34680f980..6c59aa5601 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -288,7 +288,7 @@ static void ppc_heathrow_init(MachineState *machine)
 ide_drive_get(hd, ARRAY_SIZE(hd));
 
 /* MacIO */
-macio = pci_new(-1, TYPE_OLDWORLD_MACIO);
+macio = pci_new(PCI_DEVFN(16, 0), TYPE_OLDWORLD_MACIO);
 dev = DEVICE(macio);
 qdev_prop_set_uint64(dev, "frequency", tbfreq);
 object_property_set_link(OBJECT(macio), "pic", OBJECT(pic_dev),
-- 
2.20.1

Re: [PATCH v8 0/5] Mac Old World ROM experiment (ppc/mac_* clean ups and loading binary ROM)

2020-10-18 Thread Mark Cave-Ayland


On 17/10/2020 16:56, BALATON Zoltan via wrote:

If you can send a v9 with the cast fixed I'll apply this to my qemu-macppc branch 
right away.


You could've really just edit the single cast in patch 1 before applying to change 
the it back but I've resent the changed patch 1 as v9 also adding your R-b for your 
convenience. Other patches are unchanged so you can take the v8 for those, I haven't 
resent those, let me know if you want the whole series but this is really getting 
much more work that it should be for such a simple change. (There is no cast in patch 
2 as I've already stated several times.)


Thanks - this has been included in the PR I just sent.


ATB,

Mark.

[PULL 06/13] sun4u: use qdev properties instead of legacy m48t59_init() function

2020-10-18 Thread Mark Cave-Ayland

Signed-off-by: Mark Cave-Ayland 
Message-Id: <20201016182739.22875-4-mark.cave-ayl...@ilande.co.uk>
Reviewed-by: Hervé Poussineau 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Mark Cave-Ayland 
---
 hw/sparc64/sun4u.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c
index ad5ca2472a..05e659c8a4 100644
--- a/hw/sparc64/sun4u.c
+++ b/hw/sparc64/sun4u.c
@@ -671,10 +671,13 @@ static void sun4uv_init(MemoryRegion *address_space_mem,
 pci_ide_create_devs(pci_dev);
 
 /* Map NVRAM into I/O (ebus) space */
-nvram = m48t59_init(NULL, 0, 0, NVRAM_SIZE, 1968, 59);
-s = SYS_BUS_DEVICE(nvram);
+dev = qdev_new("sysbus-m48t59");
+qdev_prop_set_int32(dev, "base-year", 1968);
+s = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(s, _fatal);
 memory_region_add_subregion(pci_address_space_io(ebus), 0x2000,
 sysbus_mmio_get_region(s, 0));
+nvram = NVRAM(dev);
  
 initrd_size = 0;
 initrd_addr = 0;
-- 
2.20.1

[PULL 02/13] grackle: use qdev gpios for PCI IRQs

2020-10-18 Thread Mark Cave-Ayland

Currently an object link property is used to pass a reference to the Heathrow
PIC into the PCI host bridge so that grackle_init_irqs() can connect the PCI
IRQs to the PIC itself.

This can be simplified by defining the PCI IRQs as qdev gpios and then wiring
up the PCI IRQs to the PIC in the Old World machine init function.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20201013114922.2946-3-mark.cave-ayl...@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland 
---
 hw/pci-host/grackle.c | 19 ++-
 hw/ppc/mac_oldworld.c |  7 +--
 2 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/hw/pci-host/grackle.c b/hw/pci-host/grackle.c
index 57c29b20af..b05facf463 100644
--- a/hw/pci-host/grackle.c
+++ b/hw/pci-host/grackle.c
@@ -28,7 +28,6 @@
 #include "hw/ppc/mac.h"
 #include "hw/qdev-properties.h"
 #include "hw/pci/pci.h"
-#include "hw/intc/heathrow_pic.h"
 #include "hw/irq.h"
 #include "qapi/error.h"
 #include "qemu/module.h"
@@ -41,7 +40,6 @@ struct GrackleState {
 PCIHostState parent_obj;
 
 uint32_t ofw_addr;
-HeathrowState *pic;
 qemu_irq irqs[4];
 MemoryRegion pci_mmio;
 MemoryRegion pci_hole;
@@ -62,15 +60,6 @@ static void pci_grackle_set_irq(void *opaque, int irq_num, 
int level)
 qemu_set_irq(s->irqs[irq_num], level);
 }
 
-static void grackle_init_irqs(GrackleState *s)
-{
-int i;
-
-for (i = 0; i < ARRAY_SIZE(s->irqs); i++) {
-s->irqs[i] = qdev_get_gpio_in(DEVICE(s->pic), 0x15 + i);
-}
-}
-
 static void grackle_realize(DeviceState *dev, Error **errp)
 {
 GrackleState *s = GRACKLE_PCI_HOST_BRIDGE(dev);
@@ -85,7 +74,6 @@ static void grackle_realize(DeviceState *dev, Error **errp)
  0, 4, TYPE_PCI_BUS);
 
 pci_create_simple(phb->bus, 0, "grackle");
-grackle_init_irqs(s);
 }
 
 static void grackle_init(Object *obj)
@@ -106,15 +94,12 @@ static void grackle_init(Object *obj)
 memory_region_init_io(>data_mem, obj, _host_data_le_ops,
   DEVICE(obj), "pci-data-idx", 0x1000);
 
-object_property_add_link(obj, "pic", TYPE_HEATHROW,
- (Object **) >pic,
- qdev_prop_allow_set_link_before_realize,
- 0);
-
 sysbus_init_mmio(sbd, >conf_mem);
 sysbus_init_mmio(sbd, >data_mem);
 sysbus_init_mmio(sbd, >pci_hole);
 sysbus_init_mmio(sbd, >pci_io);
+
+qdev_init_gpio_out(DEVICE(obj), s->irqs, ARRAY_SIZE(s->irqs));
 }
 
 static void grackle_pci_realize(PCIDevice *d, Error **errp)
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index d6a76d06dc..05e46ee6fe 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -253,10 +253,9 @@ static void ppc_heathrow_init(MachineState *machine)
 /* Grackle PCI host bridge */
 dev = qdev_new(TYPE_GRACKLE_PCI_HOST_BRIDGE);
 qdev_prop_set_uint32(dev, "ofw-addr", 0x8000);
-object_property_set_link(OBJECT(dev), "pic", OBJECT(pic_dev),
- _abort);
 s = SYS_BUS_DEVICE(dev);
 sysbus_realize_and_unref(s, _fatal);
+
 sysbus_mmio_map(s, 0, GRACKLE_BASE);
 sysbus_mmio_map(s, 1, GRACKLE_BASE + 0x20);
 /* PCI hole */
@@ -266,6 +265,10 @@ static void ppc_heathrow_init(MachineState *machine)
 memory_region_add_subregion(get_system_memory(), 0xfe00,
 sysbus_mmio_get_region(s, 3));
 
+for (i = 0; i < 4; i++) {
+qdev_connect_gpio_out(dev, i, qdev_get_gpio_in(pic_dev, 0x15 + i));
+}
+
 pci_bus = PCI_HOST_BRIDGE(dev)->bus;
 
 pci_vga_init(pci_bus);
-- 
2.20.1

[PULL 04/13] m48t59-isa: remove legacy m48t59_init_isa() function

2020-10-18 Thread Mark Cave-Ayland

This function is no longer used within the codebase.

Signed-off-by: Mark Cave-Ayland 
Message-Id: <20201016182739.22875-2-mark.cave-ayl...@ilande.co.uk>
Reviewed-by: Hervé Poussineau 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Mark Cave-Ayland 
---
 hw/rtc/m48t59-isa.c | 25 -
 include/hw/rtc/m48t59.h |  2 --
 2 files changed, 27 deletions(-)

diff --git a/hw/rtc/m48t59-isa.c b/hw/rtc/m48t59-isa.c
index cae315e488..dc21fb10a5 100644
--- a/hw/rtc/m48t59-isa.c
+++ b/hw/rtc/m48t59-isa.c
@@ -58,31 +58,6 @@ static M48txxInfo m48txx_isa_info[] = {
 }
 };
 
-Nvram *m48t59_init_isa(ISABus *bus, uint32_t io_base, uint16_t size,
-   int base_year, int model)
-{
-ISADevice *isa_dev;
-DeviceState *dev;
-int i;
-
-for (i = 0; i < ARRAY_SIZE(m48txx_isa_info); i++) {
-if (m48txx_isa_info[i].size != size ||
-m48txx_isa_info[i].model != model) {
-continue;
-}
-
-isa_dev = isa_new(m48txx_isa_info[i].bus_name);
-dev = DEVICE(isa_dev);
-qdev_prop_set_uint32(dev, "iobase", io_base);
-qdev_prop_set_int32(dev, "base-year", base_year);
-isa_realize_and_unref(isa_dev, bus, _fatal);
-return NVRAM(dev);
-}
-
-assert(false);
-return NULL;
-}
-
 static uint32_t m48txx_isa_read(Nvram *obj, uint32_t addr)
 {
 M48txxISAState *d = M48TXX_ISA(obj);
diff --git a/include/hw/rtc/m48t59.h b/include/hw/rtc/m48t59.h
index 04abedf3b2..9defe578d1 100644
--- a/include/hw/rtc/m48t59.h
+++ b/include/hw/rtc/m48t59.h
@@ -47,8 +47,6 @@ struct NvramClass {
 void (*toggle_lock)(Nvram *obj, int lock);
 };
 
-Nvram *m48t59_init_isa(ISABus *bus, uint32_t io_base, uint16_t size,
-   int base_year, int type);
 Nvram *m48t59_init(qemu_irq IRQ, hwaddr mem_base,
uint32_t io_base, uint16_t size, int base_year,
int type);
-- 
2.20.1

[PULL 03/13] uninorth: use qdev gpios for PCI IRQs

2020-10-18 Thread Mark Cave-Ayland

Currently an object link property is used to pass a reference to the OpenPIC
into the PCI host bridge so that pci_unin_init_irqs() can connect the PCI
IRQs to the PIC itself.

This can be simplified by defining the PCI IRQs as qdev gpios and then wiring
up the PCI IRQs to the PIC in the New World machine init function.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20201013114922.2946-4-mark.cave-ayl...@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland 
---
 hw/pci-host/uninorth.c | 45 +++---
 hw/ppc/mac_newworld.c  | 24 --
 include/hw/pci-host/uninorth.h |  2 --
 3 files changed, 25 insertions(+), 46 deletions(-)

diff --git a/hw/pci-host/uninorth.c b/hw/pci-host/uninorth.c
index 1ed1072eeb..0c0a9ecee1 100644
--- a/hw/pci-host/uninorth.c
+++ b/hw/pci-host/uninorth.c
@@ -32,8 +32,6 @@
 #include "hw/pci-host/uninorth.h"
 #include "trace.h"
 
-static const int unin_irq_line[] = { 0x1b, 0x1c, 0x1d, 0x1e };
-
 static int pci_unin_map_irq(PCIDevice *pci_dev, int irq_num)
 {
 return (irq_num + (pci_dev->devfn >> 3)) & 3;
@@ -43,7 +41,7 @@ static void pci_unin_set_irq(void *opaque, int irq_num, int 
level)
 {
 UNINHostState *s = opaque;
 
-trace_unin_set_irq(unin_irq_line[irq_num], level);
+trace_unin_set_irq(irq_num, level);
 qemu_set_irq(s->irqs[irq_num], level);
 }
 
@@ -112,15 +110,6 @@ static const MemoryRegionOps unin_data_ops = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void pci_unin_init_irqs(UNINHostState *s)
-{
-int i;
-
-for (i = 0; i < ARRAY_SIZE(s->irqs); i++) {
-s->irqs[i] = qdev_get_gpio_in(DEVICE(s->pic), unin_irq_line[i]);
-}
-}
-
 static char *pci_unin_main_ofw_unit_address(const SysBusDevice *dev)
 {
 UNINHostState *s = UNI_NORTH_PCI_HOST_BRIDGE(dev);
@@ -141,7 +130,6 @@ static void pci_unin_main_realize(DeviceState *dev, Error 
**errp)
PCI_DEVFN(11, 0), 4, TYPE_PCI_BUS);
 
 pci_create_simple(h->bus, PCI_DEVFN(11, 0), "uni-north-pci");
-pci_unin_init_irqs(s);
 
 /* DEC 21154 bridge */
 #if 0
@@ -172,15 +160,12 @@ static void pci_unin_main_init(Object *obj)
  "unin-pci-hole", >pci_mmio,
  0x8000ULL, 0x1000ULL);
 
-object_property_add_link(obj, "pic", TYPE_OPENPIC,
- (Object **) >pic,
- qdev_prop_allow_set_link_before_realize,
- 0);
-
 sysbus_init_mmio(sbd, >conf_mem);
 sysbus_init_mmio(sbd, >data_mem);
 sysbus_init_mmio(sbd, >pci_hole);
 sysbus_init_mmio(sbd, >pci_io);
+
+qdev_init_gpio_out(DEVICE(obj), s->irqs, ARRAY_SIZE(s->irqs));
 }
 
 static void pci_u3_agp_realize(DeviceState *dev, Error **errp)
@@ -196,7 +181,6 @@ static void pci_u3_agp_realize(DeviceState *dev, Error 
**errp)
PCI_DEVFN(11, 0), 4, TYPE_PCI_BUS);
 
 pci_create_simple(h->bus, PCI_DEVFN(11, 0), "u3-agp");
-pci_unin_init_irqs(s);
 }
 
 static void pci_u3_agp_init(Object *obj)
@@ -220,15 +204,12 @@ static void pci_u3_agp_init(Object *obj)
  "unin-pci-hole", >pci_mmio,
  0x8000ULL, 0x7000ULL);
 
-object_property_add_link(obj, "pic", TYPE_OPENPIC,
- (Object **) >pic,
- qdev_prop_allow_set_link_before_realize,
- 0);
-
 sysbus_init_mmio(sbd, >conf_mem);
 sysbus_init_mmio(sbd, >data_mem);
 sysbus_init_mmio(sbd, >pci_hole);
 sysbus_init_mmio(sbd, >pci_io);
+
+qdev_init_gpio_out(DEVICE(obj), s->irqs, ARRAY_SIZE(s->irqs));
 }
 
 static void pci_unin_agp_realize(DeviceState *dev, Error **errp)
@@ -244,7 +225,6 @@ static void pci_unin_agp_realize(DeviceState *dev, Error 
**errp)
PCI_DEVFN(11, 0), 4, TYPE_PCI_BUS);
 
 pci_create_simple(h->bus, PCI_DEVFN(11, 0), "uni-north-agp");
-pci_unin_init_irqs(s);
 }
 
 static void pci_unin_agp_init(Object *obj)
@@ -259,13 +239,10 @@ static void pci_unin_agp_init(Object *obj)
 memory_region_init_io(>data_mem, OBJECT(h), _host_data_le_ops,
   obj, "unin-agp-conf-data", 0x1000);
 
-object_property_add_link(obj, "pic", TYPE_OPENPIC,
- (Object **) >pic,
- qdev_prop_allow_set_link_before_realize,
- 0);
-
 sysbus_init_mmio(sbd, >conf_mem);
 sysbus_init_mmio(sbd, >data_mem);
+
+qdev_init_gpio_out(DEVICE(obj), s->irqs, ARRAY_SIZE(s->irqs));
 }
 
 static void pci_unin_internal_realize(DeviceState *dev, Error **errp)
@@ -281,7 +258,6 @@ static void pci_unin_internal_realize(DeviceState *dev, 
Error **errp)
PCI_DEVFN(14, 0), 4, TYPE_PCI_BUS);
 
 pci_create_simple(h->bus, PCI_DEVFN(14, 0), "uni-north-internal-pci");
-

[PULL 05/13] sun4m: use qdev properties instead of legacy m48t59_init() function

2020-10-18 Thread Mark Cave-Ayland

Signed-off-by: Mark Cave-Ayland 
Message-Id: <20201016182739.22875-3-mark.cave-ayl...@ilande.co.uk>
Reviewed-by: Hervé Poussineau 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Mark Cave-Ayland 
---
 hw/sparc/sun4m.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 54a2b2f9ef..38d1e0fd12 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -837,7 +837,7 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
 {
 DeviceState *slavio_intctl;
 unsigned int i;
-void *nvram;
+Nvram *nvram;
 qemu_irq *cpu_irqs[MAX_CPUS], slavio_irq[32], slavio_cpu_irq[MAX_CPUS];
 qemu_irq fdc_tc;
 unsigned long kernel_size;
@@ -966,7 +966,13 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
 create_unimplemented_device("SUNW,sx", hwdef->sx_base, 0x2000);
 }
 
-nvram = m48t59_init(slavio_irq[0], hwdef->nvram_base, 0, 0x2000, 1968, 8);
+dev = qdev_new("sysbus-m48t08");
+qdev_prop_set_int32(dev, "base-year", 1968);
+s = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(s, _fatal);
+sysbus_connect_irq(s, 0, slavio_irq[0]);
+sysbus_mmio_map(s, 0, hwdef->nvram_base);
+nvram = NVRAM(dev);
 
 slavio_timer_init_all(hwdef->counter_base, slavio_irq[19], slavio_cpu_irq, 
smp_cpus);
 
-- 
2.20.1

[PULL 10/13] mac_newworld: Allow loading binary ROM image

2020-10-18 Thread Mark Cave-Ayland

From: BALATON Zoltan 

Fall back to load binary ROM image if loading ELF fails. This also
moves PROM_BASE and PROM_SIZE defines to board as these are matching
the ROM size and address on this board and removes the now unused
PROM_ADDR and BIOS_SIZE defines from common mac.h.

Signed-off-by: BALATON Zoltan 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: 
<4d58ffe7645a0c746c8fed6aa8775c0867b624e0.1602805637.git.bala...@eik.bme.hu>
Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/mac.h  |  2 --
 hw/ppc/mac_newworld.c | 22 ++
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/mac.h b/hw/ppc/mac.h
index f3976b9a45..22c8408078 100644
--- a/hw/ppc/mac.h
+++ b/hw/ppc/mac.h
@@ -39,10 +39,8 @@
 /* SMP is not enabled, for now */
 #define MAX_CPUS 1
 
-#define BIOS_SIZE(1 * MiB)
 #define NVRAM_SIZE0x2000
 #define PROM_FILENAME"openbios-ppc"
-#define PROM_ADDR 0xfff0
 
 #define KERNEL_LOAD_ADDR 0x0100
 #define KERNEL_GAP   0x0010
diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index 7a8dc09c8d..f9a1cc8944 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -82,6 +82,8 @@
 
 #define NDRV_VGA_FILENAME "qemu_vga.ndrv"
 
+#define PROM_BASE 0xfff0
+#define PROM_SIZE (1 * MiB)
 
 static void fw_cfg_boot_set(void *opaque, const char *boot_device,
 Error **errp)
@@ -100,7 +102,7 @@ static void ppc_core99_reset(void *opaque)
 
 cpu_reset(CPU(cpu));
 /* 970 CPUs want to get their initial IP as part of their boot protocol */
-cpu->env.nip = PROM_ADDR + 0x100;
+cpu->env.nip = PROM_BASE + 0x100;
 }
 
 /* PowerPC Mac99 hardware initialisation */
@@ -154,25 +156,29 @@ static void ppc_core99_init(MachineState *machine)
 /* allocate RAM */
 memory_region_add_subregion(get_system_memory(), 0, machine->ram);
 
-/* allocate and load BIOS */
-memory_region_init_rom(bios, NULL, "ppc_core99.bios", BIOS_SIZE,
+/* allocate and load firmware ROM */
+memory_region_init_rom(bios, NULL, "ppc_core99.bios", PROM_SIZE,
_fatal);
+memory_region_add_subregion(get_system_memory(), PROM_BASE, bios);
 
-if (bios_name == NULL)
+if (!bios_name) {
 bios_name = PROM_FILENAME;
+}
 filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
-memory_region_add_subregion(get_system_memory(), PROM_ADDR, bios);
-
-/* Load OpenBIOS (ELF) */
 if (filename) {
+/* Load OpenBIOS (ELF) */
 bios_size = load_elf(filename, NULL, NULL, NULL, NULL,
  NULL, NULL, NULL, 1, PPC_ELF_MACHINE, 0, 0);
 
+if (bios_size <= 0) {
+/* or load binary ROM image */
+bios_size = load_image_targphys(filename, PROM_BASE, PROM_SIZE);
+}
 g_free(filename);
 } else {
 bios_size = -1;
 }
-if (bios_size < 0 || bios_size > BIOS_SIZE) {
+if (bios_size < 0 || bios_size > PROM_SIZE) {
 error_report("could not load PowerPC bios '%s'", bios_name);
 exit(1);
 }
-- 
2.20.1

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-18 Thread Michael S. Tsirkin

On Sun, Oct 18, 2020 at 08:54:36AM -0700, Andy Lutomirski wrote:
> On Sun, Oct 18, 2020 at 8:52 AM Michael S. Tsirkin  wrote:
> >
> > On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> > > 4c. The guest kernel maintains an array of physical addresses that are
> > > MADV_WIPEONFORK. The hypervisor knows about this array and its
> > > location through whatever protocol, and before resuming a
> > > moved/snapshotted/duplicated VM, it takes the responsibility for
> > > memzeroing this memory. The huge pro here would be that this
> > > eliminates all races, and reduces complexity quite a bit, because the
> > > hypervisor can perfectly synchronize its bringup (and SMP bringup)
> > > with this, and it can even optimize things like on-disk memory
> > > snapshots to simply not write out those pages to disk.
> > >
> > > A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> > > reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> > > userspace API to deal with, and it'd be race free, and eliminate a lot
> > > of kernel complexity.
> >
> > Clearly this has a chance to break applications, right?
> > If there's an app that uses this as a non-system-calls way
> > to find out whether there was a fork, it will break
> > when wipe triggers without a fork ...
> > For example, imagine:
> >
> > MADV_WIPEONFORK
> > copy secret data to MADV_DONTFORK
> > fork
> >
> >
> > used to work, with this change it gets 0s instead of the secret data.
> >
> >
> > I am also not sure it's wise to expose each guest process
> > to the hypervisor like this. E.g. each process needs a
> > guest physical address of its own then. This is a finite resource.
> >
> >
> > The mmap interface proposed here is somewhat baroque, but it is
> > certainly simple to implement ...
> 
> Wipe of fork/vmgenid/whatever could end up being much more problematic
> than it naively appears -- it could be wiped in the middle of a read.
> Either the API needs to handle this cleanly, or we need something more
> aggressive like signal-on-fork.
> 
> --Andy


Right, it's not on fork, it's actually when process is snapshotted.

If we assume it's CRIU we care about, then I
wonder what's wrong with something like
MADV_CHANGEONPTRACE_SEIZE
and basically say it's X bytes which change the value...


-- 
MST

[PULL 01/13] macio: don't reference serial_hd() directly within the device

2020-10-18 Thread Mark Cave-Ayland

Instead use qdev_prop_set_chr() to configure the ESCC serial chardevs at the
Mac Old World and New World machine level.

Also remove the now obsolete comment referring to the use of serial_hd() and
the setting of user_creatable to false accordingly.

Signed-off-by: Mark Cave-Ayland 
Message-Id: <20201013114922.2946-2-mark.cave-ayl...@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland 
---
 hw/misc/macio/macio.c | 4 
 hw/ppc/mac_newworld.c | 6 ++
 hw/ppc/mac_oldworld.c | 6 ++
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/misc/macio/macio.c b/hw/misc/macio/macio.c
index 679722628e..51368884d0 100644
--- a/hw/misc/macio/macio.c
+++ b/hw/misc/macio/macio.c
@@ -109,8 +109,6 @@ static void macio_common_realize(PCIDevice *d, Error **errp)
 qdev_prop_set_uint32(DEVICE(>escc), "disabled", 0);
 qdev_prop_set_uint32(DEVICE(>escc), "frequency", ESCC_CLOCK);
 qdev_prop_set_uint32(DEVICE(>escc), "it_shift", 4);
-qdev_prop_set_chr(DEVICE(>escc), "chrA", serial_hd(0));
-qdev_prop_set_chr(DEVICE(>escc), "chrB", serial_hd(1));
 qdev_prop_set_uint32(DEVICE(>escc), "chnBtype", escc_serial);
 qdev_prop_set_uint32(DEVICE(>escc), "chnAtype", escc_serial);
 if (!qdev_realize(DEVICE(>escc), BUS(>macio_bus), errp)) {
@@ -458,8 +456,6 @@ static void macio_class_init(ObjectClass *klass, void *data)
 k->class_id = PCI_CLASS_OTHERS << 8;
 device_class_set_props(dc, macio_properties);
 set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
-/* Reason: Uses serial_hds in macio_instance_init */
-dc->user_creatable = false;
 }
 
 static const TypeInfo macio_bus_info = {
diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index 4dfbeec0ca..6f5ef2e782 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -123,6 +123,7 @@ static void ppc_core99_init(MachineState *machine)
 UNINHostState *uninorth_pci;
 PCIBus *pci_bus;
 PCIDevice *macio;
+ESCCState *escc;
 bool has_pmu, has_adb;
 MACIOIDEState *macio_ide;
 BusState *adb_bus;
@@ -380,6 +381,11 @@ static void ppc_core99_init(MachineState *machine)
 qdev_prop_set_bit(dev, "has-adb", has_adb);
 object_property_set_link(OBJECT(macio), "pic", OBJECT(pic_dev),
  _abort);
+
+escc = ESCC(object_resolve_path_component(OBJECT(macio), "escc"));
+qdev_prop_set_chr(DEVICE(escc), "chrA", serial_hd(0));
+qdev_prop_set_chr(DEVICE(escc), "chrB", serial_hd(1));
+
 pci_realize_and_unref(macio, pci_bus, _fatal);
 
 /* We only emulate 2 out of 3 IDE controllers for now */
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index f8173934a2..d6a76d06dc 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -96,6 +96,7 @@ static void ppc_heathrow_init(MachineState *machine)
 PCIBus *pci_bus;
 PCIDevice *macio;
 MACIOIDEState *macio_ide;
+ESCCState *escc;
 SysBusDevice *s;
 DeviceState *dev, *pic_dev;
 BusState *adb_bus;
@@ -281,6 +282,11 @@ static void ppc_heathrow_init(MachineState *machine)
 qdev_prop_set_uint64(dev, "frequency", tbfreq);
 object_property_set_link(OBJECT(macio), "pic", OBJECT(pic_dev),
  _abort);
+
+escc = ESCC(object_resolve_path_component(OBJECT(macio), "escc"));
+qdev_prop_set_chr(DEVICE(escc), "chrA", serial_hd(0));
+qdev_prop_set_chr(DEVICE(escc), "chrB", serial_hd(1));
+
 pci_realize_and_unref(macio, pci_bus, _fatal);
 
 macio_ide = MACIO_IDE(object_resolve_path_component(OBJECT(macio),
-- 
2.20.1

[PULL 00/13] qemu-macppc queue 20201018

2020-10-18 Thread Mark Cave-Ayland

The following changes since commit e12ce85b2c79d83a340953291912875c30b3af06:

  Merge remote-tracking branch 'remotes/ehabkost/tags/x86-next-pull-request' 
into staging (2020-10-16 22:46:28 +0100)

are available in the Git repository at:

  git://github.com/mcayland/qemu.git tags/qemu-macppc-20201018

for you to fetch changes up to 45e6b0fe210dc8a08117e6ccbdc081348e21de09:

  mac_oldworld: Change PCI address of macio to match real hardware (2020-10-18 
16:21:42 +0100)


qemu-macppc updates


BALATON Zoltan (4):
  mac_newworld: Allow loading binary ROM image
  mac_oldworld: Drop a variable, use get_system_memory() directly
  mac_oldworld: Drop some variables
  mac_oldworld: Change PCI address of macio to match real hardware

BALATON Zoltan via (1):
  mac_oldworld: Allow loading binary ROM image

Mark Cave-Ayland (8):
  macio: don't reference serial_hd() directly within the device
  grackle: use qdev gpios for PCI IRQs
  uninorth: use qdev gpios for PCI IRQs
  m48t59-isa: remove legacy m48t59_init_isa() function
  sun4m: use qdev properties instead of legacy m48t59_init() function
  sun4u: use qdev properties instead of legacy m48t59_init() function
  ppc405_boards: use qdev properties instead of legacy m48t59_init() 
function
  m48t59: remove legacy m48t59_init() function

 hw/misc/macio/macio.c  |  4 ---
 hw/pci-host/grackle.c  | 19 ++
 hw/pci-host/uninorth.c | 45 +---
 hw/ppc/mac.h   |  2 --
 hw/ppc/mac_newworld.c  | 52 ++-
 hw/ppc/mac_oldworld.c  | 80 ++
 hw/ppc/ppc405_boards.c | 10 +-
 hw/rtc/m48t59-isa.c| 25 -
 hw/rtc/m48t59.c| 35 --
 hw/sparc/sun4m.c   | 10 --
 hw/sparc64/sun4u.c |  7 ++--
 include/hw/pci-host/uninorth.h |  2 --
 include/hw/rtc/m48t59.h|  6 
 13 files changed, 118 insertions(+), 179 deletions(-)

Re: [PATCH] target/riscv: Adjust privilege level for HLV(X)/HSV instructions

2020-10-18 Thread Richard Henderson

On 10/18/20 5:03 AM, Georg Kotheimer wrote:
> According to the specification the "field SPVP of hstatus controls the
> privilege level of the access" for the hypervisor virtual-machine load
> and store instructions HLV, HLVX and HSV.
> 
> We introduce the new virtualization register field HS_HYP_LD_ST,
> similar to HS_TWO_STAGE, which tracks whether we are currently
> executing a hypervisor virtual-macine load or store instruction.
> 
> Signed-off-by: Georg Kotheimer 

Well, let me start by mentioning the existing implementation of hyp_load et al
is broken.  I guess I wasn't paying attention when Alistair implemented them.

Here's the problem: When you change how riscv_cpu_tlb_fill works, as you are by
modifying get_physical_address, you have to remember that those results are
*saved* in the qemu tlb.

So by setting some flags, performing one memory operation, and resetting the
flags, we have in fact corrupted the tlb for the next operation without those
flags.

You need to either:

(1) perform the memory operation *without* using the qemu tlb machinery (i.e.
use get_physical_address directly, then use address_space_ld/st), or

(2) add a new mmu index for the HLV/HSV operation, with all of the differences
implied.

The second is imo preferable, as it means that helper_hyp_load et al can go
away and become normal qemu_ld/st opcodes with the new mmu indexes.

Annoyingly, for the moment you wouldn't be able to remove helper_hyp_x_load,
because there's no qemu_ld variant that uses execute permission.  But it can be
done with helpers.  I'll note that the current implementation is broken,
because it uses cpu_lduw_data_ra, and not cpu_lduw_code_ra, which is exactly
the difference that uses execute permissions.  After the conversion to the new
mmuidx, you would use cpu_lduw_mmuidx_ra.  I would also split the function into
two, so that one performs HLVX.HU and the other HLVX.WU, so that you don't have
to pass the size as a parameter.

Finally, this patch, changing behaviour when hstatus.SPVP changes... depends on
how often this bit is expected to be toggled.

If the bit toggles frequently, e.g. around some small section of kernel code,
then one might copy the SPVP bit into tlb_flags and use two different mmu
indexes to imply the state of the SPVP bit.

If the bit does not toggle frequently, e.g. whatever bit of kernel code runs
infrequently, or it only happens around priv level changes, then simply
flushing the qemu tlb when the SPVP bit changes is sufficient.  You then get to
look at SPVP directly within tlb_fill.

r~

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-18 Thread Andy Lutomirski

On Sun, Oct 18, 2020 at 8:52 AM Michael S. Tsirkin  wrote:
>
> On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> > 4c. The guest kernel maintains an array of physical addresses that are
> > MADV_WIPEONFORK. The hypervisor knows about this array and its
> > location through whatever protocol, and before resuming a
> > moved/snapshotted/duplicated VM, it takes the responsibility for
> > memzeroing this memory. The huge pro here would be that this
> > eliminates all races, and reduces complexity quite a bit, because the
> > hypervisor can perfectly synchronize its bringup (and SMP bringup)
> > with this, and it can even optimize things like on-disk memory
> > snapshots to simply not write out those pages to disk.
> >
> > A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> > reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> > userspace API to deal with, and it'd be race free, and eliminate a lot
> > of kernel complexity.
>
> Clearly this has a chance to break applications, right?
> If there's an app that uses this as a non-system-calls way
> to find out whether there was a fork, it will break
> when wipe triggers without a fork ...
> For example, imagine:
>
> MADV_WIPEONFORK
> copy secret data to MADV_DONTFORK
> fork
>
>
> used to work, with this change it gets 0s instead of the secret data.
>
>
> I am also not sure it's wise to expose each guest process
> to the hypervisor like this. E.g. each process needs a
> guest physical address of its own then. This is a finite resource.
>
>
> The mmap interface proposed here is somewhat baroque, but it is
> certainly simple to implement ...

Wipe of fork/vmgenid/whatever could end up being much more problematic
than it naively appears -- it could be wiped in the middle of a read.
Either the API needs to handle this cleanly, or we need something more
aggressive like signal-on-fork.

--Andy

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-18 Thread Michael S. Tsirkin

On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> 4c. The guest kernel maintains an array of physical addresses that are
> MADV_WIPEONFORK. The hypervisor knows about this array and its
> location through whatever protocol, and before resuming a
> moved/snapshotted/duplicated VM, it takes the responsibility for
> memzeroing this memory. The huge pro here would be that this
> eliminates all races, and reduces complexity quite a bit, because the
> hypervisor can perfectly synchronize its bringup (and SMP bringup)
> with this, and it can even optimize things like on-disk memory
> snapshots to simply not write out those pages to disk.
> 
> A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> userspace API to deal with, and it'd be race free, and eliminate a lot
> of kernel complexity.

Clearly this has a chance to break applications, right?
If there's an app that uses this as a non-system-calls way
to find out whether there was a fork, it will break
when wipe triggers without a fork ...
For example, imagine:

MADV_WIPEONFORK
copy secret data to MADV_DONTFORK
fork

used to work, with this change it gets 0s instead of the secret data.

I am also not sure it's wise to expose each guest process
to the hypervisor like this. E.g. each process needs a
guest physical address of its own then. This is a finite resource.

The mmap interface proposed here is somewhat baroque, but it is
certainly simple to implement ...

-- 
MST

ERROR: glib-2.48 gthread-2.0 is required to compile QEMU

2020-10-18 Thread Lee



Ubuntu 14.04.6 LTS, X86_64
I  make source code of version qemu 5.1\5.0\4.2,and foud the error:glib-2.48 
gthread-2.0 is required to compile QEMU
try apt-get install libglib2.0-dev,and it is sucess :
Reading state information... Done
libglib2.0-dev is already the newest version.
but the error is  not fix; and i found that version qemu 4.1 is OK,the same 
environment；
hi all ,is there some suggestion for me?

Re: [PATCH] ati: mask x y display parameter values

2020-10-18 Thread BALATON Zoltan via


On Sun, 18 Oct 2020, P J P wrote:

From: Prasad J Pandit 

The source and destination x,y display parameters in ati_2d_blt()
may run off the vga limits if either of s->regs.[src|dst]_[xy] is
zero. Mask the register values to avoid potential crash.

Reported-by: Gaoning Pan 
Signed-off-by: Prasad J Pandit 
---
hw/display/ati_2d.c | 12 ++--
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/display/ati_2d.c b/hw/display/ati_2d.c
index 23a8ae0cd8..524bc03a83 100644
--- a/hw/display/ati_2d.c
+++ b/hw/display/ati_2d.c
@@ -53,10 +53,10 @@ void ati_2d_blt(ATIVGAState *s)
s->vga.vbe_start_addr, surface_data(ds), surface_stride(ds),
surface_bits_per_pixel(ds),
(s->regs.dp_mix & GMC_ROP3_MASK) >> 16);
-unsigned dst_x = (s->regs.dp_cntl & DST_X_LEFT_TO_RIGHT ?
-  s->regs.dst_x : s->regs.dst_x + 1 - s->regs.dst_width);
-unsigned dst_y = (s->regs.dp_cntl & DST_Y_TOP_TO_BOTTOM ?
-  s->regs.dst_y : s->regs.dst_y + 1 - s->regs.dst_height);
+unsigned dst_x = (s->regs.dp_cntl & DST_X_LEFT_TO_RIGHT ? s->regs.dst_x
+: (s->regs.dst_x + 1 - s->regs.dst_width) & 0x3fff);
+unsigned dst_y = (s->regs.dp_cntl & DST_Y_TOP_TO_BOTTOM ? s->regs.dst_y
+: (s->regs.dst_y + 1 - s->regs.dst_height) & 0x3fff);


I don't think that's the correct fix. VRAM size is settable via a property 
so we should check if the resulting values are inside VRAM for which a 
simple mask may not be enough. Rather, check the calculation in the if 
with the error that says "blt outside vram not implemented".


The s->regs.[src|dst]_[xy] values should not be over 0x3fff because we 
mask them on register write in ati.c and here [src|dst]_[x|y] local 
variables are declared unsigned so negative values come out as large 
integers that should be caught by the checks below as being over VRAM end 
but those checks may have an off by one error or some other mistake. Do 
you have a reproducer and did you test if this fixes the crash or more 
info on how this overflows?


Regards,
BALATON Zoltan


int bpp = ati_bpp_from_datatype(s);
if (!bpp) {
qemu_log_mask(LOG_GUEST_ERROR, "Invalid bpp\n");
@@ -91,9 +91,9 @@ void ati_2d_blt(ATIVGAState *s)
case ROP3_SRCCOPY:
{
unsigned src_x = (s->regs.dp_cntl & DST_X_LEFT_TO_RIGHT ?
-   s->regs.src_x : s->regs.src_x + 1 - s->regs.dst_width);
+   s->regs.src_x : (s->regs.src_x + 1 - s->regs.dst_width) & 0x3fff);
unsigned src_y = (s->regs.dp_cntl & DST_Y_TOP_TO_BOTTOM ?
-   s->regs.src_y : s->regs.src_y + 1 - s->regs.dst_height);
+   s->regs.src_y : (s->regs.src_y + 1 - s->regs.dst_height) & 0x3fff);
int src_stride = DEFAULT_CNTL ?
 s->regs.src_pitch : s->regs.default_pitch;
if (!src_stride) {

1 2 >

1 - 100 of 109 matches

Mail list logo