Re: [PATCH for-6.1 1/2] docs: Remove stale TODO comments about license and version

2021-07-22 Thread Cleber Rosa


Peter Maydell writes:

> Since commits 13f934e79fa and 3a50c8f3067aaf, our HTML docs include a
> footer to all pages stating the license and version.  We can
> therefore delete the TODO comments suggesting we should do that from
> our .rst files.
>
> Signed-off-by: Peter Maydell 
> ---
>  docs/interop/qemu-ga-ref.rst | 9 -
>  docs/interop/qemu-qmp-ref.rst| 9 -
>  docs/interop/qemu-storage-daemon-qmp-ref.rst | 9 -
>  3 files changed, 27 deletions(-)
>

Reviewed-by: Cleber Rosa 




Re: [PATCH for-6.1 2/2] docs: Move licence/copyright from HTML output to rST comments

2021-07-22 Thread Cleber Rosa


Peter Maydell writes:

> Our built HTML documentation now has a standard footer which
> gives the license for QEMU (and its documentation as a whole).
> In almost all pages, we either don't bother to state the
> copyright/license for the individual rST sources, or we put
> it in an rST comment. There are just three pages which render
> copyright or license information into the user-visible HTML.
>
> Quoting a specific (different) license for an individual HTML
> page within the manual is confusing. Downgrade the license
> and copyright info to a comment within the rST source, bringing
> these pages in line with the rest of our documents.
>
> Suggested-by: Markus Armbruster 
> Signed-off-by: Peter Maydell 
> ---
>  docs/interop/vhost-user-gpu.rst |  7 ---
>  docs/interop/vhost-user.rst | 12 +++-
>  docs/system/generic-loader.rst  |  4 ++--
>  3 files changed, 13 insertions(+), 10 deletions(-)

Reviewed-by: Cleber Rosa 




Re: [PATCH-for-6.1 v2] gitlab-ci: Extract OpenSBI job rules and fix 'when' condition

2021-07-22 Thread Cleber Rosa


Philippe Mathieu-Daudé writes:

> First, all jobs depending on 'docker-opensbi' job must use at most
> all the rules that triggers it. The simplest way to ensure that is
> to always use the same rules. Extract all the rules to a reusable
> section, and include this section (with the 'extends' keyword) in
> both 'docker-opensbi' and 'build-opensbi' jobs.
>
> Second, jobs depending on another should not use the 'when: always'
> condition, because if a dependency failed we should not keep running
> jobs depending on it. The correct condition is 'when: on_success'.
>
> The problems were introduced in commit c6fc0fc1a71 ("gitlab-ci.yml:
> Add jobs to build OpenSBI firmware binaries"), but were revealed in
> commit 91e9c47e50a ("docker: OpenSBI build job depends on OpenSBI
> container").
>
> This fix is similar to the one used with the EDK2 firmware job in
> commit ac0595cf6b3 ("gitlab-ci: Extract EDK2 job rules to reusable
> section").
>
> Reported-by: Daniel P. Berrangé 
> Reviewed-by: Daniel P. Berrangé 
> Reviewed-by: Willian Rampazzo 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v2: when 'always' -> 'on_success' & reworded (danpb)
>
> Supersedes: <20210720164829.3949558-1-phi...@redhat.com>
> ---
>  .gitlab-ci.d/opensbi.yml | 30 ++
>  1 file changed, 18 insertions(+), 12 deletions(-)
>

Reviewed-by: Cleber Rosa 




[PATCH v2 0/1] Improve module accelerator error message

2021-07-22 Thread Jose R. Ziviani
v1 -> v2:
* Moved the code to module.c
* Simplified a lot by using current module DB to get info

The main objective is to improve the error message when trying to
load a not found/not installed module TCG.

For example:

$ qemu-system-x86_64 -accel tcg
ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: assertion failed: 
(ops != NULL)
Bail out! ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: 
assertion failed: (ops != NULL)
[1]31964 IOT instruction (core dumped)  ./qemu-system-x86_64 ...

To:
$ qemu-system-x86_64 -accel tcg
accel-tcg-x86_64 module is missing, install the package or config the library 
path correctly.

Jose R. Ziviani (1):
  modules: Improve error message when module is not found

 accel/accel-softmmu.c |  5 -
 util/module.c | 14 --
 2 files changed, 12 insertions(+), 7 deletions(-)

-- 
2.32.0




[PATCH v2 1/1] modules: Improve error message when module is not found

2021-07-22 Thread Jose R. Ziviani
When a module is not found, specially accelerators, QEMU displays
a error message that not easy to understand[1]. This patch improves
the readability by offering a user-friendly message[2].

This patch also moves the accelerator ops check to runtine (instead
of the original g_assert) because it works better with dynamic
modules.

[1] qemu-system-x86_64 -accel tcg
ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: assertion failed:
(ops != NULL)
Bail out! ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces:
assertion failed: (ops != NULL)
31964 IOT instruction (core dumped)  ./qemu-system-x86_64 ...

[2] qemu-system-x86_64 -accel tcg
accel-tcg-x86_64 module is missing, install the package or config the library 
path correctly.

Signed-off-by: Jose R. Ziviani 
---
 accel/accel-softmmu.c |  5 -
 util/module.c | 14 --
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/accel/accel-softmmu.c b/accel/accel-softmmu.c
index 67276e4f52..52449ac2d0 100644
--- a/accel/accel-softmmu.c
+++ b/accel/accel-softmmu.c
@@ -79,7 +79,10 @@ void accel_init_ops_interfaces(AccelClass *ac)
  * all accelerators need to define ops, providing at least a mandatory
  * non-NULL create_vcpu_thread operation.
  */
-g_assert(ops != NULL);
+if (ops == NULL) {
+exit(1);
+}
+
 if (ops->ops_init) {
 ops->ops_init(ops);
 }
diff --git a/util/module.c b/util/module.c
index 6bb4ad915a..268a8563fd 100644
--- a/util/module.c
+++ b/util/module.c
@@ -206,13 +206,10 @@ static int module_load_file(const char *fname, bool 
mayfail, bool export_symbols
 out:
 return ret;
 }
-#endif
 
 bool module_load_one(const char *prefix, const char *lib_name, bool mayfail)
 {
 bool success = false;
-
-#ifdef CONFIG_MODULES
 char *fname = NULL;
 #ifdef CONFIG_MODULE_UPGRADES
 char *version_dir;
@@ -300,6 +297,9 @@ bool module_load_one(const char *prefix, const char 
*lib_name, bool mayfail)
 
 if (!success) {
 g_hash_table_remove(loaded_modules, module_name);
+fprintf(stderr, "%s module is missing, install the "
+"package or config the library path "
+"correctly.\n", module_name);
 g_free(module_name);
 }
 
@@ -307,12 +307,9 @@ bool module_load_one(const char *prefix, const char 
*lib_name, bool mayfail)
 g_free(dirs[i]);
 }
 
-#endif
 return success;
 }
 
-#ifdef CONFIG_MODULES
-
 static bool module_loaded_qom_all;
 
 void module_load_qom_one(const char *type)
@@ -384,4 +381,9 @@ void qemu_load_module_for_opts(const char *group) {}
 void module_load_qom_one(const char *type) {}
 void module_load_qom_all(void) {}
 
+bool module_load_one(const char *prefix, const char *lib_name, bool mayfail)
+{
+return false;
+}
+
 #endif
-- 
2.32.0




[PATCH for-6.1 1/2] docs: Remove stale TODO comments about license and version

2021-07-22 Thread Peter Maydell
Since commits 13f934e79fa and 3a50c8f3067aaf, our HTML docs include a
footer to all pages stating the license and version.  We can
therefore delete the TODO comments suggesting we should do that from
our .rst files.

Signed-off-by: Peter Maydell 
---
 docs/interop/qemu-ga-ref.rst | 9 -
 docs/interop/qemu-qmp-ref.rst| 9 -
 docs/interop/qemu-storage-daemon-qmp-ref.rst | 9 -
 3 files changed, 27 deletions(-)

diff --git a/docs/interop/qemu-ga-ref.rst b/docs/interop/qemu-ga-ref.rst
index db1e9461249..032d4924552 100644
--- a/docs/interop/qemu-ga-ref.rst
+++ b/docs/interop/qemu-ga-ref.rst
@@ -1,15 +1,6 @@
 QEMU Guest Agent Protocol Reference
 ===
 
-..
-   TODO: the old Texinfo manual used to note that this manual
-   is GPL-v2-or-later. We should make that reader-visible
-   both here and in our Sphinx manuals more generally.
-
-..
-   TODO: display the QEMU version, both here and in our Sphinx manuals
-   more generally.
-
 .. contents::
:depth: 3
 
diff --git a/docs/interop/qemu-qmp-ref.rst b/docs/interop/qemu-qmp-ref.rst
index b5bebf6b9a9..357effd64f3 100644
--- a/docs/interop/qemu-qmp-ref.rst
+++ b/docs/interop/qemu-qmp-ref.rst
@@ -1,15 +1,6 @@
 QEMU QMP Reference Manual
 =
 
-..
-   TODO: the old Texinfo manual used to note that this manual
-   is GPL-v2-or-later. We should make that reader-visible
-   both here and in our Sphinx manuals more generally.
-
-..
-   TODO: display the QEMU version, both here and in our Sphinx manuals
-   more generally.
-
 .. contents::
:depth: 3
 
diff --git a/docs/interop/qemu-storage-daemon-qmp-ref.rst 
b/docs/interop/qemu-storage-daemon-qmp-ref.rst
index d0ebb42ebd5..9fed68152f5 100644
--- a/docs/interop/qemu-storage-daemon-qmp-ref.rst
+++ b/docs/interop/qemu-storage-daemon-qmp-ref.rst
@@ -1,15 +1,6 @@
 QEMU Storage Daemon QMP Reference Manual
 
 
-..
-   TODO: the old Texinfo manual used to note that this manual
-   is GPL-v2-or-later. We should make that reader-visible
-   both here and in our Sphinx manuals more generally.
-
-..
-   TODO: display the QEMU version, both here and in our Sphinx manuals
-   more generally.
-
 .. contents::
:depth: 3
 
-- 
2.20.1




Re: [PATCH v2 1/1] hmp: synchronize cpu state for lapic info

2021-07-22 Thread Dongli Zhang
May I get feedback for this bugfix?

So far the "info lapic " returns stale data and could not accurate reflect
the status in KVM.

Thank you very much!

Dongli Zhang

On 7/1/21 2:40 PM, Dongli Zhang wrote:
> While the default "info lapic" always synchronizes cpu state ...
> 
> mon_get_cpu()
> -> mon_get_cpu_sync(mon, true)
>-> cpu_synchronize_state(cpu)
>   -> ioctl KVM_GET_LAPIC (taking KVM as example)
> 
> ... the cpu state is not synchronized when the apic-id is available as
> argument.
> 
> The cpu state should be synchronized when apic-id is available. Otherwise
> the "info lapic " always returns stale data.
> 
> Signed-off-by: Dongli Zhang 
> ---
> Changed since v1:
>   - I sent out wrong patch version in v1
> 
>  target/i386/monitor.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/target/i386/monitor.c b/target/i386/monitor.c
> index 119211f0b0..d833ab5b8e 100644
> --- a/target/i386/monitor.c
> +++ b/target/i386/monitor.c
> @@ -28,6 +28,7 @@
>  #include "monitor/hmp-target.h"
>  #include "monitor/hmp.h"
>  #include "qapi/qmp/qdict.h"
> +#include "sysemu/hw_accel.h"
>  #include "sysemu/kvm.h"
>  #include "sysemu/sev.h"
>  #include "qapi/error.h"
> @@ -656,7 +657,11 @@ void hmp_info_local_apic(Monitor *mon, const QDict 
> *qdict)
>  
>  if (qdict_haskey(qdict, "apic-id")) {
>  int id = qdict_get_try_int(qdict, "apic-id", 0);
> +
>  cs = cpu_by_arch_id(id);
> +if (cs) {
> +cpu_synchronize_state(cs);
> +}
>  } else {
>  cs = mon_get_cpu(mon);
>  }
> 



Re: Prefetches in buffer_zero_*

2021-07-22 Thread Dr. David Alan Gilbert
* Richard Henderson (richard.hender...@linaro.org) wrote:
> On 7/22/21 12:02 AM, Dr. David Alan Gilbert wrote:
> > Hi Richard,
> >I think you were the last person to fiddle with the prefetching
> > in buffer_zero_avx2 and friends; Joe (cc'd) wondered if explicit
> > prefetching still made sense on modern CPUs, and that their hardware
> > generally figures stuff out better on simple increments.
> > 
> >What was your thinking on this, and did you actually measure
> > any improvement?
> 
> Ah, well, that was 5 years ago so I have no particular memory of this.  It
> wouldn't surprise me if you can't measure any improvement on modern
> hardware.
> 
> Do you now measure an improvement with the prefetches gone?

Not tried, it just came from Joe's suggestion that it was generally a
bad idea these days; I do remember that the behaviour of those functions
is quite tricky because there performance is VERY data dependent - many
VMs actually have pages that are quite dirty so you never iterate the
loop, but then you hit others with big zero pages and you spend your
entire life in the loop.

Dave
> 
> r~
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PATCH for-6.1 2/2] docs: Move licence/copyright from HTML output to rST comments

2021-07-22 Thread Peter Maydell
Our built HTML documentation now has a standard footer which
gives the license for QEMU (and its documentation as a whole).
In almost all pages, we either don't bother to state the
copyright/license for the individual rST sources, or we put
it in an rST comment. There are just three pages which render
copyright or license information into the user-visible HTML.

Quoting a specific (different) license for an individual HTML
page within the manual is confusing. Downgrade the license
and copyright info to a comment within the rST source, bringing
these pages in line with the rest of our documents.

Suggested-by: Markus Armbruster 
Signed-off-by: Peter Maydell 
---
 docs/interop/vhost-user-gpu.rst |  7 ---
 docs/interop/vhost-user.rst | 12 +++-
 docs/system/generic-loader.rst  |  4 ++--
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/docs/interop/vhost-user-gpu.rst b/docs/interop/vhost-user-gpu.rst
index 3268bf405ce..71a2c52b313 100644
--- a/docs/interop/vhost-user-gpu.rst
+++ b/docs/interop/vhost-user-gpu.rst
@@ -2,9 +2,10 @@
 Vhost-user-gpu Protocol
 ===
 
-:Licence: This work is licensed under the terms of the GNU GPL,
-  version 2 or later. See the COPYING file in the top-level
-  directory.
+..
+  Licence: This work is licensed under the terms of the GNU GPL,
+   version 2 or later. See the COPYING file in the top-level
+   directory.
 
 .. contents:: Table of Contents
 
diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index d6085f70452..d63f8d3f93a 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1,11 +1,13 @@
 ===
 Vhost-user Protocol
 ===
-:Copyright: 2014 Virtual Open Systems Sarl.
-:Copyright: 2019 Intel Corporation
-:Licence: This work is licensed under the terms of the GNU GPL,
-  version 2 or later. See the COPYING file in the top-level
-  directory.
+
+..
+  Copyright 2014 Virtual Open Systems Sarl.
+  Copyright 2019 Intel Corporation
+  Licence: This work is licensed under the terms of the GNU GPL,
+   version 2 or later. See the COPYING file in the top-level
+   directory.
 
 .. contents:: Table of Contents
 
diff --git a/docs/system/generic-loader.rst b/docs/system/generic-loader.rst
index 531ddbc8e34..4f9fb005f1d 100644
--- a/docs/system/generic-loader.rst
+++ b/docs/system/generic-loader.rst
@@ -1,8 +1,8 @@
 ..
Copyright (c) 2016, Xilinx Inc.
 
-This work is licensed under the terms of the GNU GPL, version 2 or later.  See
-the COPYING file in the top-level directory.
+   This work is licensed under the terms of the GNU GPL, version 2 or later.  
See
+   the COPYING file in the top-level directory.
 
 Generic Loader
 --
-- 
2.20.1




[PATCH for-6.1 0/2] docs license footer followon cleanups

2021-07-22 Thread Peter Maydell
This patchset makes a couple of followon cleanups now that
commits 13f934e79fa and 3a50c8f3067aaf are in master and our HTML
documentation has a footer to all pages stating the QEMU license
and version:
 * it removes the TODO comments, because we've now done them
 * three .rst files were rendering their own copyright or
   license information into the user-visible HTML, which is
   a bit confusing now that we also do this in the page footer;
   patch 2 brings those files into line with the others by having
   the comment/license only in a rST comment

thanks
-- PMM

Peter Maydell (2):
  docs: Remove stale TODO comments about license and version
  docs: Move licence/copyright from HTML output to rST comments

 docs/interop/qemu-ga-ref.rst |  9 -
 docs/interop/qemu-qmp-ref.rst|  9 -
 docs/interop/qemu-storage-daemon-qmp-ref.rst |  9 -
 docs/interop/vhost-user-gpu.rst  |  7 ---
 docs/interop/vhost-user.rst  | 12 +++-
 docs/system/generic-loader.rst   |  4 ++--
 6 files changed, 13 insertions(+), 37 deletions(-)

-- 
2.20.1




Re: [PATCH v3 2/5] migration: Make from_dst_file accesses thread-safe

2021-07-22 Thread Peter Xu
On Thu, Jul 22, 2021 at 01:58:38PM -0400, Peter Xu wrote:
> Accessing from_dst_file is potentially racy in current code base like below:
> 
>   if (s->from_dst_file)
> do_something(s->from_dst_file);
> 
> Because from_dst_file can be reset right after the check in another
> thread (rp_thread).  One example is migrate_fd_cancel().
> 
> Use the same qemu_file_lock to protect it too, just like to_dst_file.
> 
> When it's safe to access without lock, comment it.
> 
> There's one special reference in migration_thread() that can be replaced by
> the newly introduced rp_thread_created flag.
> 
> Reported-by: Dr. David Alan Gilbert 
> Reviewed-by: Dr. David Alan Gilbert 
> Signed-off-by: Peter Xu 
> ---

(Dave should have helped fixing this which I appreciated a lot, but just make
 it be together with the record..)

Below needs to be squashed into the patch:

diff --git a/migration/migration.c b/migration/migration.c
index a50330016c..041b8451a6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2696,7 +2696,7 @@ static void 
migration_release_from_dst_file(MigrationState *ms)
 {
 QEMUFile *file;

-WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
+WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
 /*
  * Reset the from_dst_file pointer first before releasing it, as we
  * can't block within lock section

-- 
Peter Xu




Re: [PATCH for-6.1 0/3] docs: Document arm mainstone, kzm, imx25-pdk

2021-07-22 Thread Richard Henderson

On 7/22/21 7:52 AM, Peter Maydell wrote:

Peter Maydell (3):
   docs: Add documentation of Arm 'mainstone' board
   docs: Add documentation of Arm 'kzm' board
   docs: Add documentation of Arm 'imx25-pdk' board


Reviewed-by: Richard Henderson 

r~



Re: [PATCH for-6.1 0/2] docs license footer followon cleanups

2021-07-22 Thread Marc-André Lureau
On Thu, Jul 22, 2021 at 11:23 PM Peter Maydell 
wrote:

> This patchset makes a couple of followon cleanups now that
> commits 13f934e79fa and 3a50c8f3067aaf are in master and our HTML
> documentation has a footer to all pages stating the QEMU license
> and version:
>  * it removes the TODO comments, because we've now done them
>  * three .rst files were rendering their own copyright or
>license information into the user-visible HTML, which is
>a bit confusing now that we also do this in the page footer;
>patch 2 brings those files into line with the others by having
>the comment/license only in a rST comment
>
> thanks
> -- PMM
>
> Peter Maydell (2):
>   docs: Remove stale TODO comments about license and version
>   docs: Move licence/copyright from HTML output to rST comments
>

Reviewed-by: Marc-André Lureau 


>  docs/interop/qemu-ga-ref.rst |  9 -
>  docs/interop/qemu-qmp-ref.rst|  9 -
>  docs/interop/qemu-storage-daemon-qmp-ref.rst |  9 -
>  docs/interop/vhost-user-gpu.rst  |  7 ---
>  docs/interop/vhost-user.rst  | 12 +++-
>  docs/system/generic-loader.rst   |  4 ++--
>  6 files changed, 13 insertions(+), 37 deletions(-)
>
> --
> 2.20.1
>
>
>

-- 
Marc-André Lureau


Re: [PATCH] MAINTAINERS: Don't list Andrzej Zaborowski for various components

2021-07-22 Thread Richard Henderson

On 7/22/21 8:09 AM, Peter Maydell wrote:

Andrzej Zaborowski is listed as an "Odd Fixes" maintainer for the
nSeries, Palm and PXA2XX boards, as well as the "Maintained" status
Arm 32-bit TCG backend.

Andrzej's last email to qemu-devel was back in 2017, and the email
before that was all the way back in 2013.  We don't really need to
fill his email up with CCs on QEMU patches any more...

Remove Andrzej from the various boards sections (leaving them still
Odd Fixes with me as the backup patch reviewer).  Add Richard
Henderson as the maintainer for the Arm TCG backend, since removing
Andrzej would otherwise leave that section with no M: line at all.

Signed-off-by: Peter Maydell 
---
Andrzej: if you're reading this, thanks for all the work you did
on QEMU back in the day; and if you do want to still be CCd on
patches let me know and I'll happily drop this MAINTAINERS update.

Richard: are you happy with (a) being listed for Arm TCG and
(b) it being "Maintained" status?


Yep.
Reviewed-by: Richard Henderson 

r~



Re: [PATCH for-6.1 v2] machine: Disallow specifying topology parameters as zero

2021-07-22 Thread Cleber Rosa


Yanan Wang writes:

> In the SMP configuration, we should either specify a topology
> parameter with a reasonable value (equal to or greater than 1)
> or just leave it omitted and QEMU will calculate its value.
> Configurations which explicitly specify the topology parameters
> as zero like "sockets=0" are meaningless, so disallow them.
>
> However, the commit 1e63fe685804d
> (machine: pass QAPI struct to mc->smp_parse) has documented that
> '0' has the same semantics as omitting a parameter in the qapi
> comment for SMPConfiguration. So this patch fixes the doc and
> also adds the corresponding sanity check in the smp parsers.
>
> Suggested-by: Andrew Jones 
> Signed-off-by: Yanan Wang 
> ---
>  hw/core/machine.c | 14 ++
>  qapi/machine.json |  6 +++---
>  qemu-options.hx   | 12 +++-
>  3 files changed, 24 insertions(+), 8 deletions(-)

Hi Yanan,

This looks somewhat similar to this very old patch of mine:

   https://mail.gnu.org/archive/html/qemu-devel/2020-10/msg03039.html

I'm putting a reference here because I believe the test can be salvaged
and slightly adapted for this patch of yours.

Let me know if I can help anyhow.

Thanks,
- Cleber.




Re: [PATCH v2 03/22] target/loongarch: Add core definition

2021-07-22 Thread Richard Henderson

On 7/20/21 11:52 PM, Song Gao wrote:

This patch add target state header, target definitions
and initialization routines.

Signed-off-by: Song Gao 
---
  target/loongarch/cpu-param.h |  21 
  target/loongarch/cpu-qom.h   |  40 ++
  target/loongarch/cpu.c   | 293 +++
  target/loongarch/cpu.h   | 265 ++
  4 files changed, 619 insertions(+)
  create mode 100644 target/loongarch/cpu-param.h
  create mode 100644 target/loongarch/cpu-qom.h
  create mode 100644 target/loongarch/cpu.c
  create mode 100644 target/loongarch/cpu.h

diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h
new file mode 100644
index 000..582ee29
--- /dev/null
+++ b/target/loongarch/cpu-param.h
@@ -0,0 +1,21 @@
+/*
+ * LoongArch cpu parameters for qemu.
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#ifndef LOONGARCH_CPU_PARAM_H
+#define LOONGARCH_CPU_PARAM_H 1
+
+#ifdef TARGET_LOONGARCH64
+#define TARGET_LONG_BITS 64


Why the ifdef for TARGET_LOONGARCH64?
Nothing will compile without that set.


+#ifdef CONFIG_TCG
+static void loongarch_cpu_synchronize_from_tb(CPUState *cs,
+  const TranslationBlock *tb)
+{
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+CPULoongArchState *env = >env;
+
+env->active_tc.PC = tb->pc;
+env->hflags &= ~LOONGARCH_HFLAG_BMASK;
+env->hflags |= tb->flags & LOONGARCH_HFLAG_BMASK;
+}


Loongarch has no branch delay slots, so you should not have replicated the mips branch 
delay slot handling.  There should be no BMASK at all.



+#ifdef CONFIG_TCG
+#include "hw/core/tcg-cpu-ops.h"
+
+static struct TCGCPUOps loongarch_tcg_ops = {
+.initialize = loongarch_tcg_init,
+.synchronize_from_tb = loongarch_cpu_synchronize_from_tb,
+};
+#endif /* CONFIG_TCG */


May I presume that Loongarch has virtualization hardware, and will eventually support KVM? 
 If not, there is no need for CONFIG_TCG anywhere.



+#define TCG_GUEST_DEFAULT_MO (0)
+#define UNASSIGNED_CPU_ID 0x
+
+typedef union fpr_t fpr_t;
+union fpr_t {
+float64  fd;   /* ieee double precision */
+float32  fs[2];/* ieee single precision */
+uint64_t d;/* binary double fixed-point */
+uint32_t w[2]; /* binary single fixed-point */
+};


For what it's worth, we already have a CPU_DoubleU type that could be used.  But frankly, 
float64 *is* uint64_t, so there's very little use in putting them together into a union. 
It would seem that you don't even use fs and w for more than fpu_dump_state, and you're 
even doing it wrong there.



+typedef struct CPULoongArchFPUContext CPULoongArchFPUContext;
+struct CPULoongArchFPUContext {
+/* Floating point registers */
+fpr_t fpr[32];
+float_status fp_status;
+
+bool cf[8];
+/*
+ * fcsr0
+ * 31:29 |28:24 |23:21 |20:16 |15:10 |9:8 |7  |6  |5 |4:0
+ *Cause Flags RM   DAE TM Enables
+ */
+uint32_t fcsr0;
+uint32_t fcsr0_mask;
+uint32_t vcsr16;
+
+#define FCSR0_M10xdf /* FCSR1 mask, DAE, TM and Enables */
+#define FCSR0_M20x1f1f   /* FCSR2 mask, Cause and Flags */
+#define FCSR0_M30x300/* FCSR3 mask, Round Mode */
+#define FCSR0_RM8/* Round Mode bit num on fcsr0 */
+#define GET_FP_CAUSE(reg)(((reg) >> 24) & 0x1f)
+#define GET_FP_ENABLE(reg)   (((reg) >>  0) & 0x1f)
+#define GET_FP_FLAGS(reg)(((reg) >> 16) & 0x1f)
+#define SET_FP_CAUSE(reg, v)  do { (reg) = ((reg) & ~(0x1f << 24)) | \
+   ((v & 0x1f) << 24);   \
+ } while (0)
+#define SET_FP_ENABLE(reg, v) do { (reg) = ((reg) & ~(0x1f <<  0)) | \
+   ((v & 0x1f) << 0);\
+ } while (0)
+#define SET_FP_FLAGS(reg, v)  do { (reg) = ((reg) & ~(0x1f << 16)) | \
+   ((v & 0x1f) << 16);   \
+ } while (0)
+#define UPDATE_FP_FLAGS(reg, v)   do { (reg) |= ((v & 0x1f) << 16); } while (0)
+#define FP_INEXACT1
+#define FP_UNDERFLOW  2
+#define FP_OVERFLOW   4
+#define FP_DIV0   8
+#define FP_INVALID16
+};
+
+#define TARGET_INSN_START_EXTRA_WORDS 2
+#define LOONGARCH_FPU_MAX 1
+#define N_IRQS  14
+
+enum loongarch_feature {
+LA_FEATURE_3A5000,
+};
+
+typedef struct TCState TCState;
+struct TCState {
+target_ulong gpr[32];
+target_ulong PC;
+};
+
+typedef struct CPULoongArchState CPULoongArchState;
+struct CPULoongArchState {
+TCState active_tc;
+CPULoongArchFPUContext active_fpu;


Please don't replicate the mips foolishness with active_tc and active_fpu.  There is no 
inactive_fpu with which to contrast this.  Just include these fields directly into the 
main 

Re: [PATCH v2 04/22] target/loongarch: Add interrupt handling support

2021-07-22 Thread Richard Henderson

On 7/20/21 11:53 PM, Song Gao wrote:

+bool loongarch_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
+{
+if (interrupt_request & CPU_INTERRUPT_HARD) {
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+CPULoongArchState *env = >env;
+
+if (cpu_loongarch_hw_interrupts_enabled(env) &&
+cpu_loongarch_hw_interrupts_pending(env)) {
+cs->exception_index = EXCP_INTE;
+env->error_code = 0;
+loongarch_cpu_do_interrupt(cs);
+return true;
+}
+}
+return false;
+}


Not sure what you're doing here, with user-only.  None of these conditions 
apply.


r~



Re: [PATCH v2 05/22] target/loongarch: Add memory management support

2021-07-22 Thread Richard Henderson

On 7/20/21 11:53 PM, Song Gao wrote:

This patch introduces one memory-management-related functions
- loongarch_cpu_tlb_fill()

Signed-off-by: Song Gao 
---
  target/loongarch/cpu.c|   1 +
  target/loongarch/cpu.h|   9 
  target/loongarch/tlb_helper.c | 103 ++
  3 files changed, 113 insertions(+)
  create mode 100644 target/loongarch/tlb_helper.c

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 8eaa778..6269dd9 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -269,6 +269,7 @@ static struct TCGCPUOps loongarch_tcg_ops = {
  .initialize = loongarch_tcg_init,
  .synchronize_from_tb = loongarch_cpu_synchronize_from_tb,
  .cpu_exec_interrupt = loongarch_cpu_exec_interrupt,
+.tlb_fill = loongarch_cpu_tlb_fill,
  };
  #endif /* CONFIG_TCG */
  
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h

index 1db8bb5..5c06122 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -287,4 +287,13 @@ static inline void compute_hflags(CPULoongArchState *env)
  
  const char *loongarch_exception_name(int32_t exception);
  
+/* tlb_helper.c */

+bool loongarch_cpu_tlb_fill(CPUState *cs,
+vaddr address,
+int size,
+MMUAccessType access_type,
+int mmu_idx,
+bool probe,
+uintptr_t retaddr);
+
  #endif /* LOONGARCH_CPU_H */
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
new file mode 100644
index 000..b59a995
--- /dev/null
+++ b/target/loongarch/tlb_helper.c
@@ -0,0 +1,103 @@
+/*
+ * LoongArch tlb emulation helpers for qemu.
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "cpu-csr.h"
+#include "exec/helper-proto.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+#include "exec/log.h"
+
+enum {
+TLBRET_PE = -7,
+TLBRET_XI = -6,
+TLBRET_RI = -5,
+TLBRET_DIRTY = -4,
+TLBRET_INVALID = -3,
+TLBRET_NOMATCH = -2,
+TLBRET_BADADDR = -1,
+TLBRET_MATCH = 0
+};
+
+static void raise_mmu_exception(CPULoongArchState *env, target_ulong address,
+MMUAccessType access_type, int tlb_error)
+{
+CPUState *cs = env_cpu(env);
+int exception = 0, error_code = 0;
+
+if (access_type == MMU_INST_FETCH) {
+error_code |= INST_INAVAIL;
+}
+
+switch (tlb_error) {
+default:
+case TLBRET_BADADDR:
+exception = EXCP_ADE;
+break;
+case TLBRET_NOMATCH:
+/* No TLB match for a mapped address */
+if (access_type == MMU_DATA_STORE) {
+exception = EXCP_TLBS;
+} else {
+exception = EXCP_TLBL;
+}
+error_code |= TLB_NOMATCH;
+break;
+case TLBRET_INVALID:
+/* TLB match with no valid bit */
+if (access_type == MMU_DATA_STORE) {
+exception = EXCP_TLBS;
+} else {
+exception = EXCP_TLBL;
+}
+break;
+case TLBRET_DIRTY:
+exception = EXCP_TLBM;
+break;
+case TLBRET_XI:
+/* Execute-Inhibit Exception */
+exception = EXCP_TLBXI;
+break;
+case TLBRET_RI:
+/* Read-Inhibit Exception */
+exception = EXCP_TLBRI;
+break;
+case TLBRET_PE:
+/* Privileged Exception */
+exception = EXCP_TLBPE;
+break;
+}
+
+if (tlb_error == TLBRET_NOMATCH) {
+env->CSR_TLBRBADV = address;
+env->CSR_TLBREHI = address & (TARGET_PAGE_MASK << 1);
+cs->exception_index = exception;
+env->error_code = error_code;
+return;
+}
+
+/* Raise exception */
+env->CSR_BADV = address;
+cs->exception_index = exception;
+env->error_code = error_code;
+env->CSR_TLBEHI = address & (TARGET_PAGE_MASK << 1);
+}
+
+bool loongarch_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
+   MMUAccessType access_type, int mmu_idx,
+   bool probe, uintptr_t retaddr)
+{
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+CPULoongArchState *env = >env;
+int ret = TLBRET_BADADDR;
+
+/* data access */
+raise_mmu_exception(env, address, access_type, ret);
+do_raise_exception_err(env, cs->exception_index, env->error_code, retaddr);
+}


Again, almost all of this does not apply for user-only.

r~








Re: [PATCH v2 06/22] target/loongarch: Add main translation routines

2021-07-22 Thread Richard Henderson

On 7/20/21 11:53 PM, Song Gao wrote:

+/* General purpose registers moves. */
+void gen_load_gpr(TCGv t, int reg)
+{
+if (reg == 0) {
+tcg_gen_movi_tl(t, 0);
+} else {
+tcg_gen_mov_tl(t, cpu_gpr[reg]);
+}
+}


Please have a look at

https://patchew.org/QEMU/20210709042608.883256-1-richard.hender...@linaro.org/

for a better way to handle the zero register.



+static inline void save_cpu_state(DisasContext *ctx, int do_save_pc)
+{
+if (do_save_pc && ctx->base.pc_next != ctx->saved_pc) {
+gen_save_pc(ctx->base.pc_next);
+ctx->saved_pc = ctx->base.pc_next;
+}
+if (ctx->hflags != ctx->saved_hflags) {
+tcg_gen_movi_i32(hflags, ctx->hflags);
+ctx->saved_hflags = ctx->hflags;
+switch (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
+case LOONGARCH_HFLAG_BR:
+break;
+case LOONGARCH_HFLAG_BC:
+case LOONGARCH_HFLAG_B:
+tcg_gen_movi_tl(btarget, ctx->btarget);
+break;
+}
+}
+}


Drop all the hflags handling.
It's all copied from mips delay slot handling.


+
+static inline void restore_cpu_state(CPULoongArchState *env, DisasContext *ctx)
+{
+ctx->saved_hflags = ctx->hflags;
+switch (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
+case LOONGARCH_HFLAG_BR:
+break;
+case LOONGARCH_HFLAG_BC:
+case LOONGARCH_HFLAG_B:
+ctx->btarget = env->btarget;
+break;
+}
+}


Likewise.


+static void gen_load_fpr32h(TCGv_i32 t, int reg)
+{
+tcg_gen_extrh_i64_i32(t, fpu_f64[reg]);
+}
+
+static void gen_store_fpr32h(TCGv_i32 t, int reg)
+{
+TCGv_i64 t64 = tcg_temp_new_i64();
+tcg_gen_extu_i32_i64(t64, t);
+tcg_gen_deposit_i64(fpu_f64[reg], fpu_f64[reg], t64, 32, 32);
+tcg_temp_free_i64(t64);
+}


There is no general-purpose high-part fpr stuff.  There's only movgr2frh and movfrh2gr, 
and you can simplify both if you drop the transition through TCGv_i32.



+void gen_op_addr_add(TCGv ret, TCGv arg0, TCGv arg1)
+{
+tcg_gen_add_tl(ret, arg0, arg1);
+}


No point in this, since loongarch has no 32-bit address mode.


+void gen_base_offset_addr(TCGv addr, int base, int offset)
+{
+if (base == 0) {
+tcg_gen_movi_tl(addr, offset);
+} else if (offset == 0) {
+gen_load_gpr(addr, base);
+} else {
+tcg_gen_movi_tl(addr, offset);
+gen_op_addr_add(addr, cpu_gpr[base], addr);
+}
+}


Using the interfaces I quote above from my riscv cleanup,
this can be tidied to

tcg_gen_addi_tl(addr, gpr_src(base), offset);


+static inline bool use_goto_tb(DisasContext *ctx, target_ulong dest)
+{
+return true;
+}


You must now use translate_use_goto_tb, which will not always return true.  You will see 
assertion failures otherwise.



+static inline void clear_branch_hflags(DisasContext *ctx)
+{
+ctx->hflags &= ~LOONGARCH_HFLAG_BMASK;
+if (ctx->base.is_jmp == DISAS_NEXT) {
+save_cpu_state(ctx, 0);
+} else {
+/*
+ * It is not safe to save ctx->hflags as hflags may be changed
+ * in execution time.
+ */
+tcg_gen_andi_i32(hflags, hflags, ~LOONGARCH_HFLAG_BMASK);
+}
+}


Not required.


+static void gen_branch(DisasContext *ctx, int insn_bytes)
+{
+if (ctx->hflags & LOONGARCH_HFLAG_BMASK) {
+int proc_hflags = ctx->hflags & LOONGARCH_HFLAG_BMASK;
+/* Branches completion */
+clear_branch_hflags(ctx);
+ctx->base.is_jmp = DISAS_NORETURN;
+switch (proc_hflags & LOONGARCH_HFLAG_BMASK) {
+case LOONGARCH_HFLAG_B:
+/* unconditional branch */
+gen_goto_tb(ctx, 0, ctx->btarget);
+break;
+case LOONGARCH_HFLAG_BC:
+/* Conditional branch */
+{
+TCGLabel *l1 = gen_new_label();
+
+tcg_gen_brcondi_tl(TCG_COND_NE, bcond, 0, l1);
+gen_goto_tb(ctx, 1, ctx->base.pc_next + insn_bytes);
+gen_set_label(l1);
+gen_goto_tb(ctx, 0, ctx->btarget);
+}
+break;
+case LOONGARCH_HFLAG_BR:
+/* unconditional branch to register */
+tcg_gen_mov_tl(cpu_PC, btarget);
+tcg_gen_lookup_and_goto_ptr();
+break;
+default:
+fprintf(stderr, "unknown branch 0x%x\n", proc_hflags);
+abort();
+}
+}
+}


Split this up into the various trans_* branch routines, without the setting of 
HFLAG.


+static void loongarch_tr_init_disas_context(DisasContextBase *dcbase,
+CPUState *cs)
+{
+DisasContext *ctx = container_of(dcbase, DisasContext, base);
+CPULoongArchState *env = cs->env_ptr;
+
+ctx->page_start = ctx->base.pc_first & TARGET_PAGE_MASK;
+ctx->saved_pc = -1;
+ctx->btarget = 0;
+/* Restore state from the tb context.  */
+ctx->hflags = (uint32_t)ctx->base.tb->flags;
+restore_cpu_state(env, ctx);
+

Re: [PATCH for-6.1 0/1] machine: Disallow specifying topology parameters as zero

2021-07-22 Thread wangyanan (Y)

On 2021/7/22 14:02, Cornelia Huck wrote:

On Thu, Jul 22 2021, Yanan Wang  wrote:


In the SMP configuration, we should either specify a topology
parameter with a reasonable value (equal to or greater than 1)
or just leave it omitted and QEMU will calculate its value.
Configurations which explicitly specify the topology parameters
as zero like "sockets=0" are meaningless, so disallow them.

However; the commit 1e63fe685804d
(machine: pass QAPI struct to mc->smp_parse) has documented that
'0' has the same semantics as omitting a parameter in the qapi
comment for SMPConfiguration. So this patch fixes the doc and
also adds the corresponding sanity check in the smp parsers.

Are we expecting any real users to have used that 'parameter=0'
behaviour? I expect that libvirt and other management software already
did the right thing; unfortunately, sometimes weird configuration lines
tend to persist in search results.

I think there may not any users who have already used a configuration
with explicit "parameter=0", instead it should have been just omitted.
Yes, libvirt now rejects "parameter=0" when parsing XML, but we now
still allows "parameter=0" in the direct QEMU cmdlines. If we hope to
disallow this kind of config thoroughly, we'd better also have a sanity
check in QEMU.

Thanks,
Yanan

This patch originly comes form [1], and it was suggested that
this patch fixing the doc should be sent for 6.1 to avoid a
deprecation process in the future.

[1] https://lore.kernel.org/qemu-devel/ypwsthpiza3mf...@redhat.com/

Yanan Wang (1):
   machine: Disallow specifying topology parameters as zero

  hw/core/machine.c | 30 ++
  hw/i386/pc.c  | 33 -
  qapi/machine.json |  6 +++---
  3 files changed; 49 insertions(+); 20 deletions(-)

.





[PATCH 0/2] acpi: pcihp: fix hotplug when bridge is wired to function > 0

2021-07-22 Thread Igor Mammedov
For full description see 2/2.
Tested hotplug on Q35 (see 2/2 for reproducer) and PC (with pci-bridge) machines

Igor Mammedov (2):
  acpi: x86: pcihp: cleanup devfn usage in
build_append_pci_bus_devices()
  acpi: x86: pcihp: add support hotplug on multifunction bridges

 hw/i386/acpi-build.c | 42 +-
 1 file changed, 29 insertions(+), 13 deletions(-)

-- 
2.27.0




[PATCH 1/2] acpi: x86: pcihp: cleanup devfn usage in build_append_pci_bus_devices()

2021-07-22 Thread Igor Mammedov
Signed-off-by: Igor Mammedov 
---
 hw/i386/acpi-build.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 17836149fe..b40e284b72 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -374,7 +374,7 @@ static void build_append_pci_bus_devices(Aml *parent_scope, 
PCIBus *bus,
 Aml *dev, *notify_method = NULL, *method;
 QObject *bsel;
 PCIBus *sec;
-int i;
+int devfn;
 
 bsel = object_property_get_qobject(OBJECT(bus), ACPI_PCIHP_PROP_BSEL, 
NULL);
 if (bsel) {
@@ -384,11 +384,11 @@ static void build_append_pci_bus_devices(Aml 
*parent_scope, PCIBus *bus,
 notify_method = aml_method("DVNT", 2, AML_NOTSERIALIZED);
 }
 
-for (i = 0; i < ARRAY_SIZE(bus->devices); i += PCI_FUNC_MAX) {
+for (devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
 DeviceClass *dc;
 PCIDeviceClass *pc;
-PCIDevice *pdev = bus->devices[i];
-int slot = PCI_SLOT(i);
+PCIDevice *pdev = bus->devices[devfn];
+int slot = PCI_SLOT(devfn);
 bool hotplug_enabled_dev;
 bool bridge_in_acpi;
 bool cold_plugged_bridge;
@@ -525,13 +525,12 @@ static void build_append_pci_bus_devices(Aml 
*parent_scope, PCIBus *bus,
 /* Notify about child bus events in any case */
 if (pcihp_bridge_en) {
 QLIST_FOREACH(sec, >child, sibling) {
-int32_t devfn = sec->parent_dev->devfn;
-
 if (pci_bus_is_root(sec)) {
 continue;
 }
 
-aml_append(method, aml_name("^S%.02X.PCNT", devfn));
+aml_append(method, aml_name("^S%.02X.PCNT",
+sec->parent_dev->devfn));
 }
 }
 
-- 
2.27.0




Re: [PATCH for-6.1?] iotest: Further enhance qemu-img-bitmaps

2021-07-22 Thread Vladimir Sementsov-Ogievskiy

21.07.2021 23:46, Eric Blake wrote:

Add a regression test to make sure we detect attempts to use 'qemu-img
bitmap' to modify an in-use local file.

Suggested-by: Nir Soffer 
Signed-off-by: Eric Blake 
---

Sadly, this missed my bitmaps pull request today.  If there's any
reason to respin that pull request, I'm inclined to add this in, as it
just touches the iotests; otherwise, if it slips to 6.2 it's not too
bad.

  tests/qemu-iotests/tests/qemu-img-bitmaps | 6 ++
  tests/qemu-iotests/tests/qemu-img-bitmaps.out | 5 +
  2 files changed, 11 insertions(+)

diff --git a/tests/qemu-iotests/tests/qemu-img-bitmaps 
b/tests/qemu-iotests/tests/qemu-img-bitmaps
index 7a3fe8c3d37a..3b6fade11735 100755
--- a/tests/qemu-iotests/tests/qemu-img-bitmaps
+++ b/tests/qemu-iotests/tests/qemu-img-bitmaps
@@ -129,6 +129,12 @@ $QEMU_IMG map --output=json --image-opts \
  $QEMU_IMG map --output=json --image-opts \
  "$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b3" | _filter_qemu_img_map

+echo
+echo "=== bitmap command fails to modify image already in use ==="
+echo
+
+$QEMU_IMG bitmap --add "$TEST_IMG" b4 2>&1 | _filter_testdir
+
  nbd_server_stop

  echo
diff --git a/tests/qemu-iotests/tests/qemu-img-bitmaps.out 
b/tests/qemu-iotests/tests/qemu-img-bitmaps.out
index e851f0320ecb..c6e12dd700aa 100644
--- a/tests/qemu-iotests/tests/qemu-img-bitmaps.out
+++ b/tests/qemu-iotests/tests/qemu-img-bitmaps.out
@@ -116,6 +116,11 @@ Format specific information:
  { "start": 2097152, "length": 1048576, "depth": 0, "present": false, "zero": false, 
"data": false},
  { "start": 3145728, "length": 7340032, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET}]

+=== bitmap command fails to modify image already in use ===
+
+qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get "write" lock
+Is another process using the image [TEST_DIR/t.qcow2]?
+
  === Check handling of inconsistent bitmap ===

  image: TEST_DIR/t.IMGFMT




I'm not against, but why you decided to add such a test? Lock should work for 
any operation, what's the reason to check bitmaps separately? Or it was broken 
recently?

anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



[PATCH for-6.1? 0/6] mirror: Handle errors after READY cancel

2021-07-22 Thread Max Reitz
Hi,

This is a rather complex series with changes that aren’t exactly local
to the mirror job, so maybe it’s too complex for 6.1.

However, it is a bug fix, and not an insignificant one, though probably
not a regression of any kind.

Bug report:
https://gitlab.com/qemu-project/qemu/-/issues/462

(I didn’t put any “Fixes:” or “Resolves:” into the commit messages,
because there is no single patch here that fixes the bug.)

The root of the problem is that if you cancel a mirror job during its
READY phase, any kind of I/O error (with the error action 'stop') is
likely to not be handled gracefully, which means that perhaps the job
will just loop forever without pausing, doing nothing but emitting
errors.  There is no way to stop the job, or cancel it with force=true,
and so you also cannot quit qemu normally, because, well, cancelling the
job doesn’t do anything.  So you have to kill qemu to stop the mess.

If you’re lucky, the error is transient.  Then qemu will just kill
itself with a failed assertion, because it’ll try a READY -> READY
transition, which isn’t allowed.

There are a couple of problems contributing to it all:

(1) The READY -> READY transition comes from the fact that we will enter
the READY state whenever source and target are synced, and whenever
s->synced is false.  I/O errors reset s->synced.  I believe they
shouldn’t.
(Patch 1)

(2) Quitting qemu doesn’t force-cancel jobs.  I don’t understand why.
If for all jobs but mirror we want them to be cancelled and not
properly completed, why do we want mirror to get a consistent
result?  (Which is what cancel with force=false gives you.)
I believe we actually don’t care, and so on many occasions where we
invoke job_cancel_sync() and job_cancel_sync_all(), we want to
force-cancel the job(s) in question.
(Patch 2)

(3) Cancelling mirror post-READY with force=false is actually not really
cancelling the job.  It’s a different completion mode.  The job
should run like any normal job, it shouldn’t be special-cased.
However, we have a couple of places that special-case cancelled job
because they believe that such jobs are on their way to definite
termination.  For example, we don’t allow pausing cancelled jobs.
We definitely do want to allow pausing a mirror post-READY job that
is being non-force-cancelled.  The job may still take an arbitrary
amount of time, so it absolutely should be pausable.
(Patches 3, 4)

(4) Mirror only checks whether it’s been force-cancelled at the bottom
of its main loop, after several `continue`s.  Therefore, if flushing
fails (and it then `continue`s), that check will be skipped.  If
flushing fails continuously, the job cannot be force-cancelled.
(Patch 5)


Max Reitz (6):
  mirror: Keep s->synced on error
  job: @force parameter for job_cancel_sync{,_all}()
  jobs: Give Job.force_cancel more meaning
  job: Add job_cancel_requested()
  mirror: Check job_is_cancelled() earlier
  iotests: Add mirror-ready-cancel-error test

 include/qemu/job.h|  29 +++-
 block/backup.c|   3 +-
 block/mirror.c|  35 +++--
 block/replication.c   |   4 +-
 blockdev.c|   4 +-
 job.c |  46 --
 qemu-nbd.c|   2 +-
 softmmu/runstate.c|   2 +-
 storage-daemon/qemu-storage-daemon.c  |   2 +-
 tests/unit/test-block-iothread.c  |   2 +-
 tests/unit/test-blockjob.c|   2 +-
 tests/qemu-iotests/109.out|  60 +++-
 .../tests/mirror-ready-cancel-error   | 143 ++
 .../tests/mirror-ready-cancel-error.out   |   5 +
 tests/qemu-iotests/tests/qsd-jobs.out |   2 +-
 15 files changed, 262 insertions(+), 79 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/mirror-ready-cancel-error
 create mode 100644 tests/qemu-iotests/tests/mirror-ready-cancel-error.out

-- 
2.31.1




[PATCH for-6.1? 1/6] mirror: Keep s->synced on error

2021-07-22 Thread Max Reitz
An error does not take us out of the READY phase, which is what
s->synced signifies.  It does of course mean that source and target are
no longer in sync, but that is what s->actively_sync is for -- s->synced
never meant that source and target are in sync, only that they were at
some point (and at that point we transitioned into the READY phase).

The tangible problem is that we transition to READY once we are in sync
and s->synced is false.  By resetting s->synced here, we will transition
from READY to READY once the error is resolved (if the job keeps
running), and that transition is not allowed.

Signed-off-by: Max Reitz 
---
 block/mirror.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 98fc66eabf..d73b704473 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -121,7 +121,6 @@ typedef enum MirrorMethod {
 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
 int error)
 {
-s->synced = false;
 s->actively_synced = false;
 if (read) {
 return block_job_error_action(>common, s->on_source_error,
-- 
2.31.1




[PATCH v2 2/6] util/oslib-posix: Introduce and use MemsetContext for touch_all_pages()

2021-07-22 Thread David Hildenbrand
Let's minimize the number of global variables to prepare for
os_mem_prealloc() getting called concurrently and make the code a bit
easier to read.

The only consumer that really needs a global variable is the sigbus
handler, which will require protection via a mutex in the future either way
as we cannot concurrently mess with the SIGBUS handler.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: David Hildenbrand 
---
 util/oslib-posix.c | 81 --
 1 file changed, 50 insertions(+), 31 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 1cb80bf94c..2ae6c3aaab 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -73,22 +73,30 @@
 
 #define MAX_MEM_PREALLOC_THREAD_COUNT 16
 
+struct MemsetThread;
+
+typedef struct MemsetContext {
+bool all_threads_created;
+bool any_thread_failed;
+struct MemsetThread *threads;
+int num_threads;
+} MemsetContext;
+
 struct MemsetThread {
 char *addr;
 size_t numpages;
 size_t hpagesize;
 QemuThread pgthread;
 sigjmp_buf env;
+MemsetContext *context;
 };
 typedef struct MemsetThread MemsetThread;
 
-static MemsetThread *memset_thread;
-static int memset_num_threads;
-static bool memset_thread_failed;
+/* used by sigbus_handler() */
+static MemsetContext *sigbus_memset_context;
 
 static QemuMutex page_mutex;
 static QemuCond page_cond;
-static bool threads_created_flag;
 
 int qemu_get_thread_id(void)
 {
@@ -439,10 +447,13 @@ const char *qemu_get_exec_dir(void)
 static void sigbus_handler(int signal)
 {
 int i;
-if (memset_thread) {
-for (i = 0; i < memset_num_threads; i++) {
-if (qemu_thread_is_self(_thread[i].pgthread)) {
-siglongjmp(memset_thread[i].env, 1);
+
+if (sigbus_memset_context) {
+for (i = 0; i < sigbus_memset_context->num_threads; i++) {
+MemsetThread *thread = _memset_context->threads[i];
+
+if (qemu_thread_is_self(>pgthread)) {
+siglongjmp(thread->env, 1);
 }
 }
 }
@@ -459,7 +470,7 @@ static void *do_touch_pages(void *arg)
  * clearing until all threads have been created.
  */
 qemu_mutex_lock(_mutex);
-while(!threads_created_flag){
+while (!memset_args->context->all_threads_created) {
 qemu_cond_wait(_cond, _mutex);
 }
 qemu_mutex_unlock(_mutex);
@@ -470,7 +481,7 @@ static void *do_touch_pages(void *arg)
 pthread_sigmask(SIG_UNBLOCK, , );
 
 if (sigsetjmp(memset_args->env, 1)) {
-memset_thread_failed = true;
+memset_args->context->any_thread_failed = true;
 } else {
 char *addr = memset_args->addr;
 size_t numpages = memset_args->numpages;
@@ -506,14 +517,14 @@ static void *do_madv_populate_write_pages(void *arg)
 
 /* See do_touch_pages(). */
 qemu_mutex_lock(_mutex);
-while (!threads_created_flag) {
+while (!memset_args->context->all_threads_created) {
 qemu_cond_wait(_cond, _mutex);
 }
 qemu_mutex_unlock(_mutex);
 
 ret = qemu_madvise(addr, size, QEMU_MADV_POPULATE_WRITE);
 if (ret) {
-memset_thread_failed = true;
+memset_args->context->any_thread_failed = true;
 }
 return NULL;
 }
@@ -534,6 +545,9 @@ static bool touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
 int smp_cpus, bool use_madv_populate_write)
 {
 static gsize initialized = 0;
+MemsetContext context = {
+.num_threads = get_memset_num_threads(smp_cpus),
+};
 size_t numpages_per_thread, leftover;
 void *(*touch_fn)(void *);
 char *addr = area;
@@ -551,34 +565,39 @@ static bool touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
 touch_fn = do_touch_pages;
 }
 
-memset_thread_failed = false;
-threads_created_flag = false;
-memset_num_threads = get_memset_num_threads(smp_cpus);
-memset_thread = g_new0(MemsetThread, memset_num_threads);
-numpages_per_thread = numpages / memset_num_threads;
-leftover = numpages % memset_num_threads;
-for (i = 0; i < memset_num_threads; i++) {
-memset_thread[i].addr = addr;
-memset_thread[i].numpages = numpages_per_thread + (i < leftover);
-memset_thread[i].hpagesize = hpagesize;
-qemu_thread_create(_thread[i].pgthread, "touch_pages",
-   touch_fn, _thread[i],
+context.threads = g_new0(MemsetThread, context.num_threads);
+numpages_per_thread = numpages / context.num_threads;
+leftover = numpages % context.num_threads;
+for (i = 0; i < context.num_threads; i++) {
+context.threads[i].addr = addr;
+context.threads[i].numpages = numpages_per_thread + (i < leftover);
+context.threads[i].hpagesize = hpagesize;
+context.threads[i].context = 
+qemu_thread_create([i].pgthread, "touch_pages",
+   touch_fn, [i],

Re: [PULL 0/3] block bitmaps patches for rc1, 2021-07-21

2021-07-22 Thread Peter Maydell
On Wed, 21 Jul 2021 at 20:49, Eric Blake  wrote:
>
> The following changes since commit e77c8b8b8e933414ef07dbed04e02973fccffeb0:
>
>   Update version for v6.1.0-rc0 release (2021-07-21 17:10:15 +0100)
>
> are available in the Git repository at:
>
>   https://repo.or.cz/qemu/ericb.git tags/pull-bitmaps-2021-07-21
>
> for you to fetch changes up to 955171e4417bf39edb5503e694501e082a757731:
>
>   qemu-img: Add --skip-broken-bitmaps for 'convert --bitmaps' (2021-07-21 
> 14:14:41 -0500)
>
> 
> block bitmaps patches for 2021-07-21
>
> - fix 'qemu-img convert --bitmaps' handling of qcow2 files with
>   inconsistent bitmaps
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM



Re: [PATCH for-6.1 0/1] machine: Disallow specifying topology parameters as zero

2021-07-22 Thread Cornelia Huck
On Thu, Jul 22 2021, Yanan Wang  wrote:

> In the SMP configuration, we should either specify a topology
> parameter with a reasonable value (equal to or greater than 1)
> or just leave it omitted and QEMU will calculate its value.
> Configurations which explicitly specify the topology parameters
> as zero like "sockets=0" are meaningless, so disallow them.
>
> However; the commit 1e63fe685804d
> (machine: pass QAPI struct to mc->smp_parse) has documented that
> '0' has the same semantics as omitting a parameter in the qapi
> comment for SMPConfiguration. So this patch fixes the doc and
> also adds the corresponding sanity check in the smp parsers.

Are we expecting any real users to have used that 'parameter=0'
behaviour? I expect that libvirt and other management software already
did the right thing; unfortunately, sometimes weird configuration lines
tend to persist in search results.

>
> This patch originly comes form [1], and it was suggested that
> this patch fixing the doc should be sent for 6.1 to avoid a
> deprecation process in the future.
>
> [1] https://lore.kernel.org/qemu-devel/ypwsthpiza3mf...@redhat.com/
>
> Yanan Wang (1):
>   machine: Disallow specifying topology parameters as zero
>
>  hw/core/machine.c | 30 ++
>  hw/i386/pc.c  | 33 -
>  qapi/machine.json |  6 +++---
>  3 files changed; 49 insertions(+); 20 deletions(-)




[PATCH v3 05/13] plugins/lockstep: make socket path not positional & parse bool arg

2021-07-22 Thread Mahmoud Mandour
Signed-off-by: Mahmoud Mandour 
---
 contrib/plugins/lockstep.c | 31 ++-
 docs/devel/tcg-plugins.rst |  2 +-
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/contrib/plugins/lockstep.c b/contrib/plugins/lockstep.c
index 7fd35eb669..a41ffe83fa 100644
--- a/contrib/plugins/lockstep.c
+++ b/contrib/plugins/lockstep.c
@@ -319,22 +319,35 @@ QEMU_PLUGIN_EXPORT int 
qemu_plugin_install(qemu_plugin_id_t id,
int argc, char **argv)
 {
 int i;
-
-if (!argc || !argv[0]) {
-qemu_plugin_outs("Need a socket path to talk to other instance.");
-return -1;
-}
+g_autofree char *sock_path = NULL;
 
 for (i = 0; i < argc; i++) {
 char *p = argv[i];
-if (strcmp(p, "verbose") == 0) {
-verbose = true;
-} else if (!setup_unix_socket(argv[0])) {
-qemu_plugin_outs("Failed to setup socket for communications.");
+g_autofree char **tokens = g_strsplit(p, "=", 2);
+
+if (g_strcmp0(tokens[0], "verbose") == 0) {
+if (!qemu_plugin_bool_parse(tokens[0], tokens[1], )) {
+fprintf(stderr, "boolean argument parsing failed: %s\n", p);
+return -1;
+}
+} else if (g_strcmp0(tokens[0], "sockpath") == 0) {
+sock_path = tokens[1];
+} else {
+fprintf(stderr, "option parsing failed: %s\n", p);
 return -1;
 }
 }
 
+if (sock_path == NULL) {
+fprintf(stderr, "Need a socket path to talk to other instance.\n");
+return -1;
+}
+
+if (!setup_unix_socket(sock_path)) {
+fprintf(stderr, "Failed to setup socket for communications.\n");
+return -1;
+}
+
 our_id = id;
 
 qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
index d09c349234..1cb8dee240 100644
--- a/docs/devel/tcg-plugins.rst
+++ b/docs/devel/tcg-plugins.rst
@@ -285,7 +285,7 @@ communicate over::
 
   ./sparc-softmmu/qemu-system-sparc -monitor none -parallel none \
 -net none -M SS-20 -m 256 -kernel day11/zImage.elf \
--plugin ./contrib/plugins/liblockstep.so,arg=lockstep-sparc.sock \
+-plugin ./contrib/plugins/liblockstep.so,sockpath=lockstep-sparc.sock \
   -d plugin,nochain
 
 which will eventually report::
-- 
2.25.1




Re: [PATCH for-6.2 v2 02/11] machine: Make smp_parse generic enough for all arches

2021-07-22 Thread wangyanan (Y)

On 2021/7/20 0:53, Daniel P. Berrangé wrote:

On Mon, Jul 19, 2021 at 11:20:34AM +0800, Yanan Wang wrote:

Currently the only difference between smp_parse and pc_smp_parse
is the support of multi-dies and the related error reporting code.
With an arch compat variable "bool smp_dies_supported", we can
easily make smp_parse generic enough for all arches and the PC
specific one can be removed.

Making smp_parse() generic enough can reduce code duplication and
ease the code maintenance, and also allows extending the topology
with more arch specific members (e.g., clusters) in the future.

No functional change intended.

Suggested-by: Andrew Jones 
Signed-off-by: Yanan Wang 
---
  hw/core/machine.c   | 28 ++---
  hw/i386/pc.c| 76 +
  include/hw/boards.h |  1 +
  3 files changed, 19 insertions(+), 86 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index d73daa10f4..ed6712e964 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -743,6 +743,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
  
  static void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)

  {
+MachineClass *mc = MACHINE_GET_CLASS(ms);
  unsigned cpus= config->has_cpus ? config->cpus : 0;
  unsigned sockets = config->has_sockets ? config->sockets : 0;
  unsigned dies= config->has_dies ? config->dies : 1;
@@ -761,7 +762,7 @@ static void smp_parse(MachineState *ms, SMPConfiguration 
*config, Error **errp)
  return;
  }
  
-if (dies > 1) {

+if (!mc->smp_dies_supported && dies > 1) {
  error_setg(errp, "dies not supported by this machine's CPU topology");
  return;
  }
@@ -772,23 +773,25 @@ static void smp_parse(MachineState *ms, SMPConfiguration 
*config, Error **errp)
  threads = threads > 0 ? threads : 1;
  if (cpus == 0) {
  sockets = sockets > 0 ? sockets : 1;
-cpus = cores * threads * sockets;
+cpus = sockets * dies * cores * threads;
  } else {
  maxcpus = maxcpus > 0 ? maxcpus : cpus;
-sockets = maxcpus / (cores * threads);
+sockets = maxcpus / (dies * cores * threads);
  }
  } else if (cores == 0) {
  threads = threads > 0 ? threads : 1;
-cores = cpus / (sockets * threads);
+cores = cpus / (sockets * dies * threads);
  cores = cores > 0 ? cores : 1;
  } else if (threads == 0) {
-threads = cpus / (cores * sockets);
+threads = cpus / (sockets * dies * cores);
  threads = threads > 0 ? threads : 1;
-} else if (sockets * cores * threads < cpus) {
+} else if (sockets * dies * cores * threads < cpus) {
+g_autofree char *dies_msg = g_strdup_printf(
+mc->smp_dies_supported ? " * dies (%u)" : "", dies);
  error_setg(errp, "cpu topology: "
-   "sockets (%u) * cores (%u) * threads (%u) < "
+   "sockets (%u)%s * cores (%u) * threads (%u) < "
 "smp_cpus (%u)",
-   sockets, cores, threads, cpus);
+   sockets, dies_msg, cores, threads, cpus);

Since we're allowing dies=1 (but not greater), I'm not convinced we
need the conditionally built error message, and could just include
"* dies" all the time.

If we do want it to be conditionally different though, I'd just
sugest calling error_setg twice. Although this duplicates stuff,
it'll be clearer to read which I think is a net win.


The duplicates may increase quickly if more arch specific members
are introduced, I think the conditional error reporting may still be
necessary, but should be more clearer than current approach. :)

Thanks,
Yanan
.




Re: [PATCH for-6.2 v2 00/11] machine: smp parsing fixes and improvement

2021-07-22 Thread Pierre Morel




On 7/21/21 2:38 PM, wangyanan (Y) wrote:

On 2021/7/20 0:57, Cornelia Huck wrote:

On Mon, Jul 19 2021, Yanan Wang  wrote:


Hi,

This is v2 of the series [1] that I have posted to introduce some smp 
parsing
fixes and improvement, much more work has been processed compared to 
RFC v1.


[1] https://lists.gnu.org/archive/html/qemu-devel/2021-07/msg00259.html

The purpose of this series is to improve/fix the parsing logic. 
Explicitly

specifying a CPU topology parameter as zero is not allowed any more, and
maxcpus is now uniformly used to calculate the omitted parameters. 
It's also
suggested that we should start to prefer cores over sockets over 
threads on
the newer machine types, which will make the computed virtual 
topology more

reflective of the real hardware.

In order to reduce code duplication and ease the code maintenance, 
smp_parse
in now converted into a parser generic enough for all arches, so that 
the PC
specific one can be removed. It's also convenient to introduce more 
topology

members (e.g. cluster) to the generic parser in the future.

Cc:ing Pierre, as he also had been looking at the smp parsing code (for
s390x) recently.

Also, please keep me on cc: for patches that touch s390x.

Sure, I will. Sorry about the missing. :)

Thanks,
Yanan
.

And me too please.
Thanks
Pierre

--
Pierre Morel
IBM Lab Boeblingen



Re: [PATCH 08/16] whpx nvmm: Drop useless migrate_del_blocker()

2021-07-22 Thread Reinoud Zandijk
On Tue, Jul 20, 2021 at 02:54:00PM +0200, Markus Armbruster wrote:
> There is nothing to delete after migrate_add_blocker() failed.  Trying
> anyway is safe, but useless.  Don't.
> 
> Cc: Sunil Muthuswamy 
> Cc: Kamil Rytarowski 
> Cc: Reinoud Zandijk 
> Signed-off-by: Markus Armbruster 
> ---
>  target/i386/nvmm/nvmm-all.c | 1 -
>  target/i386/whpx/whpx-all.c | 1 -
>  2 files changed, 2 deletions(-)
> 
> diff --git a/target/i386/nvmm/nvmm-all.c b/target/i386/nvmm/nvmm-all.c
> index dfa690d65d..7bb0d9e30e 100644
> --- a/target/i386/nvmm/nvmm-all.c
> +++ b/target/i386/nvmm/nvmm-all.c
> @@ -929,7 +929,6 @@ nvmm_init_vcpu(CPUState *cpu)
>  (void)migrate_add_blocker(nvmm_migration_blocker, _error);
>  if (local_error) {
>  error_report_err(local_error);
> -migrate_del_blocker(nvmm_migration_blocker);
>  error_free(nvmm_migration_blocker);
>  return -EINVAL;
>  }

Reviewed-by: Reinoud Zandijk 

No problem with it




Re: [PATCH v3 2/3] hw/net: e1000e: Correct the initial value of VET register

2021-07-22 Thread Jason Wang



在 2021/7/21 下午12:15, Bin Meng 写道:

From: Christina Wang 

The initial value of VLAN Ether Type (VET) register is 0x8100, as per
the manual and real hardware.

While Linux e1000e driver always writes VET register to 0x8100, it is
not always the case for everyone. Drivers relying on the reset value
of VET won't be able to transmit and receive VLAN frames in QEMU.

Unlike e1000 in QEMU, e1000e uses a field 'vet' in "struct E1000Core"
to cache the value of VET register, but the cache only gets updated
when VET register is written. To always get a consistent VET value
no matter VET is written or remains its reset value, drop the 'vet'
field and use 'core->mac[VET]' directly.

Reported-by: Markus Carlstedt 
Signed-off-by: Christina Wang 
Signed-off-by: Bin Meng 

---

Changes in v3:
- add a "init-vet" property for versioned machines

Changes in v2:
- keep the 'vet' field in "struct E1000Core" for migration compatibility

  hw/core/machine.c| 21 +
  hw/net/e1000e.c  |  8 +++-
  hw/net/e1000e_core.c |  9 -
  3 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 29982c1ef1..8355df69c7 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -42,6 +42,7 @@ GlobalProperty hw_compat_6_0[] = {
  { "i8042", "extended-state", "false"},
  { "nvme-ns", "eui64-default", "off"},
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };



It looks to me doing compat in hw_compat_6_0 is sufficient.

The codes will do it automatically for the versions before 6.0.

E.g virt_machine_5_2_options() will call virt_machine_6_0_options() etc.

Other looks good.

Thanks



  const size_t hw_compat_6_0_len = G_N_ELEMENTS(hw_compat_6_0);
  
@@ -51,6 +52,7 @@ GlobalProperty hw_compat_5_2[] = {

  { "virtio-blk-device", "report-discard-granularity", "off" },
  { "virtio-net-pci", "vectors", "3"},
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
  
@@ -65,6 +67,7 @@ GlobalProperty hw_compat_5_1[] = {

  { "pl011", "migrate-clk", "off" },
  { "virtio-pci", "x-ats-page-aligned", "off"},
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_5_1_len = G_N_ELEMENTS(hw_compat_5_1);
  
@@ -77,6 +80,7 @@ GlobalProperty hw_compat_5_0[] = {

  { "vmport", "x-cmds-v2", "off" },
  { "virtio-device", "x-disable-legacy-check", "true" },
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_5_0_len = G_N_ELEMENTS(hw_compat_5_0);
  
@@ -94,12 +98,14 @@ GlobalProperty hw_compat_4_2[] = {

  { "fw_cfg", "acpi-mr-restore", "false" },
  { "virtio-device", "use-disabled-flag", "false" },
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2);
  
  GlobalProperty hw_compat_4_1[] = {

  { "virtio-pci", "x-pcie-flr-init", "off" },
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_4_1_len = G_N_ELEMENTS(hw_compat_4_1);
  
@@ -113,6 +119,7 @@ GlobalProperty hw_compat_4_0[] = {

  { "virtio-balloon-device", "qemu-4-0-config-size", "true" },
  { "pl031", "migrate-tick-offset", "false" },
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_4_0_len = G_N_ELEMENTS(hw_compat_4_0);
  
@@ -131,11 +138,13 @@ GlobalProperty hw_compat_3_1[] = {

  { "virtio-balloon-device", "qemu-4-0-config-size", "false" },
  { "pcie-root-port-base", "disable-acs", "true" }, /* Added in 4.1 */
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_3_1_len = G_N_ELEMENTS(hw_compat_3_1);
  
  GlobalProperty hw_compat_3_0[] = {

  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_3_0_len = G_N_ELEMENTS(hw_compat_3_0);
  
@@ -147,6 +156,7 @@ GlobalProperty hw_compat_2_12[] = {

  { "vmware-svga", "global-vmstate", "true" },
  { "qxl-vga", "global-vmstate", "true" },
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_2_12_len = G_N_ELEMENTS(hw_compat_2_12);
  
@@ -156,6 +166,7 @@ GlobalProperty hw_compat_2_11[] = {

  { "vhost-user-blk-pci", "vectors", "2" },
  { "e1000", "migrate_tso_props", "off" },
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_2_11_len = G_N_ELEMENTS(hw_compat_2_11);
  
@@ -163,6 +174,7 @@ GlobalProperty hw_compat_2_10[] = {

  { "virtio-mouse-device", "wheel-axis", "false" },
  { "virtio-tablet-device", "wheel-axis", "false" },
  { "e1000", "init-vet", "off" },
+{ "e1000e", "init-vet", "off" },
  };
  const size_t hw_compat_2_10_len = 

Re: [PATCH v3 1/3] hw/net: e1000: Correct the initial value of VET register

2021-07-22 Thread Jason Wang



在 2021/7/21 下午12:15, Bin Meng 写道:

From: Christina Wang 

The initial value of VLAN Ether Type (VET) register is 0x8100, as per
the manual and real hardware.

While Linux e1000 driver always writes VET register to 0x8100, it is
not always the case for everyone. Drivers relying on the reset value
of VET won't be able to transmit and receive VLAN frames in QEMU.

Reported-by: Markus Carlstedt 
Signed-off-by: Christina Wang 
Signed-off-by: Bin Meng 

---

Changes in v3:
- add a "init-vet" property for versioned machines

  hw/core/machine.c | 27 +--
  hw/net/e1000.c| 26 ++
  2 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 775add0795..29982c1ef1 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -41,6 +41,7 @@ GlobalProperty hw_compat_6_0[] = {
  { "gpex-pcihost", "allow-unmapped-accesses", "false" },
  { "i8042", "extended-state", "false"},
  { "nvme-ns", "eui64-default", "off"},
+{ "e1000", "init-vet", "off" },



Doing this for 6.0 is sufficient. See patch 2.



  };
  const size_t hw_compat_6_0_len = G_N_ELEMENTS(hw_compat_6_0);
  
@@ -49,6 +50,7 @@ GlobalProperty hw_compat_5_2[] = {

  { "PIIX4_PM", "smm-compat", "on"},
  { "virtio-blk-device", "report-discard-granularity", "off" },
  { "virtio-net-pci", "vectors", "3"},
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
  
@@ -62,6 +64,7 @@ GlobalProperty hw_compat_5_1[] = {

  { "pvpanic", "events", "1"}, /* PVPANIC_PANICKED */
  { "pl011", "migrate-clk", "off" },
  { "virtio-pci", "x-ats-page-aligned", "off"},
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_5_1_len = G_N_ELEMENTS(hw_compat_5_1);
  
@@ -73,6 +76,7 @@ GlobalProperty hw_compat_5_0[] = {

  { "vmport", "x-report-vmx-type", "off" },
  { "vmport", "x-cmds-v2", "off" },
  { "virtio-device", "x-disable-legacy-check", "true" },
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_5_0_len = G_N_ELEMENTS(hw_compat_5_0);
  
@@ -89,11 +93,13 @@ GlobalProperty hw_compat_4_2[] = {

  { "qxl-vga", "revision", "4" },
  { "fw_cfg", "acpi-mr-restore", "false" },
  { "virtio-device", "use-disabled-flag", "false" },
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2);
  
  GlobalProperty hw_compat_4_1[] = {

  { "virtio-pci", "x-pcie-flr-init", "off" },
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_4_1_len = G_N_ELEMENTS(hw_compat_4_1);
  
@@ -106,6 +112,7 @@ GlobalProperty hw_compat_4_0[] = {

  { "virtio-device", "use-started", "false" },
  { "virtio-balloon-device", "qemu-4-0-config-size", "true" },
  { "pl031", "migrate-tick-offset", "false" },
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_4_0_len = G_N_ELEMENTS(hw_compat_4_0);
  
@@ -123,10 +130,13 @@ GlobalProperty hw_compat_3_1[] = {

  { "virtio-blk-device", "write-zeroes", "false" },
  { "virtio-balloon-device", "qemu-4-0-config-size", "false" },
  { "pcie-root-port-base", "disable-acs", "true" }, /* Added in 4.1 */
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_3_1_len = G_N_ELEMENTS(hw_compat_3_1);
  
-GlobalProperty hw_compat_3_0[] = {};

+GlobalProperty hw_compat_3_0[] = {
+{ "e1000", "init-vet", "off" },
+};
  const size_t hw_compat_3_0_len = G_N_ELEMENTS(hw_compat_3_0);
  
  GlobalProperty hw_compat_2_12[] = {

@@ -136,6 +146,7 @@ GlobalProperty hw_compat_2_12[] = {
  { "VGA", "global-vmstate", "true" },
  { "vmware-svga", "global-vmstate", "true" },
  { "qxl-vga", "global-vmstate", "true" },
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_2_12_len = G_N_ELEMENTS(hw_compat_2_12);
  
@@ -144,12 +155,14 @@ GlobalProperty hw_compat_2_11[] = {

  { "virtio-blk-pci", "vectors", "2" },
  { "vhost-user-blk-pci", "vectors", "2" },
  { "e1000", "migrate_tso_props", "off" },
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_2_11_len = G_N_ELEMENTS(hw_compat_2_11);
  
  GlobalProperty hw_compat_2_10[] = {

  { "virtio-mouse-device", "wheel-axis", "false" },
  { "virtio-tablet-device", "wheel-axis", "false" },
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_2_10_len = G_N_ELEMENTS(hw_compat_2_10);
  
@@ -158,6 +171,7 @@ GlobalProperty hw_compat_2_9[] = {

  { "intel-iommu", "pt", "off" },
  { "virtio-net-device", "x-mtu-bypass-backend", "off" },
  { "pcie-root-port", "x-migrate-msix", "false" },
+{ "e1000", "init-vet", "off" },
  };
  const size_t hw_compat_2_9_len = G_N_ELEMENTS(hw_compat_2_9);
  
@@ -172,6 +186,7 @@ GlobalProperty hw_compat_2_8[] = {

  { "virtio-pci", "x-pcie-pm-init", "off" },
  { "cirrus-vga", "vgamem_mb", "8" },
  { "isa-cirrus-vga", "vgamem_mb", "8" },
+{ "e1000", "init-vet", "off" },
  };
  const 

Re: intermittent hang in qos-test for qemu-system-i386 on 32-bit arm host

2021-07-22 Thread Claudio Fontana
On 7/10/21 3:30 PM, Peter Maydell wrote:
> I've noticed recently that intermittently 'make check' will hang on
> my aarch32 test system (really an aarch64 box with an aarch32 chroot).
> 
> I think from grep that this must be the vhost-user-blk test.
> 
> Here's the process tree:
> 
> pmaydell 13126  0.0  0.0   8988  6416 ?SJul09   0:01 make
> -C build/all-a32 check V=1 GCC_COLORS= -j9
> pmaydell 19632  0.0  0.0   4432  2096 ?SJul09   0:00  \_
> bash -o pipefail -c echo 'MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((
> ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_IMG=./qemu-img
> G_TEST_DBUS_DAEMON=/home/peter.maydell/qemu/tests/dbus-vmstate-daemon.sh
> QTEST_QEMU_BINARY=./qemu-system-i386
> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> tests/qtest/qos-test --tap -k' &&
> MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> QTEST_QEMU_IMG=./qemu-img
> G_TEST_DBUS_DAEMON=/home/peter.maydell/qemu/tests/dbus-vmstate-daemon.sh
> QTEST_QEMU_BINARY=./qemu-system-i386
> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> tests/qtest/qos-test --tap -k -m quick < /dev/null |
> ./scripts/tap-driver.pl --test-name="qtest-i386/qos-test"
> pmaydell 19634  0.0  0.0  13608  3076 ?Sl   Jul09   0:02
> \_ tests/qtest/qos-test --tap -k -m quick
> pmaydell 20679  0.0  0.0 109076 16100 ?Sl   Jul09   0:00
> |   \_ ./storage-daemon/qemu-storage-daemon --blockdev
> driver=file,node-name=disk0,filename=qtest.X7RL2X --export
> type=vhost-user-blk,id=disk0,addr.type=unix,addr.path=/tmp/qtest-19634-sock.9LJoHn,node-name=disk0,writable=on,num-queues=1
> pmaydell 20681  0.0  0.2 447828 46544 ?Sl   Jul09   0:00
> |   \_ ./qemu-system-i386 -qtest unix:/tmp/qtest-19634.sock -qtest-log
> /dev/null -chardev socket,path=/tmp/qtest-19634.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -M pc -device
> vhost-user-blk-pci,id=drv0,chardev=char1,addr=4.0 -object
> memory-backend-memfd,id=mem,size=256M,share=on -M memory-backend=mem
> -m 256M -chardev socket,id=char1,path=/tmp/qtest-19634-sock.9LJoHn
> -accel qtest
> pmaydell 19635  0.0  0.0  10256  7176 ?SJul09   0:00
> \_ perl ./scripts/tap-driver.pl --test-name=qtest-i386/qos-test
> 
> 
> Backtrace from tests/qtest/qos-test (not as helpful as it could
> be since this is an optimized build):
> 
> (gdb) thread apply all bt
> 
> Thread 2 (Thread 0xf76ff240 (LWP 19636)):
> #0  syscall () at ../sysdeps/unix/sysv/linux/arm/syscall.S:37
> #1  0x005206de in qemu_futex_wait (val=, f= out>) at /home/peter.maydell/qemu/include/qemu/futex.h:29
> #2  qemu_event_wait (ev=ev@entry=0x5816fc ) at
> ../../util/qemu-thread-posix.c:480
> #3  0x005469c2 in call_rcu_thread (opaque=) at
> ../../util/rcu.c:258
> #4  0x0051fbc2 in qemu_thread_start (args=) at
> ../../util/qemu-thread-posix.c:541
> #5  0xf785a614 in start_thread (arg=0xf6ce711c) at pthread_create.c:463
> #6  0xf77f57ec in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73
> from /lib/arm-linux-gnueabihf/libc.so.6
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> Thread 1 (Thread 0xf7a04010 (LWP 19634)):
> #0  __libc_do_syscall () at 
> ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
> #1  0xf7861d8c in __libc_read (fd=12, buf=buf@entry=0xff9ce8e4,
> nbytes=nbytes@entry=1024) at ../sysdeps/unix/sysv/linux/read.c:27
> #2  0x004ebc5a in read (__nbytes=1024, __buf=0xff9ce8e4,
> __fd=) at
> /usr/include/arm-linux-gnueabihf/bits/unistd.h:44
> #3  qtest_client_socket_recv_line (s=0x1a46cb8) at
> ../../tests/qtest/libqtest.c:494
> #4  0x004ebd4e in qtest_rsp_args (s=s@entry=0x1a46cb8,
> expected_args=expected_args@entry=1) at
> ../../tests/qtest/libqtest.c:521
> #5  0x004ec1ee in qtest_query_target_endianness (s=0x1a46cb8) at
> ../../tests/qtest/libqtest.c:570
> #6  0x004ec94a in qtest_init_without_qmp_handshake
> (extra_args=) at ../../tests/qtest/libqtest.c:332
> #7  0x004ecd7a in qtest_init (extra_args=) at
> ../../tests/qtest/libqtest.c:339
> #8  0x004ded10 in qtest_start (
> args=0x1a63710 "-M pc  -device
> vhost-user-blk-pci,id=drv0,chardev=char1,addr=4.0 -object
> memory-backend-memfd,id=mem,size=256M,share=on  -M memory-backend=mem
> -m 256M -chardev socket,id=char1,path=/tmp/qtest-19634-so"...) at
> ../../tests/qtest/libqtest-single.h:29
> #9  restart_qemu_or_continue (
> path=0x1a63710 "-M pc  -device
> vhost-user-blk-pci,id=drv0,chardev=char1,addr=4.0 -object
> memory-backend-memfd,id=mem,size=256M,share=on  -M memory-backend=mem
> -m 256M -chardev socket,id=char1,path=/tmp/qtest-19634-so"...) at
> ../../tests/qtest/qos-test.c:105
> #10 run_one_test (arg=) at ../../tests/qtest/qos-test.c:178
> #11 0xf794ee74 in ?? () from /usr/lib/arm-linux-gnueabihf/libglib-2.0.so.0
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> 
> Backtrace from qemu-system-i386:
> 
> (gdb) thread apply all bt
> 
> Thread 4 (Thread 0xdfd0cb90 (LWP 20734)):
> #0  0xf6f85206 in 

[PATCH] hw/pcie-root-port: Fix hotplug for PCI devices requiring IO

2021-07-22 Thread Marcel Apfelbaum
Q35 has now ACPI hotplug enabled by default for PCI(e) devices.
As opposed to native PCIe hotplug, some guests will not assign
IO range to pcie-root-ports not supporting native hotplug,
resulting into a regression.

Fix it by setting the "reserve-io" hint capability of the
pcie-root-ports so the firmware will allocate the IO range instead.

Signed-off-by: Marcel Apfelbaum 
---
 hw/pci-bridge/gen_pcie_root_port.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/pci-bridge/gen_pcie_root_port.c 
b/hw/pci-bridge/gen_pcie_root_port.c
index ec9907917e..20099a8ae3 100644
--- a/hw/pci-bridge/gen_pcie_root_port.c
+++ b/hw/pci-bridge/gen_pcie_root_port.c
@@ -28,6 +28,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(GenPCIERootPort, 
GEN_PCIE_ROOT_PORT)
 (GEN_PCIE_ROOT_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
 
 #define GEN_PCIE_ROOT_PORT_MSIX_NR_VECTOR   1
+#define GEN_PCIE_ROOT_DEFAULT_IO_RANGE  4096
 
 struct GenPCIERootPort {
 /*< private >*/
@@ -75,6 +76,7 @@ static bool gen_rp_test_migrate_msix(void *opaque, int 
version_id)
 static void gen_rp_realize(DeviceState *dev, Error **errp)
 {
 PCIDevice *d = PCI_DEVICE(dev);
+PCIESlot *s = PCIE_SLOT(d);
 GenPCIERootPort *grp = GEN_PCIE_ROOT_PORT(d);
 PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
 Error *local_err = NULL;
@@ -85,6 +87,9 @@ static void gen_rp_realize(DeviceState *dev, Error **errp)
 return;
 }
 
+if (grp->res_reserve.io == -1 && s->hotplug && !s->native_hotplug) {
+grp->res_reserve.io = GEN_PCIE_ROOT_DEFAULT_IO_RANGE;
+}
 int rc = pci_bridge_qemu_reserve_cap_init(d, 0,
   grp->res_reserve, errp);
 
-- 
2.31.1




Re: [PATCH v2 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager

2021-07-22 Thread David Hildenbrand

On 22.07.21 13:29, Dr. David Alan Gilbert wrote:

* David Hildenbrand (da...@redhat.com) wrote:

virtio-mem exposes a dynamic amount of memory within RAMBlocks by
coordinating with the VM. Memory within a RAMBlock can either get
plugged and consequently used by the VM, or unplugged and consequently no
longer used by the VM. Logical unplug is realized by discarding the
physical memory backing for virtual memory ranges, similar to memory
ballooning.

However, important difference to virtio-balloon are:

a) A virtio-mem device only operates on its assigned memory region /
RAMBlock ("device memory")
b) Initially, all device memory is logically unplugged
c) Virtual machines will never accidentally reuse memory that is currently
logically unplugged. The spec defines most accesses to unplugged memory
as "undefined behavior" -- except reading unplugged memory, which is
currently expected to work, but that will change in the future.
d) The (un)plug granularity is in the range of megabytes -- "memory blocks"
e) The state (plugged/unplugged) of a memory block is always known and
properly tracked.

Whenever memory blocks within the RAMBlock get (un)plugged, changes are
communicated via the RamDiscardManager to other QEMU subsystems, most
prominently vfio which updates the DMA mapping accordingly. "Unplugging"
corresponds to "discarding" and "plugging" corresponds to "populating".

While migrating (precopy/postcopy) that state of such memory blocks cannot
change.


So no plugging/unplugging can happen during the migration?


Exactly:

static bool virtio_mem_is_busy(void)
{
/*
 * Postcopy cannot handle concurrent discards and we don't want to migrate
 * pages on-demand with stale content when plugging new blocks.
 *
 * For precopy, we don't want unplugged blocks in our migration stream, and
 * when plugging new blocks, the page content might differ between source
 * and destination (observable by the guest when not initializing pages
 * after plugging them) until we're running on the destination (as we didn't
 * migrate these blocks when they were unplugged).
 */
return migration_in_incoming_postcopy() || !migration_is_idle();
}

[...]



Let's reuse the RamDiscardManager infrastructure to essentially handle
precopy, postcopy and background snapshots cleanly, which means:

a) In precopy code, always clearing all dirty bits from the bitmap that
correspond to discarded range, whenever we update the dirty bitmap. This
results in logically unplugged memory to never get migrated.


Have you seen cases where discarded areas are being marked as dirty?
That suggests something somewhere is writing to them and shouldn't be.


I have due to sub-optimal clear_bmap handling to be sorted out by

https://lkml.kernel.org/r/20210722083055.23352-1-wei.w.w...@intel.com

Whereby the issue is rather that initially dirty bits don't get cleared in
lower layers and keep popping up as dirty.

The issue with postcopy recovery code setting discarded ranges dirty in
the dirty bitmap, I did not try reproducing. But from looking at the
code, it's pretty clear that it would happen.

Apart from that, nothing should dirty that memory. Of course,
malicious guests could trigger it for now, in which case we wouldn't catch it
and migrate such pages with postcopy, because the final bitmap sync in
ram_postcopy_send_discard_bitmap() is performed without calling notifiers
right now.

--
Thanks,

David / dhildenb




Re: [PATCH] hw/display: fix virgl reset regression

2021-07-22 Thread Gerd Hoffmann
On Fri, Jul 02, 2021 at 04:32:21PM +0400, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Before commit 49afbca3b00e8e517d54964229a794b51768deaf ("virtio-gpu: drop
> use_virgl_renderer"), use_virgl_renderer was preventing calling GL
> functions from non-GL context threads. The innocuously looking
> 
>   g->parent_obj.use_virgl_renderer = false;
> 
> was set the first time virtio_gpu_gl_reset() was called, during
> pc_machine_reset() in the main thread. Further virtio_gpu_gl_reset()
> calls in IO threads, without associated GL context, were thus skipping
> GL calls and avoided warnings or crashes (see also
> https://gitlab.freedesktop.org/virgl/virglrenderer/-/issues/226).

Conflicts with patch by Akihiko Odaki fixing the same thing or a
related issue:

virtio-gpu: Call Virgl only in the main thread

https://patchwork.ozlabs.org/project/qemu-devel/patch/20210617113520.25973-1-akihiko.od...@gmail.com/

Can you have a look please and suggest how to handle this?

thanks,
  Gerd




[PATCH 0/3] i386/kvm: Paravirtualized features usage enforcement

2021-07-22 Thread Vitaly Kuznetsov
[I know this is probably too late for 6.1 but maybe the first patch of the
series is good as it just adds a missing doc?]

By default, KVM doesn't limit the usage of paravirtualized feature (neither
native KVM nor Hyper-V) to what was exposed to the guest in CPUIDs making
it possible to use all of them. KVM_CAP_HYPERV_ENFORCE_CPUID and
KVM_CAP_ENFORCE_PV_FEATURE_CPUID features were recently introduced making
it possible to limit available features to what was actually exposed. Add
support for these to QEMU.

While on it, document all currently supported KVM PV features in
docs/kvm-pv.txt.

Vitaly Kuznetsov (3):
  docs: Briefly describe KVM PV features
  i386: Support KVM_CAP_ENFORCE_PV_FEATURE_CPUID
  i386: Support KVM_CAP_HYPERV_ENFORCE_CPUID

 docs/hyperv.txt   |  17 +--
 docs/kvm-pv.txt   | 103 ++
 target/i386/cpu.c |   3 ++
 target/i386/cpu.h |   4 ++
 target/i386/kvm/kvm.c |  19 
 5 files changed, 143 insertions(+), 3 deletions(-)
 create mode 100644 docs/kvm-pv.txt

-- 
2.31.1




[PATCH 2/3] i386: Support KVM_CAP_ENFORCE_PV_FEATURE_CPUID

2021-07-22 Thread Vitaly Kuznetsov
By default, KVM allows the guest to use all currently supported PV features
even when they were not announced in guest visible CPUIDs. Introduce a new
"kvm-pv-enforce-cpuid" flag to limit the supported feature set to the
exposed features. The feature is supported by Linux >= 5.10 and is not
enabled by default in QEMU.

Signed-off-by: Vitaly Kuznetsov 
---
 docs/kvm-pv.txt   | 13 -
 target/i386/cpu.c |  2 ++
 target/i386/cpu.h |  3 +++
 target/i386/kvm/kvm.c | 10 ++
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/docs/kvm-pv.txt b/docs/kvm-pv.txt
index 84ad7fa60f8d..d1aac533feea 100644
--- a/docs/kvm-pv.txt
+++ b/docs/kvm-pv.txt
@@ -87,6 +87,17 @@ the number of supported vCPUs for a given configuration 
lower).
 Tells the guest that guest visible TSC value can be fully trusted for kvmclock
 computations and no warps are expected.
 
-4. Useful links
+4. Supplementary features
+=
+
+4.1. kvm-pv-enforce-cpuid
+=
+By default, KVM allows the guest to use all currently supported PV features 
even
+when they were not announced in guest visible CPUIDs. 'kvm-pv-enforce-cpuid'
+feature alters this behavior and limits the supported feature set to the
+exposed features only.
+
+
+5. Useful links
 
 Please refer to Documentation/virt/kvm in Linux for additional detail.
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 48b55ebd0a67..0a0d2cddc9d2 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6668,6 +6668,8 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("l3-cache", X86CPU, enable_l3_cache, true),
 DEFINE_PROP_BOOL("kvm-no-smi-migration", X86CPU, kvm_no_smi_migration,
  false),
+DEFINE_PROP_BOOL("kvm-pv-enforce-cpuid", X86CPU, kvm_pv_enforce_cpuid,
+ false),
 DEFINE_PROP_BOOL("vmware-cpuid-freq", X86CPU, vmware_cpuid_freq, true),
 DEFINE_PROP_BOOL("tcg-cpuid", X86CPU, expose_tcg, true),
 DEFINE_PROP_BOOL("x-migrate-smi-count", X86CPU, migrate_smi_count,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5d98a4e7c025..31f1f7caf116 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1768,6 +1768,9 @@ struct X86CPU {
 /* Stop SMI delivery for migration compatibility with old machines */
 bool kvm_no_smi_migration;
 
+/* Forcefully disable KVM PV features not exposed in guest CPUIDs */
+bool kvm_pv_enforce_cpuid;
+
 /* Number of physical address bits supported */
 uint32_t phys_bits;
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 59ed8327ac13..452b04f469b5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1617,6 +1617,16 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 cpu_x86_cpuid(env, 0, 0, , , , );
 
+if (cpu->kvm_pv_enforce_cpuid) {
+r = kvm_vcpu_enable_cap(cs, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, 0, 1);
+if (r < 0) {
+fprintf(stderr,
+"failed to enable KVM_CAP_ENFORCE_PV_FEATURE_CPUID: %s",
+strerror(-r));
+abort();
+}
+}
+
 for (i = 0; i <= limit; i++) {
 if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
 fprintf(stderr, "unsupported level value: 0x%x\n", limit);
-- 
2.31.1




[PATCH 3/3] i386: Support KVM_CAP_HYPERV_ENFORCE_CPUID

2021-07-22 Thread Vitaly Kuznetsov
By default, KVM allows the guest to use all currently supported Hyper-V
enlightenments when Hyper-V CPUID interface was exposed, regardless of if
some features were not announced in guest visible CPUIDs. hv-enforce-cpuid
feature alters this behavior and only allows the guest to use exposed
Hyper-V enlightenments. The feature is supported by Linux >= 5.14 and is
not enabled by default in QEMU.

Signed-off-by: Vitaly Kuznetsov 
---
 docs/hyperv.txt   | 17 ++---
 target/i386/cpu.c |  1 +
 target/i386/cpu.h |  1 +
 target/i386/kvm/kvm.c |  9 +
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/docs/hyperv.txt b/docs/hyperv.txt
index 000638a2fd38..072709a68f47 100644
--- a/docs/hyperv.txt
+++ b/docs/hyperv.txt
@@ -203,8 +203,11 @@ When the option is set to 'on' QEMU will always enable the 
feature, regardless
 of host setup. To keep guests secure, this can only be used in conjunction with
 exposing correct vCPU topology and vCPU pinning.
 
-4. Development features
-
+4. Supplementary features
+=
+
+4.1. hv-passthrough
+===
 In some cases (e.g. during development) it may make sense to use QEMU in
 'pass-through' mode and give Windows guests all enlightenments currently
 supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU
@@ -215,8 +218,16 @@ values from KVM to QEMU. "hv-passthrough" overrides all 
other "hv-*" settings on
 the command line. Also, enabling this flag effectively prevents migration as 
the
 list of enabled enlightenments may differ between target and destination hosts.
 
+4.2. hv-enforce-cpuid
+=
+By default, KVM allows the guest to use all currently supported Hyper-V
+enlightenments when Hyper-V CPUID interface was exposed, regardless of if
+some features were not announced in guest visible CPUIDs. 'hv-enforce-cpuid'
+feature alters this behavior and only allows the guest to use exposed Hyper-V
+enlightenments.
+
 
-4. Useful links
+5. Useful links
 
 Hyper-V Top Level Functional specification and other information:
 https://github.com/MicrosoftDocs/Virtualization-Documentation
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0a0d2cddc9d2..1d4c44c8b762 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6642,6 +6642,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU,
 hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF),
 DEFINE_PROP_BOOL("hv-passthrough", X86CPU, hyperv_passthrough, false),
+DEFINE_PROP_BOOL("hv-enforce-cpuid", X86CPU, hyperv_enforce_cpuid, false),
 
 DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 31f1f7caf116..9539f57199fa 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1685,6 +1685,7 @@ struct X86CPU {
 uint32_t hyperv_version_id[4];
 uint32_t hyperv_limits[3];
 uint32_t hyperv_nested[4];
+bool hyperv_enforce_cpuid;
 
 bool check_cpuid;
 bool enforce_cpuid;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 452b04f469b5..ccbea88080fc 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1519,6 +1519,15 @@ static int hyperv_init_vcpu(X86CPU *cpu)
 cpu->hyperv_nested[0] = evmcs_version;
 }
 
+if (cpu->hyperv_enforce_cpuid) {
+ret = kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_ENFORCE_CPUID, 0, 1);
+if (ret < 0) {
+error_report("failed to enable KVM_CAP_HYPERV_ENFORCE_CPUID: %s",
+ strerror(-ret));
+return ret;
+}
+}
+
 return 0;
 }
 
-- 
2.31.1




[PATCH for-6.1? 2/6] job: @force parameter for job_cancel_sync{, _all}()

2021-07-22 Thread Max Reitz
Callers should be able to specify whether they want job_cancel_sync() to
force-cancel the job or not.

In fact, almost all invocations do not care about consistency of the
result and just want the job to terminate as soon as possible, so they
should pass force=true.  The replication block driver is the exception.

This changes some iotest outputs, because quitting qemu while a mirror
job is active will now lead to it being cancelled instead of completed,
which is what we want.  (Cancelling a READY mirror job with force=false
may take an indefinite amount of time, which we do not want when
quitting.  If users want consistent results, they must have all jobs be
done before they quit qemu.)

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz 
---
 include/qemu/job.h| 10 ++---
 block/replication.c   |  4 +-
 blockdev.c|  4 +-
 job.c | 27 +---
 qemu-nbd.c|  2 +-
 softmmu/runstate.c|  2 +-
 storage-daemon/qemu-storage-daemon.c  |  2 +-
 tests/unit/test-block-iothread.c  |  2 +-
 tests/unit/test-blockjob.c|  2 +-
 tests/qemu-iotests/109.out| 60 +++
 tests/qemu-iotests/tests/qsd-jobs.out |  2 +-
 11 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 41162ed494..5e8edbc2c8 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -506,19 +506,19 @@ void job_user_cancel(Job *job, bool force, Error **errp);
 
 /**
  * Synchronously cancel the @job.  The completion callback is called
- * before the function returns.  The job may actually complete
- * instead of canceling itself; the circumstances under which this
- * happens depend on the kind of job that is active.
+ * before the function returns.  If @force is false, the job may
+ * actually complete instead of canceling itself; the circumstances
+ * under which this happens depend on the kind of job that is active.
  *
  * Returns the return value from the job if the job actually completed
  * during the call, or -ECANCELED if it was canceled.
  *
  * Callers must hold the AioContext lock of job->aio_context.
  */
-int job_cancel_sync(Job *job);
+int job_cancel_sync(Job *job, bool force);
 
 /** Synchronously cancels all jobs using job_cancel_sync(). */
-void job_cancel_sync_all(void);
+void job_cancel_sync_all(bool force);
 
 /**
  * @job: The job to be completed.
diff --git a/block/replication.c b/block/replication.c
index 32444b9a8f..e7a9327b12 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -149,7 +149,7 @@ static void replication_close(BlockDriverState *bs)
 if (s->stage == BLOCK_REPLICATION_FAILOVER) {
 commit_job = >commit_job->job;
 assert(commit_job->aio_context == qemu_get_current_aio_context());
-job_cancel_sync(commit_job);
+job_cancel_sync(commit_job, false);
 }
 
 if (s->mode == REPLICATION_MODE_SECONDARY) {
@@ -726,7 +726,7 @@ static void replication_stop(ReplicationState *rs, bool 
failover, Error **errp)
  * disk, secondary disk in backup_job_completed().
  */
 if (s->backup_job) {
-job_cancel_sync(>backup_job->job);
+job_cancel_sync(>backup_job->job, false);
 }
 
 if (!failover) {
diff --git a/blockdev.c b/blockdev.c
index 3d8ac368a1..aa95918c02 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1848,7 +1848,7 @@ static void drive_backup_abort(BlkActionState *common)
 aio_context = bdrv_get_aio_context(state->bs);
 aio_context_acquire(aio_context);
 
-job_cancel_sync(>job->job);
+job_cancel_sync(>job->job, true);
 
 aio_context_release(aio_context);
 }
@@ -1949,7 +1949,7 @@ static void blockdev_backup_abort(BlkActionState *common)
 aio_context = bdrv_get_aio_context(state->bs);
 aio_context_acquire(aio_context);
 
-job_cancel_sync(>job->job);
+job_cancel_sync(>job->job, true);
 
 aio_context_release(aio_context);
 }
diff --git a/job.c b/job.c
index e7a5d28854..9e971d64cf 100644
--- a/job.c
+++ b/job.c
@@ -763,7 +763,12 @@ static void job_completed_txn_abort(Job *job)
 if (other_job != job) {
 ctx = other_job->aio_context;
 aio_context_acquire(ctx);
-job_cancel_async(other_job, false);
+/*
+ * This is a transaction: If one job failed, no result will matter.
+ * Therefore, pass force=true to terminate all other jobs as 
quickly
+ * as possible.
+ */
+job_cancel_async(other_job, true);
 aio_context_release(ctx);
 }
 }
@@ -964,12 +969,24 @@ static void job_cancel_err(Job *job, Error **errp)
 job_cancel(job, false);
 }
 
-int job_cancel_sync(Job *job)
+/**
+ * Same as job_cancel_err(), but force-cancel.
+ */
+static 

[PATCH for-6.1? 6/6] iotests: Add mirror-ready-cancel-error test

2021-07-22 Thread Max Reitz
Test what happens when there is an I/O error after a mirror job in the
READY phase has been cancelled.

Signed-off-by: Max Reitz 
---
 .../tests/mirror-ready-cancel-error   | 143 ++
 .../tests/mirror-ready-cancel-error.out   |   5 +
 2 files changed, 148 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/mirror-ready-cancel-error
 create mode 100644 tests/qemu-iotests/tests/mirror-ready-cancel-error.out

diff --git a/tests/qemu-iotests/tests/mirror-ready-cancel-error 
b/tests/qemu-iotests/tests/mirror-ready-cancel-error
new file mode 100755
index 00..f2dc1f
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-ready-cancel-error
@@ -0,0 +1,143 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Test what happens when errors occur to a mirror job after it has
+# been cancelled in the READY phase
+#
+# Copyright (C) 2021 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+
+
+image_size = 1 * 1024 * 1024
+source = os.path.join(iotests.test_dir, 'source.img')
+target = os.path.join(iotests.test_dir, 'target.img')
+
+
+class TestMirrorReadyCancelError(iotests.QMPTestCase):
+def setUp(self) -> None:
+assert iotests.qemu_img_create('-f', iotests.imgfmt, source,
+   str(image_size)) == 0
+assert iotests.qemu_img_create('-f', iotests.imgfmt, target,
+   str(image_size)) == 0
+
+self.vm = iotests.VM()
+self.vm.launch()
+
+def tearDown(self) -> None:
+self.vm.shutdown()
+os.remove(source)
+os.remove(target)
+
+def add_blockdevs(self, once: bool) -> None:
+res = self.vm.qmp('blockdev-add',
+  **{'node-name': 'source',
+ 'driver': iotests.imgfmt,
+ 'file': {
+ 'driver': 'file',
+ 'filename': source
+ }})
+self.assert_qmp(res, 'return', {})
+
+# blkdebug notes:
+# Enter state 2 on the first flush, which happens before the
+# job enters the READY state.  The second flush will happen
+# when the job is about to complete, and we want that one to
+# fail.
+res = self.vm.qmp('blockdev-add',
+  **{'node-name': 'target',
+ 'driver': iotests.imgfmt,
+ 'file': {
+ 'driver': 'blkdebug',
+ 'image': {
+ 'driver': 'file',
+ 'filename': target
+ },
+ 'set-state': [{
+ 'event': 'flush_to_disk',
+ 'state': 1,
+ 'new_state': 2
+ }],
+ 'inject-error': [{
+ 'event': 'flush_to_disk',
+ 'once': once,
+ 'immediately': True,
+ 'state': 2
+ }]}})
+self.assert_qmp(res, 'return', {})
+
+def start_mirror(self) -> None:
+res = self.vm.qmp('blockdev-mirror',
+  job_id='mirror',
+  device='source',
+  target='target',
+  filter_node_name='mirror-top',
+  sync='full',
+  on_target_error='stop')
+self.assert_qmp(res, 'return', {})
+
+def cancel_mirror_with_error(self) -> None:
+self.vm.event_wait('BLOCK_JOB_READY')
+
+# Write something so will not leave the job immediately, but
+# flush first (which will fail, thanks to blkdebug)
+res = self.vm.qmp('human-monitor-command',
+  command_line='qemu-io mirror-top "write 0 64k"')
+self.assert_qmp(res, 'return', '')
+
+# Drain status change events
+while self.vm.event_wait('JOB_STATUS_CHANGE', timeout=0.0) is not None:
+pass
+
+res = self.vm.qmp('block-job-cancel', 

[PATCH v2 3/6] util/oslib-posix: Don't create too many threads with small memory or little pages

2021-07-22 Thread David Hildenbrand
Let's limit the number of threads to something sane, especially that
- We don't have more threads than the number of pages we have
- We don't have threads that initialize small (< 64 MiB) memory

Signed-off-by: David Hildenbrand 
---
 util/oslib-posix.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 2ae6c3aaab..a1d309d495 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -40,6 +40,7 @@
 #include 
 #include "qemu/cutils.h"
 #include "qemu/compiler.h"
+#include "qemu/units.h"
 
 #ifdef CONFIG_LINUX
 #include 
@@ -529,7 +530,8 @@ static void *do_madv_populate_write_pages(void *arg)
 return NULL;
 }
 
-static inline int get_memset_num_threads(int smp_cpus)
+static inline int get_memset_num_threads(size_t hpagesize, size_t numpages,
+ int smp_cpus)
 {
 long host_procs = sysconf(_SC_NPROCESSORS_ONLN);
 int ret = 1;
@@ -537,6 +539,12 @@ static inline int get_memset_num_threads(int smp_cpus)
 if (host_procs > 0) {
 ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), smp_cpus);
 }
+
+/* Especially with gigantic pages, don't create more threads than pages. */
+ret = MIN(ret, numpages);
+/* Don't start threads to prealloc comparatively little memory. */
+ret = MIN(ret, MAX(1, hpagesize * numpages / (64 * MiB)));
+
 /* In case sysconf() fails, we fall back to single threaded */
 return ret;
 }
@@ -546,7 +554,7 @@ static bool touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
 {
 static gsize initialized = 0;
 MemsetContext context = {
-.num_threads = get_memset_num_threads(smp_cpus),
+.num_threads = get_memset_num_threads(hpagesize, numpages, smp_cpus),
 };
 size_t numpages_per_thread, leftover;
 void *(*touch_fn)(void *);
-- 
2.31.1




Re: [PATCH 2/2] acpi: x86: pcihp: add support hotplug on multifunction bridges

2021-07-22 Thread Laurent Vivier
On 22/07/2021 12:59, Igor Mammedov wrote:
> Commit 17858a1695 (hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35)
> switched PCI hotplug from native to ACPI one by default.
> 
> That however breaks ihotplug on following CLI that used to work:
>-nodefaults -machine q35 \
>-device 
> pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1
>  \
>-device 
> pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2
> 
> where PCI device is hotplugged to pcie-root-port-1 with error on guest side:
> 
>   ACPI BIOS Error (bug): Could not resolve symbol [^S0B.PCNT], AE_NOT_FOUND 
> (20201113/psargs-330)
>   ACPI Error: Aborting method \_SB.PCI0.PCNT due to previous error 
> (AE_NOT_FOUND) (20201113/psparse-531)
>   ACPI Error: Aborting method \_GPE._E01 due to previous error (AE_NOT_FOUND) 
> (20201113/psparse-531)
>   ACPI Error: AE_NOT_FOUND, while evaluating GPE method [_E01] 
> (20201113/evgpe-515)
> 
> cause is that QEMU's ACPI hotplug never supported functions other then 0
> and due to bug it was generating notification entries for not described
> functions.
> 
> Technically there is no reason not to describe cold-plugged bridges
> (root ports) on functions other then 0, as they similaraly to bridge
> on function 0 are unpluggable.
> 
> Fix consists of describing cold-plugged bridges[root ports] on functions
> other than 0.
> 
> Fixes: 17858a169508609ca9063c544833e5a1adeb7b52
> Signed-off-by: Igor Mammedov 
> Reported-by: Laurent Vivier 
> ---
>  hw/i386/acpi-build.c | 31 ---
>  1 file changed, 24 insertions(+), 7 deletions(-)
> 

Tested-by: Laurent Vivier 




[PATCH] nbd/server: Suppress Broken pipe errors on abrupt disconnection

2021-07-22 Thread Richard W.M. Jones
This simple patch suppresses a warning message from qemu-nbd when a
client abruptly disconnects.  There is a way to work around this in
the client (by shutting down the connection properly).  Nevertheless
the "Broken pipe" error seems unnecessary to me as it does not convey
any useful information and might be used to fill up logs.

Rich.





[PATCH] nbd/server: Suppress Broken pipe errors on abrupt disconnection

2021-07-22 Thread Richard W.M. Jones
$ rm -f /tmp/sock /tmp/pid
$ qemu-img create -f qcow2 /tmp/disk.qcow2 1M
$ qemu-nbd -t --format=qcow2 --socket=/tmp/sock --pid-file=/tmp/pid 
/tmp/disk.qcow2 &
$ nbdsh -u 'nbd+unix:///?socket=/tmp/sock' -c 'h.get_size()'
qemu-nbd: Disconnect client, due to: Failed to send reply: Unable to write to 
socket: Broken pipe
$ killall qemu-nbd

nbdsh is abruptly dropping the NBD connection here which is a valid
way to close the connection.  It seems unnecessary to print an error
in this case so this commit suppresses it.

Note that if you call the nbdsh h.shutdown() method then the message
was not printed:

$ nbdsh -u 'nbd+unix:///?socket=/tmp/sock' -c 'h.get_size()' -c 'h.shutdown()'

Signed-off-by: Richard W.M. Jones 
---
 nbd/server.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index b60ebc3ab6..0f86535b88 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2668,7 +2668,11 @@ static coroutine_fn void nbd_trip(void *opaque)
 ret = nbd_handle_request(client, , req->data, _err);
 }
 if (ret < 0) {
-error_prepend(_err, "Failed to send reply: ");
+if (errno != EPIPE) {
+error_prepend(_err, "Failed to send reply: ");
+} else {
+local_err = NULL;
+}
 goto disconnect;
 }
 
-- 
2.32.0




Re: [PULL v3 05/19] hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35

2021-07-22 Thread Igor Mammedov
On Wed, 21 Jul 2021 12:37:40 -0400
"Michael S. Tsirkin"  wrote:

> On Wed, Jul 21, 2021 at 06:27:33PM +0200, Igor Mammedov wrote:
> > On Wed, 21 Jul 2021 12:09:01 -0400
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Wed, Jul 21, 2021 at 05:49:16PM +0200, Laurent Vivier wrote:  
> > > > On 21/07/2021 16:59, Igor Mammedov wrote:
> > > > > On Tue, 20 Jul 2021 14:56:06 +0200
> > > > > Laurent Vivier  wrote:
> > > > > 
> > > > >> On 20/07/2021 13:38, Laurent Vivier wrote:
> > > > >>> On 16/07/2021 17:15, Michael S. Tsirkin wrote:  
> > > >  From: Julia Suvorova 
> > > > 
> > > >  Q35 has three different types of PCI devices hot-plug: PCIe Native,
> > > >  SHPC Native and ACPI hot-plug. This patch changes the default 
> > > >  choice
> > > >  for cold-plugged bridges from PCIe Native to ACPI Hot-plug with
> > > >  ability to use SHPC and PCIe Native for hot-plugged bridges.
> > > > 
> > > >  This is a list of the PCIe Native hot-plug issues that led to this
> > > >  change:
> > > >  * no racy behavior during boot (see 110c477c2ed)
> > > >  * no delay during deleting - after the actual power off 
> > > >  software
> > > >    must wait at least 1 second before indicating about it. This 
> > > >  case
> > > >    is quite important for users, it even has its own bug:
> > > >    https://bugzilla.redhat.com/show_bug.cgi?id=1594168
> > > >  * no timer-based behavior - in addition to the previous 
> > > >  example,
> > > >    the attention button has a 5-second waiting period, during 
> > > >  which
> > > >    the operation can be canceled with a second press. While this
> > > >    looks fine for manual button control, automation will result 
> > > >  in
> > > >    the need to queue or drop events, and the software receiving
> > > >    events in all sort of unspecified combinations of 
> > > >  attention/power
> > > >    indicator states, which is racy and uppredictable.
> > > >  * fixes:
> > > >  * https://bugzilla.redhat.com/show_bug.cgi?id=1752465
> > > >  * https://bugzilla.redhat.com/show_bug.cgi?id=1690256
> > > > 
> > > >  To return to PCIe Native hot-plug:
> > > >  -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off
> > > > 
> > > >  Known issue: older linux guests need the following flag
> > > >  to allow hotplugged pci express devices to use io:
> > > >  -device pcie-root-port,io-reserve=4096.
> > > >  io is unusual for pci express so this seems minor.
> > > >  We'll fix this by a follow up patch.
> > > > 
> > > >  Signed-off-by: Julia Suvorova 
> > > >  Reviewed-by: Igor Mammedov 
> > > >  Message-Id: <20210713004205.775386-6-jus...@redhat.com>
> > > >  Reviewed-by: Michael S. Tsirkin 
> > > >  Signed-off-by: Michael S. Tsirkin 
> > > >  Reviewed-by: David Gibson 
> > > >  ---
> > > >   hw/acpi/ich9.c | 2 +-
> > > >   hw/i386/pc.c   | 1 +
> > > >   2 files changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > >  diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
> > > >  index 2f4eb453ac..778e27b659 100644
> > > >  --- a/hw/acpi/ich9.c
> > > >  +++ b/hw/acpi/ich9.c
> > > >  @@ -427,7 +427,7 @@ void ich9_pm_add_properties(Object *obj, 
> > > >  ICH9LPCPMRegs *pm)
> > > >   pm->disable_s3 = 0;
> > > >   pm->disable_s4 = 0;
> > > >   pm->s4_val = 2;
> > > >  -pm->use_acpi_hotplug_bridge = false;
> > > >  +pm->use_acpi_hotplug_bridge = true;
> > > >   
> > > >   object_property_add_uint32_ptr(obj, ACPI_PM_PROP_PM_IO_BASE,
> > > >  >pm_io_base, 
> > > >  OBJ_PROP_FLAG_READ);
> > > >  diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > >  index aa79c5e0e6..f4c7a78362 100644
> > > >  --- a/hw/i386/pc.c
> > > >  +++ b/hw/i386/pc.c
> > > >  @@ -99,6 +99,7 @@ GlobalProperty pc_compat_6_0[] = {
> > > >   { "qemu64" "-" TYPE_X86_CPU, "model", "6" },
> > > >   { "qemu64" "-" TYPE_X86_CPU, "stepping", "3" },
> > > >   { TYPE_X86_CPU, "x-vendor-cpuid-only", "off" },
> > > >  +{ "ICH9-LPC", "acpi-pci-hotplug-with-bridge-support", "off" },
> > > >   };
> > > >   const size_t pc_compat_6_0_len = G_N_ELEMENTS(pc_compat_6_0);
> > > >   
> > > >   
> > > > >>>
> > > > >>> There is an issue with this patch.
> > > > >>>
> > > > >>> When I try to unplug a VFIO device I have the following error and 
> > > > >>> the device is not unplugged:
> > > > >>>
> > > > >>> (qemu) device_del hostdev0
> > > > >>>
> > > > >>> [   34.116714] ACPI BIOS Error (bug): Could not resolve symbol 
> > > > >>> [^S0B.PCNT], AE_NOT_FOUND
> > > > >>> (20201113/psargs-330)
> > > > >>> [   34.117987] ACPI Error: Aborting method \_SB.PCI0.PCNT due to 
> > > > >>> previous 

[PATCH 2/2] acpi: x86: pcihp: add support hotplug on multifunction bridges

2021-07-22 Thread Igor Mammedov
Commit 17858a1695 (hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35)
switched PCI hotplug from native to ACPI one by default.

That however breaks ihotplug on following CLI that used to work:
   -nodefaults -machine q35 \
   -device 
pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1
 \
   -device 
pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2

where PCI device is hotplugged to pcie-root-port-1 with error on guest side:

  ACPI BIOS Error (bug): Could not resolve symbol [^S0B.PCNT], AE_NOT_FOUND 
(20201113/psargs-330)
  ACPI Error: Aborting method \_SB.PCI0.PCNT due to previous error 
(AE_NOT_FOUND) (20201113/psparse-531)
  ACPI Error: Aborting method \_GPE._E01 due to previous error (AE_NOT_FOUND) 
(20201113/psparse-531)
  ACPI Error: AE_NOT_FOUND, while evaluating GPE method [_E01] 
(20201113/evgpe-515)

cause is that QEMU's ACPI hotplug never supported functions other then 0
and due to bug it was generating notification entries for not described
functions.

Technically there is no reason not to describe cold-plugged bridges
(root ports) on functions other then 0, as they similaraly to bridge
on function 0 are unpluggable.

Fix consists of describing cold-plugged bridges[root ports] on functions
other than 0.

Fixes: 17858a169508609ca9063c544833e5a1adeb7b52
Signed-off-by: Igor Mammedov 
Reported-by: Laurent Vivier 
---
 hw/i386/acpi-build.c | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index b40e284b72..77cb7a338a 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -364,7 +364,7 @@ static void build_append_pcihp_notify_entry(Aml *method, 
int slot)
 int32_t devfn = PCI_DEVFN(slot, 0);
 
 if_ctx = aml_if(aml_and(aml_arg(0), aml_int(0x1U << slot), NULL));
-aml_append(if_ctx, aml_notify(aml_name("S%.02X", devfn), aml_arg(1)));
+aml_append(if_ctx, aml_notify(aml_name("^S%.03X", devfn), aml_arg(1)));
 aml_append(method, if_ctx);
 }
 
@@ -389,18 +389,26 @@ static void build_append_pci_bus_devices(Aml 
*parent_scope, PCIBus *bus,
 PCIDeviceClass *pc;
 PCIDevice *pdev = bus->devices[devfn];
 int slot = PCI_SLOT(devfn);
+int func = PCI_FUNC(devfn);
+/* ACPI spec: 1.0b: Table 6-2 _ADR Object Bus Types, PCI type */
+int adr = slot << 16 | func;
 bool hotplug_enabled_dev;
 bool bridge_in_acpi;
 bool cold_plugged_bridge;
 
 if (!pdev) {
-if (bsel) { /* add hotplug slots for non present devices */
+/*
+ * add hotplug slots for non present devices.
+ * hotplug is supported only for non-multifunction device
+ * so generate device description only for function 0
+ */
+if (bsel && !PCI_FUNC(devfn)) {
 if (pci_bus_is_express(bus) && slot > 0) {
 break;
 }
-dev = aml_device("S%.02X", PCI_DEVFN(slot, 0));
+dev = aml_device("S%.03X", devfn);
 aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
-aml_append(dev, aml_name_decl("_ADR", aml_int(slot << 16)));
+aml_append(dev, aml_name_decl("_ADR", aml_int(adr)));
 method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
 aml_append(method,
 aml_call2("PCEJ", aml_name("BSEL"), aml_name("_SUN"))
@@ -436,9 +444,18 @@ static void build_append_pci_bus_devices(Aml 
*parent_scope, PCIBus *bus,
 continue;
 }
 
+/*
+ * allow describing coldplugged bridges in ACPI even if they are not
+ * on function 0, as they are not unpluggale, for all other devices
+ * generate description only for function 0 per slot
+ */
+if (PCI_FUNC(devfn) && !bridge_in_acpi) {
+continue;
+}
+
 /* start to compose PCI slot descriptor */
-dev = aml_device("S%.02X", PCI_DEVFN(slot, 0));
-aml_append(dev, aml_name_decl("_ADR", aml_int(slot << 16)));
+dev = aml_device("S%.03X", devfn);
+aml_append(dev, aml_name_decl("_ADR", aml_int(adr)));
 
 if (bsel) {
 /*
@@ -529,7 +546,7 @@ static void build_append_pci_bus_devices(Aml *parent_scope, 
PCIBus *bus,
 continue;
 }
 
-aml_append(method, aml_name("^S%.02X.PCNT",
+aml_append(method, aml_name("^S%.03X.PCNT",
 sec->parent_dev->devfn));
 }
 }
-- 
2.27.0




[PATCH 1/3] docs: Briefly describe KVM PV features

2021-07-22 Thread Vitaly Kuznetsov
KVM PV features don't seem to be documented anywhere, in particular, the
fact that some of the features are enabled by default and some are not can
only be figured out from the code.

Signed-off-by: Vitaly Kuznetsov 
---
 docs/kvm-pv.txt | 92 +
 1 file changed, 92 insertions(+)
 create mode 100644 docs/kvm-pv.txt

diff --git a/docs/kvm-pv.txt b/docs/kvm-pv.txt
new file mode 100644
index ..84ad7fa60f8d
--- /dev/null
+++ b/docs/kvm-pv.txt
@@ -0,0 +1,92 @@
+KVM paravirtualized features
+
+
+
+1. Description
+===
+In some cases when implementing a hardware interface in software is slow, KVM
+implements its own paravirtualized interfaces.
+
+2. Setup
+=
+KVM PV features are represented as CPU flags. The following features are 
enabled
+by default for any CPU model when KVM is enabled:
+  kvmclock
+  kvm-nopiodelay
+  kvm-asyncpf
+  kvm-steal-time
+  kvm-pv-eoi
+  kvmclock-stable-bit
+
+'kvm-msi-ext-dest-id' feature is enabled by default in x2apic mode with split
+irqchip (e.g. "-machine ...,kernel-irqchip=split -cpu ...,x2apic").
+
+Note: when cpu model 'host' is used, QEMU passes through all KVM PV features
+exposed by KVM to the guest.
+
+3. Existing features
+
+
+3.1. kvmclock
+
+This feature exposes KVM specific PV clocksource to the guest.
+
+3.2. kvm-nopiodelay
+===
+The guest doesn't need to perform delays on PIO operations.
+
+3.3. kvm-mmu
+
+This feature is deprecated.
+
+3.4. kvm-asyncpf
+
+Enables asynchronous page fault mechanism. Note: since Linux-5.10 the feature 
is
+deprecated and not enabled by KVM. Use "kvm-asyncpf-int" instead.
+
+3.5. kvm-steal-time
+===
+Enables stolen (when guest vCPU is not running) time accounting.
+
+3.6. kvm-pv-eoi
+===
+Enables paravirtualized end-of-interrupt signaling.
+
+3.7. kvm-pv-unhalt
+==
+Enables paravirtualized spinlocks support.
+
+3.8. kvm-pv-tlb-flush
+=
+Enables paravirtualized TLB flush mechanism.
+
+3.9. kvm-pv-ipi
+===
+Enables paravirtualized IPI mechanism.
+
+3.10. kvm-poll-control
+==
+Enables host-side polling on HLT control from the guest.
+
+3.11. kvm-pv-sched-yield
+
+Enables paravirtualized sched yield feature.
+
+3.12. kvm-asyncpf-int
+=
+Enables interrupt based asynchronous page fault mechanism.
+
+3.13. kvm-msi-ext-dest-id
+=
+Support 'Extended Destination ID' for external interrupts. The feature allows
+to use up to 32768 CPUs without IRQ remapping (but other limits may apply 
making
+the number of supported vCPUs for a given configuration lower).
+
+3.14. kvmclock-stable-bit
+=
+Tells the guest that guest visible TSC value can be fully trusted for kvmclock
+computations and no warps are expected.
+
+4. Useful links
+
+Please refer to Documentation/virt/kvm in Linux for additional detail.
-- 
2.31.1




[PATCH for-6.1? 5/6] mirror: Check job_is_cancelled() earlier

2021-07-22 Thread Max Reitz
We must check whether the job is force-cancelled early in our main loop,
most importantly before any `continue` statement.  For example, we used
to have `continue`s before our current checking location that are
triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
failing, force-cancelling the job would not terminate it.

A job being force-cancelled should be treated the same as the job having
failed, so put the check in the same place where we check `s->ret < 0`.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz 
---
 block/mirror.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 291d2ed040..a993ed37d0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -995,7 +995,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 mirror_wait_for_any_operation(s, true);
 }
 
-if (s->ret < 0) {
+if (s->ret < 0 || job_is_cancelled(>common.job)) {
 ret = s->ret;
 goto immediate_exit;
 }
@@ -1081,17 +1081,12 @@ static int coroutine_fn mirror_run(Job *job, Error 
**errp)
 break;
 }
 
-ret = 0;
-
 if (s->synced && !should_complete) {
 delay_ns = (s->in_flight == 0 &&
 cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
 }
 trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
 job_sleep_ns(>common.job, delay_ns);
-if (job_is_cancelled(>common.job)) {
-break;
-}
 s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
 
-- 
2.31.1




[PATCH for-6.1? 4/6] job: Add job_cancel_requested()

2021-07-22 Thread Max Reitz
Most callers of job_is_cancelled() actually want to know whether the job
is on its way to immediate termination.  For example, we refuse to pause
jobs that are cancelled; but this only makes sense for jobs that are
really actually cancelled.

A mirror job that is cancelled during READY with force=false should
absolutely be allowed to pause.  This "cancellation" (which is actually
a kind of completion) may take an indefinite amount of time, and so
should behave like any job during normal operation.  For example, with
on-target-error=stop, the job should stop on write errors.  (In
contrast, force-cancelled jobs should not get write errors, as they
should just terminate and not do further I/O.)

Therefore, redefine job_is_cancelled() to only return true for jobs that
are force-cancelled (which as of HEAD^ means any job that interprets the
cancellation request as a request for immediate termination), and add
job_cancel_request() as the general variant, which returns true for any
jobs which have been requested to be cancelled, whether it be
immediately or after an arbitrarily long completion phase.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz 
---
 include/qemu/job.h |  8 +++-
 block/mirror.c |  9 -
 job.c  | 13 +
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 8aa90f7395..032edf3c5f 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
 /** Returns true if the job should not be visible to the management layer. */
 bool job_is_internal(Job *job);
 
-/** Returns whether the job is scheduled for cancellation. */
+/** Returns whether the job is being cancelled. */
 bool job_is_cancelled(Job *job);
 
+/**
+ * Returns whether the job is scheduled for cancellation (at an
+ * indefinite point).
+ */
+bool job_cancel_requested(Job *job);
+
 /** Returns whether the job is in a completed state. */
 bool job_is_completed(Job *job);
 
diff --git a/block/mirror.c b/block/mirror.c
index c3514f4196..291d2ed040 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -938,7 +938,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 job_transition_to_ready(>common.job);
 s->synced = true;
 s->actively_synced = true;
-while (!job_is_cancelled(>common.job) && !s->should_complete) {
+while (!job_cancel_requested(>common.job) && !s->should_complete) {
 job_yield(>common.job);
 }
 s->common.job.cancelled = false;
@@ -1046,7 +1046,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 }
 
 should_complete = s->should_complete ||
-job_is_cancelled(>common.job);
+job_cancel_requested(>common.job);
 cnt = bdrv_get_dirty_count(s->dirty_bitmap);
 }
 
@@ -1089,7 +1089,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 }
 trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
 job_sleep_ns(>common.job, delay_ns);
-if (job_is_cancelled(>common.job) && s->common.job.force_cancel) {
+if (job_is_cancelled(>common.job)) {
 break;
 }
 s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1101,8 +1101,7 @@ immediate_exit:
  * or it was cancelled prematurely so that we do not guarantee that
  * the target is a copy of the source.
  */
-assert(ret < 0 || (s->common.job.force_cancel &&
-   job_is_cancelled(>common.job)));
+assert(ret < 0 || job_is_cancelled(>common.job));
 assert(need_drain);
 mirror_wait_for_all_io(s);
 }
diff --git a/job.c b/job.c
index e78d893a9c..c51c8077cb 100644
--- a/job.c
+++ b/job.c
@@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
 }
 
 bool job_is_cancelled(Job *job)
+{
+return job->cancelled && job->force_cancel;
+}
+
+bool job_cancel_requested(Job *job)
 {
 return job->cancelled;
 }
@@ -650,7 +655,7 @@ static void job_conclude(Job *job)
 
 static void job_update_rc(Job *job)
 {
-if (!job->ret && job_is_cancelled(job)) {
+if (!job->ret && job_cancel_requested(job)) {
 job->ret = -ECANCELED;
 }
 if (job->ret) {
@@ -704,7 +709,7 @@ static int job_finalize_single(Job *job)
 
 /* Emit events only if we actually started */
 if (job_started(job)) {
-if (job_is_cancelled(job)) {
+if (job_cancel_requested(job)) {
 job_event_cancelled(job);
 } else {
 job_event_completed(job);
@@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
 if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
 return;
 }
-if (job_is_cancelled(job) || !job->driver->complete) {
+if (job_cancel_requested(job) || !job->driver->complete) {
 error_setg(errp, "The active block job '%s' cannot be 

[PATCH v2 1/6] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()

2021-07-22 Thread David Hildenbrand
Let's sense support and use it for preallocation. MADV_POPULATE_WRITE
does not require a SIGBUS handler, doesn't actually touch page content,
and avoids context switches; it is, therefore, faster and easier to handle
than our current approach.

While MADV_POPULATE_WRITE is, in general, faster than manual
prefaulting, and especially faster with 4k pages, there is still value in
prefaulting using multiple threads to speed up preallocation.

More details on MADV_POPULATE_WRITE can be found in the Linux commit
4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault
page tables") and in the man page proposal [1].

[1] https://lkml.kernel.org/r/20210712083917.16361-1-da...@redhat.com

This resolves the TODO in do_touch_pages().

In the future, we might want to look into using fallocate(), eventually
combined with MADV_POPULATE_READ, when dealing with shared file
mappings.

Reviewed-by: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
---
 include/qemu/osdep.h |  7 
 util/oslib-posix.c   | 88 +---
 2 files changed, 74 insertions(+), 21 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 60718fc342..d1660d67fa 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -471,6 +471,11 @@ static inline void qemu_cleanup_generic_vfree(void *p)
 #else
 #define QEMU_MADV_REMOVE QEMU_MADV_DONTNEED
 #endif
+#ifdef MADV_POPULATE_WRITE
+#define QEMU_MADV_POPULATE_WRITE MADV_POPULATE_WRITE
+#else
+#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID
+#endif
 
 #elif defined(CONFIG_POSIX_MADVISE)
 
@@ -484,6 +489,7 @@ static inline void qemu_cleanup_generic_vfree(void *p)
 #define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_NOHUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_REMOVE QEMU_MADV_DONTNEED
+#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID
 
 #else /* no-op */
 
@@ -497,6 +503,7 @@ static inline void qemu_cleanup_generic_vfree(void *p)
 #define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_NOHUGEPAGE  QEMU_MADV_INVALID
 #define QEMU_MADV_REMOVE QEMU_MADV_INVALID
+#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID
 
 #endif
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index e8bdb02e1d..1cb80bf94c 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -484,10 +484,6 @@ static void *do_touch_pages(void *arg)
  *
  * 'volatile' to stop compiler optimizing this away
  * to a no-op
- *
- * TODO: get a better solution from kernel so we
- * don't need to write at all so we don't cause
- * wear on the storage backing the region...
  */
 *(volatile char *)addr = *addr;
 addr += hpagesize;
@@ -497,6 +493,31 @@ static void *do_touch_pages(void *arg)
 return NULL;
 }
 
+static void *do_madv_populate_write_pages(void *arg)
+{
+MemsetThread *memset_args = (MemsetThread *)arg;
+const size_t size = memset_args->numpages * memset_args->hpagesize;
+char * const addr = memset_args->addr;
+int ret;
+
+if (!size) {
+return NULL;
+}
+
+/* See do_touch_pages(). */
+qemu_mutex_lock(_mutex);
+while (!threads_created_flag) {
+qemu_cond_wait(_cond, _mutex);
+}
+qemu_mutex_unlock(_mutex);
+
+ret = qemu_madvise(addr, size, QEMU_MADV_POPULATE_WRITE);
+if (ret) {
+memset_thread_failed = true;
+}
+return NULL;
+}
+
 static inline int get_memset_num_threads(int smp_cpus)
 {
 long host_procs = sysconf(_SC_NPROCESSORS_ONLN);
@@ -510,10 +531,11 @@ static inline int get_memset_num_threads(int smp_cpus)
 }
 
 static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
-int smp_cpus)
+int smp_cpus, bool use_madv_populate_write)
 {
 static gsize initialized = 0;
 size_t numpages_per_thread, leftover;
+void *(*touch_fn)(void *);
 char *addr = area;
 int i = 0;
 
@@ -523,6 +545,12 @@ static bool touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
 g_once_init_leave(, 1);
 }
 
+if (use_madv_populate_write) {
+touch_fn = do_madv_populate_write_pages;
+} else {
+touch_fn = do_touch_pages;
+}
+
 memset_thread_failed = false;
 threads_created_flag = false;
 memset_num_threads = get_memset_num_threads(smp_cpus);
@@ -534,7 +562,7 @@ static bool touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
 memset_thread[i].numpages = numpages_per_thread + (i < leftover);
 memset_thread[i].hpagesize = hpagesize;
 qemu_thread_create(_thread[i].pgthread, "touch_pages",
-   do_touch_pages, _thread[i],
+   touch_fn, _thread[i],
QEMU_THREAD_JOINABLE);
 addr += memset_thread[i].numpages * hpagesize;
 }
@@ -553,6 +581,12 @@ static bool touch_all_pages(char *area, size_t 

[PATCH v2 0/6] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()

2021-07-22 Thread David Hildenbrand
#1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid
global variables and prepare for concurrency, #3 and #4 optimize thread
handling, #5 makes os_mem_prealloc() safe to be called from multiple
threads concurrently and #6 makes the SIGBUS handler coexist cleanly with
the MCE SIGBUS handler under Linux.

Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
Linux commit 4ca9b3859dac ("mm/madvise: introduce
MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man
page patch [1].

v1 -> v2:
- "util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()"
-- Handle thread with no data to initialize
-- Always set use_madv_populate_write properly
-- Add comment regarding future fallocate() optimization
- "util/oslib-posix: Don't create too many threads with small memory or
   little pages"
-- Added
- "util/oslib-posix: Avoid creating a single thread with
   MADV_POPULATE_WRITE"
-- Added
- "util/oslib-posix: Support concurrent os_mem_prealloc() invocation"
-- Add missing g_once_init_leave()
-- Move g_once_init_enter() to the place where it is actually needed
- "util/oslib-posix: Forward SIGBUS to MCE handler under Linux"
-- Added

[1] https://lkml.kernel.org/r/20210712083917.16361-1-da...@redhat.com

Cc: Paolo Bonzini 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Eduardo Habkost 
Cc: Dr. David Alan Gilbert 
Cc: Marek Kedzierski 
Cc: Pankaj Gupta 
Cc: Daniel P. Berrangé 

David Hildenbrand (6):
  util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
  util/oslib-posix: Introduce and use MemsetContext for
touch_all_pages()
  util/oslib-posix: Don't create too many threads with small memory or
little pages
  util/oslib-posix: Avoid creating a single thread with
MADV_POPULATE_WRITE
  util/oslib-posix: Support concurrent os_mem_prealloc() invocation
  util/oslib-posix: Forward SIGBUS to MCE handler under Linux

 include/qemu/osdep.h |   7 ++
 softmmu/cpus.c   |   4 +
 util/oslib-posix.c   | 220 +--
 3 files changed, 181 insertions(+), 50 deletions(-)

-- 
2.31.1




[PATCH v2 5/6] util/oslib-posix: Support concurrent os_mem_prealloc() invocation

2021-07-22 Thread David Hildenbrand
Add a mutex to protect the SIGBUS case, as we cannot mess concurrently
with the sigbus handler and we have to manage the global variable
sigbus_memset_context. The MADV_POPULATE_WRITE path can run
concurrently.

Note that page_mutex and page_cond are shared between concurrent
invocations, which shouldn't be a problem.

This is a preparation for future virtio-mem prealloc code, which will call
os_mem_prealloc() asynchronously from an iothread when handling guest
requests.

Add a comment that messing with the SIGBUS handler is frowned upon and
can result in problems we fortunately haven't seen so far. Note that
forwarding signals to the already installed SIGBUS handler isn't clean
either, as that one might modify the SIGBUS handler again.

Reviewed-by: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
---
 util/oslib-posix.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 1483e985c6..7c75848a67 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -95,6 +95,7 @@ typedef struct MemsetThread MemsetThread;
 
 /* used by sigbus_handler() */
 static MemsetContext *sigbus_memset_context;
+static QemuMutex sigbus_mutex;
 
 static QemuMutex page_mutex;
 static QemuCond page_cond;
@@ -625,6 +626,7 @@ static bool madv_populate_write_possible(char *area, size_t 
pagesize)
 void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
  Error **errp)
 {
+static gsize initialized;
 int ret;
 struct sigaction act, oldact;
 size_t hpagesize = qemu_fd_getpagesize(fd);
@@ -638,6 +640,12 @@ void os_mem_prealloc(int fd, char *area, size_t memory, 
int smp_cpus,
 use_madv_populate_write = madv_populate_write_possible(area, hpagesize);
 
 if (!use_madv_populate_write) {
+if (g_once_init_enter()) {
+qemu_mutex_init(_mutex);
+g_once_init_leave(, 1);
+}
+
+qemu_mutex_lock(_mutex);
 memset(, 0, sizeof(act));
 act.sa_handler = _handler;
 act.sa_flags = 0;
@@ -664,6 +672,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int 
smp_cpus,
 perror("os_mem_prealloc: failed to reinstall signal handler");
 exit(1);
 }
+qemu_mutex_unlock(_mutex);
 }
 }
 
-- 
2.31.1




[PATCH v2 4/6] util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE

2021-07-22 Thread David Hildenbrand
Let's simplify the case when we only want a single thread and don't have
to mess with signal handlers.

Signed-off-by: David Hildenbrand 
---
 util/oslib-posix.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index a1d309d495..1483e985c6 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -568,6 +568,14 @@ static bool touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
 }
 
 if (use_madv_populate_write) {
+/* Avoid creating a single thread for MADV_POPULATE_WRITE */
+if (context.num_threads == 1) {
+if (qemu_madvise(area, hpagesize * numpages,
+ QEMU_MADV_POPULATE_WRITE)) {
+return true;
+}
+return false;
+}
 touch_fn = do_madv_populate_write_pages;
 } else {
 touch_fn = do_touch_pages;
-- 
2.31.1




Re: [PATCH for-6.2 v2 05/11] machine: Improve the error reporting of smp parsing

2021-07-22 Thread Andrew Jones
On Thu, Jul 22, 2021 at 04:10:32PM +0800, wangyanan (Y) wrote:
> On 2021/7/20 0:53, Andrew Jones wrote:
> > On Mon, Jul 19, 2021 at 11:20:37AM +0800, Yanan Wang wrote:
> > > We totally have two requirements for a valid SMP configuration:
> > s/totally//
> > 
> > > the sum of "sockets * dies * cores * threads" must represent all
> > the product
> > 
> > > the possible cpus, i.e., max_cpus, and must include the initial
> > > present cpus, i.e., smp_cpus.
> > > 
> > > We only need to ensure "sockets * dies * cores * threads == maxcpus"
> > > at first and then ensure "sockets * dies * cores * threads >= cpus".
> > Or, "maxcpus >= cpus"
> > 
> > > With a reasonable order of the sanity-check, we can simplify the
> > > error reporting code.
> > > 
> > > Signed-off-by: Yanan Wang 
> > > ---
> > >   hw/core/machine.c | 25 ++---
> > >   1 file changed, 10 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index 668f0a1553..8b4d07d3fc 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -788,21 +788,6 @@ static void smp_parse(MachineState *ms, 
> > > SMPConfiguration *config, Error **errp)
> > >   cpus = cpus > 0 ? cpus : sockets * dies * cores * threads;
> > >   maxcpus = maxcpus > 0 ? maxcpus : cpus;
> > > -if (sockets * dies * cores * threads < cpus) {
> > > -g_autofree char *dies_msg = g_strdup_printf(
> > > -mc->smp_dies_supported ? " * dies (%u)" : "", dies);
> > > -error_setg(errp, "cpu topology: "
> > > -   "sockets (%u)%s * cores (%u) * threads (%u) < "
> > > -   "smp_cpus (%u)",
> > > -   sockets, dies_msg, cores, threads, cpus);
> > > -return;
> > > -}
> > > -
> > > -if (maxcpus < cpus) {
> > > -error_setg(errp, "maxcpus must be equal to or greater than smp");
> > > -return;
> > > -}
> > This may be redundant when determining a valid config, but by checking it
> > separately we can provide a more useful error message.
> Yes, this message is more useful. Can we also report the exact values of the
> parameters within this error message ?

sure

> How about the following:
> 
> if (sockets * cores * threads != maxcpus) {
>     error_setg("product of the topology must be equal to maxcpus"
>    "sockets (%u) * cores (%u) * threads (%u)"
>    "!= maxcpus (%u)",
>    sockets, cores, threads, maxcpus);
> return;
> }
> 
> if (maxcpus < cpus) {
>     error_setg("maxcpus must be equal to or greater than smp:"
>    "sockets (%u) * cores (%u) * threads (%u)"
>    "== maxcpus (%u) < smp_cpus (%u)",
>    sockets, cores, threads, maxcpus, cpus);
> return;
> }

OK by me

Thanks,
drew

> 
> Thanks,
> Yanan
> .
> > > -
> > >   if (sockets * dies * cores * threads != maxcpus) {
> > >   g_autofree char *dies_msg = g_strdup_printf(
> > >   mc->smp_dies_supported ? " * dies (%u)" : "", dies);
> > > @@ -814,6 +799,16 @@ static void smp_parse(MachineState *ms, 
> > > SMPConfiguration *config, Error **errp)
> > >   return;
> > >   }
> > > +if (sockets * dies * cores * threads < cpus) {
> > > +g_autofree char *dies_msg = g_strdup_printf(
> > > +mc->smp_dies_supported ? " * dies (%u)" : "", dies);
> > > +error_setg(errp, "Invalid CPU topology: "
> > > +   "sockets (%u)%s * cores (%u) * threads (%u) < "
> > > +   "smp_cpus (%u)",
> > > +   sockets, dies_msg, cores, threads, cpus);
> > > +return;
> > > +}
> > > +
> > >   ms->smp.cpus = cpus;
> > >   ms->smp.sockets = sockets;
> > >   ms->smp.dies = dies;
> > > -- 
> > > 2.19.1
> > > 
> > I'm not sure we need this patch.
> > 
> > Thanks,
> > drew
> > 
> > .
> 




Re: [PATCH for-6.2 03/23] target/avr: Drop checks for singlestep_enabled

2021-07-22 Thread Michael Rolnik
Reviewed-by: Michael Rolnik 
Tested-by: Michael Rolnik 

On Wed, Jul 21, 2021 at 9:00 PM Philippe Mathieu-Daudé 
wrote:

> +Michael/Alex/Pavel
>
> On 7/21/21 8:41 AM, Richard Henderson wrote:
> > GDB single-stepping is now handled generically.
> >
> > Signed-off-by: Richard Henderson 
> > ---
> >  target/avr/translate.c | 19 ---
> >  1 file changed, 4 insertions(+), 15 deletions(-)
> >
> > diff --git a/target/avr/translate.c b/target/avr/translate.c
> > index e08b83..0403470dd8 100644
> > --- a/target/avr/translate.c
> > +++ b/target/avr/translate.c
> > @@ -1089,11 +1089,7 @@ static void gen_goto_tb(DisasContext *ctx, int n,
> target_ulong dest)
> >  tcg_gen_exit_tb(tb, n);
> >  } else {
> >  tcg_gen_movi_i32(cpu_pc, dest);
> > -if (ctx->base.singlestep_enabled) {
> > -gen_helper_debug(cpu_env);
> > -} else {
> > -tcg_gen_lookup_and_goto_ptr();
> > -}
> > +tcg_gen_lookup_and_goto_ptr();
> >  }
> >  ctx->base.is_jmp = DISAS_NORETURN;
> >  }
> > @@ -3011,17 +3007,10 @@ static void avr_tr_tb_stop(DisasContextBase
> *dcbase, CPUState *cs)
> >  tcg_gen_movi_tl(cpu_pc, ctx->npc);
> >  /* fall through */
> >  case DISAS_LOOKUP:
> > -if (!ctx->base.singlestep_enabled) {
> > -tcg_gen_lookup_and_goto_ptr();
> > -break;
> > -}
> > -/* fall through */
> > +tcg_gen_lookup_and_goto_ptr();
> > +break;
> >  case DISAS_EXIT:
> > -if (ctx->base.singlestep_enabled) {
> > -gen_helper_debug(cpu_env);
> > -} else {
> > -tcg_gen_exit_tb(NULL, 0);
> > -}
> > +tcg_gen_exit_tb(NULL, 0);
> >  break;
> >  default:
> >  g_assert_not_reached();
> >
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
> Not related to this patch, but looking at the last
> gen_helper_debug() use:
>
> /*
>  *  The BREAK instruction is used by the On-chip Debug system, and is
>  *  normally not used in the application software. When the BREAK
> instruction is
>  *  executed, the AVR CPU is set in the Stopped Mode. This gives the
> On-chip
>  *  Debugger access to internal resources.  If any Lock bits are set, or
> either
>  *  the JTAGEN or OCDEN Fuses are unprogrammed, the CPU will treat the
> BREAK
>  *  instruction as a NOP and will not enter the Stopped mode.  This
> instruction
>  *  is not available in all devices. Refer to the device specific
> instruction
>  *  set summary.
>  */
> static bool trans_BREAK(DisasContext *ctx, arg_BREAK *a)
> {
> if (!avr_have_feature(ctx, AVR_FEATURE_BREAK)) {
> return true;
> }
>
> #ifdef BREAKPOINT_ON_BREAK
> tcg_gen_movi_tl(cpu_pc, ctx->npc - 1);
> gen_helper_debug(cpu_env);
> ctx->base.is_jmp = DISAS_EXIT;
> #else
> /* NOP */
> #endif
>
> return true;
> }
>
> Shouldn't we have a generic 'bool gdbstub_is_attached()' in
> "exec/gdbstub.h", then use it in replay_gdb_attached() and
> trans_BREAK() instead of this BREAKPOINT_ON_BREAK build-time
> definitions?
>


-- 
Best Regards,
Michael Rolnik


Re: [PATCH v2 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager

2021-07-22 Thread Dr. David Alan Gilbert
* David Hildenbrand (da...@redhat.com) wrote:
> virtio-mem exposes a dynamic amount of memory within RAMBlocks by
> coordinating with the VM. Memory within a RAMBlock can either get
> plugged and consequently used by the VM, or unplugged and consequently no
> longer used by the VM. Logical unplug is realized by discarding the
> physical memory backing for virtual memory ranges, similar to memory
> ballooning.
> 
> However, important difference to virtio-balloon are:
> 
> a) A virtio-mem device only operates on its assigned memory region /
>RAMBlock ("device memory")
> b) Initially, all device memory is logically unplugged
> c) Virtual machines will never accidentally reuse memory that is currently
>logically unplugged. The spec defines most accesses to unplugged memory
>as "undefined behavior" -- except reading unplugged memory, which is
>currently expected to work, but that will change in the future.
> d) The (un)plug granularity is in the range of megabytes -- "memory blocks"
> e) The state (plugged/unplugged) of a memory block is always known and
>properly tracked.
> 
> Whenever memory blocks within the RAMBlock get (un)plugged, changes are
> communicated via the RamDiscardManager to other QEMU subsystems, most
> prominently vfio which updates the DMA mapping accordingly. "Unplugging"
> corresponds to "discarding" and "plugging" corresponds to "populating".
> 
> While migrating (precopy/postcopy) that state of such memory blocks cannot
> change.

So no plugging/unplugging can happen during the migration?

> We never ever want to migrate such logically unplugged memory,
> because it can result in an unintended memory consumption both, on the
> source (when reading memory from some memory backends) and on the
> destination (when writing memory). Further, migration time can be heavily
> reduced when skipping logically unplugged blocks and we avoid populating
> unnecessary page tables in Linux.
> 
> Right now, virtio-mem reuses the free page hinting infrastructure during
> precopy to exclude all logically unplugged ("discarded") parts from the
> migration stream. However, there are some scenarios that are not handled
> properly and need fixing. Further, there are some ugly corner cases in
> postcopy code and background snapshotting code that similarly have to
> handle such special RAMBlocks.
> 
> Let's reuse the RamDiscardManager infrastructure to essentially handle
> precopy, postcopy and background snapshots cleanly, which means:
> 
> a) In precopy code, always clearing all dirty bits from the bitmap that
>correspond to discarded range, whenever we update the dirty bitmap. This
>results in logically unplugged memory to never get migrated.

Have you seen cases where discarded areas are being marked as dirty?
That suggests something somewhere is writing to them and shouldn't be.

Dave

> b) In postcopy code, placing a zeropage when requested to handle a page
>falling into a discarded range -- because the source will never send it.
> c) In background snapshot code, never populating discarded ranges, not even
>with the shared zeropage, to avoid unintended memory consumption,
>especially in the future with hugetlb and shmem.
> 
> Detail: When realizing a virtio-mem devices, it will register the RAM
> for migration via vmstate_register_ram(). Further, it will
> set itself as the RamDiscardManager for the corresponding memory
> region of the RAMBlock via memory_region_set_ram_discard_manager().
> Last but not least, memory device code will actually map the
> memory region into guest physical address space. So migration
> code can always properly identify such RAMBlocks.
> 
> Tested with precopy/postcopy on shmem, where even reading unpopulated
> memory ranges will populate actual memory and not the shared zeropage.
> Tested with background snapshots on anonymous memory, because other
> backends are not supported yet with upstream Linux.
> 
> Idealy, this should all go via the migration tree.
> 
> v1 -> v2:
> - "migration/ram: Handle RAMBlocks with a RamDiscardManager on the
>migration source"
> -- Added a note how it interacts with the clear_bmap and what we might want
>to further optimize in the future when synchronizing bitmaps.
> 
> Cc: "Michael S. Tsirkin" 
> Cc: Paolo Bonzini 
> Cc: Juan Quintela 
> Cc: "Dr. David Alan Gilbert" 
> Cc: Eduardo Habkost 
> Cc: Peter Xu 
> Cc: Andrey Gruzdev 
> Cc: Marek Kedzierski 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Alex Williamson 
> Cc: Pankaj Gupta 
> 
> David Hildenbrand (6):
>   memory: Introduce replay_discarded callback for RamDiscardManager
>   virtio-mem: Implement replay_discarded RamDiscardManager callback
>   migration/ram: Handle RAMBlocks with a RamDiscardManager on the
> migration source
>   virtio-mem: Drop precopy notifier
>   migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the
> destination
>   migration/ram: Handle RAMBlocks with a 

[PATCH for-6.1? 3/6] jobs: Give Job.force_cancel more meaning

2021-07-22 Thread Max Reitz
We largely have two cancel modes for jobs:

First, there is actual cancelling.  The job is terminated as soon as
possible, without trying to reach a consistent result.

Second, we have mirror in the READY state.  Technically, the job is not
really cancelled, but it just is a different completion mode.  The job
can still run for an indefinite amount of time while it tries to reach a
consistent result.

We want to be able to clearly distinguish which cancel mode a job is in
(when it has been cancelled).  We can use Job.force_cancel for this, but
right now it only reflects cancel requests from the user with
force=true, but clearly, jobs that do not even distinguish between
force=false and force=true are effectively always force-cancelled.

So this patch has Job.force_cancel signify whether the job will
terminate as soon as possible (force_cancel=true) or whether it will
effectively remain running despite being "cancelled"
(force_cancel=false).

To this end, we let jobs that provide JobDriver.cancel() tell the
generic job code whether they will terminate as soon as possible or not,
and for jobs that do not provide that method we assume they will.

Signed-off-by: Max Reitz 
---
 include/qemu/job.h | 11 ++-
 block/backup.c |  3 ++-
 block/mirror.c | 24 ++--
 job.c  |  6 +-
 4 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 5e8edbc2c8..8aa90f7395 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -253,8 +253,17 @@ struct JobDriver {
 
 /**
  * If the callback is not NULL, it will be invoked in job_cancel_async
+ *
+ * This function must return true if the job will be cancelled
+ * immediately without any further I/O (mandatory if @force is
+ * true), and false otherwise.  This lets the generic job layer
+ * know whether a job has been truly (force-)cancelled, or whether
+ * it is just in a special completion mode (like mirror after
+ * READY).
+ * (If the callback is NULL, the job is assumed to terminate
+ * without I/O.)
  */
-void (*cancel)(Job *job, bool force);
+bool (*cancel)(Job *job, bool force);
 
 
 /** Called when the job is freed */
diff --git a/block/backup.c b/block/backup.c
index bd3614ce70..513e1c8a0b 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -331,11 +331,12 @@ static void coroutine_fn backup_set_speed(BlockJob *job, 
int64_t speed)
 }
 }
 
-static void backup_cancel(Job *job, bool force)
+static bool backup_cancel(Job *job, bool force)
 {
 BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
 
 bdrv_cancel_in_flight(s->target_bs);
+return true;
 }
 
 static const BlockJobDriver backup_job_driver = {
diff --git a/block/mirror.c b/block/mirror.c
index d73b704473..c3514f4196 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1089,9 +1089,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 }
 trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
 job_sleep_ns(>common.job, delay_ns);
-if (job_is_cancelled(>common.job) &&
-(!s->synced || s->common.job.force_cancel))
-{
+if (job_is_cancelled(>common.job) && s->common.job.force_cancel) {
 break;
 }
 s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1103,7 +1101,7 @@ immediate_exit:
  * or it was cancelled prematurely so that we do not guarantee that
  * the target is a copy of the source.
  */
-assert(ret < 0 || ((s->common.job.force_cancel || !s->synced) &&
+assert(ret < 0 || (s->common.job.force_cancel &&
job_is_cancelled(>common.job)));
 assert(need_drain);
 mirror_wait_for_all_io(s);
@@ -1189,14 +1187,27 @@ static bool mirror_drained_poll(BlockJob *job)
 return !!s->in_flight;
 }
 
-static void mirror_cancel(Job *job, bool force)
+static bool mirror_cancel(Job *job, bool force)
 {
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
 BlockDriverState *target = blk_bs(s->target);
 
-if (force || !job_is_ready(job)) {
+/*
+ * Before the job is READY, we treat any cancellation like a
+ * force-cancellation.
+ */
+force = force || !job_is_ready(job);
+
+if (force) {
 bdrv_cancel_in_flight(target);
 }
+return force;
+}
+
+static bool commit_active_cancel(Job *job, bool force)
+{
+/* Same as above in mirror_cancel() */
+return force || !job_is_ready(job);
 }
 
 static const BlockJobDriver mirror_job_driver = {
@@ -1226,6 +1237,7 @@ static const BlockJobDriver commit_active_job_driver = {
 .abort  = mirror_abort,
 .pause  = mirror_pause,
 .complete   = mirror_complete,
+.cancel = commit_active_cancel,
 },
 .drained_poll   = mirror_drained_poll,
 };
diff --git a/job.c 

Re: [PATCH for-6.2 v2 04/11] machine: Use the computed parameters to calculate omitted cpus

2021-07-22 Thread Andrew Jones
On Thu, Jul 22, 2021 at 12:42:55PM +0800, wangyanan (Y) wrote:
> On 2021/7/20 0:42, Andrew Jones wrote:
> > On Mon, Jul 19, 2021 at 11:20:36AM +0800, Yanan Wang wrote:
> > > Currently we directly calculate the omitted cpus based on the already
> > > provided parameters. This makes some cmdlines like:
> > >-smp maxcpus=16
> > >-smp sockets=2,maxcpus=16
> > >-smp sockets=2,dies=2,maxcpus=16
> > >-smp sockets=2,cores=4,maxcpus=16
> > > not work. We should probably use the computed paramters to calculate
> > > cpus when maxcpus is provided while cpus is omitted, which will make
> > > above configs start to work.
> > > 
> > > Note: change in this patch won't affect any existing working cmdlines
> > > but allows more incomplete configs to be valid.
> > > 
> > > Signed-off-by: Yanan Wang 
> > > ---
> > >   hw/core/machine.c | 17 +
> > >   1 file changed, 9 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index c9f15b15a5..668f0a1553 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -767,26 +767,27 @@ static void smp_parse(MachineState *ms, 
> > > SMPConfiguration *config, Error **errp)
> > >   return;
> > >   }
> > > -/* compute missing values, prefer sockets over cores over threads */
> > >   maxcpus = maxcpus > 0 ? maxcpus : cpus;
> > > -if (cpus == 0) {
> > > -sockets = sockets > 0 ? sockets : 1;
> > > -cores = cores > 0 ? cores : 1;
> > > -threads = threads > 0 ? threads : 1;
> > > -cpus = sockets * dies * cores * threads;
> > > -maxcpus = maxcpus > 0 ? maxcpus : cpus;
> > > -} else if (sockets == 0) {
> > > +/* compute missing values, prefer sockets over cores over threads */
> > > +if (sockets == 0) {
> > >   cores = cores > 0 ? cores : 1;
> > >   threads = threads > 0 ? threads : 1;
> > >   sockets = maxcpus / (dies * cores * threads);
> > > +sockets = sockets > 0 ? sockets : 1;
> > >   } else if (cores == 0) {
> > >   threads = threads > 0 ? threads : 1;
> > >   cores = maxcpus / (sockets * dies * threads);
> > > +cores = cores > 0 ? cores : 1;
> > >   } else if (threads == 0) {
> > >   threads = maxcpus / (sockets * dies * cores);
> > > +threads = threads > 0 ? threads : 1;
> > >   }
> > I didn't think we wanted this rounding which this patch adds back into
> > cores and threads and now also sockets.
> Firstly, I think we can agree that with or without the rounding, the invalid
> configs will still be reported as invalid. So the only difference is in the
> err
> message (either report 0 or 1 of a fractional parameter). :)

An error message that says the inputs produced a fractional parameter
would be even better. If the code gets too hairy because some parameters
may be zero and without additional checks we'd risk divide by zeros,
then maybe we should output ("fractional %s", param_name) and exit at the
same places we're currently rounding up.

> 
> I added back the rounding because this patch indeed need it, we start
> to use the computed parameters to calculate cpus, so we have to ensure
> that the computed parameters are at least 1. If both cpus and maxcpus
> are omitted (e.g. -smp sockets=16), without the rounding we will get
> zeroed cpus and maxcpus, and with the rounding we will get valid result
> like "cpus=16,sockets=16,cores=1,threads=1,maxcpus=16".
> 
> If a "0" or "1" in the error message doesn't make so much difference as
> assumed for the error reporting, I suggest that we probably can keep the
> rounding which makes the parser code concise.
> > > +/* use the computed parameters to calculate the omitted cpus */
> > > +cpus = cpus > 0 ? cpus : sockets * dies * cores * threads;
> > > +maxcpus = maxcpus > 0 ? maxcpus : cpus;
> > It doesn't really matter, but I think I'd rather write this like
> > 
> >   maxcpus = maxcpus > 0 ? maxcpus : sockets * dies * cores * threads;
> >   cpus = cpus > 0 ? cpus : maxcpus;
> Yes, there is no functional difference. But I think maybe we'd better keep
> some consistence with the "maxcpus = maxcpus > 0 ? maxcpus : cpus"
> at top this function and also with the smp doc in qemu-option.hx i.e.
> "If omitted the maximum number of CPUs will be set to match the initial
> CPU count" ?

I won't argue it too much, but I think we should be thinking about maxcpus
more than cpus if we're thinking about cpu topologies. I'd rather have
users keep in mind that whatever their topology generates is the max and
if they don't want to expose that many cpus to the guest then they should
provide an additional, smaller number 'cpus'. To get that mindset we may
need to adjust the documentation.

Thanks,
drew




[PATCH v2 6/6] util/oslib-posix: Forward SIGBUS to MCE handler under Linux

2021-07-22 Thread David Hildenbrand
Temporarily modifying the SIGBUS handler is really nasty, as we might be
unlucky and receive an MCE SIGBUS while having our handler registered.
Unfortunately, there is no way around messing with SIGBUS when
MADV_POPULATE_WRITE is not applicable or not around.

Let's forward SIGBUS that don't belong to us to the already registered
handler and document the situation.

Signed-off-by: David Hildenbrand 
---
 softmmu/cpus.c |  4 
 util/oslib-posix.c | 36 +---
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 071085f840..23bca46b07 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -352,6 +352,10 @@ static void qemu_init_sigbus(void)
 {
 struct sigaction action;
 
+/*
+ * ALERT: when modifying this, take care that SIGBUS forwarding in
+ * os_mem_prealloc() will continue working as expected.
+ */
 memset(, 0, sizeof(action));
 action.sa_flags = SA_SIGINFO;
 action.sa_sigaction = sigbus_handler;
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 7c75848a67..4f10108600 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -95,6 +95,7 @@ typedef struct MemsetThread MemsetThread;
 
 /* used by sigbus_handler() */
 static MemsetContext *sigbus_memset_context;
+struct sigaction sigbus_oldact;
 static QemuMutex sigbus_mutex;
 
 static QemuMutex page_mutex;
@@ -446,7 +447,11 @@ const char *qemu_get_exec_dir(void)
 return exec_dir;
 }
 
+#ifdef CONFIG_LINUX
+static void sigbus_handler(int signal, siginfo_t *siginfo, void *ctx)
+#else /* CONFIG_LINUX */
 static void sigbus_handler(int signal)
+#endif /* CONFIG_LINUX */
 {
 int i;
 
@@ -459,6 +464,26 @@ static void sigbus_handler(int signal)
 }
 }
 }
+
+#ifdef CONFIG_LINUX
+/*
+ * We assume that the MCE SIGBUS handler could have been registered. We
+ * should never receive BUS_MCEERR_AO on any of our threads, but only on
+ * the main thread registered for PR_MCE_KILL_EARLY. Further, we should not
+ * receive BUS_MCEERR_AR triggered by action of other threads on one of
+ * our threads. So, no need to check for unrelated SIGBUS when seeing one
+ * for our threads.
+ *
+ * We will forward to the MCE handler, which will either handle the SIGBUS
+ * or reinstall the default SIGBUS handler and reraise the SIGBUS. The
+ * default SIGBUS handler will crash the process, so we don't care.
+ */
+if (sigbus_oldact.sa_flags & SA_SIGINFO) {
+sigbus_oldact.sa_sigaction(signal, siginfo, ctx);
+return;
+}
+#endif /* CONFIG_LINUX */
+warn_report("os_mem_prealloc: unrelated SIGBUS detected and ignored");
 }
 
 static void *do_touch_pages(void *arg)
@@ -628,10 +653,10 @@ void os_mem_prealloc(int fd, char *area, size_t memory, 
int smp_cpus,
 {
 static gsize initialized;
 int ret;
-struct sigaction act, oldact;
 size_t hpagesize = qemu_fd_getpagesize(fd);
 size_t numpages = DIV_ROUND_UP(memory, hpagesize);
 bool use_madv_populate_write;
+struct sigaction act;
 
 /*
  * Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for
@@ -647,10 +672,15 @@ void os_mem_prealloc(int fd, char *area, size_t memory, 
int smp_cpus,
 
 qemu_mutex_lock(_mutex);
 memset(, 0, sizeof(act));
+#ifdef CONFIG_LINUX
+act.sa_sigaction = _handler;
+act.sa_flags = SA_SIGINFO;
+#else /* CONFIG_LINUX */
 act.sa_handler = _handler;
 act.sa_flags = 0;
+#endif /* CONFIG_LINUX */
 
-ret = sigaction(SIGBUS, , );
+ret = sigaction(SIGBUS, , _oldact);
 if (ret) {
 error_setg_errno(errp, errno,
 "os_mem_prealloc: failed to install signal handler");
@@ -666,7 +696,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int 
smp_cpus,
 }
 
 if (!use_madv_populate_write) {
-ret = sigaction(SIGBUS, , NULL);
+ret = sigaction(SIGBUS, _oldact, NULL);
 if (ret) {
 /* Terminate QEMU since it can't recover from error */
 perror("os_mem_prealloc: failed to reinstall signal handler");
-- 
2.31.1




Re: [PATCH] hw/display: fix virgl reset regression

2021-07-22 Thread Marc-André Lureau
Hi

On Thu, Jul 22, 2021 at 4:12 PM Gerd Hoffmann  wrote:

> On Fri, Jul 02, 2021 at 04:32:21PM +0400, marcandre.lur...@redhat.com
> wrote:
> > From: Marc-André Lureau 
> >
> > Before commit 49afbca3b00e8e517d54964229a794b51768deaf ("virtio-gpu: drop
> > use_virgl_renderer"), use_virgl_renderer was preventing calling GL
> > functions from non-GL context threads. The innocuously looking
> >
> >   g->parent_obj.use_virgl_renderer = false;
> >
> > was set the first time virtio_gpu_gl_reset() was called, during
> > pc_machine_reset() in the main thread. Further virtio_gpu_gl_reset()
> > calls in IO threads, without associated GL context, were thus skipping
> > GL calls and avoided warnings or crashes (see also
> > https://gitlab.freedesktop.org/virgl/virglrenderer/-/issues/226).
>
> Conflicts with patch by Akihiko Odaki fixing the same thing or a
> related issue:
>
> virtio-gpu: Call Virgl only in the main thread
>
>
> https://patchwork.ozlabs.org/project/qemu-devel/patch/20210617113520.25973-1-akihiko.od...@gmail.com/
>
> Can you have a look please and suggest how to handle this?
>

Thanks, I didn't notice we were trying to solve the same issue.

Akihiko's patch indeed seems to solve the crash, but doesn't solve the
flood of asserts (on wayland):
qemu-system-x86_64: Gtk: gtk_gl_area_make_current: assertion
'gtk_widget_get_realized (widget)' failed
qemu-system-x86_64: Gdk: gdk_window_create_gl_context: assertion
'GDK_IS_WINDOW (window)' failed
qemu-system-x86_64: Gdk: gdk_gl_context_set_required_version: assertion
'GDK_IS_GL_CONTEXT (context)' failed
... and many more

My patch cleans it for me, I would suggest to take mine.

Fwiw, I just tested also on X11, and we have another regression that seems
unrelated:
qemu-system-x86_64: ../src/dispatch_common.c:858: epoxy_get_proc_address:
Assertion `0 && "Couldn't find current GLX or EGL context.\n"' failed.

sigh..



-- 
Marc-André Lureau


Re: [PATCH for-6.2 v2 11/11] tests/unit: Add a unit test for smp parsing

2021-07-22 Thread Andrew Jones
On Thu, Jul 22, 2021 at 02:15:18PM +0800, wangyanan (Y) wrote:
> On 2021/7/20 2:57, Andrew Jones wrote:
> > On Mon, Jul 19, 2021 at 11:20:43AM +0800, Yanan Wang wrote:
> > > Add a QEMU unit test for the parsing of given SMP configuration.
> > > Since all the parsing logic is in generic function smp_parse(),
> > > this test passes diffenent SMP configurations to the function
> > > and compare the parsing result with what is expected.
> > > 
> > > In the test, all possible collections of the topology parameters
> > > and the corressponding expected results are listed, including the
> > > valid and invalid ones.
> > > 
> > > The preference of sockets over cores and the preference of cores
> > > over sockets, and the support of multi-dies are also considered.
> > > 
> > > Signed-off-by: Yanan Wang 
> > > ---
> > >   MAINTAINERS |1 +
> > >   tests/unit/meson.build  |1 +
> > >   tests/unit/test-smp-parse.c | 1117 +++
> > >   3 files changed, 1119 insertions(+)
> > >   create mode 100644 tests/unit/test-smp-parse.c
> > > 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index 70633e3bf4..160dba2e57 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -1636,6 +1636,7 @@ F: include/hw/boards.h
> > >   F: include/hw/core/cpu.h
> > >   F: include/hw/cpu/cluster.h
> > >   F: include/sysemu/numa.h
> > > +F: tests/unit/test-smp-parse.c
> > >   T: git https://gitlab.com/ehabkost/qemu.git machine-next
> > >   Xtensa Machines
> > > diff --git a/tests/unit/meson.build b/tests/unit/meson.build
> > > index 3e0504dd21..694a924627 100644
> > > --- a/tests/unit/meson.build
> > > +++ b/tests/unit/meson.build
> > > @@ -44,6 +44,7 @@ tests = {
> > > 'test-uuid': [],
> > > 'ptimer-test': ['ptimer-test-stubs.c', meson.source_root() / 
> > > 'hw/core/ptimer.c'],
> > > 'test-qapi-util': [],
> > > +  'test-smp-parse': [qom, meson.source_root() / 'hw/core/machine-smp.c'],
> > >   }
> > >   if have_system or have_tools
> > > diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
> > > new file mode 100644
> > > index 00..bc1d324c3d
> > > --- /dev/null
> > > +++ b/tests/unit/test-smp-parse.c
> > > @@ -0,0 +1,1117 @@
> > > +/*
> > > + * SMP parsing unit-tests
> > > + *
> > > + * Copyright (C) 2021, Huawei, Inc.
> > > + *
> > > + * Authors:
> > > + *  Yanan Wang 
> > > + *
> > > + * This work is licensed under the terms of the GNU LGPL, version 2.1 or 
> > > later.
> > > + * See the COPYING.LIB file in the top-level directory.
> > > + */
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "qom/object.h"
> > > +#include "qemu/module.h"
> > > +#include "qapi/error.h"
> > > +
> > > +#include "hw/boards.h"
> > > +
> > > +#define T true
> > > +#define F false
> > > +
> > > +/**
> > > + * SMPTestData:
> > > + * @config - the given SMP configuration for parsing
> > > + * @should_be_valid - whether the given configuration is supposed to be 
> > > valid
> > > + * @expect - the CPU topology info expected to be parsed out
> > > + */
> > > +typedef struct SMPTestData {
> > > +SMPConfiguration config;
> > > +bool should_be_valid;
> > Long way to say 'valid'.
> Indeed..., "valid" should be enough.
> > > +CpuTopology expect;
> > > +} SMPTestData;
> > > +
> > > +/* the specific machine type info for this test */
> > > +static const TypeInfo smp_machine_info = {
> > > +.name = TYPE_MACHINE,
> > > +.parent = TYPE_OBJECT,
> > > +.class_size = sizeof(MachineClass),
> > > +.instance_size = sizeof(MachineState),
> > > +};
> > > +
> > > +/*
> > > + * prefer sockets over cores over threads before 6.2.
> > > + * all possible SMP configurations and the corressponding expected 
> > > outputs
> > corresponding (please run spell check on your commit messages)
> > 
> Ok, I missed the check.
> > > + * are listed for testing, including the valid and invalid ones.
> > > + */
> > > +static struct SMPTestData prefer_sockets[] = {
> > > +{
> > > +/* config: no smp configuration provided
> > > + * expect: cpus=1,sockets=1,dies=1,cores=1,threads=1,maxcpus=1 */
> > > +.config = (SMPConfiguration) { F, 0, F, 0, F, 0, F, 0, F, 0, F, 
> > > 0 },
> > SMPConfiguration and CpuTopology have named fields so we could drop the
> > 'expect: ...' comment line and instead do
> > 
> >   {
> >/* no configuration provided */
> >.config = { .has_cpus = F, .has_maxcpus = F, ... },
> >.valid = T,
> >.expect = { .sockets = 1, .cores = 1, ... },
> >   }, {
> >...
> >   }
> > 
> > which may be easier to maintain. OTOH, the concise form this approach has
> > is also nice.
> I tried the structure initialization with explicit name fields in it like
> above,
> actually we are supposed to do in this way so that we don't have to worry
> about the order change of the structure members.
> 
> But this would break the 80-char line limit or introduce more lines for

I wouldn't worry about 80-char (or even 90, which is when 

Re: [question] Shall we flush ITS tables into guest RAM when shutdown the VM?

2021-07-22 Thread Juan Quintela
Eric Auger  wrote:
> Hi Dave,
>
> On 7/6/21 4:19 PM, Dr. David Alan Gilbert wrote:
>> * Eric Auger (eric.au...@redhat.com) wrote:

...

>> Well, I initially wanted to know more about this scenario to determine
>> whether
>> a normal shutdown would fall into it.
> I think it was for save/restore use case. In that case you need to flush
> the KVM cache in memory on VM shutdown.
 Sorry for late reply.

 Can we distinguish from the 'RunState'?
 When we stop the VM, the RunState will be set. There are many types of
 RunState, such as RUN_STATE_FINISH_MIGRATE/RUN_STATE_SAVE_VM/
 RUN_STATE_SHUTDOWN/RUN_STATE_GUEST_PANICKED, etc.

 Maybe RUN_STATE_SHUTDOWN doesn't belong to save/restore use case,
 right?
>>> Adding Dave, Juan and Peter in the loop for migration expertise.
>>>
>>> At the moment we save the ARM ITS MSI controller tables whenever the VM
>>> gets stopped. Saving the tables from KVM caches into the guest RAM is
>>> needed for migration and save/restore use cases.
>>> However with GICv4 this fails at KVM level because some MSIs are
>>> forwarded and saving their state is not supported with GICv4.
>>>
>>> While GICv4 migration is not supported we would like the VM to work
>>> properly, ie. being stoppable without taking care of table saving.
>>>
>>> So could we be more precise and identifiy the save/restore and migration
>>> use cases instead of saving the tables on each VM shutdown.
>> During the precopy migration (not sure about others), we do:
>>
>> static void migration_completion(MigrationState *s)
>> {
>> 
>> ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>> ...
>> ret = qemu_savevm_state_complete_precopy(s->to_dst_file, 
>> false,
>>  inactivate);
>>
>> so I think we do have that state there to hook off.
>
> That's consistent with what you suggested in the past ans what is logged
> in the commit message of
>
> cddafd8f353d2d251b1a5c6c948a577a85838582 ("hw/intc/arm_gicv3_its:
> Implement state save/restore").

Hi

Ouch, it is really a mess.  Why do we need to save it to RAM instead of
saving it to anywhere else?

I guess that the answer is that we don't want to know what the state is,
so we are mgrating a opaque blob.

> However does the save/restore enters that state. If I remember
> correctly that's why I decided to do the save on each VM stop instead.

>>> The tables are saved into guest RAM so when need the CPUs and devices to
>>> be stopped but we need the guest RAM to be saved after the ITS save
>>> operation.

Saving this data into RAM dirties the bitmaps, right?


>> Yeh so what should happen is that you:
>>a) Iterate RAM a lot
>>b) You stop everything
>>  -> Flushes remaining changes into RAM
>>c) Transmit device state and last bits of RAM changes.
>>
>> so that flush should happen at (b).
> That's correct.

/* does a state transition even if the VM is already stopped,
   current state is forgotten forever */
int vm_stop_force_state(RunState state)
{
if (runstate_is_running()) {
return vm_stop(state);
} else {
int ret;
runstate_set(state);

bdrv_drain_all();
/* Make sure to return an error if the flush in a previous vm_stop()
 * failed. */
ret = bdrv_flush_all();
trace_vm_stop_flush_all(ret);
return ret;
}
}

You really want to hook here, like the block layer.
But as far as I can see, there is no generic way to put a hook there.

And the path is different if the machine is running or not.

Thinking about how to put a hook there.
Welcome if you have a good name for the hook.

Later, Juan.




Re: [PATCH for-6.1 0/1] machine: Disallow specifying topology parameters as zero

2021-07-22 Thread Andrew Jones
On Thu, Jul 22, 2021 at 08:02:16AM +0200, Cornelia Huck wrote:
> On Thu, Jul 22 2021, Yanan Wang  wrote:
> 
> > In the SMP configuration, we should either specify a topology
> > parameter with a reasonable value (equal to or greater than 1)
> > or just leave it omitted and QEMU will calculate its value.
> > Configurations which explicitly specify the topology parameters
> > as zero like "sockets=0" are meaningless, so disallow them.
> >
> > However; the commit 1e63fe685804d
> > (machine: pass QAPI struct to mc->smp_parse) has documented that
> > '0' has the same semantics as omitting a parameter in the qapi
> > comment for SMPConfiguration. So this patch fixes the doc and
> > also adds the corresponding sanity check in the smp parsers.
> 
> Are we expecting any real users to have used that 'parameter=0'
> behaviour? I expect that libvirt and other management software already
> did the right thing; unfortunately, sometimes weird configuration lines
> tend to persist in search results.

I understand this concern. I think the only documentation we had prior to
commit 1e63fe685804 was

DEF("smp", HAS_ARG, QEMU_OPTION_smp,
"-smp 
[cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
"set the number of CPUs to 'n' [default=1]\n"
"maxcpus= maximum number of total cpus, including\n"
"offline CPUs for hotplug, etc\n"
"cores= number of CPU cores on one socket (for PC, it's on 
one die)\n"
"threads= number of threads on one CPU core\n"
"dies= number of CPU dies on one socket (for PC only)\n"
"sockets= number of discrete sockets in the system\n",
QEMU_ARCH_ALL)
SRST
``-smp 
[cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
are supported. On Sparc32 target, Linux limits the number of usable
CPUs to 4. For the PC target, the number of cores per die, the
number of threads per cores, the number of dies per packages and the
total number of sockets can be specified. Missing values will be
computed. If any on the three values is given, the total number of
CPUs n can be omitted. maxcpus specifies the maximum number of
hotpluggable CPUs.
ERST

This doesn't mention zero inputs and even implies non-zero inputs.

I'm not sure if we need to worry about the odd command line that used zero
for some parameters. What do you think?

Thanks,
drew


> 
> >
> > This patch originly comes form [1], and it was suggested that
> > this patch fixing the doc should be sent for 6.1 to avoid a
> > deprecation process in the future.
> >
> > [1] https://lore.kernel.org/qemu-devel/ypwsthpiza3mf...@redhat.com/
> >
> > Yanan Wang (1):
> >   machine: Disallow specifying topology parameters as zero
> >
> >  hw/core/machine.c | 30 ++
> >  hw/i386/pc.c  | 33 -
> >  qapi/machine.json |  6 +++---
> >  3 files changed; 49 insertions(+); 20 deletions(-)
> 




Re: [PATCH 1/2] qapi: introduce forwarding visitor

2021-07-22 Thread Markus Armbruster
Paolo Bonzini  writes:

> This new adaptor visitor takes a single field of the adaptee, and exposes it
> with a different name.
>
> This will be used for QOM alias properties.  Alias targets can of course
> have a different name than the alias property itself (e.g. a machine's
> pflash0 might be an alias of a property named 'drive').  When the target's
> getter or setter invokes the visitor, it will use a different name than
> what the caller expects, and the visitor will not be able to find it
> (or will consume erroneously).
>
> The solution is for alias getters and setters to wrap the incoming
> visitor, and forward the sole field that the target is expecting while
> renaming it appropriately.

Double-checking: the other fields are not accessible via this visitor.
Correct?

>
> Signed-off-by: Paolo Bonzini 
> ---
>  include/qapi/forward-visitor.h|  27 +++
>  qapi/meson.build  |   1 +
>  qapi/qapi-forward-visitor.c   | 307 ++
>  tests/unit/meson.build|   1 +
>  tests/unit/test-forward-visitor.c | 165 
>  5 files changed, 501 insertions(+)
>  create mode 100644 include/qapi/forward-visitor.h
>  create mode 100644 qapi/qapi-forward-visitor.c
>  create mode 100644 tests/unit/test-forward-visitor.c

Missing: update of the big comment in include/qapi/visitor.h.  Can be
done on top.

>
> diff --git a/include/qapi/forward-visitor.h b/include/qapi/forward-visitor.h
> new file mode 100644
> index 00..c7002d53e6
> --- /dev/null
> +++ b/include/qapi/forward-visitor.h
> @@ -0,0 +1,27 @@
> +/*
> + * Forwarding visitor
> + *
> + * Copyright Red Hat, Inc. 2021
> + *
> + * Author: Paolo Bonzini 
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.1 or 
> later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#ifndef FORWARD_VISITOR_H
> +#define FORWARD_VISITOR_H
> +
> +#include "qapi/visitor.h"
> +
> +typedef struct ForwardFieldVisitor ForwardFieldVisitor;
> +
> +/*
> + * The forwarding visitor only expects a single name, @from, to be passed for
> + * toplevel fields.  It is converted to @to and forward to the @target 
> visitor.
> + * Calls within a struct are forwarded without changing the name.
> + */
> +Visitor *visitor_forward_field(Visitor *target, const char *from, const char 
> *to);
> +
> +#endif
> diff --git a/qapi/meson.build b/qapi/meson.build
> index 376f4ceafe..c356a385e3 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -2,6 +2,7 @@ util_ss.add(files(
>'opts-visitor.c',
>'qapi-clone-visitor.c',
>'qapi-dealloc-visitor.c',
> +  'qapi-forward-visitor.c',
>'qapi-util.c',
>'qapi-visit-core.c',
>'qobject-input-visitor.c',
> diff --git a/qapi/qapi-forward-visitor.c b/qapi/qapi-forward-visitor.c
> new file mode 100644
> index 00..bc6412d52e
> --- /dev/null
> +++ b/qapi/qapi-forward-visitor.c
> @@ -0,0 +1,307 @@
> +/*
> + * Forward Visitor
> + *
> + * Copyright (C) 2021 Red Hat, Inc.
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.1 or 
> later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include 
> +#include "qapi/compat-policy.h"
> +#include "qapi/error.h"
> +#include "qapi/forward-visitor.h"
> +#include "qapi/visitor-impl.h"
> +#include "qemu/queue.h"
> +#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qbool.h"
> +#include "qapi/qmp/qdict.h"
> +#include "qapi/qmp/qerror.h"
> +#include "qapi/qmp/qlist.h"
> +#include "qapi/qmp/qnull.h"
> +#include "qapi/qmp/qnum.h"
> +#include "qapi/qmp/qstring.h"
> +#include "qemu/cutils.h"
> +#include "qemu/option.h"
> +
> +struct ForwardFieldVisitor {
> +Visitor visitor;
> +
> +Visitor *target;
> +char *from;
> +char *to;
> +
> +int depth;
> +};

Comment the members?  In particular @depth.

> +
> +static ForwardFieldVisitor *to_ffv(Visitor *v)
> +{
> +return container_of(v, ForwardFieldVisitor, visitor);
> +}
> +
> +static bool forward_field_translate_name(ForwardFieldVisitor *v, const char 
> **name,
> + Error **errp)
> +{
> +if (v->depth) {
> +return true;
> +}

Succeed when we're in a sub-struct.

> +if (g_str_equal(*name, v->from)) {
> +*name = v->to;
> +return true;
> +}

Succeed when we're in the root struct and @name is the alias name.
Replace the alias name by the real one.

> +error_setg(errp, QERR_MISSING_PARAMETER, *name);
> +return false;

Fail when we're in the root struct and @name is not the alias name.

> +}

Can you explain why you treat names in sub-structs differently than
names other than the alias name in the root struct?

> +
> +static bool forward_field_check_struct(Visitor *v, Error **errp)
> +{
> +ForwardFieldVisitor *ffv = to_ffv(v);

Humor me: blank line between declarations and statements.

> +return visit_check_struct(ffv->target, errp);
> +}
> +
> 

Re: [PATCH v2 1/6] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()

2021-07-22 Thread David Hildenbrand

On 22.07.21 15:47, Daniel P. Berrangé wrote:

On Thu, Jul 22, 2021 at 03:39:50PM +0200, David Hildenbrand wrote:

On 22.07.21 15:31, Daniel P. Berrangé wrote:

On Thu, Jul 22, 2021 at 02:36:30PM +0200, David Hildenbrand wrote:

Let's sense support and use it for preallocation. MADV_POPULATE_WRITE
does not require a SIGBUS handler, doesn't actually touch page content,
and avoids context switches; it is, therefore, faster and easier to handle
than our current approach.

While MADV_POPULATE_WRITE is, in general, faster than manual
prefaulting, and especially faster with 4k pages, there is still value in
prefaulting using multiple threads to speed up preallocation.

More details on MADV_POPULATE_WRITE can be found in the Linux commit
4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault
page tables") and in the man page proposal [1].

[1] https://lkml.kernel.org/r/20210712083917.16361-1-da...@redhat.com

This resolves the TODO in do_touch_pages().

In the future, we might want to look into using fallocate(), eventually
combined with MADV_POPULATE_READ, when dealing with shared file
mappings.

Reviewed-by: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
---
   include/qemu/osdep.h |  7 
   util/oslib-posix.c   | 88 +---
   2 files changed, 74 insertions(+), 21 deletions(-)




@@ -497,6 +493,31 @@ static void *do_touch_pages(void *arg)
   return NULL;
   }
+static void *do_madv_populate_write_pages(void *arg)
+{
+MemsetThread *memset_args = (MemsetThread *)arg;
+const size_t size = memset_args->numpages * memset_args->hpagesize;
+char * const addr = memset_args->addr;
+int ret;
+
+if (!size) {
+return NULL;
+}
+
+/* See do_touch_pages(). */
+qemu_mutex_lock(_mutex);
+while (!threads_created_flag) {
+qemu_cond_wait(_cond, _mutex);
+}
+qemu_mutex_unlock(_mutex);
+
+ret = qemu_madvise(addr, size, QEMU_MADV_POPULATE_WRITE);
+if (ret) {
+memset_thread_failed = true;


I'm wondering if this use of memset_thread_failed is sufficient.

This is pre-existing from the current impl, and ends up being
used to set the bool result of 'touch_all_pages'. The caller
of that then does

  if (touch_all_pages(area, hpagesize, numpages, smp_cpus)) {
  error_setg(errp, "os_mem_prealloc: Insufficient free host memory "
  "pages available to allocate guest RAM");
  }

this was reasonable with the old impl, because the only reason
we ever see 'memset_thread_failed==true' is if we got SIGBUS
due to ENOMEM.

My concern is that madvise() has a bunch of possible errno
codes returned on failure, and we're not distinguishing
them. In the past this kind of thing has burnt us making
failures hard to debug.

Could we turn 'bool memset_thread_failed' into 'int memset_thread_errno'

Then, we can make 'touch_all_pages' have an 'Error **errp'
parameter, and it can directly call

   error_setg_errno(errp, memset_thead_errno, some message...)

when memset_thread_errno is non-zero, and thus we can remove
the generic message from the caller of touch_all_pages.

If you agree, it'd be best to refactor the existing code to
use this pattern in an initial patch.


We could also simply trace the return value, which should be comparatively
easy to add. We should be getting either -ENOMEM or -EHWPOISON. And the
latter is highly unlikely to happen when actually preallocating.

We made sure that we don't end up with -EINVAL as we're sensing of
MADV_POPULATE_WRITE works on the mapping.


Those are in the "normal" usage scenarios. I'm wondering about the
abnormal scenarios where QEMU code is mistakenly screwed up or
libvirt / mgmt app makes some config mistake. eg we can get
things like EPERM if selinux or seccomp block the madvise
syscall by mistake (common if EQMU is inside docker for example),
or can we get EINVAL if the 'addr' is not page aligned, and so on.


So when it comes to debugging, I'd actually prefer tracing -errno, as the
real error will be of little help to end users.


I don't care about the end users interpreting it, rather us as maintainers
who get a bug report containing insufficient info to diagnose the root
cause.


Well, okay. I'll have a look how this turns out.

--
Thanks,

David / dhildenb




Re: [PATCH 1/2] qxl: remove assert in qxl_pre_save.

2021-07-22 Thread Dr. David Alan Gilbert
* Gerd Hoffmann (kra...@redhat.com) wrote:
> Since commit 551dbd0846d2 ("migration: check pre_save return in
> vmstate_save_state") the pre_save hook can fail.  So lets finally
> use that to drop the guest-triggerable assert in qxl_pre_save().
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/display/qxl.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/display/qxl.c b/hw/display/qxl.c
> index 84f99088e0a0..3867b94fe236 100644
> --- a/hw/display/qxl.c
> +++ b/hw/display/qxl.c
> @@ -2283,7 +2283,9 @@ static int qxl_pre_save(void *opaque)
>  } else {
>  d->last_release_offset = (uint8_t *)d->last_release - ram_start;
>  }
> -assert(d->last_release_offset < d->vga.vram_size);
> +if (d->last_release_offset < d->vga.vram_size) {
> +return 1;

It would be great if there was an error_report or something there
so that we get some idea of what happened.

Dave

> +}
>  
>  return 0;
>  }
> -- 
> 2.31.1
> 
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v2 4/5] migration: Teach QEMUFile to be QIOChannel-aware

2021-07-22 Thread Dr. David Alan Gilbert
* Peter Xu (pet...@redhat.com) wrote:
> migration uses QIOChannel typed qemufiles.  In follow up patches, we'll need
> the capability to identify this fact, so that we can get the backing 
> QIOChannel
> from a QEMUFile.
> 
> We can also define types for QEMUFile but so far since we only need to be able
> to identify QIOChannel, introduce a boolean which is simpler.
> 
> Introduce another helper qemu_file_get_ioc() to return the ioc backend of a
> qemufile if has_ioc is set.
> 
> No functional change.
> 
> Signed-off-by: Peter Xu 

Yep, one day we'll sort out the block case, but until then

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/qemu-file-channel.c |  4 ++--
>  migration/qemu-file.c | 17 -
>  migration/qemu-file.h |  4 +++-
>  migration/ram.c   |  2 +-
>  migration/savevm.c|  4 ++--
>  5 files changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
> index 867a5ed0c3..2f8b1fcd46 100644
> --- a/migration/qemu-file-channel.c
> +++ b/migration/qemu-file-channel.c
> @@ -187,11 +187,11 @@ static const QEMUFileOps channel_output_ops = {
>  QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc)
>  {
>  object_ref(OBJECT(ioc));
> -return qemu_fopen_ops(ioc, _input_ops);
> +return qemu_fopen_ops(ioc, _input_ops, true);
>  }
>  
>  QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc)
>  {
>  object_ref(OBJECT(ioc));
> -return qemu_fopen_ops(ioc, _output_ops);
> +return qemu_fopen_ops(ioc, _output_ops, true);
>  }
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 1eacf9e831..6338d8e2ff 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -55,6 +55,8 @@ struct QEMUFile {
>  Error *last_error_obj;
>  /* has the file has been shutdown */
>  bool shutdown;
> +/* Whether opaque points to a QIOChannel */
> +bool has_ioc;
>  };
>  
>  /*
> @@ -101,7 +103,7 @@ bool qemu_file_mode_is_not_valid(const char *mode)
>  return false;
>  }
>  
> -QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
> +QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops, bool has_ioc)
>  {
>  QEMUFile *f;
>  
> @@ -109,6 +111,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
> *ops)
>  
>  f->opaque = opaque;
>  f->ops = ops;
> +f->has_ioc = has_ioc;
>  return f;
>  }
>  
> @@ -851,3 +854,15 @@ void qemu_file_set_blocking(QEMUFile *f, bool block)
>  f->ops->set_blocking(f->opaque, block, NULL);
>  }
>  }
> +
> +/*
> + * Return the ioc object if it's a migration channel.  Note: it can return 
> NULL
> + * for callers passing in a non-migration qemufile.  E.g. see 
> qemu_fopen_bdrv()
> + * and its usage in e.g. load_snapshot().  So we need to check against NULL
> + * before using it.  If without the check, migration_incoming_state_destroy()
> + * could fail for load_snapshot().
> + */
> +QIOChannel *qemu_file_get_ioc(QEMUFile *file)
> +{
> +return file->has_ioc ? QIO_CHANNEL(file->opaque) : NULL;
> +}
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index a9b6d6ccb7..3f36d4dc8c 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -27,6 +27,7 @@
>  
>  #include 
>  #include "exec/cpu-common.h"
> +#include "io/channel.h"
>  
>  /* Read a chunk of data from a file at the given position.  The pos argument
>   * can be ignored if the file is only be used for streaming.  The number of
> @@ -119,7 +120,7 @@ typedef struct QEMUFileHooks {
>  QEMURamSaveFunc *save_page;
>  } QEMUFileHooks;
>  
> -QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
> +QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops, bool has_ioc);
>  void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
>  int qemu_get_fd(QEMUFile *f);
>  int qemu_fclose(QEMUFile *f);
> @@ -179,5 +180,6 @@ void ram_control_load_hook(QEMUFile *f, uint64_t flags, 
> void *data);
>  size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
>   ram_addr_t offset, size_t size,
>   uint64_t *bytes_sent);
> +QIOChannel *qemu_file_get_ioc(QEMUFile *file);
>  
>  #endif
> diff --git a/migration/ram.c b/migration/ram.c
> index f728f5072f..08b3cb7a4a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -550,7 +550,7 @@ static int compress_threads_save_setup(void)
>  /* comp_param[i].file is just used as a dummy buffer to save data,
>   * set its ops to empty.
>   */
> -comp_param[i].file = qemu_fopen_ops(NULL, _ops);
> +comp_param[i].file = qemu_fopen_ops(NULL, _ops, false);
>  comp_param[i].done = true;
>  comp_param[i].quit = false;
>  qemu_mutex_init(_param[i].mutex);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 72848b946c..96b5e5d639 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -168,9 

Re: [PATCH v2 5/5] migration: Move the yank unregister of channel_close out

2021-07-22 Thread Dr. David Alan Gilbert
* Peter Xu (pet...@redhat.com) wrote:
> It's efficient, but hackish to call yank unregister calls in channel_close(),
> especially it'll be hard to debug when qemu crashed with some yank function
> leaked.
> 
> Remove that hack, but instead explicitly unregister yank functions at the
> places where needed, they are:
> 
>   (on src)
>   - migrate_fd_cleanup
>   - postcopy_pause
> 
>   (on dst)
>   - migration_incoming_state_destroy
>   - postcopy_pause_incoming
> 
> Signed-off-by: Peter Xu 
> ---
>  migration/migration.c |  5 +
>  migration/qemu-file-channel.c |  3 ---
>  migration/savevm.c|  7 +++
>  migration/yank_functions.c| 14 ++
>  migration/yank_functions.h|  1 +
>  5 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index fa70400f98..bfeb65b8f7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -59,6 +59,7 @@
>  #include "multifd.h"
>  #include "qemu/yank.h"
>  #include "sysemu/cpus.h"
> +#include "yank_functions.h"
>  
>  #define MAX_THROTTLE  (128 << 20)  /* Migration transfer speed 
> throttling */
>  
> @@ -273,6 +274,7 @@ void migration_incoming_state_destroy(void)
>  }
>  
>  if (mis->from_src_file) {
> +migration_ioc_unregister_yank_from_file(mis->from_src_file);
>  qemu_fclose(mis->from_src_file);
>  mis->from_src_file = NULL;
>  }
> @@ -1811,6 +1813,7 @@ static void migrate_fd_cleanup(MigrationState *s)
>   * Close the file handle without the lock to make sure the
>   * critical section won't block for long.
>   */
> +migration_ioc_unregister_yank_from_file(tmp);
>  qemu_fclose(tmp);
>  }
>  
> @@ -3352,6 +3355,8 @@ static MigThrError postcopy_pause(MigrationState *s)
>  
>  /* Current channel is possibly broken. Release it. */
>  assert(s->to_dst_file);
> +/* Unregister yank for current channel */
> +migration_ioc_unregister_yank_from_file(s->to_dst_file);

Should this go inside the lock?

Dave

>  qemu_mutex_lock(>qemu_file_lock);
>  file = s->to_dst_file;
>  s->to_dst_file = NULL;
> diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
> index 2f8b1fcd46..bb5a5752df 100644
> --- a/migration/qemu-file-channel.c
> +++ b/migration/qemu-file-channel.c
> @@ -107,9 +107,6 @@ static int channel_close(void *opaque, Error **errp)
>  int ret;
>  QIOChannel *ioc = QIO_CHANNEL(opaque);
>  ret = qio_channel_close(ioc, errp);
> -if (OBJECT(ioc)->ref == 1) {
> -migration_ioc_unregister_yank(ioc);
> -}
>  object_unref(OBJECT(ioc));
>  return ret;
>  }
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 96b5e5d639..7b7b64bd13 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -65,6 +65,7 @@
>  #include "qemu/bitmap.h"
>  #include "net/announce.h"
>  #include "qemu/yank.h"
> +#include "yank_functions.h"
>  
>  const unsigned int postcopy_ram_discard_version;
>  
> @@ -2568,6 +2569,12 @@ static bool 
> postcopy_pause_incoming(MigrationIncomingState *mis)
>  /* Clear the triggered bit to allow one recovery */
>  mis->postcopy_recover_triggered = false;
>  
> +/*
> + * Unregister yank with either from/to src would work, since ioc behind 
> it
> + * is the same
> + */
> +migration_ioc_unregister_yank_from_file(mis->from_src_file);
> +
>  assert(mis->from_src_file);
>  qemu_file_shutdown(mis->from_src_file);
>  qemu_fclose(mis->from_src_file);
> diff --git a/migration/yank_functions.c b/migration/yank_functions.c
> index 23697173ae..8c08aef14a 100644
> --- a/migration/yank_functions.c
> +++ b/migration/yank_functions.c
> @@ -14,6 +14,7 @@
>  #include "qemu/yank.h"
>  #include "io/channel-socket.h"
>  #include "io/channel-tls.h"
> +#include "qemu-file.h"
>  
>  void migration_yank_iochannel(void *opaque)
>  {
> @@ -46,3 +47,16 @@ void migration_ioc_unregister_yank(QIOChannel *ioc)
>   QIO_CHANNEL(ioc));
>  }
>  }
> +
> +void migration_ioc_unregister_yank_from_file(QEMUFile *file)
> +{
> +QIOChannel *ioc = qemu_file_get_ioc(file);
> +
> +if (ioc) {
> +/*
> + * For migration qemufiles, we'll always reach here.  Though we'll 
> skip
> + * calls from e.g. savevm/loadvm as they don't use yank.
> + */
> +migration_ioc_unregister_yank(ioc);
> +}
> +}
> diff --git a/migration/yank_functions.h b/migration/yank_functions.h
> index 74c7f18c91..a7577955ed 100644
> --- a/migration/yank_functions.h
> +++ b/migration/yank_functions.h
> @@ -17,3 +17,4 @@
>  void migration_yank_iochannel(void *opaque);
>  void migration_ioc_register_yank(QIOChannel *ioc);
>  void migration_ioc_unregister_yank(QIOChannel *ioc);
> +void migration_ioc_unregister_yank_from_file(QEMUFile *file);
> -- 
> 2.31.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / 

[PULL 01/15] qemu-config: never call the callback after an error, fix leak

2021-07-22 Thread Paolo Bonzini
Ensure that the callback to qemu_config_foreach is never called upon
an error, by moving the invocation before the "out" label.

Cc: arm...@redhat.com
Fixes: 3770141139 ("qemu-config: parse configuration files to a QDict", 
2021-06-04)
Signed-off-by: Paolo Bonzini 
---
 util/qemu-config.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/qemu-config.c b/util/qemu-config.c
index 84ee6dc4ea..7db810f1e0 100644
--- a/util/qemu-config.c
+++ b/util/qemu-config.c
@@ -417,12 +417,12 @@ static int qemu_config_foreach(FILE *fp, QEMUConfigCB 
*cb, void *opaque,
 return res;
 }
 res = count;
-out:
 if (qdict) {
 cb(group, qdict, opaque, errp);
-qobject_unref(qdict);
 }
+out:
 loc_pop();
+qobject_unref(qdict);
 return res;
 }
 
-- 
2.31.1





[PULL 08/15] target/i386: Added consistency checks for CR4

2021-07-22 Thread Paolo Bonzini
From: Lara Lazier 

All MBZ bits in CR4 must be zero. (APM2 15.5)
Added reserved bitmask and added checks in both
helper_vmrun and helper_write_crN.

Signed-off-by: Lara Lazier 
Message-Id: <20210721152651.14683-2-laramglaz...@gmail.com>
Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.h| 39 
 target/i386/tcg/sysemu/misc_helper.c |  3 +++
 target/i386/tcg/sysemu/svm_helper.c  |  9 ---
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5d98a4e7c0..1f7e8d7f0a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -240,6 +240,7 @@ typedef enum X86Seg {
 #define CR4_OSFXSR_SHIFT 9
 #define CR4_OSFXSR_MASK (1U << CR4_OSFXSR_SHIFT)
 #define CR4_OSXMMEXCPT_MASK  (1U << 10)
+#define CR4_UMIP_MASK   (1U << 11)
 #define CR4_LA57_MASK   (1U << 12)
 #define CR4_VMXE_MASK   (1U << 13)
 #define CR4_SMXE_MASK   (1U << 14)
@@ -251,6 +252,14 @@ typedef enum X86Seg {
 #define CR4_PKE_MASK   (1U << 22)
 #define CR4_PKS_MASK   (1U << 24)
 
+#define CR4_RESERVED_MASK \
+(~(target_ulong)(CR4_VME_MASK | CR4_PVI_MASK | CR4_TSD_MASK \
+| CR4_DE_MASK | CR4_PSE_MASK | CR4_PAE_MASK \
+| CR4_MCE_MASK | CR4_PGE_MASK | CR4_PCE_MASK \
+| CR4_OSFXSR_MASK | CR4_OSXMMEXCPT_MASK |CR4_UMIP_MASK \
+| CR4_FSGSBASE_MASK | CR4_PCIDE_MASK | CR4_OSXSAVE_MASK \
+| CR4_SMEP_MASK | CR4_SMAP_MASK | CR4_PKE_MASK | CR4_PKS_MASK))
+
 #define DR6_BD  (1 << 13)
 #define DR6_BS  (1 << 14)
 #define DR6_BT  (1 << 15)
@@ -2196,6 +2205,36 @@ static inline bool hyperv_feat_enabled(X86CPU *cpu, int 
feat)
 return !!(cpu->hyperv_features & BIT(feat));
 }
 
+static inline uint64_t cr4_reserved_bits(CPUX86State *env)
+{
+uint64_t reserved_bits = CR4_RESERVED_MASK;
+if (!env->features[FEAT_XSAVE]) {
+reserved_bits |= CR4_OSXSAVE_MASK;
+}
+if (!(env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_SMEP)) {
+reserved_bits |= CR4_SMEP_MASK;
+}
+if (!(env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_SMAP)) {
+reserved_bits |= CR4_SMAP_MASK;
+}
+if (!(env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_FSGSBASE)) {
+reserved_bits |= CR4_FSGSBASE_MASK;
+}
+if (!(env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKU)) {
+reserved_bits |= CR4_PKE_MASK;
+}
+if (!(env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57)) {
+reserved_bits |= CR4_LA57_MASK;
+}
+if (!(env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_UMIP)) {
+reserved_bits |= CR4_UMIP_MASK;
+}
+if (!(env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKS)) {
+reserved_bits |= CR4_PKS_MASK;
+}
+return reserved_bits;
+}
+
 #if defined(TARGET_X86_64) && \
 defined(CONFIG_USER_ONLY) && \
 defined(CONFIG_LINUX)
diff --git a/target/i386/tcg/sysemu/misc_helper.c 
b/target/i386/tcg/sysemu/misc_helper.c
index db0d8a9d79..a2af2c9bba 100644
--- a/target/i386/tcg/sysemu/misc_helper.c
+++ b/target/i386/tcg/sysemu/misc_helper.c
@@ -99,6 +99,9 @@ void helper_write_crN(CPUX86State *env, int reg, target_ulong 
t0)
 cpu_x86_update_cr3(env, t0);
 break;
 case 4:
+if (t0 & cr4_reserved_bits(env)) {
+cpu_vmexit(env, SVM_EXIT_ERR, 0, GETPC());
+}
 if (((t0 ^ env->cr[4]) & CR4_LA57_MASK) &&
 (env->hflags & HF_CS64_MASK)) {
 raise_exception_ra(env, EXCP0D_GPF, GETPC());
diff --git a/target/i386/tcg/sysemu/svm_helper.c 
b/target/i386/tcg/sysemu/svm_helper.c
index 72b03a345d..d7d7a86aa9 100644
--- a/target/i386/tcg/sysemu/svm_helper.c
+++ b/target/i386/tcg/sysemu/svm_helper.c
@@ -85,6 +85,7 @@ void helper_vmrun(CPUX86State *env, int aflag, int 
next_eip_addend)
 uint32_t int_ctl;
 uint32_t asid;
 uint64_t new_cr0;
+uint64_t new_cr4;
 
 cpu_svm_check_intercept_param(env, SVM_EXIT_VMRUN, 0, GETPC());
 
@@ -225,14 +226,16 @@ void helper_vmrun(CPUX86State *env, int aflag, int 
next_eip_addend)
 if ((new_cr0 & CR0_NW_MASK) && !(new_cr0 & CR0_CD_MASK)) {
 cpu_vmexit(env, SVM_EXIT_ERR, 0, GETPC());
 }
+new_cr4 = x86_ldq_phys(cs, env->vm_vmcb + offsetof(struct vmcb, save.cr4));
+if (new_cr4 & cr4_reserved_bits(env)) {
+cpu_vmexit(env, SVM_EXIT_ERR, 0, GETPC());
+}
 /* clear exit_info_2 so we behave like the real hardware */
 x86_stq_phys(cs,
  env->vm_vmcb + offsetof(struct vmcb, control.exit_info_2), 0);
 
 cpu_x86_update_cr0(env, new_cr0);
-cpu_x86_update_cr4(env, x86_ldq_phys(cs,
- env->vm_vmcb + offsetof(struct vmcb,
- save.cr4)));
+cpu_x86_update_cr4(env, new_cr4);
 cpu_x86_update_cr3(env, x86_ldq_phys(cs,
  env->vm_vmcb + offsetof(struct vmcb,
  save.cr3)));
-- 
2.31.1


[PULL 00/15] Misc bugfix patches for 2021-07-22

2021-07-22 Thread Paolo Bonzini
The following changes since commit 143c2e0432859826c9e8d5b2baa307355f1a5332:

  Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2021-07-19' 
into staging (2021-07-19 19:06:05 +0100)

are available in the Git repository at:

  https://gitlab.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to 0848f8aca6f7b13f2a755c2593b0a1cbb39f658e:

  configure: Let --without-default-features disable vhost-kernel and vhost-vdpa 
(2021-07-22 14:44:51 +0200)


Bugfixes.


Gerd Hoffmann (1):
  usb: fix usb-host dependency check

Jason Andryuk (1):
  vl: Parse legacy default_machine_opts

Lara Lazier (3):
  target/i386: Added V_INTR_PRIO check to virtual interrupts
  target/i386: Added consistency checks for CR4
  target/i386: Added consistency checks for EFER

Paolo Bonzini (4):
  qemu-config: never call the callback after an error, fix leak
  qemu-config: fix memory leak on ferror()
  chardev-spice: add missing module_obj directive
  meson: fix dependencies for modinfo

Stefan Hajnoczi (1):
  qemu-config: restore "machine" in qmp_query_command_line_options()

Thomas Huth (5):
  configure: Drop obsolete check for the alloc_size attribute
  configure: Fix --without-default-features propagation to meson
  configure: Allow vnc to get disabled with --without-default-features
  configure: Fix the default setting of the "xen" feature
  configure: Let --without-default-features disable vhost-kernel and 
vhost-vdpa

 chardev/spice.c  |  1 +
 configure| 20 +++-
 hw/usb/meson.build   |  2 +-
 meson.build  |  6 ++--
 softmmu/vl.c |  1 +
 target/i386/cpu.h| 44 ++
 target/i386/tcg/sysemu/misc_helper.c |  3 ++
 target/i386/tcg/sysemu/svm_helper.c  | 60 +---
 util/qemu-config.c   | 16 +++---
 9 files changed, 125 insertions(+), 28 deletions(-)
-- 
2.31.1




[PULL 13/15] configure: Allow vnc to get disabled with --without-default-features

2021-07-22 Thread Paolo Bonzini
From: Thomas Huth 

There's no reason why we should keep VNC enabled when the user
specified --without-default-features.

Reported-by: Cole Robinson 
Signed-off-by: Thomas Huth 
Message-Id: <20210713093155.677589-3-th...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 configure   | 2 +-
 meson.build | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 19c4bc1192..468aac58e2 100755
--- a/configure
+++ b/configure
@@ -304,7 +304,7 @@ virtiofsd="auto"
 virtfs="auto"
 libudev="auto"
 mpath="auto"
-vnc="enabled"
+vnc="auto"
 sparse="auto"
 vde="$default_feature"
 vnc_sasl="auto"
diff --git a/meson.build b/meson.build
index eb85fe8e44..cb3856fc35 100644
--- a/meson.build
+++ b/meson.build
@@ -930,7 +930,7 @@ vnc = not_found
 png = not_found
 jpeg = not_found
 sasl = not_found
-if get_option('vnc').enabled()
+if not get_option('vnc').disabled()
   vnc = declare_dependency() # dummy dependency
   png = dependency('libpng', required: get_option('vnc_png'),
method: 'pkg-config', kwargs: static_kwargs)
-- 
2.31.1





Re: [PATCH-for-6.1] gitlab-ci: Extract OpenSBI job rules to reusable section

2021-07-22 Thread Daniel P . Berrangé
On Tue, Jul 20, 2021 at 06:48:29PM +0200, Philippe Mathieu-Daudé wrote:
> All jobs depending on 'docker-opensbi' job must use at most all
> the rules that triggers it. The simplest way to ensure that
> is to always use the same rules. Extract all the rules to a
> reusable section, and include this section (with the 'extends'
> keyword) in both 'docker-opensbi' and 'build-opensbi' jobs.
> 
> The problem was introduced in commit c6fc0fc1a71 ("gitlab-ci.yml:
> Add jobs to build OpenSBI firmware binaries"), but was revealed in
> commit 91e9c47e50a ("docker: OpenSBI build job depends on OpenSBI
> container").
> 
> This fix is similar to the one used with the EDK2 firmware job in
> commit ac0595cf6b3 ("gitlab-ci: Extract EDK2 job rules to reusable
> section").
> 
> Reported-by: Daniel P. Berrangé 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Latent bug on CI, safe for 6.1.
> ---
>  .gitlab-ci.d/opensbi.yml | 28 +---
>  1 file changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/.gitlab-ci.d/opensbi.yml b/.gitlab-ci.d/opensbi.yml
> index f66cd1d9089..d8a0456679e 100644
> --- a/.gitlab-ci.d/opensbi.yml
> +++ b/.gitlab-ci.d/opensbi.yml
> @@ -1,10 +1,23 @@
> -docker-opensbi:
> - stage: containers
> - rules: # Only run this job when the Dockerfile is modified
> +# All jobs needing docker-opensbi must use the same rules it uses.
> +.opensbi_job_rules:
> + rules: # Only run this job when ...
>   - changes:
> +   # this file is modified
> - .gitlab-ci.d/opensbi.yml
> +   # or the Dockerfile is modified
> - .gitlab-ci.d/opensbi/Dockerfile
> when: always
> + - changes: # or roms/opensbi/ is modified (submodule updated)
> +   - roms/opensbi/*
> +   when: always
> + - if: '$CI_COMMIT_REF_NAME =~ /^opensbi/' # or the branch/tag starts with 
> 'opensbi'
> +   when: always
> + - if: '$CI_COMMIT_MESSAGE =~ /opensbi/i' # or last commit description 
> contains 'OpenSBI'
> +   when: always

In debugging why the acceptance jobs rnu despite their prerequisite
jobs failing, I've realized we've been making a mistake in most of
'rules' sections.

'when: always'  will make the job run regardless of status of any
dependant jobs. IOW, if you have a 'needs: []', it is almost
always wrong to use 'when: always'. Instead  we need 'when: on_success'

So this patch needs to make that change, and likewise the edk2 patch
with the same logic.

Alex has queued this one, but I don't see it in a PULL yet, so I
guess we can just do a v2 of this.

> +
> +docker-opensbi:
> + extends: .opensbi_job_rules
> + stage: containers
>   image: docker:19.03.1
>   services:
>   - docker:19.03.1-dind
> @@ -24,16 +37,9 @@ docker-opensbi:
>   - docker push $IMAGE_TAG
>  
>  build-opensbi:
> + extends: .opensbi_job_rules
>   stage: build
>   needs: ['docker-opensbi']
> - rules: # Only run this job when ...
> - - changes: # ... roms/opensbi/ is modified (submodule updated)
> -   - roms/opensbi/*
> -   when: always
> - - if: '$CI_COMMIT_REF_NAME =~ /^opensbi/' # or the branch/tag starts with 
> 'opensbi'
> -   when: always
> - - if: '$CI_COMMIT_MESSAGE =~ /opensbi/i' # or last commit description 
> contains 'OpenSBI'
> -   when: always
>   artifacts:
> paths: # 'artifacts.zip' will contains the following files:
> - pc-bios/opensbi-riscv32-generic-fw_dynamic.bin
> -- 
> 2.31.1
> 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 0/2] qapi/qom: use correct field name when getting/setting alias properties

2021-07-22 Thread Paolo Bonzini

On 22/07/21 15:25, Markus Armbruster wrote:

Since patch submitters tend to submit code that works for the success
case, I like to test a few failure cases before anything else.  Gotcha:

 $ qemu-system-x86_64 -machine pc,pflash0=xxx
 qemu-system-x86_64: Property 'cfi.pflash01.drive' can't find value 'xxx'

The error message is misleading.


Indeed I knew about this, and even thought briefly about how to fix it 
before realizing it is not a regression (which is also why I didn't 
think of including it in the commit message).


All the ways I could think about for a fix involved looking at the error 
class, and possibly even adding a dictionary of key-value pairs for some 
error classes.  I know you don't really like error classes and you 
probably would like the idea of key-value pairs even less---and to be 
honest I didn't really have a plan to implement any of that.


Paolo


This is not a "must not commit" issue.  Fixing a regression in time for
the release at the price of a bad error message is still a win.  The bad
error message needs fixing all the same, just not necessarily before the
release.

Since mere thinking doesn't rock the release boat: any ideas on how this
could be fixed?







Re: [PULL 37/48] usb: build usb-host as module

2021-07-22 Thread Peter Krempa
adding libvirt-list

On Thu, Jul 08, 2021 at 17:17:37 +0200, Paolo Bonzini wrote:
> From: Gerd Hoffmann 
> 
> Drop one more shared library dependency (libusb) from core qemu.
> 
> Signed-off-by: Gerd Hoffmann 
> Reviewed-by: Jose R. Ziviani 
> Message-Id: <20210624103836.2382472-34-kra...@redhat.com>
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/usb/host-libusb.c | 1 +
>  hw/usb/meson.build   | 8 ++--
>  2 files changed, 7 insertions(+), 2 deletions(-)

After this commit libvirt is no longer detecting the 'hostdevice'
property of 'usb-host'. In fact 'device-list-properties' is returning:

"desc": "Device 'usb-host' not found"

> diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
> index 2b7f87872c..c0f314462a 100644
> --- a/hw/usb/host-libusb.c
> +++ b/hw/usb/host-libusb.c
> @@ -1777,6 +1777,7 @@ static TypeInfo usb_host_dev_info = {
>  .class_init= usb_host_class_initfn,
>  .instance_init = usb_host_instance_init,
>  };
> +module_obj(TYPE_USB_HOST_DEVICE);
>  
>  static void usb_host_register_types(void)
>  {
> diff --git a/hw/usb/meson.build b/hw/usb/meson.build
> index 817f3e027a..3ca6127937 100644
> --- a/hw/usb/meson.build
> +++ b/hw/usb/meson.build
> @@ -72,8 +72,12 @@ if usbredir.found()
>  endif
>  
>  # usb pass-through
> -softmmu_ss.add(when: ['CONFIG_USB', libusb],
> -   if_true: files('host-libusb.c'))
> +if config_host.has_key('CONFIG_USB_LIBUSB')

The problem is in this condition as it's evaluated as false. When I
replace it with libusb.found(), everything works as it used to.

Unfortunately I don't know what the real goa behind using
CONFIG_USB_LIBUSB here was to see whether my approach is good.

> +  usbhost_ss = ss.source_set()
> +  usbhost_ss.add(when: ['CONFIG_USB', libusb],
> + if_true: files('host-libusb.c'))
> +  hw_usb_modules += {'host': usbhost_ss}
> +endif
>  
>  softmmu_ss.add(when: ['CONFIG_USB', 'CONFIG_XEN', libusb], if_true: 
> files('xen-usb.c'))
>  
> -- 
> 2.31.1
> 
> 
> 




Re: [PATCH] vl: Parse legacy default_machine_opts

2021-07-22 Thread Anthony PERARD via
On Mon, Jul 12, 2021 at 10:15:52PM -0400, Jason Andryuk wrote:
> qemu can't start a xen vm after commit d8fb7d0969d5
> "vl: switch -M parsing to keyval" with:
> 
> $ ./qemu-system-i386 -M xenfv
> Unexpected error in object_property_find_err() at ../qom/object.c:1298:
> qemu-system-i386: Property 'xenfv-3.1-machine.accel' not found
> Aborted (core dumped)
> 
> The default_machine_opts handling doesn't process the legacy machine
> options like "accel".  Call qemu_apply_legacy_machine_options to provide
> the legacy handling.
> 
> Signed-off-by: Jason Andryuk 

Reviewed-by: Anthony PERARD 

I can't find a different way to set a default "accelerator" to a
machine, so this patch seems necessary.

Thanks,

> ---
>  softmmu/vl.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 4df1496101..f4d8630fc6 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2126,6 +2126,7 @@ static void qemu_create_machine(QDict *qdict)
>  QDict *default_opts =
>  keyval_parse(machine_class->default_machine_opts, NULL, NULL,
>   _abort);
> +qemu_apply_legacy_machine_options(default_opts);
>  object_set_properties_from_keyval(OBJECT(current_machine), 
> default_opts,
>false, _abort);
>  qobject_unref(default_opts);

-- 
Anthony PERARD



Re: [PATCH for-6.2 v2 11/11] tests/unit: Add a unit test for smp parsing

2021-07-22 Thread wangyanan (Y)

On 2021/7/22 21:12, Andrew Jones wrote:

On Thu, Jul 22, 2021 at 02:15:18PM +0800, wangyanan (Y) wrote:

On 2021/7/20 2:57, Andrew Jones wrote:

On Mon, Jul 19, 2021 at 11:20:43AM +0800, Yanan Wang wrote:

Add a QEMU unit test for the parsing of given SMP configuration.
Since all the parsing logic is in generic function smp_parse(),
this test passes diffenent SMP configurations to the function
and compare the parsing result with what is expected.

In the test, all possible collections of the topology parameters
and the corressponding expected results are listed, including the
valid and invalid ones.

The preference of sockets over cores and the preference of cores
over sockets, and the support of multi-dies are also considered.

Signed-off-by: Yanan Wang 
---
   MAINTAINERS |1 +
   tests/unit/meson.build  |1 +
   tests/unit/test-smp-parse.c | 1117 +++
   3 files changed, 1119 insertions(+)
   create mode 100644 tests/unit/test-smp-parse.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 70633e3bf4..160dba2e57 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1636,6 +1636,7 @@ F: include/hw/boards.h
   F: include/hw/core/cpu.h
   F: include/hw/cpu/cluster.h
   F: include/sysemu/numa.h
+F: tests/unit/test-smp-parse.c
   T: git https://gitlab.com/ehabkost/qemu.git machine-next
   Xtensa Machines
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 3e0504dd21..694a924627 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -44,6 +44,7 @@ tests = {
 'test-uuid': [],
 'ptimer-test': ['ptimer-test-stubs.c', meson.source_root() / 
'hw/core/ptimer.c'],
 'test-qapi-util': [],
+  'test-smp-parse': [qom, meson.source_root() / 'hw/core/machine-smp.c'],
   }
   if have_system or have_tools
diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
new file mode 100644
index 00..bc1d324c3d
--- /dev/null
+++ b/tests/unit/test-smp-parse.c
@@ -0,0 +1,1117 @@
+/*
+ * SMP parsing unit-tests
+ *
+ * Copyright (C) 2021, Huawei, Inc.
+ *
+ * Authors:
+ *  Yanan Wang 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qom/object.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+
+#include "hw/boards.h"
+
+#define T true
+#define F false
+
+/**
+ * SMPTestData:
+ * @config - the given SMP configuration for parsing
+ * @should_be_valid - whether the given configuration is supposed to be valid
+ * @expect - the CPU topology info expected to be parsed out
+ */
+typedef struct SMPTestData {
+SMPConfiguration config;
+bool should_be_valid;

Long way to say 'valid'.

Indeed..., "valid" should be enough.

+CpuTopology expect;
+} SMPTestData;
+
+/* the specific machine type info for this test */
+static const TypeInfo smp_machine_info = {
+.name = TYPE_MACHINE,
+.parent = TYPE_OBJECT,
+.class_size = sizeof(MachineClass),
+.instance_size = sizeof(MachineState),
+};
+
+/*
+ * prefer sockets over cores over threads before 6.2.
+ * all possible SMP configurations and the corressponding expected outputs

corresponding (please run spell check on your commit messages)


Ok, I missed the check.

+ * are listed for testing, including the valid and invalid ones.
+ */
+static struct SMPTestData prefer_sockets[] = {
+{
+/* config: no smp configuration provided
+ * expect: cpus=1,sockets=1,dies=1,cores=1,threads=1,maxcpus=1 */
+.config = (SMPConfiguration) { F, 0, F, 0, F, 0, F, 0, F, 0, F, 0 },

SMPConfiguration and CpuTopology have named fields so we could drop the
'expect: ...' comment line and instead do

   {
/* no configuration provided */
.config = { .has_cpus = F, .has_maxcpus = F, ... },
.valid = T,
.expect = { .sockets = 1, .cores = 1, ... },
   }, {
...
   }

which may be easier to maintain. OTOH, the concise form this approach has
is also nice.

I tried the structure initialization with explicit name fields in it like
above,
actually we are supposed to do in this way so that we don't have to worry
about the order change of the structure members.

But this would break the 80-char line limit or introduce more lines for

I wouldn't worry about 80-char (or even 90, which is when checkpatch
switches from a warning to an error) for something like this, but that's
just my opinion. You'd have to get agreement from whomever would decide /
not decide to merge this.


Ok, I think I will try the format with explicit name fields.

Thanks,
Yanan



for each SMP configuration. If this is not a real problem, I also prefer
above format.

I don't think you should need the casts in the assignments
though.

Yes, the casts may be unnecessary, will remove them.

+.should_be_valid = true,
+.expect = (CpuTopology) { 1, 1, 1, 1, 1, 1 },
+}, {
+/* config: -smp 8
+ * expect: 

Re: [PATCH for-6.2 v2 04/11] machine: Use the computed parameters to calculate omitted cpus

2021-07-22 Thread wangyanan (Y)

On 2021/7/22 20:27, Andrew Jones wrote:

On Thu, Jul 22, 2021 at 12:42:55PM +0800, wangyanan (Y) wrote:

On 2021/7/20 0:42, Andrew Jones wrote:

On Mon, Jul 19, 2021 at 11:20:36AM +0800, Yanan Wang wrote:

Currently we directly calculate the omitted cpus based on the already
provided parameters. This makes some cmdlines like:
-smp maxcpus=16
-smp sockets=2,maxcpus=16
-smp sockets=2,dies=2,maxcpus=16
-smp sockets=2,cores=4,maxcpus=16
not work. We should probably use the computed paramters to calculate
cpus when maxcpus is provided while cpus is omitted, which will make
above configs start to work.

Note: change in this patch won't affect any existing working cmdlines
but allows more incomplete configs to be valid.

Signed-off-by: Yanan Wang 
---
   hw/core/machine.c | 17 +
   1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index c9f15b15a5..668f0a1553 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -767,26 +767,27 @@ static void smp_parse(MachineState *ms, SMPConfiguration 
*config, Error **errp)
   return;
   }
-/* compute missing values, prefer sockets over cores over threads */
   maxcpus = maxcpus > 0 ? maxcpus : cpus;
-if (cpus == 0) {
-sockets = sockets > 0 ? sockets : 1;
-cores = cores > 0 ? cores : 1;
-threads = threads > 0 ? threads : 1;
-cpus = sockets * dies * cores * threads;
-maxcpus = maxcpus > 0 ? maxcpus : cpus;
-} else if (sockets == 0) {
+/* compute missing values, prefer sockets over cores over threads */
+if (sockets == 0) {
   cores = cores > 0 ? cores : 1;
   threads = threads > 0 ? threads : 1;
   sockets = maxcpus / (dies * cores * threads);
+sockets = sockets > 0 ? sockets : 1;
   } else if (cores == 0) {
   threads = threads > 0 ? threads : 1;
   cores = maxcpus / (sockets * dies * threads);
+cores = cores > 0 ? cores : 1;
   } else if (threads == 0) {
   threads = maxcpus / (sockets * dies * cores);
+threads = threads > 0 ? threads : 1;
   }

I didn't think we wanted this rounding which this patch adds back into
cores and threads and now also sockets.

Firstly, I think we can agree that with or without the rounding, the invalid
configs will still be reported as invalid. So the only difference is in the
err
message (either report 0 or 1 of a fractional parameter). :)

An error message that says the inputs produced a fractional parameter
would be even better. If the code gets too hairy because some parameters
may be zero and without additional checks we'd risk divide by zeros,
then maybe we should output ("fractional %s", param_name) and exit at the
same places we're currently rounding up.

Ok. If we remove the rounding, then the calculation code has to be modified
to be like the following. We have to separately consider the case that cpus
and maxcpus are both omitted (e.g. -smp sockets=16).

maxcpus = maxcpus > 0 ? maxcpus : cpus;

if (cpus == 0 && maxcpus == 0) {
    sockets = sockets > 0 ? sockets : 1;
    cores = cores > 0 ? cores : 1;
    threads = threads > 0 ? threads : 1;
    goto cal;
}

if (sockets == 0) {
...
} else if (cores == 0) {
...
} else if (threads == 0) {
...
}

cal:
maxcpus = maxcpus > 0 ? maxcpus : sockets * cores * threads;
cpus = cpus > 0 ? cpus : maxcpus;

I added back the rounding because this patch indeed need it, we start
to use the computed parameters to calculate cpus, so we have to ensure
that the computed parameters are at least 1. If both cpus and maxcpus
are omitted (e.g. -smp sockets=16), without the rounding we will get
zeroed cpus and maxcpus, and with the rounding we will get valid result
like "cpus=16,sockets=16,cores=1,threads=1,maxcpus=16".

If a "0" or "1" in the error message doesn't make so much difference as
assumed for the error reporting, I suggest that we probably can keep the
rounding which makes the parser code concise.

+/* use the computed parameters to calculate the omitted cpus */
+cpus = cpus > 0 ? cpus : sockets * dies * cores * threads;
+maxcpus = maxcpus > 0 ? maxcpus : cpus;

It doesn't really matter, but I think I'd rather write this like

   maxcpus = maxcpus > 0 ? maxcpus : sockets * dies * cores * threads;
   cpus = cpus > 0 ? cpus : maxcpus;

Yes, there is no functional difference. But I think maybe we'd better keep
some consistence with the "maxcpus = maxcpus > 0 ? maxcpus : cpus"
at top this function and also with the smp doc in qemu-option.hx i.e.
"If omitted the maximum number of CPUs will be set to match the initial
CPU count" ?

I won't argue it too much, but I think we should be thinking about maxcpus
more than cpus if we're thinking about cpu topologies. I'd rather have
users keep in mind that whatever their topology generates is the max and
if they don't want to expose that many cpus to the guest then they should
provide an additional, smaller 

[PULL 09/15] target/i386: Added consistency checks for EFER

2021-07-22 Thread Paolo Bonzini
From: Lara Lazier 

EFER.SVME has to be set, and EFER reserved bits must
be zero.
In addition the combinations
 * EFER.LMA or EFER.LME is non-zero and the processor does not support LM
 * non-zero EFER.LME and CR0.PG and zero CR4.PAE
 * non-zero EFER.LME and CR0.PG and zero CR0.PE
 * non-zero EFER.LME, CR0.PG, CR4.PAE, CS.L and CS.D
are all invalid.
(AMD64 Architecture Programmer's Manual, V2, 15.5)

Signed-off-by: Lara Lazier 
Message-Id: <20210721152651.14683-3-laramglaz...@gmail.com>
Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.h   |  5 
 target/i386/tcg/sysemu/svm_helper.c | 39 +
 2 files changed, 44 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 1f7e8d7f0a..6c50d3ab4f 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -475,6 +475,11 @@ typedef enum X86Seg {
 #define MSR_EFER_SVME  (1 << 12)
 #define MSR_EFER_FFXSR (1 << 14)
 
+#define MSR_EFER_RESERVED\
+(~(target_ulong)(MSR_EFER_SCE | MSR_EFER_LME\
+| MSR_EFER_LMA | MSR_EFER_NXE | MSR_EFER_SVME\
+| MSR_EFER_FFXSR))
+
 #define MSR_STAR0xc081
 #define MSR_LSTAR   0xc082
 #define MSR_CSTAR   0xc083
diff --git a/target/i386/tcg/sysemu/svm_helper.c 
b/target/i386/tcg/sysemu/svm_helper.c
index d7d7a86aa9..4d64ec378e 100644
--- a/target/i386/tcg/sysemu/svm_helper.c
+++ b/target/i386/tcg/sysemu/svm_helper.c
@@ -75,6 +75,41 @@ static inline bool ctl_has_irq(uint32_t int_ctl)
 return (int_ctl & V_IRQ_MASK) && (int_prio >= tpr);
 }
 
+static inline bool is_efer_invalid_state (CPUX86State *env)
+{
+if (!(env->efer & MSR_EFER_SVME)) {
+return true;
+}
+
+if (env->efer & MSR_EFER_RESERVED) {
+return true;
+}
+
+if ((env->efer & (MSR_EFER_LMA | MSR_EFER_LME)) &&
+!(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) {
+return true;
+}
+
+if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK)
+&& !(env->cr[4] & CR4_PAE_MASK)) {
+return true;
+}
+
+if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK)
+&& !(env->cr[0] & CR0_PE_MASK)) {
+return true;
+}
+
+if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK)
+&& (env->cr[4] & CR4_PAE_MASK)
+&& (env->segs[R_CS].flags & DESC_L_MASK)
+&& (env->segs[R_CS].flags & DESC_B_MASK)) {
+return true;
+}
+
+return false;
+}
+
 void helper_vmrun(CPUX86State *env, int aflag, int next_eip_addend)
 {
 CPUState *cs = env_cpu(env);
@@ -291,6 +326,10 @@ void helper_vmrun(CPUX86State *env, int aflag, int 
next_eip_addend)
 }
 #endif
 
+if (is_efer_invalid_state(env)) {
+cpu_vmexit(env, SVM_EXIT_ERR, 0, GETPC());
+}
+
 switch (x86_ldub_phys(cs,
   env->vm_vmcb + offsetof(struct vmcb, control.tlb_ctl))) {
 case TLB_CONTROL_DO_NOTHING:
-- 
2.31.1





[PULL 05/15] usb: fix usb-host dependency check

2021-07-22 Thread Paolo Bonzini
From: Gerd Hoffmann 

Fixes: 90540f3289 ("configure, meson: convert libusb detection to meson", 
2021-06-25)
Reported-by: Programmingkid 
Signed-off-by: Gerd Hoffmann 
Message-Id: <20210721081718.301343-1-kra...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Paolo Bonzini 
---
 hw/usb/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/meson.build b/hw/usb/meson.build
index 3ca6127937..de853d780d 100644
--- a/hw/usb/meson.build
+++ b/hw/usb/meson.build
@@ -72,7 +72,7 @@ if usbredir.found()
 endif
 
 # usb pass-through
-if config_host.has_key('CONFIG_USB_LIBUSB')
+if libusb.found()
   usbhost_ss = ss.source_set()
   usbhost_ss.add(when: ['CONFIG_USB', libusb],
  if_true: files('host-libusb.c'))
-- 
2.31.1





[PULL 07/15] target/i386: Added V_INTR_PRIO check to virtual interrupts

2021-07-22 Thread Paolo Bonzini
From: Lara Lazier 

The APM2 states that The processor takes a virtual INTR interrupt
if V_IRQ and V_INTR_PRIO indicate that there is a virtual interrupt pending
whose priority is greater than the value in V_TPR.

Signed-off-by: Lara Lazier 
Message-Id: <20210721152651.14683-1-laramglaz...@gmail.com>
Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/sysemu/svm_helper.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/target/i386/tcg/sysemu/svm_helper.c 
b/target/i386/tcg/sysemu/svm_helper.c
index 00618cff23..72b03a345d 100644
--- a/target/i386/tcg/sysemu/svm_helper.c
+++ b/target/i386/tcg/sysemu/svm_helper.c
@@ -65,6 +65,16 @@ static inline void svm_load_seg_cache(CPUX86State *env, 
hwaddr addr,
sc->base, sc->limit, sc->flags);
 }
 
+static inline bool ctl_has_irq(uint32_t int_ctl)
+{
+uint32_t int_prio;
+uint32_t tpr;
+
+int_prio = (int_ctl & V_INTR_PRIO_MASK) >> V_INTR_MASKING_SHIFT;
+tpr = int_ctl & V_TPR_MASK;
+return (int_ctl & V_IRQ_MASK) && (int_prio >= tpr);
+}
+
 void helper_vmrun(CPUX86State *env, int aflag, int next_eip_addend)
 {
 CPUState *cs = env_cpu(env);
@@ -290,7 +300,7 @@ void helper_vmrun(CPUX86State *env, int aflag, int 
next_eip_addend)
 
 env->hflags2 |= HF2_GIF_MASK;
 
-if (int_ctl & V_IRQ_MASK) {
+if (ctl_has_irq(int_ctl)) {
 CPUState *cs = env_cpu(env);
 
 cs->interrupt_request |= CPU_INTERRUPT_VIRQ;
-- 
2.31.1





[PULL 14/15] configure: Fix the default setting of the "xen" feature

2021-07-22 Thread Paolo Bonzini
From: Thomas Huth 

The "xen" variable should either contain "enabled", "disabled" or
nothing (for auto detection). But when the user currently runs the
configure script with --without-default-features, it gets set to
"no" instead. This does not work as expected, the feature will still
be enabled if the Xen headers are present. Thus set the variable
to "disabled" instead if default_feature switch has been set.

Reported-by: Cole Robinson 
Signed-off-by: Thomas Huth 
Message-Id: <20210713093155.677589-4-th...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 468aac58e2..40fa8cc26e 100755
--- a/configure
+++ b/configure
@@ -311,7 +311,7 @@ vnc_sasl="auto"
 vnc_jpeg="auto"
 vnc_png="auto"
 xkbcommon="auto"
-xen="$default_feature"
+xen=${default_feature:+disabled}
 xen_ctrl_version="$default_feature"
 xen_pci_passthrough="auto"
 linux_aio="$default_feature"
-- 
2.31.1





[PULL 12/15] configure: Fix --without-default-features propagation to meson

2021-07-22 Thread Paolo Bonzini
From: Thomas Huth 

A typo prevents that many features get disabled when the user
runs "configure" with the --without-default-features switch.

Reported-by: Cole Robinson 
Signed-off-by: Thomas Huth 
Message-Id: <20210713093155.677589-2-th...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 026704f15a..19c4bc1192 100755
--- a/configure
+++ b/configure
@@ -5206,7 +5206,7 @@ if test "$skip_meson" = no; then
 -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
 -Dvhost_user_blk_server=$vhost_user_blk_server 
-Dmultiprocess=$multiprocess \
 -Dfuse=$fuse -Dfuse_lseek=$fuse_lseek 
-Dguest_agent_msi=$guest_agent_msi -Dbpf=$bpf\
-$(if test "$default_features" = no; then echo 
"-Dauto_features=disabled"; fi) \
+$(if test "$default_feature" = no; then echo 
"-Dauto_features=disabled"; fi) \
-Dtcg_interpreter=$tcg_interpreter \
 $cross_arg \
 "$PWD" "$source_path"
-- 
2.31.1





Re: [PATCH for-6.1 v2] machine: Disallow specifying topology parameters as zero

2021-07-22 Thread Daniel P . Berrangé
On Thu, Jul 22, 2021 at 11:43:26PM +0800, Yanan Wang wrote:
> In the SMP configuration, we should either specify a topology
> parameter with a reasonable value (equal to or greater than 1)
> or just leave it omitted and QEMU will calculate its value.
> Configurations which explicitly specify the topology parameters
> as zero like "sockets=0" are meaningless, so disallow them.
> 
> However, the commit 1e63fe685804d
> (machine: pass QAPI struct to mc->smp_parse) has documented that
> '0' has the same semantics as omitting a parameter in the qapi
> comment for SMPConfiguration. So this patch fixes the doc and
> also adds the corresponding sanity check in the smp parsers.
> 
> Suggested-by: Andrew Jones 
> Signed-off-by: Yanan Wang 
> ---
>  hw/core/machine.c | 14 ++
>  qapi/machine.json |  6 +++---
>  qemu-options.hx   | 12 +++-
>  3 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 775add0795..db129d937b 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -829,6 +829,20 @@ static void machine_set_smp(Object *obj, Visitor *v, 
> const char *name,
>  return;
>  }
>  
> +/*
> + * The topology parameters must be specified equal to or great than one
> + * or just omitted, explicit configuration like "cpus=0" is not allowed.
> + */
> +if ((config->has_cpus && config->cpus == 0) ||
> +(config->has_sockets && config->sockets == 0) ||
> +(config->has_dies && config->dies == 0) ||
> +(config->has_cores && config->cores == 0) ||
> +(config->has_threads && config->threads == 0) ||
> +(config->has_maxcpus && config->maxcpus == 0)) {
> +error_setg(errp, "parameters must be equal to or greater than one if 
> provided");

I'd suggest a slight tweak since when seen it lacks context:

$ ./qemu-system-x86_64 -smp 4,cores=0,sockets=2
qemu-system-x86_64: parameters must be equal to or greater than one if provided


error_setg(errp, "CPU topology parameters must be equal to or greater than 
one if provided");



> diff --git a/qemu-options.hx b/qemu-options.hx
> index 99ed5ec5f1..b0168f8c48 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -223,11 +223,13 @@ SRST
>  of computing the CPU maximum count.
>  
>  Either the initial CPU count, or at least one of the topology parameters
> -must be specified. Values for any omitted parameters will be computed
> -from those which are given. Historically preference was given to the
> -coarsest topology parameters when computing missing values (ie sockets
> -preferred over cores, which were preferred over threads), however, this
> -behaviour is considered liable to change.
> +must be specified. The specified parameters must be equal to or great

s/great/greater/

> +than one, explicit configuration like "cpus=0" is not allowed. Values
> +for any omitted parameters will be computed from those which are given.
> +Historically preference was given to the coarsest topology parameters
> +when computing missing values (ie sockets preferred over cores, which
> +were preferred over threads), however, this behaviour is considered
> +liable to change.
>  ERST


If you make the text changes, then feel free to add this when posting v2:

 Reviewed-by: Daniel P. Berrangé 
 Tested-by: Daniel P. Berrangé 
 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH for-6.2 v2 10/11] machine: Split out the smp parsing code

2021-07-22 Thread Andrew Jones
On Thu, Jul 22, 2021 at 02:24:03PM +0800, wangyanan (Y) wrote:
> On 2021/7/20 1:20, Andrew Jones wrote:
> > On Mon, Jul 19, 2021 at 11:20:42AM +0800, Yanan Wang wrote:
> > > We are going to introduce an unit test for the parser smp_parse()
> > > in hw/core/machine.c, but now machine.c is only built in softmmu.
> > > 
> > > In order to solve the build dependency on the smp parsing code and
> > > avoid building unrelated stuff for the unit tests, move the related
> > > code from machine.c into a new common file, i.e., machine-smp.c.
> > > 
> > > Signed-off-by: Yanan Wang 
> > > ---
> > >   MAINTAINERS   |   1 +
> > >   hw/core/machine-smp.c | 124 ++
> > >   hw/core/machine.c | 109 -
> > >   hw/core/meson.build   |   1 +
> > >   include/hw/boards.h   |   1 +
> > >   5 files changed, 127 insertions(+), 109 deletions(-)
> > >   create mode 100644 hw/core/machine-smp.c
> > > 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index 9100f9a043..70633e3bf4 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -1626,6 +1626,7 @@ F: cpu.c
> > >   F: hw/core/cpu.c
> > >   F: hw/core/machine-qmp-cmds.c
> > >   F: hw/core/machine.c
> > > +F: hw/core/machine-smp.c

I just noticed that the spacing in this change might not be right.

> > >   F: hw/core/null-machine.c
> > >   F: hw/core/numa.c
> > >   F: hw/cpu/cluster.c
> > > diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
> > > new file mode 100644
> > > index 00..6a00cfe44a
> > > --- /dev/null
> > > +++ b/hw/core/machine-smp.c
> > > @@ -0,0 +1,124 @@
> > > +/*
> > > + * QEMU Machine (related to SMP configuration)
> > > + *
> > > + * Copyright (C) 2014 Red Hat Inc
> > > + *
> > > + * Authors:
> > > + *   Marcel Apfelbaum 
> > This header was obviously copy+pasted without being updated.
> Yes, the header was kept unchanged.
> 
> But actually I'm not completely sure which field should be updated. :)
> Should I add "Copyright (C) 2021, Huawei, Inc." and also the authorship
> "Yanan Wang " behind the existing ones
> or just replace them?

I see what you were attempting to do now. You were deriving this new work
(a source file) from an existing work and you wanted to preserve the
original copyright and authorship. It's not so simple with these types of
projects though. In this case, smp_parse wasn't even part of the original
machine.c file (it came over with commit 6f479566a87d). I think it's
pretty common for these projects to just put whatever your preferred
(or your employer's preferred) copyright/authorship on new files. So, I'd
just replace the fields.

I'm interested in what others have to say about this though.

Thanks,
drew


> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or 
> > > later.
> > > + * See the COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "hw/boards.h"
> > > +#include "qapi/error.h"
> > > +
> > > +/*
> > > + * smp_parse - Generic function used to parse the given SMP configuration
> > > + *
> > > + * The topology parameters must be specified equal to or great than one
> > > + * or just omitted, explicit configuration like "cpus=0" is not allowed.
> > > + * The omitted parameters will be calculated based on the provided ones.
> > > + *
> > > + * maxcpus will default to the value of cpus if omitted and will be used
> > > + * to compute the missing sockets/cores/threads. cpus will be calculated
> > > + * from the computed parametrs if omitted.
> > > + *
> > > + * In calculation of omitted arch-netural sockets/cores/threads, we 
> > > prefer
> > > + * sockets over cores over threads before 6.2, while prefer cores over
> > > + * sockets over threads since 6.2 on. The arch-specific dies will 
> > > directly
> > > + * default to 1 if omitted.
> > > + */
> > > +void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
> > > +{
> > > +MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > +unsigned cpus= config->has_cpus ? config->cpus : 0;
> > > +unsigned sockets = config->has_sockets ? config->sockets : 0;
> > > +unsigned dies= config->has_dies ? config->dies : 1;
> > > +unsigned cores   = config->has_cores ? config->cores : 0;
> > > +unsigned threads = config->has_threads ? config->threads : 0;
> > > +unsigned maxcpus = config->has_maxcpus ? config->maxcpus : 0;
> > > +
> > > +if ((config->has_cpus && config->cpus == 0) ||
> > > +(config->has_sockets && config->sockets == 0) ||
> > > +(config->has_dies && config->dies == 0) ||
> > > +(config->has_cores && config->cores == 0) ||
> > > +(config->has_threads && config->threads == 0) ||
> > > +(config->has_maxcpus && config->maxcpus == 0)) {
> > > +error_setg(errp, "parameters must be equal to or greater than 
> > > one"
> > > +   "if provided");
> > > +return;
> > > +}
> > > +
> > 

Re: [question] Shall we flush ITS tables into guest RAM when shutdown the VM?

2021-07-22 Thread Peter Maydell
On Thu, 22 Jul 2021 at 14:19, Juan Quintela  wrote:
>
> Eric Auger  wrote:
> > Hi Dave,
> >
> > On 7/6/21 4:19 PM, Dr. David Alan Gilbert wrote:
> > That's consistent with what you suggested in the past ans what is logged
> > in the commit message of
> >
> > cddafd8f353d2d251b1a5c6c948a577a85838582 ("hw/intc/arm_gicv3_its:
> > Implement state save/restore").
>
> Hi
>
> Ouch, it is really a mess.  Why do we need to save it to RAM instead of
> saving it to anywhere else?

The ITS tables are in guest RAM because that is how the real
hardware works, and so it is also how the emulated version
has to behave...

-- PMM



Re: [PATCH v2 1/6] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()

2021-07-22 Thread David Hildenbrand

On 22.07.21 15:31, Daniel P. Berrangé wrote:

On Thu, Jul 22, 2021 at 02:36:30PM +0200, David Hildenbrand wrote:

Let's sense support and use it for preallocation. MADV_POPULATE_WRITE
does not require a SIGBUS handler, doesn't actually touch page content,
and avoids context switches; it is, therefore, faster and easier to handle
than our current approach.

While MADV_POPULATE_WRITE is, in general, faster than manual
prefaulting, and especially faster with 4k pages, there is still value in
prefaulting using multiple threads to speed up preallocation.

More details on MADV_POPULATE_WRITE can be found in the Linux commit
4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault
page tables") and in the man page proposal [1].

[1] https://lkml.kernel.org/r/20210712083917.16361-1-da...@redhat.com

This resolves the TODO in do_touch_pages().

In the future, we might want to look into using fallocate(), eventually
combined with MADV_POPULATE_READ, when dealing with shared file
mappings.

Reviewed-by: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
---
  include/qemu/osdep.h |  7 
  util/oslib-posix.c   | 88 +---
  2 files changed, 74 insertions(+), 21 deletions(-)




@@ -497,6 +493,31 @@ static void *do_touch_pages(void *arg)
  return NULL;
  }
  
+static void *do_madv_populate_write_pages(void *arg)

+{
+MemsetThread *memset_args = (MemsetThread *)arg;
+const size_t size = memset_args->numpages * memset_args->hpagesize;
+char * const addr = memset_args->addr;
+int ret;
+
+if (!size) {
+return NULL;
+}
+
+/* See do_touch_pages(). */
+qemu_mutex_lock(_mutex);
+while (!threads_created_flag) {
+qemu_cond_wait(_cond, _mutex);
+}
+qemu_mutex_unlock(_mutex);
+
+ret = qemu_madvise(addr, size, QEMU_MADV_POPULATE_WRITE);
+if (ret) {
+memset_thread_failed = true;


I'm wondering if this use of memset_thread_failed is sufficient.

This is pre-existing from the current impl, and ends up being
used to set the bool result of 'touch_all_pages'. The caller
of that then does

 if (touch_all_pages(area, hpagesize, numpages, smp_cpus)) {
 error_setg(errp, "os_mem_prealloc: Insufficient free host memory "
 "pages available to allocate guest RAM");
 }

this was reasonable with the old impl, because the only reason
we ever see 'memset_thread_failed==true' is if we got SIGBUS
due to ENOMEM.

My concern is that madvise() has a bunch of possible errno
codes returned on failure, and we're not distinguishing
them. In the past this kind of thing has burnt us making
failures hard to debug.

Could we turn 'bool memset_thread_failed' into 'int memset_thread_errno'

Then, we can make 'touch_all_pages' have an 'Error **errp'
parameter, and it can directly call

  error_setg_errno(errp, memset_thead_errno, some message...)

when memset_thread_errno is non-zero, and thus we can remove
the generic message from the caller of touch_all_pages.

If you agree, it'd be best to refactor the existing code to
use this pattern in an initial patch.


We could also simply trace the return value, which should be 
comparatively easy to add. We should be getting either -ENOMEM or 
-EHWPOISON. And the latter is highly unlikely to happen when actually 
preallocating.


We made sure that we don't end up with -EINVAL as we're sensing of 
MADV_POPULATE_WRITE works on the mapping.


So when it comes to debugging, I'd actually prefer tracing -errno, as 
the real error will be of little help to end users.


Makes sense?

--
Thanks,

David / dhildenb




Re: Disabling TLS address caching to help QEMU on GNU/Linux

2021-07-22 Thread Richard Biener
On Tue, Jul 20, 2021 at 4:54 PM Florian Weimer via Gcc  wrote:
>
> Currently, the GNU/Linux ABI does not really specify whether the thread
> pointer (the address of the TCB) may change at a function boundary.
>
> Traditionally, GCC assumes that the ABI allows caching addresses of
> thread-local variables across function calls.  Such caching varies in
> aggressiveness between targets, probably due to differences in the
> choice of -mtls-dialect=gnu and -mtls-dialect=gnu2 as the default for
> the targets.  (Caching with -mtls-dialect=gnu2 appears to be more
> aggressive.)
>
> In addition to that, glibc defines errno as this:
>
> extern int *__errno_location (void) __attribute__ ((__const__));
> #define errno (*__errno_location ())
>
> And the const attribute has the side effect of caching the address of
> errno within the same stack frame.
>
> With stackful coroutines, such address caching is only valid if
> coroutines are only ever resumed on the same thread on which they were
> suspended.  (The C++ coroutine implementation is not stackful and is not
> affected by this at the ABI level.)  Historically, I think we took the
> position that cross-thread resumption is undefined.  But the ABIs aren't
> crystal-clear on this matter.
>
> One important piece of software for GNU is QEMU (not just for GNU/Linux,
> Hurd development also benefits from virtualization).  QEMU uses stackful
> coroutines extensively.  There are some hard-to-change code areas where
> resumption happens across threads unfortunately.  These increasingly
> cause problems with more inlining, inter-procedural analysis, and a
> general push towards LTO (which is also needed for some security
> hardening features).
>
> Should the GNU toolchain offer something to help out the QEMU
> developers?  Maybe GCC could offer an option to disable the caching for
> all TLS models.  glibc could detect that mode based on a new
> preprocessor macro and adjust its __errno_location declaration and
> similar function declarations.  There will be a performance impact of
> this, of course, but it would make the QEMU usage well-defined (at the
> lowest levels).

But how does TLS usage transfer between threads?  On the gimple
level the TLS pointer is not visible and thus we'd happily CSE its address:

__thread int x[2];

void bar (int *);

int *foo(int i)
{
  int *p = [i];
  bar (p);
  return [i];
}

results in

int * foo (int i)
{
  int * p;
  sizetype _5;
  sizetype _6;

   [local count: 1073741824]:
  _5 = (sizetype) i_1(D);
  _6 = _5 * 4;
  p_2 =  + _6;
  bar (p_2);
  return p_2;
}

to make this work as expected one would need to expose the TLS pointer
access.

> If this is a programming model that should be supported, then restoring
> some of the optimizations would be possible, by annotating
> context-switching functions and TLS-address-dependent functions.  But I
> think QEMU would immediately benefit from just the simple approach that
> disables address caching of TLS variables.
>
> Thanks,
> Florian
>



Re: [PATCH for-6.1 0/1] machine: Disallow specifying topology parameters as zero

2021-07-22 Thread wangyanan (Y)

On 2021/7/22 21:55, Paolo Bonzini wrote:

On 22/07/21 15:37, Andrew Jones wrote:

This doesn't mention zero inputs and even implies non-zero inputs.

I'm not sure if we need to worry about the odd command line that used 
zero

for some parameters. What do you think?


I think I agree as well, however the patch that Yanan sent has 
unnecessary duplication between smp_parse and pc_smp_parse. 
machine_set_smp is a better place to implement this kind of check.



The smp_parse and pc_smp_parse are going to be converted into a
generic parser, and the added sanity-check in this patch will also be
tested in an unit test. So is it probably better to keep the check in the
parser instead of the caller? The duplication will be eliminated anyway
when there is one single parser.

But I can also implement the check in machine_set_smp as you mentioned
if it's more reasonable and preferred. :)

Thanks,
Yanan
.




Re: [PATCH for-6.1 0/1] machine: Disallow specifying topology parameters as zero

2021-07-22 Thread Paolo Bonzini

On 22/07/21 16:12, wangyanan (Y) wrote:

The smp_parse and pc_smp_parse are going to be converted into a
generic parser, and the added sanity-check in this patch will also be
tested in an unit test. So is it probably better to keep the check in the
parser instead of the caller? The duplication will be eliminated anyway
when there is one single parser.

But I can also implement the check in machine_set_smp as you mentioned
if it's more reasonable and preferred. :)


Yes, I would prefer to avoid having duplicate code.  There are some 
common checks already in machine_set_smp, e.g. comparing ms->smp.cpus 
against mc->min_cpus and mc->max_cpus.


Paolo




Re: [PULL 36/40] vl: switch -M parsing to keyval

2021-07-22 Thread Paolo Bonzini

On 22/07/21 10:19, Peter Krempa wrote:

This patch breaks detection of certain machine options features in
libvirt which were being detected from 'query-command-line-options'.

I presume the change simply removed this from the output of
query-command-line-options due to the historical cruft how the
aforementioned command works.

Unfortunately I didn't find any suitable replacement from what we are
querying.


Yep, there is already a patch queued for this.

Paolo




Re: [PATCH v2 2/5] migration: Make from_dst_file accesses thread-safe

2021-07-22 Thread Peter Xu
On Wed, Jul 21, 2021 at 04:15:27PM -0500, Eric Blake wrote:
> On Wed, Jul 21, 2021 at 03:34:06PM -0400, Peter Xu wrote:
> > Accessing from_dst_file is potentially racy in current code base like below:
> > 
> >   if (s->from_dst_file)
> > do_something(s->from_dst_file);
> > 
> > Because from_dst_file can be reset right after the check in another
> > thread (rp_thread).  One example is migrate_fd_cancel().
> > 
> > Use the same qemu_file_lock to protect it too, just like to_dst_file.
> > 
> > When it's safe to access without lock, comment it.
> > 
> > There's one special reference in migration_thread() that can be replaced by
> > the newly introduced rp_thread_created flag.
> > 
> > Reported-by: Dr. David Alan Gilbert 
> > Signed-off-by: Peter Xu 
> > ---
> >  migration/migration.c | 32 +---
> >  migration/migration.h |  8 +---
> >  migration/ram.c   |  1 +
> >  3 files changed, 31 insertions(+), 10 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 21b94f75a3..fa70400f98 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1879,10 +1879,12 @@ static void migrate_fd_cancel(MigrationState *s)
> >  QEMUFile *f = migrate_get_current()->to_dst_file;
> >  trace_migrate_fd_cancel();
> >  
> > +qemu_mutex_lock(>qemu_file_lock);
> >  if (s->rp_state.from_dst_file) {
> >  /* shutdown the rp socket, so causing the rp thread to shutdown */
> >  qemu_file_shutdown(s->rp_state.from_dst_file);
> >  }
> > +qemu_mutex_unlock(>qemu_file_lock);
> 
> Worth using WITH_QEMU_LOCK_GUARD?

Sure.

> 
> > @@ -2827,11 +2845,13 @@ out:
> >   * Maybe there is something we can do: it looks like a
> >   * network down issue, and we pause for a recovery.
> >   */
> > -qemu_fclose(rp);
> > -ms->rp_state.from_dst_file = NULL;
> > +migration_release_from_dst_file(ms);
> >  rp = NULL;
> >  if (postcopy_pause_return_path_thread(ms)) {
> > -/* Reload rp, reset the rest */
> > +/*
> > + * Reload rp, reset the rest.  Referencing it is save since
> 
> s/save/safe/

Will fix.

I'll wait for some more comments before I repost.

Thanks,

-- 
Peter Xu




Re: [PATCH v3] migration: clear the memory region dirty bitmap when skipping free pages

2021-07-22 Thread Peter Xu
On Thu, Jul 22, 2021 at 09:57:13AM +, Wang, Wei W wrote:
> On Thursday, July 22, 2021 5:48 PM, David Hildenbrand wrote:
> > On 22.07.21 10:30, Wei Wang wrote:
> > > When skipping free pages to send, their corresponding dirty bits in
> > > the memory region dirty bitmap need to be cleared. Otherwise the
> > > skipped pages will be sent in the next round after the migration
> > > thread syncs dirty bits from the memory region dirty bitmap.
> > >
> > > Cc: David Hildenbrand 
> > > Cc: Peter Xu 
> > > Cc: Michael S. Tsirkin 
> > > Reported-by: David Hildenbrand 
> > > Signed-off-by: Wei Wang 
> > > ---
> > >   migration/ram.c | 74
> > +
> > >   1 file changed, 56 insertions(+), 18 deletions(-)
> > >
> > 
> > LGTM, thanks
> > 
> > Reviewed-by: David Hildenbrand 
> > 
> 
> Thanks. Please remember to have a regression test together with Peterx's that 
> patch when you get a chance.

I can continue to try that; but it's slightly low priority to me so it could be
a bit delayed.  If either of you could help that would be great, as I still
don't know last time why I didn't use free-page-hint right (note: I definitely
checked lsmod for sure; so it's there).  So I'll need to figure that out first.

Thanks,

-- 
Peter Xu




Re: [PATCH for-6.2 v2 04/11] machine: Use the computed parameters to calculate omitted cpus

2021-07-22 Thread Andrew Jones
On Thu, Jul 22, 2021 at 10:59:11PM +0800, wangyanan (Y) wrote:
> Ok. If we remove the rounding, then the calculation code has to be modified
> to be like the following. We have to separately consider the case that cpus
> and maxcpus are both omitted (e.g. -smp sockets=16).
> 
> maxcpus = maxcpus > 0 ? maxcpus : cpus;
> 
> if (cpus == 0 && maxcpus == 0) {
>     sockets = sockets > 0 ? sockets : 1;
>     cores = cores > 0 ? cores : 1;
>     threads = threads > 0 ? threads : 1;
>     goto cal;
> }
> 
> if (sockets == 0) {
> ...
> } else if (cores == 0) {
> ...
> } else if (threads == 0) {
> ...
> }
> 
> cal:
> maxcpus = maxcpus > 0 ? maxcpus : sockets * cores * threads;
> cpus = cpus > 0 ? cpus : maxcpus;

Whatever works, but hopefully you can avoid an ugly goto.

Thanks,
drew




[PULL 15/15] configure: Let --without-default-features disable vhost-kernel and vhost-vdpa

2021-07-22 Thread Paolo Bonzini
From: Thomas Huth 

The vhost_kernel and vhost_vdpa variables should be pre-initialized with
the $default_feature setting so that these features get disabled when
the user runs the configure scripts with --without-default-features.

Reported-by: Cole Robinson 
Signed-off-by: Thomas Huth 
Message-Id: <20210713093155.677589-5-th...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 configure | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/configure b/configure
index 40fa8cc26e..2a6d23a844 100755
--- a/configure
+++ b/configure
@@ -321,6 +321,7 @@ attr="auto"
 xfs="$default_feature"
 tcg="enabled"
 membarrier="$default_feature"
+vhost_kernel="$default_feature"
 vhost_net="$default_feature"
 vhost_crypto="$default_feature"
 vhost_scsi="$default_feature"
@@ -328,6 +329,7 @@ vhost_vsock="$default_feature"
 vhost_user="no"
 vhost_user_blk_server="auto"
 vhost_user_fs="$default_feature"
+vhost_vdpa="$default_feature"
 bpf="auto"
 kvm="auto"
 hax="auto"
-- 
2.31.1




Re: [PATCH] hw/display: fix virgl reset regression

2021-07-22 Thread Marc-André Lureau
Hi

On Thu, Jul 22, 2021 at 4:51 PM Marc-André Lureau <
marcandre.lur...@gmail.com> wrote:

> Hi
>
> On Thu, Jul 22, 2021 at 4:12 PM Gerd Hoffmann  wrote:
>
>> On Fri, Jul 02, 2021 at 04:32:21PM +0400, marcandre.lur...@redhat.com
>> wrote:
>> > From: Marc-André Lureau 
>> >
>> > Before commit 49afbca3b00e8e517d54964229a794b51768deaf ("virtio-gpu:
>> drop
>> > use_virgl_renderer"), use_virgl_renderer was preventing calling GL
>> > functions from non-GL context threads. The innocuously looking
>> >
>> >   g->parent_obj.use_virgl_renderer = false;
>> >
>> > was set the first time virtio_gpu_gl_reset() was called, during
>> > pc_machine_reset() in the main thread. Further virtio_gpu_gl_reset()
>> > calls in IO threads, without associated GL context, were thus skipping
>> > GL calls and avoided warnings or crashes (see also
>> > https://gitlab.freedesktop.org/virgl/virglrenderer/-/issues/226).
>>
>> Conflicts with patch by Akihiko Odaki fixing the same thing or a
>> related issue:
>>
>> virtio-gpu: Call Virgl only in the main thread
>>
>>
>> https://patchwork.ozlabs.org/project/qemu-devel/patch/20210617113520.25973-1-akihiko.od...@gmail.com/
>>
>> Can you have a look please and suggest how to handle this?
>>
>
> Thanks, I didn't notice we were trying to solve the same issue.
>
> Akihiko's patch indeed seems to solve the crash, but doesn't solve the
> flood of asserts (on wayland):
> qemu-system-x86_64: Gtk: gtk_gl_area_make_current: assertion
> 'gtk_widget_get_realized (widget)' failed
> qemu-system-x86_64: Gdk: gdk_window_create_gl_context: assertion
> 'GDK_IS_WINDOW (window)' failed
> qemu-system-x86_64: Gdk: gdk_gl_context_set_required_version: assertion
> 'GDK_IS_GL_CONTEXT (context)' failed
> ... and many more
>
> My patch cleans it for me, I would suggest to take mine.
>
> Fwiw, I just tested also on X11, and we have another regression that seems
> unrelated:
> qemu-system-x86_64: ../src/dispatch_common.c:858: epoxy_get_proc_address:
> Assertion `0 && "Couldn't find current GLX or EGL context.\n"' failed.
>
> sigh..
>
>
That assert is fixed with "vl: add virtio-vga-gl to the default_list" patch
(
https://patchew.org/QEMU/20210701062421.721414-1-marcandre.lur...@redhat.com/
).


-- 
Marc-André Lureau


Re: [PATCH 0/2] qapi/qom: use correct field name when getting/setting alias properties

2021-07-22 Thread Markus Armbruster
Since patch submitters tend to submit code that works for the success
case, I like to test a few failure cases before anything else.  Gotcha:

$ qemu-system-x86_64 -machine pc,pflash0=xxx qemu-system-x86_64:
Property 'cfi.pflash01.drive' can't find value 'xxx'

The error message is misleading.

This is not a "must not commit" issue.  Fixing a regression in time for
the release at the price of a bad error message is still a win.  The bad
error message needs fixing all the same, just not necessarily before the
release.

Since mere thinking doesn't rock the release boat: any ideas on how this
could be fixed?




  1   2   3   >