from:"Michal Privoznik"

[PATCH] hostmem: Round up memory size for qemu_madvise() in host_memory_backend_memory_complete()

2023-11-27 Thread Michal Privoznik

Simple reproducer:
qemu.git $ ./build/qemu-system-x86_64 \
-m size=8389632k,slots=16,maxmem=2560k \
-object 
'{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/hugepages2M/","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}'
 \
-numa node,nodeid=0,cpus=0,memdev=ram-node0

With current master I get:

qemu-system-x86_64: cannot bind memory to host NUMA nodes: Invalid argument

The problem is that memory size (8193MiB) is not an integer
multiple of underlying pagesize (2MiB) which triggers a check
inside of madvise(), since we can't really set a madvise() policy
just to a fraction of a page.

Signed-off-by: Michal Privoznik 
---
 backends/hostmem.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 747e7838c0..4e88d048de 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -326,9 +326,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
 HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
 Error *local_err = NULL;
 void *ptr;
-uint64_t sz;
 
 if (bc->alloc) {
+uint64_t sz;
+
 bc->alloc(backend, _err);
 if (local_err) {
 goto out;
@@ -337,6 +338,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
 ptr = memory_region_get_ram_ptr(>mr);
 sz = memory_region_size(>mr);
 
+/* Round up size to be an integer multiple of pagesize, because
+ * madvise() does not really like setting advices on a fraction of a
+ * page. */
+sz = ROUND_UP(sz, qemu_ram_pagesize(backend->mr.ram_block));
+
 if (backend->merge) {
 qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
 }
-- 
2.41.0

[PATCH] configure: check for $download value properly

2023-06-07 Thread Michal Privoznik

If configure was invoked with --disable-download and git
submodules were not checked out a warning is produced and the
configure script fails. But the $download variable (which
reflects the enable/disable download argument) is checked for in
a weird fashion:

  test -f "$download" = disabled

Drop the '-f' to check for the actual value of the variable.

Fixes: 2019cabfee0
Signed-off-by: Michal Privoznik 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 8765b88e12..8a638dd82a 100755
--- a/configure
+++ b/configure
@@ -767,7 +767,7 @@ if test "$plugins" = "yes" -a "$tcg" = "disabled"; then
 fi
 
 if ! test -f "$source_path/subprojects/keycodemapdb/README" \
-&& test -f "$download" = disabled
+&& test "$download" = disabled
 then
 echo
 echo "ERROR: missing subprojects"
-- 
2.39.3

[PATCH v3] meson: Avoid implicit declaration of functions

2023-05-30 Thread Michal Privoznik

While detecting a presence of a function via 'cc.links()'
gives desired result (i.e. detects whether function is present),
it also produces a warning on systems where the function is not
present (into meson-log.txt), e.g.:

  qemu.git/build/meson-private/tmph74x3p38/testfile.c:2:34: \
  warning: implicit declaration of function 'malloc_trim' 
[-Wimplicit-function-declaration]

And some distributions (e.g. Gentoo) parse the meson log and
consider these erroneous because it can lead to feature
misdetection (see [1]).

We can check whether given function exists via
'cc.has_function()' or whether STATX_* macros exist via
'cc.has_header_symbol()'.

1: https://wiki.gentoo.org/wiki/Modern_C_porting
Resolves: https://bugs.gentoo.org/898810
Signed-off-by: Michal Privoznik 
---

v3 of

https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg07210.html

diff to v2:
- Expanded commit message to explain why we need this.

 meson.build | 26 +-
 1 file changed, 5 insertions(+), 21 deletions(-)

diff --git a/meson.build b/meson.build
index 2d48aa1e2e..21061b19d4 100644
--- a/meson.build
+++ b/meson.build
@@ -1797,8 +1797,7 @@ malloc = []
 if get_option('malloc') == 'system'
   has_malloc_trim = \
 get_option('malloc_trim').allowed() and \
-cc.links('''#include 
-int main(void) { malloc_trim(0); return 0; }''')
+cc.has_function('malloc_trim', prefix: '#include ')
 else
   has_malloc_trim = false
   malloc = cc.find_library(get_option('malloc'), required: true)
@@ -1811,34 +1810,19 @@ if not has_malloc_trim and 
get_option('malloc_trim').enabled()
   endif
 endif
 
-# Check whether the glibc provides statx()
-
 gnu_source_prefix = '''
   #ifndef _GNU_SOURCE
   #define _GNU_SOURCE
   #endif
 '''
-statx_test = gnu_source_prefix + '''
-  #include 
-  int main(void) {
-struct statx statxbuf;
-statx(0, "", 0, STATX_BASIC_STATS, );
-return 0;
-  }'''
 
-has_statx = cc.links(statx_test)
+# Check whether the glibc provides STATX_BASIC_STATS
+
+has_statx = cc.has_header_symbol('sys/stat.h', 'STATX_BASIC_STATS', prefix: 
gnu_source_prefix)
 
 # Check whether statx() provides mount ID information
 
-statx_mnt_id_test = gnu_source_prefix + '''
-  #include 
-  int main(void) {
-struct statx statxbuf;
-statx(0, "", 0, STATX_BASIC_STATS | STATX_MNT_ID, );
-return statxbuf.stx_mnt_id;
-  }'''
-
-has_statx_mnt_id = cc.links(statx_mnt_id_test)
+has_statx_mnt_id = cc.has_header_symbol('sys/stat.h', 'STATX_MNT_ID', prefix: 
gnu_source_prefix)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
   .require(targetos == 'linux',
-- 
2.39.3

[PATCH v2] meson: Avoid implicit declaration of functions

2023-05-29 Thread Michal Privoznik

While detecting a presence of a function via 'cc.links()'
gives desired result (i.e. detects whether function is present),
it also produces a warning on systems where the function is not
present (into meson-log.txt), e.g.:

  qemu.git/build/meson-private/tmph74x3p38/testfile.c:2:34: \
  warning: implicit declaration of function 'malloc_trim' 
[-Wimplicit-function-declaration]

We can check whether given function exists via
'cc.has_function()' or whether STATX_* macros exist via
'cc.has_header_symbol()'.

Resolves: https://bugs.gentoo.org/898810
Signed-off-by: Michal Privoznik 
---

v2 of:

https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg07138.html

diff to v1:
- Drop cc.links() as it's redundant

 meson.build | 26 +-
 1 file changed, 5 insertions(+), 21 deletions(-)

diff --git a/meson.build b/meson.build
index 2d48aa1e2e..21061b19d4 100644
--- a/meson.build
+++ b/meson.build
@@ -1797,8 +1797,7 @@ malloc = []
 if get_option('malloc') == 'system'
   has_malloc_trim = \
 get_option('malloc_trim').allowed() and \
-cc.links('''#include 
-int main(void) { malloc_trim(0); return 0; }''')
+cc.has_function('malloc_trim', prefix: '#include ')
 else
   has_malloc_trim = false
   malloc = cc.find_library(get_option('malloc'), required: true)
@@ -1811,34 +1810,19 @@ if not has_malloc_trim and 
get_option('malloc_trim').enabled()
   endif
 endif
 
-# Check whether the glibc provides statx()
-
 gnu_source_prefix = '''
   #ifndef _GNU_SOURCE
   #define _GNU_SOURCE
   #endif
 '''
-statx_test = gnu_source_prefix + '''
-  #include 
-  int main(void) {
-struct statx statxbuf;
-statx(0, "", 0, STATX_BASIC_STATS, );
-return 0;
-  }'''
 
-has_statx = cc.links(statx_test)
+# Check whether the glibc provides STATX_BASIC_STATS
+
+has_statx = cc.has_header_symbol('sys/stat.h', 'STATX_BASIC_STATS', prefix: 
gnu_source_prefix)
 
 # Check whether statx() provides mount ID information
 
-statx_mnt_id_test = gnu_source_prefix + '''
-  #include 
-  int main(void) {
-struct statx statxbuf;
-statx(0, "", 0, STATX_BASIC_STATS | STATX_MNT_ID, );
-return statxbuf.stx_mnt_id;
-  }'''
-
-has_statx_mnt_id = cc.links(statx_mnt_id_test)
+has_statx_mnt_id = cc.has_header_symbol('sys/stat.h', 'STATX_MNT_ID', prefix: 
gnu_source_prefix)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
   .require(targetos == 'linux',
-- 
2.39.3

[PATCH] meson: Avoid implicit declaration of functions

2023-05-29 Thread Michal Privoznik

While detecting a presence of a function via 'cc.links()'
gives desired result (i.e. detects whether function is present),
it also produces a warning on systems where the function is not
present, e.g.:

  qemu.git/build/meson-private/tmph74x3p38/testfile.c:2:34: \
  warning: implicit declaration of function 'malloc_trim' 
[-Wimplicit-function-declaration]

We can check whether given function exists via
'cc.has_function()' firstly.

Resolves: https://bugs.gentoo.org/898810
Signed-off-by: Michal Privoznik 
---
 meson.build | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/meson.build b/meson.build
index 2d48aa1e2e..5da4dbac24 100644
--- a/meson.build
+++ b/meson.build
@@ -1797,6 +1797,7 @@ malloc = []
 if get_option('malloc') == 'system'
   has_malloc_trim = \
 get_option('malloc_trim').allowed() and \
+cc.has_function('malloc_trim', prefix: '#include ') and \
 cc.links('''#include 
 int main(void) { malloc_trim(0); return 0; }''')
 else
@@ -1818,27 +1819,29 @@ gnu_source_prefix = '''
   #define _GNU_SOURCE
   #endif
 '''
-statx_test = gnu_source_prefix + '''
-  #include 
+statx_prefix = gnu_source_prefix + '''
+  #include
+'''
+statx_test = statx_prefix + '''
   int main(void) {
 struct statx statxbuf;
 statx(0, "", 0, STATX_BASIC_STATS, );
 return 0;
   }'''
 
-has_statx = cc.links(statx_test)
+has_statx = cc.has_function('statx', prefix: statx_prefix) and \
+cc.links(statx_test)
 
 # Check whether statx() provides mount ID information
 
-statx_mnt_id_test = gnu_source_prefix + '''
-  #include 
+statx_mnt_id_test = statx_prefix + '''
   int main(void) {
 struct statx statxbuf;
 statx(0, "", 0, STATX_BASIC_STATS | STATX_MNT_ID, );
 return statxbuf.stx_mnt_id;
   }'''
 
-has_statx_mnt_id = cc.links(statx_mnt_id_test)
+has_statx_mnt_id = has_statx and cc.links(statx_mnt_id_test)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
   .require(targetos == 'linux',
-- 
2.39.3

[PATCH v2] hostmem: Honor multiple preferred nodes if possible

2022-12-15 Thread Michal Privoznik

If a memory-backend is configured with mode
HOST_MEM_POLICY_PREFERRED then
host_memory_backend_memory_complete() calls mbind() as:

  mbind(..., MPOL_PREFERRED, nodemask, ...);

Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
to the .host-nodes attribute. Therefore, there can be multiple
nodes specified. However, the documentation to MPOL_PREFERRED
says:

  MPOL_PREFERRED
This mode sets the preferred node for allocation. ...
If nodemask specifies more than one node ID, the first node
in the mask will be selected as the preferred node.

Therefore, only the first node is honored and the rest is
silently ignored. Well, with recent changes to the kernel and
numactl we can do better.

The Linux kernel added in v5.15 via commit cfcaa66f8032
("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
support for MPOL_PREFERRED_MANY, which accepts multiple preferred
NUMA nodes instead.

Then, numa_has_preferred_many() API was introduced to numactl
(v2.0.15~26) allowing applications to query kernel support.

Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
mbind() call instead and stop ignoring multiple nodes, silently.

Signed-off-by: Michal Privoznik 
---

v2 of:

https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg01354.html

diff to v1 (thanks to David for his rewiew):
- Don't cache numa_has_preferred_many() retval
- Reword comments and commit message
- Switch compile time detection from numa_set_preferred_many() to
  numa_has_preferred_many()

 backends/hostmem.c | 19 +--
 meson.build|  5 +
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 8640294c10..163ea9af04 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -23,7 +23,12 @@
 
 #ifdef CONFIG_NUMA
 #include 
+#include 
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_DEFAULT != MPOL_DEFAULT);
+/*
+ * HOST_MEM_POLICY_PREFERRED may either translate to MPOL_PREFERRED or
+ * MPOL_PREFERRED_MANY, see comments further below.
+ */
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_PREFERRED != MPOL_PREFERRED);
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
@@ -346,6 +351,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
  * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
  * this doesn't catch hugepage case. */
 unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
+int mode = backend->policy;
 
 /* check for invalid host-nodes and policies and give more verbose
  * error messages than mbind(). */
@@ -369,9 +375,18 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
 assert(maxnode <= MAX_NODES);
 
+#ifdef HAVE_NUMA_SET_PREFERRED_MANY
+if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
+/*
+ * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
+ * silently picks the first node.
+ */
+mode = MPOL_PREFERRED_MANY;
+}
+#endif
+
 if (maxnode &&
-mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1,
-  flags)) {
+mbind(ptr, sz, mode, backend->host_nodes, maxnode + 1, flags)) {
 if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
 error_setg_errno(errp, errno,
  "cannot bind memory to host NUMA nodes");
diff --git a/meson.build b/meson.build
index 5c6b5a1c75..fb6979349d 100644
--- a/meson.build
+++ b/meson.build
@@ -1858,6 +1858,11 @@ config_host_data.set('CONFIG_LINUX_AIO', libaio.found())
 config_host_data.set('CONFIG_LINUX_IO_URING', linux_io_uring.found())
 config_host_data.set('CONFIG_LIBPMEM', libpmem.found())
 config_host_data.set('CONFIG_NUMA', numa.found())
+if numa.found()
+  config_host_data.set('HAVE_NUMA_HAS_PREFERRED_MANY',
+   cc.has_function('numa_has_preferred_many',
+   dependencies: numa))
+endif
 config_host_data.set('CONFIG_OPENGL', opengl.found())
 config_host_data.set('CONFIG_PROFILER', get_option('profiler'))
 config_host_data.set('CONFIG_RBD', rbd.found())
-- 
2.37.4

[PATCH] hostmem: Honour multiple preferred nodes if possible

2022-12-09 Thread Michal Privoznik

If a memory-backend is configured with mode
HOST_MEM_POLICY_PREFERRED then
host_memory_backend_memory_complete() calls mbind() as:

  mbind(..., MPOL_PREFERRED, nodemask, ...);

Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
to the .host-nodes attribute. Therefore, there can be multiple
nodes specified. However, the documentation to MPOL_PREFERRED
says:

  MPOL_PREFERRED
This mode sets the preferred node for allocation. ...
If nodemask specifies more than one node ID, the first node
in the mask will be selected as the preferred node.

Therefore, only the first node is honoured and the rest is
silently ignored. Well, with recent changes to the kernel and
numactl we can do better.

Firstly, new mode - MPOL_PREFERRED_MANY - was introduced to
kernel (v5.15-rc1~107^2~21) which now accepts multiple NUMA
nodes.

Then, numa_has_preferred_many() API was introduced to numactl
(v2.0.15~26) allowing applications to query kernel support.

Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
mbind() call instead and stop ignoring multiple nodes, silently.

Signed-off-by: Michal Privoznik 
---
 backends/hostmem.c | 28 
 meson.build|  5 +
 2 files changed, 33 insertions(+)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 8640294c10..e0d6cb6c8a 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -23,10 +23,22 @@
 
 #ifdef CONFIG_NUMA
 #include 
+#include 
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_DEFAULT != MPOL_DEFAULT);
+/*
+ * HOST_MEM_POLICY_PREFERRED may some time also by MPOL_PREFERRED_MANY, see
+ * below.
+ */
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_PREFERRED != MPOL_PREFERRED);
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
+
+/*
+ * -1 for uninitialized,
+ *  0 for MPOL_PREFERRED_MANY unsupported,
+ *  1 for supported.
+ */
+static int has_preferred_many = -1;
 #endif
 
 char *
@@ -346,6 +358,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
  * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
  * this doesn't catch hugepage case. */
 unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
+int mode = backend->policy;
 
 /* check for invalid host-nodes and policies and give more verbose
  * error messages than mbind(). */
@@ -369,6 +382,21 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
 assert(maxnode <= MAX_NODES);
 
+#ifdef HAVE_NUMA_SET_PREFERRED_MANY
+if (has_preferred_many < 0) {
+/* Check, whether kernel supports MPOL_PREFERRED_MANY. */
+has_preferred_many = numa_has_preferred_many() > 0 ? 1 : 0;
+}
+
+if (mode == MPOL_PREFERRED && has_preferred_many > 0) {
+/*
+ * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
+ * silently picks the first node.
+ */
+mode = MPOL_PREFERRED_MANY;
+}
+#endif
+
 if (maxnode &&
 mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1,
   flags)) {
diff --git a/meson.build b/meson.build
index 5c6b5a1c75..ebbff7a8ea 100644
--- a/meson.build
+++ b/meson.build
@@ -1858,6 +1858,11 @@ config_host_data.set('CONFIG_LINUX_AIO', libaio.found())
 config_host_data.set('CONFIG_LINUX_IO_URING', linux_io_uring.found())
 config_host_data.set('CONFIG_LIBPMEM', libpmem.found())
 config_host_data.set('CONFIG_NUMA', numa.found())
+if numa.found()
+  config_host_data.set('HAVE_NUMA_SET_PREFERRED_MANY',
+   cc.has_function('numa_set_preferred_many',
+   dependencies: numa))
+endif
 config_host_data.set('CONFIG_OPENGL', opengl.found())
 config_host_data.set('CONFIG_PROFILER', get_option('profiler'))
 config_host_data.set('CONFIG_RBD', rbd.found())
-- 
2.37.4

[PATCH v2] seccomp: Get actual errno value from failed seccomp functions

2022-10-26 Thread Michal Privoznik

Upon failure, a libseccomp API returns actual errno value very
rarely. Fortunately, after its commit 34bf78ab (contained in
2.5.0 release), the SCMP_FLTATR_API_SYSRAWRC attribute can be set
which makes subsequent APIs return true errno on failure.

This is especially critical when seccomp_load() fails, because
generic -ECANCELED says nothing.

Signed-off-by: Michal Privoznik 
Reviewed-by: Philippe Mathieu-Daudé 
---

v2 of:

https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg04509.html

diff to v1:
- added comment when setting SYSRAWRC attribute per Philippe's
  suggestion

 meson.build|  9 +
 softmmu/qemu-seccomp.c | 13 +
 2 files changed, 22 insertions(+)

diff --git a/meson.build b/meson.build
index b686dfef75..5f114c89d9 100644
--- a/meson.build
+++ b/meson.build
@@ -636,10 +636,16 @@ if vmnet.found() and not 
cc.has_header_symbol('vmnet/vmnet.h',
 endif
 
 seccomp = not_found
+seccomp_has_sysrawrc = false
 if not get_option('seccomp').auto() or have_system or have_tools
   seccomp = dependency('libseccomp', version: '>=2.3.0',
required: get_option('seccomp'),
method: 'pkg-config', kwargs: static_kwargs)
+  if seccomp.found()
+seccomp_has_sysrawrc = cc.has_header_symbol('seccomp.h',
+'SCMP_FLTATR_API_SYSRAWRC',
+dependencies: seccomp)
+  endif
 endif
 
 libcap_ng = not_found
@@ -1849,6 +1855,9 @@ config_host_data.set('CONFIG_RDMA', rdma.found())
 config_host_data.set('CONFIG_SDL', sdl.found())
 config_host_data.set('CONFIG_SDL_IMAGE', sdl_image.found())
 config_host_data.set('CONFIG_SECCOMP', seccomp.found())
+if seccomp.found()
+  config_host_data.set('CONFIG_SECCOMP_SYSRAWRC', seccomp_has_sysrawrc)
+endif
 config_host_data.set('CONFIG_SNAPPY', snappy.found())
 config_host_data.set('CONFIG_TPM', have_tpm)
 config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
diff --git a/softmmu/qemu-seccomp.c b/softmmu/qemu-seccomp.c
index deaf8a4ef5..d66a2a1226 100644
--- a/softmmu/qemu-seccomp.c
+++ b/softmmu/qemu-seccomp.c
@@ -312,6 +312,19 @@ static int seccomp_start(uint32_t seccomp_opts, Error 
**errp)
 goto seccomp_return;
 }
 
+#if defined(CONFIG_SECCOMP_SYSRAWRC)
+/*
+ * This must be the first seccomp_attr_set() call to have full
+ * error propagation from subsequent seccomp APIs.
+ */
+rc = seccomp_attr_set(ctx, SCMP_FLTATR_API_SYSRAWRC, 1);
+if (rc != 0) {
+error_setg_errno(errp, -rc,
+ "failed to set seccomp rawrc attribute");
+goto seccomp_return;
+}
+#endif
+
 rc = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_TSYNC, 1);
 if (rc != 0) {
 error_setg_errno(errp, -rc,
-- 
2.37.4

[PATCH] seccomp: Get actual errno value from failed seccomp functions

2022-10-25 Thread Michal Privoznik

Upon failure, a libseccomp API returns actual errno value very
rarely. Fortunately, after its commit 34bf78ab (contained in
2.5.0 release), the SCMP_FLTATR_API_SYSRAWRC attribute can be set
which makes subsequent APIs return true errno on failure.

This is especially critical when seccomp_load() fails, because
generic -ECANCELED says nothing.

Signed-off-by: Michal Privoznik 
---
 meson.build| 9 +
 softmmu/qemu-seccomp.c | 9 +
 2 files changed, 18 insertions(+)

diff --git a/meson.build b/meson.build
index b686dfef75..5f114c89d9 100644
--- a/meson.build
+++ b/meson.build
@@ -636,10 +636,16 @@ if vmnet.found() and not 
cc.has_header_symbol('vmnet/vmnet.h',
 endif
 
 seccomp = not_found
+seccomp_has_sysrawrc = false
 if not get_option('seccomp').auto() or have_system or have_tools
   seccomp = dependency('libseccomp', version: '>=2.3.0',
required: get_option('seccomp'),
method: 'pkg-config', kwargs: static_kwargs)
+  if seccomp.found()
+seccomp_has_sysrawrc = cc.has_header_symbol('seccomp.h',
+'SCMP_FLTATR_API_SYSRAWRC',
+dependencies: seccomp)
+  endif
 endif
 
 libcap_ng = not_found
@@ -1849,6 +1855,9 @@ config_host_data.set('CONFIG_RDMA', rdma.found())
 config_host_data.set('CONFIG_SDL', sdl.found())
 config_host_data.set('CONFIG_SDL_IMAGE', sdl_image.found())
 config_host_data.set('CONFIG_SECCOMP', seccomp.found())
+if seccomp.found()
+  config_host_data.set('CONFIG_SECCOMP_SYSRAWRC', seccomp_has_sysrawrc)
+endif
 config_host_data.set('CONFIG_SNAPPY', snappy.found())
 config_host_data.set('CONFIG_TPM', have_tpm)
 config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
diff --git a/softmmu/qemu-seccomp.c b/softmmu/qemu-seccomp.c
index deaf8a4ef5..05abf257c4 100644
--- a/softmmu/qemu-seccomp.c
+++ b/softmmu/qemu-seccomp.c
@@ -312,6 +312,15 @@ static int seccomp_start(uint32_t seccomp_opts, Error 
**errp)
 goto seccomp_return;
 }
 
+#if defined(CONFIG_SECCOMP_SYSRAWRC)
+rc = seccomp_attr_set(ctx, SCMP_FLTATR_API_SYSRAWRC, 1);
+if (rc != 0) {
+error_setg_errno(errp, -rc,
+ "failed to set seccomp rawrc attribute");
+goto seccomp_return;
+}
+#endif
+
 rc = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_TSYNC, 1);
 if (rc != 0) {
 error_setg_errno(errp, -rc,
-- 
2.37.4

[PATCH v3] configure: Avoid using strings binary

2022-10-14 Thread Michal Privoznik

When determining the endiandness of the target architecture we're
building for a small program is compiled, which in an obfuscated
way declares two strings. Then, we look which string is in
correct order (using strings binary) and deduct the endiandness.
But using the strings binary is problematic, because it's part of
toolchain (strings is just a symlink to
x86_64-pc-linux-gnu-strings or llvm-strings). And when
(cross-)compiling, it requires users to set the symlink to the
correct toolchain.

Fortunately, we have a better alternative anyways. We can mimic
what compiler.h is already doing: comparing __BYTE_ORDER__
against values for little/big endiandness.

Bug: https://bugs.gentoo.org/876933
Signed-off-by: Michal Privoznik 
---

v3 of:

https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg02149.html

diff to v2:
- Check whether __BYTE_ORDER__ is defined prior comparing it
- Switch from 'if compile_prog' to 'if ! compile_prog'

 configure | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/configure b/configure
index 45ee6f4eb3..d186944d3f 100755
--- a/configure
+++ b/configure
@@ -1423,30 +1423,31 @@ if test "$tcg" = "enabled"; then
 git_submodules="$git_submodules tests/fp/berkeley-softfloat-3"
 fi
 
-# ---
+##
 # big/little endian test
 cat > $TMPC << EOF
-#include 
-short big_endian[] = { 0x4269, 0x4765, 0x4e64, 0x4961, 0x4e00, 0, };
-short little_endian[] = { 0x694c, 0x7454, 0x654c, 0x6e45, 0x6944, 0x6e41, 0, };
-int main(int argc, char *argv[])
-{
-return printf("%s %s\n", (char *)big_endian, (char *)little_endian);
-}
+#if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+# error LITTLE
+#endif
+int main(void) { return 0; }
 EOF
 
-if compile_prog ; then
-if strings -a $TMPE | grep -q BiGeNdIaN ; then
-bigendian="yes"
-elif strings -a $TMPE | grep -q LiTtLeEnDiAn ; then
-bigendian="no"
-else
-echo big/little test failed
-exit 1
-fi
+if ! compile_prog ; then
+  bigendian="no"
 else
+  cat > $TMPC << EOF
+#if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+# error BIG
+#endif
+int main(void) { return 0; }
+EOF
+
+  if ! compile_prog ; then
+bigendian="yes"
+  else
 echo big/little test failed
 exit 1
+  fi
 fi
 
 ##
-- 
2.35.1

[PATCH v2] configure: Avoid using strings binary

2022-10-13 Thread Michal Privoznik

When determining the endiandness of the target architecture we're
building for a small program is compiled, which in an obfuscated
way declares two strings. Then, we look which string is in
correct order (using strings binary) and deduct the endiandness.
But using the strings binary is problematic, because it's part of
toolchain (strings is just a symlink to
x86_64-pc-linux-gnu-strings or llvm-strings). And when
(cross-)compiling, it requires users to set the symlink to the
correct toolchain.

Fortunately, we have a better alternative anyways. We can mimic
what compiler.h is already doing: comparing __BYTE_ORDER__
against values for little/big endiandness.

Bug: https://bugs.gentoo.org/876933
Signed-off-by: Michal Privoznik 
---

v2 of:

https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg02054.html

diff to v1:
- Fixed reversed logic
- Ditched custom compiler macros in favor of __BYTE_ORDER__

 configure | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/configure b/configure
index 45ee6f4eb3..2ac26c6978 100755
--- a/configure
+++ b/configure
@@ -1423,30 +1423,31 @@ if test "$tcg" = "enabled"; then
 git_submodules="$git_submodules tests/fp/berkeley-softfloat-3"
 fi
 
-# ---
+##
 # big/little endian test
 cat > $TMPC << EOF
-#include 
-short big_endian[] = { 0x4269, 0x4765, 0x4e64, 0x4961, 0x4e00, 0, };
-short little_endian[] = { 0x694c, 0x7454, 0x654c, 0x6e45, 0x6944, 0x6e41, 0, };
-int main(int argc, char *argv[])
-{
-return printf("%s %s\n", (char *)big_endian, (char *)little_endian);
-}
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+# error LITTLE
+#endif
+int main(void) { return 0; }
 EOF
 
 if compile_prog ; then
-if strings -a $TMPE | grep -q BiGeNdIaN ; then
-bigendian="yes"
-elif strings -a $TMPE | grep -q LiTtLeEnDiAn ; then
-bigendian="no"
-else
-echo big/little test failed
-exit 1
-fi
+  bigendian="yes"
 else
+  cat > $TMPC << EOF
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+# error BIG
+#endif
+int main(void) { return 0; }
+EOF
+
+  if compile_prog ; then
+bigendian="no"
+  else
 echo big/little test failed
 exit 1
+  fi
 fi
 
 ##
-- 
2.35.1

[PATCH] configure: Avoid using strings binary

2022-10-13 Thread Michal Privoznik

When determining the endiandness of the target architecture we're
building for a small program is compiled, which in an obfuscated
way declares two strings. Then, we look which string is in
correct order (using strings binary) and deduct the endiandness.
But using the strings binary is problematic, because it's part of
toolchain (strings is just a symlink to
x86_64-pc-linux-gnu-strings or llvm-strings). And when
(cross-)compiling, it requires users to set the symlink to the
correct toolchain.

Fortunately, we have a better alternative anyways. Since we
require either clang or gcc we can rely on macros they declare.

Bug: https://bugs.gentoo.org/876933
Signed-off-by: Michal Privoznik 
---
 configure | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/configure b/configure
index 45ee6f4eb3..91e04635cb 100755
--- a/configure
+++ b/configure
@@ -1426,27 +1426,30 @@ fi
 # ---
 # big/little endian test
 cat > $TMPC << EOF
-#include 
-short big_endian[] = { 0x4269, 0x4765, 0x4e64, 0x4961, 0x4e00, 0, };
-short little_endian[] = { 0x694c, 0x7454, 0x654c, 0x6e45, 0x6944, 0x6e41, 0, };
-int main(int argc, char *argv[])
-{
-return printf("%s %s\n", (char *)big_endian, (char *)little_endian);
-}
+#if defined(__BYTE_ORDER) && __BYTE_ORDER == __BIG_ENDIAN || \
+defined(__BIG_ENDIAN__)
+# error BIG
+#endif
+int main(void) { return 0; }
 EOF
 
 if compile_prog ; then
-if strings -a $TMPE | grep -q BiGeNdIaN ; then
-bigendian="yes"
-elif strings -a $TMPE | grep -q LiTtLeEnDiAn ; then
-bigendian="no"
-else
-echo big/little test failed
-exit 1
-fi
+  bigendian="yes"
 else
+  cat > $TMPC << EOF
+#if defined(__BYTE_ORDER) && __BYTE_ORDER == __LITTLE_ENDIAN || \
+defined(__LITTLE_ENDIAN__)
+# error LITTLE
+#endif
+int main(void) { return 0; }
+EOF
+
+  if compile_prog ; then
+bigendian="no"
+  else
 echo big/little test failed
 exit 1
+  fi
 fi
 
 ##
-- 
2.35.1

[PATCH] util: NUMA aware memory preallocation

2022-05-10 Thread Michal Privoznik

When allocating large amounts of memory the task is offloaded
onto threads. These threads then use various techniques to
allocate the memory fully (madvise(), writing into the memory).
However, these threads are free to run on any CPU, which becomes
problematic on NUMA machines because it may happen that a thread
is running on a distant node.

Ideally, this is something that a management application would
resolve, but we are not anywhere close to that, Firstly, memory
allocation happens before monitor socket is even available. But
okay, that's what -preconfig is for. But then the problem is that
'object-add' would not return until all memory is preallocated.

Long story short, management application has no way of learning
TIDs of allocator threads so it can't make them run NUMA aware.

But what we can do is to propagate the 'host-nodes' attribute of
MemoryBackend object down to where preallocation threads are
created and set their affinity according to the attribute.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2074000
Signed-off-by: Michal Privoznik 
---
 backends/hostmem.c |  6 ++--
 hw/virtio/virtio-mem.c |  2 +-
 include/qemu/osdep.h   |  2 ++
 util/meson.build   |  2 +-
 util/oslib-posix.c | 74 --
 util/oslib-win32.c |  2 ++
 6 files changed, 82 insertions(+), 6 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index a7bae3d713..7373472c7e 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -232,7 +232,8 @@ static void host_memory_backend_set_prealloc(Object *obj, 
bool value,
 void *ptr = memory_region_get_ram_ptr(>mr);
 uint64_t sz = memory_region_size(>mr);
 
-os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, _err);
+os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads,
+backend->host_nodes, MAX_NODES, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -394,7 +395,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
  */
 if (backend->prealloc) {
 os_mem_prealloc(memory_region_get_fd(>mr), ptr, sz,
-backend->prealloc_threads, _err);
+backend->prealloc_threads, backend->host_nodes,
+MAX_NODES, _err);
 if (local_err) {
 goto out;
 }
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 5aca408726..48b104cdf6 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -467,7 +467,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, 
uint64_t start_gpa,
 int fd = memory_region_get_fd(>memdev->mr);
 Error *local_err = NULL;
 
-os_mem_prealloc(fd, area, size, 1, _err);
+os_mem_prealloc(fd, area, size, 1, NULL, 0, _err);
 if (local_err) {
 static bool warned;
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 1c1e7eca98..474cbf3b86 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -577,6 +577,8 @@ unsigned long qemu_getauxval(unsigned long type);
 void qemu_set_tty_echo(int fd, bool echo);
 
 void os_mem_prealloc(int fd, char *area, size_t sz, int smp_cpus,
+ const unsigned long *host_nodes,
+ unsigned long max_node,
  Error **errp);
 
 /**
diff --git a/util/meson.build b/util/meson.build
index 8f16018cd4..393ff74570 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -15,7 +15,7 @@ freebsd_dep = []
 if targetos == 'freebsd'
   freebsd_dep = util
 endif
-util_ss.add(when: 'CONFIG_POSIX', if_true: [files('oslib-posix.c'), 
freebsd_dep])
+util_ss.add(when: 'CONFIG_POSIX', if_true: [files('oslib-posix.c'), 
freebsd_dep, numa])
 util_ss.add(when: 'CONFIG_POSIX', if_true: files('qemu-thread-posix.c'))
 util_ss.add(when: 'CONFIG_POSIX', if_true: files('memfd.c'))
 util_ss.add(when: 'CONFIG_WIN32', if_true: files('aio-win32.c'))
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 477990f39b..1572b9b178 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -73,6 +73,10 @@
 #include "qemu/error-report.h"
 #endif
 
+#ifdef CONFIG_NUMA
+#include 
+#endif
+
 #define MAX_MEM_PREALLOC_THREAD_COUNT 16
 
 struct MemsetThread;
@@ -82,6 +86,9 @@ typedef struct MemsetContext {
 bool any_thread_failed;
 struct MemsetThread *threads;
 int num_threads;
+#ifdef CONFIG_NUMA
+struct bitmask *nodemask;
+#endif
 } MemsetContext;
 
 struct MemsetThread {
@@ -420,6 +427,12 @@ static void *do_touch_pages(void *arg)
 }
 qemu_mutex_unlock(_mutex);
 
+#ifdef CONFIG_NUMA
+if (memset_args->context->nodemask) {
+numa_run_on_node_mask(memset_args->context->nodemask);
+}
+#endif
+
 /* unblock SIGBUS */
 sigemptyset();
 sigaddset(, SIGBUS);
@@ -463,6 +476

[PATCH] qmp: Stabilize preconfig

2021-10-25 Thread Michal Privoznik

The -preconfig option and exit-preconfig command are around for
quite some time now. However, they are still marked as unstable.
This is suboptimal because it may block some upper layer in
consuming it. In this specific case - Libvirt avoids using
experimental features.

Signed-off-by: Michal Privoznik 
---
 docs/system/managed-startup.rst | 2 +-
 monitor/hmp-cmds.c  | 2 +-
 qapi/misc.json  | 6 +++---
 qemu-options.hx | 5 ++---
 softmmu/vl.c| 4 ++--
 tests/qtest/numa-test.c | 8 
 tests/qtest/qmp-test.c  | 6 +++---
 7 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/docs/system/managed-startup.rst b/docs/system/managed-startup.rst
index 9bcf98ea79..f3291b867b 100644
--- a/docs/system/managed-startup.rst
+++ b/docs/system/managed-startup.rst
@@ -32,4 +32,4 @@ machine, including but not limited to:
 - ``query-qmp-schema``
 - ``query-commands``
 - ``query-status``
-- ``x-exit-preconfig``
+- ``exit-preconfig``
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index bcaa41350e..5ccb823b97 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1001,7 +1001,7 @@ void hmp_exit_preconfig(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
 
-qmp_x_exit_preconfig();
+qmp_exit_preconfig();
 hmp_handle_error(mon, err);
 }
 
diff --git a/qapi/misc.json b/qapi/misc.json
index 5c2ca3b556..0f75d60996 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -175,7 +175,7 @@
 { 'command': 'cont' }
 
 ##
-# @x-exit-preconfig:
+# @exit-preconfig:
 #
 # Exit from "preconfig" state
 #
@@ -191,11 +191,11 @@
 #
 # Example:
 #
-# -> { "execute": "x-exit-preconfig" }
+# -> { "execute": "exit-preconfig" }
 # <- { "return": {} }
 #
 ##
-{ 'command': 'x-exit-preconfig', 'allow-preconfig': true }
+{ 'command': 'exit-preconfig', 'allow-preconfig': true }
 
 ##
 # @human-monitor-command:
diff --git a/qemu-options.hx b/qemu-options.hx
index 5f375bbfa6..d27778869b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3936,10 +3936,9 @@ SRST
 ``--preconfig``
 Pause QEMU for interactive configuration before the machine is
 created, which allows querying and configuring properties that will
-affect machine initialization. Use QMP command 'x-exit-preconfig' to
+affect machine initialization. Use QMP command 'exit-preconfig' to
 exit the preconfig state and move to the next state (i.e. run guest
-if -S isn't used or pause the second time if -S is used). This
-option is experimental.
+if -S isn't used or pause the second time if -S is used).
 ERST
 
 DEF("S", 0, QEMU_OPTION_S, \
diff --git a/softmmu/vl.c b/softmmu/vl.c
index af0c4cbd99..09a9ec06f9 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2730,7 +2730,7 @@ static void qemu_machine_creation_done(void)
 }
 }
 
-void qmp_x_exit_preconfig(Error **errp)
+void qmp_exit_preconfig(Error **errp)
 {
 if (phase_check(PHASE_MACHINE_INITIALIZED)) {
 error_setg(errp, "The command is permitted only before machine 
initialization");
@@ -3770,7 +3770,7 @@ void qemu_init(int argc, char **argv, char **envp)
 }
 
 if (!preconfig_requested) {
-qmp_x_exit_preconfig(_fatal);
+qmp_exit_preconfig(_fatal);
 }
 qemu_init_displays();
 accel_setup_post(current_machine);
diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c
index 90bf68a5b3..bea95b1832 100644
--- a/tests/qtest/numa-test.c
+++ b/tests/qtest/numa-test.c
@@ -285,7 +285,7 @@ static void pc_dynamic_cpu_cfg(const void *data)
 " 'arguments': { 'type': 'cpu', 'node-id': 1, 'socket-id': 0 } }")));
 
 /* let machine initialization to complete and run */
-g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'x-exit-preconfig' 
}")));
+g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'exit-preconfig' 
}")));
 qtest_qmp_eventwait(qs, "RESUME");
 
 /* check that CPUs are mapped as expected */
@@ -443,7 +443,7 @@ static void pc_hmat_build_cfg(const void *data)
 
 /* let machine initialization to complete and run */
 g_assert_false(qmp_rsp_is_err(qtest_qmp(qs,
-"{ 'execute': 'x-exit-preconfig' }")));
+"{ 'execute': 'exit-preconfig' }")));
 qtest_qmp_eventwait(qs, "RESUME");
 
 qtest_quit(qs);
@@ -482,7 +482,7 @@ static void pc_hmat_off_cfg(const void *data)
 
 /* let machine initialization to complete and run */
 g_assert_false(qmp_rsp_is_err(qtest_qmp(qs,
-"{ 'execute': 'x-exit-preconfig' }")));
+"{ 'execute': 'exit-preconfig' }")));
 qtest_qmp_eventwait(qs, "RESUME");
 
 qtest_quit(qs);
@@ -533,7 +533,7 @@ static void pc_hmat_erange_cfg(const void *data)
 
 /* let machine initialization to complete and run */
 g_assert_false(qmp_rsp_is_err(qtest_qmp(qs,
-

[PATCH 0/2] Two chardev with fdset fixes

2021-08-17 Thread Michal Privoznik

I was working on libvirt, trying to make it pass a pre-opened FD for a
chardev's logfile. I've spent non-negligible amount of time trying to
figure out why I was getting EACCESS error when QEMU definitely was able
to access the FD. Well, found the problem and here is my attempt to save
future me from figuring it out again.

NB, I'm not fully convinced that EBADFD is the best value, but anything
is better than EACCESS.

Michal Privoznik (2):
  chardev: Propagate error from logfile opening
  monitor: Report EBADFD if fdset contains invalid FD

 chardev/char.c | 7 ++-
 monitor/misc.c | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

-- 
2.31.1

[PATCH 1/2] chardev: Propagate error from logfile opening

2021-08-17 Thread Michal Privoznik

If a chardev has a logfile the file is opened using
qemu_open_old() which does the job, but since @errp is not
propagated into qemu_open_internal() we lose much more accurate
error and just report "Unable to open logfile $errno".  When
using plain files, it's probably okay as nothing complex is
happening behind the curtains. But the problem becomes more
prominent when passing an "/dev/fdset/XXX" path since much more
needs to be done.

The fix is to use qemu_create() which passes @errp further down.

Signed-off-by: Michal Privoznik 
---
 chardev/char.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/chardev/char.c b/chardev/char.c
index 4595a8d430..0169d8dde4 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -241,18 +241,15 @@ static void qemu_char_open(Chardev *chr, ChardevBackend 
*backend,
 ChardevCommon *common = backend ? backend->u.null.data : NULL;
 
 if (common && common->has_logfile) {
-int flags = O_WRONLY | O_CREAT;
+int flags = O_WRONLY;
 if (common->has_logappend &&
 common->logappend) {
 flags |= O_APPEND;
 } else {
 flags |= O_TRUNC;
 }
-chr->logfd = qemu_open_old(common->logfile, flags, 0666);
+chr->logfd = qemu_create(common->logfile, flags, 0666, errp);
 if (chr->logfd < 0) {
-error_setg_errno(errp, errno,
- "Unable to open logfile %s",
- common->logfile);
 return;
 }
 }
-- 
2.31.1

[PATCH 2/2] monitor: Report EBADFD if fdset contains invalid FD

2021-08-17 Thread Michal Privoznik

When opening a path that starts with "/dev/fdset/" the control
jumps into qemu_parse_fdset() and then into
monitor_fdset_dup_fd_add(). In here, corresponding fdset is found
and then all FDs from the set are iterated over trying to find an
FD that matches expected access mode. For instance, if caller
wants O_WRONLY then the FD set has to contain an O_WRONLY FD.

If no such FD is found then errno is set to EACCES which results
in very misleading error messages, for instance:

  Could not dup FD for /dev/fdset/3 flags 441: Permission denied

There is no permission issue, the problem is that there was no FD
within given fdset that was in expected access mode. Therefore,
let's set errno to EBADFD, which gives us somewhat better
error messages:

  Could not dup FD for /dev/fdset/3 flags 441: File descriptor in bad state

Signed-off-by: Michal Privoznik 
---
 monitor/misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monitor/misc.c b/monitor/misc.c
index ffe7966870..a0eda0d574 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -1347,7 +1347,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 }
 
 if (fd == -1) {
-errno = EACCES;
+errno = EBADFD;
 return -1;
 }
 
-- 
2.31.1

[PATCH 2/2] numa: Parse initiator= attribute before cpus= attribute

2021-07-07 Thread Michal Privoznik

When parsing cpus= attribute of -numa object couple of checks
is performed, such as correct initiator setting (see the if()
statement at the end of for() loop in
machine_set_cpu_numa_node()).

However, with the current code cpus= attribute is parsed before
initiator= attribute and thus the check may fail even though it
is not obvious why. But since parsing the initiator= attribute
does not depend on the cpus= attribute we can swap the order of
the two.

It's fairly easy to reproduce with the following command line
(snippet of an actual cmd line):

  -smp 4,sockets=4,cores=1,threads=1 \
  -object 
'{"qom-type":"memory-backend-ram","id":"ram-node0","size":2147483648}' \
  -numa node,nodeid=0,cpus=0-1,initiator=0,memdev=ram-node0 \
  -object 
'{"qom-type":"memory-backend-ram","id":"ram-node1","size":2147483648}' \
  -numa node,nodeid=1,cpus=2-3,initiator=1,memdev=ram-node1 \
  -numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5
 \
  -numa 
hmat-lb,initiator=0,target=0,hierarchy=first-level,data-type=access-latency,latency=10
 \
  -numa 
hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=5
 \
  -numa 
hmat-lb,initiator=1,target=1,hierarchy=first-level,data-type=access-latency,latency=10
 \
  -numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=204800K
 \
  -numa 
hmat-lb,initiator=0,target=0,hierarchy=first-level,data-type=access-bandwidth,bandwidth=208896K
 \
  -numa 
hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=204800K
 \
  -numa 
hmat-lb,initiator=1,target=1,hierarchy=first-level,data-type=access-bandwidth,bandwidth=208896K
 \
  -numa 
hmat-cache,node-id=0,size=10K,level=1,associativity=direct,policy=write-back,line=8
 \
  -numa 
hmat-cache,node-id=1,size=10K,level=1,associativity=direct,policy=write-back,line=8
 \

Signed-off-by: Michal Privoznik 
---
 hw/core/numa.c | 45 +++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 1058d3697b..510d096a88 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -88,6 +88,29 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 return;
 }
 
+/*
+ * If not set the initiator, set it to MAX_NODES. And if
+ * HMAT is enabled and this node has no cpus, QEMU will raise error.
+ */
+numa_info[nodenr].initiator = MAX_NODES;
+if (node->has_initiator) {
+if (!ms->numa_state->hmat_enabled) {
+error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+   "(HMAT) is disabled, enable it with -machine hmat=on "
+   "before using any of hmat specific options");
+return;
+}
+
+if (node->initiator >= MAX_NODES) {
+error_report("The initiator id %" PRIu16 " expects an integer "
+ "between 0 and %d", node->initiator,
+ MAX_NODES - 1);
+return;
+}
+
+numa_info[nodenr].initiator = node->initiator;
+}
+
 for (cpus = node->cpus; cpus; cpus = cpus->next) {
 CpuInstanceProperties props;
 if (cpus->value >= max_cpus) {
@@ -142,28 +165,6 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
 }
 
-/*
- * If not set the initiator, set it to MAX_NODES. And if
- * HMAT is enabled and this node has no cpus, QEMU will raise error.
- */
-numa_info[nodenr].initiator = MAX_NODES;
-if (node->has_initiator) {
-if (!ms->numa_state->hmat_enabled) {
-error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
-   "(HMAT) is disabled, enable it with -machine hmat=on "
-   "before using any of hmat specific options");
-return;
-}
-
-if (node->initiator >= MAX_NODES) {
-error_report("The initiator id %" PRIu16 " expects an integer "
- "between 0 and %d", node->initiator,
- MAX_NODES - 1);
-return;
-}
-
-numa_info[nodenr].initiator = node->initiator;
-}
 numa_info[nodenr].present = true;
 max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
 ms->numa_state->num_nodes++;
-- 
2.31.1

[PATCH 1/2] numa: Report expected initiator

2021-07-07 Thread Michal Privoznik

When setting up NUMA with HMAT enabled there's a check performed
in machine_set_cpu_numa_node() that reports an error when a NUMA
node has a CPU but the node's initiator is not itself. The error
message reported contains only the expected value and not the
actual value (which is different because an error is being
reported). Report both values in the error message.

Signed-off-by: Michal Privoznik 
---
 hw/core/machine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 57c18f909a..6f59fb0b7f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -728,7 +728,8 @@ void machine_set_cpu_numa_node(MachineState *machine,
 if ((numa_info[props->node_id].initiator < MAX_NODES) &&
 (props->node_id != numa_info[props->node_id].initiator)) {
 error_setg(errp, "The initiator of CPU NUMA node %" PRId64
-" should be itself", props->node_id);
+   " should be itself (got %" PRIu16 ")",
+   props->node_id, 
numa_info[props->node_id].initiator);
 return;
 }
 numa_info[props->node_id].has_cpu = true;
-- 
2.31.1

[PATCH 0/2] numa: Parse initiator= attribute before cpus= attribute

2021-07-07 Thread Michal Privoznik

See 2/2 for explanation. The first patch is just cosmetics.

Michal Privoznik (2):
  numa: Report expected initiator
  numa: Parse initiator= attribute before cpus= attribute

 hw/core/machine.c |  3 ++-
 hw/core/numa.c| 45 +++--
 2 files changed, 25 insertions(+), 23 deletions(-)

-- 
2.31.1

Re: firmware selection for SEV-ES

2021-04-23 Thread Michal Privoznik


On 4/22/21 4:13 PM, Laszlo Ersek wrote:

On 04/21/21 13:51, Pavel Hrdina wrote:

On Wed, Apr 21, 2021 at 11:54:24AM +0200, Laszlo Ersek wrote:

Hi Brijesh, Tom,

in QEMU's "docs/interop/firmware.json", the @FirmwareFeature enumeration
has a constant called @amd-sev. We should introduce an @amd-sev-es
constant as well, minimally for the following reason:

AMD document #56421 ("SEV-ES Guest-Hypervisor Communication Block
Standardization") revision 1.40 says in "4.6 System Management Mode
(SMM)" that "SMM will not be supported in this version of the
specification". This is reflected in OVMF, so an OVMF binary that's
supposed to run in a SEV-ES guest must be built without "-D
SMM_REQUIRE". (As a consequence, such a binary should be built also
without "-D SECURE_BOOT_ENABLE".)

At the level of "docs/interop/firmware.json", this means that management
applications should be enabled to look for the @amd-sev-es feature (and
it also means, for OS distributors, that any firmware descriptor
exposing @amd-sev-es will currently have to lack all three of:
@requires-smm, @secure-boot, @enrolled-keys).

I have three questions:


(1) According to
, SEV-ES is
explicitly requested in the domain XML via setting bit#2 in the "policy"
element.

Can this setting be used by libvirt to look for such a firmware
descriptor that exposes @amd-sev-es?


Hi Laszlo and all,

Currently we use only  when selecting
firmware to make sure that it supports @amd-sev. Since we already have a
place in the VM XML where users can configure amd-sev-as we can use that
information when selecting correct firmware that should be used for the
VM.


Thanks!

Should we file a libvirtd Feature Request (where?) for recognizing the
@amd-sev-es feature flag?


Yes, we should. We can use RedHat bugzilla for that. Laszlo - do you 
want to do it yourself or shall I help you with that?


Michal

Re: [PATCH RESEND] hostmem: Don't report pmem attribute if unsupported

2021-02-16 Thread Michal Privoznik


On 2/17/21 12:07 AM, John Snow wrote:

On 2/16/21 5:23 PM, Eduardo Habkost wrote:

On Tue, Jan 26, 2021 at 08:48:25AM +0100, Michal Privoznik wrote:

When management applications (like Libvirt) want to check whether
memory-backend-file.pmem is supported they can list object
properties using 'qom-list-properties'. However, 'pmem' is
declared always (and thus reported always) and only at runtime
QEMU errors out if it was built without libpmem (and thus can not
guarantee write persistence). This is suboptimal since we have
ability to declare attributes at compile time.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1915216
Signed-off-by: Michal Privoznik 
Reviewed-by: Daniel Henrique Barboza 
Tested-by: Daniel Henrique Barboza 
Reviewed-by: Igor Mammedov 


I'm not a fan of making availability of properties conditional
(even if at compile time), but if this helps libvirt I guess it
makes sense.



Compile time might be OK, but if we want to describe everything via QAPI 
eventually, we just need to be able to describe that compile-time 
requisite appropriately.


Conditional at run-time is I think the thing that absolutely has to go 
wherever it surfaces.


I'm open for discussion. How do you think libvirt (or any other mgmt 
tool/user) should inspect whether given feature is available?
What libvirt currently does it issues 'qom-list-properties' with 
'typename=memory-backend-file' and inspects whether pmem attribute is 
available. Is 'qom-list' preferred?






CCing John, who has been thinking a lot about these questions.



Thanks for the heads up. Good reminder that libvirt uses the existence 
of properties as a bellwether for feature support. I don't think I like 
that idea, but I like breaking libvirt even less.


That was at hand solution. If libvirt's not doing it right, I'm happy to 
make things better.


Michal

Re: [PATCH RESEND] hostmem: Don't report pmem attribute if unsupported

2021-02-15 Thread Michal Privoznik


On 1/26/21 8:48 AM, Michal Privoznik wrote:

When management applications (like Libvirt) want to check whether
memory-backend-file.pmem is supported they can list object
properties using 'qom-list-properties'. However, 'pmem' is
declared always (and thus reported always) and only at runtime
QEMU errors out if it was built without libpmem (and thus can not
guarantee write persistence). This is suboptimal since we have
ability to declare attributes at compile time.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1915216
Signed-off-by: Michal Privoznik 
Reviewed-by: Daniel Henrique Barboza 
Tested-by: Daniel Henrique Barboza 
Reviewed-by: Igor Mammedov 
---

This is just a resend of a patch I've sent earlier with Reviewed-by and
Tested-by added:

https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg04558.html

  backends/hostmem-file.c | 13 -
  1 file changed, 4 insertions(+), 9 deletions(-)


Polite ping, please.

Michal

Re: [PATCH v3] machine: add missing doc for memory-backend option

2021-02-04 Thread Michal Privoznik


On 1/21/21 5:15 PM, Igor Mammedov wrote:
>

Ping, please? Is there anything I can help with to get this merged? 
Libvirt's migration is broken without this patch [1] and thus I'd like 
to have this merged sooner rather than later.


1: There's a libvirt patch required but depends on this one.

Thanks,
Michal

Re: [PATCH v3] machine: add missing doc for memory-backend option

2021-01-27 Thread Michal Privoznik


On 1/27/21 6:56 PM, Daniel P. Berrangé wrote:

On Wed, Jan 27, 2021 at 04:35:22PM +0100, Igor Mammedov wrote:

On Wed, 27 Jan 2021 15:24:26 +0100
Michal Privoznik  wrote:


On 1/27/21 11:54 AM, Daniel P. Berrangé wrote:

On Wed, Jan 27, 2021 at 10:45:11AM +, Daniel P. Berrangé wrote:

On Thu, Jan 21, 2021 at 11:15:04AM -0500, Igor Mammedov wrote:





How does a mgmt app know which machine types need to use this
option ? The machine type names are opaque strings, and apps
must not attempt to parse or interpret the version number
inside the machine type name, as they can be changed by
distros.  IOW, saying to use it for machine types 4.0 and
older isn't a valid usage strategy IMHO.

it's possible (but no necessary) to use knob with new machine types
(defaults for these match suggested property value).


IIUC, this means that setting the property has no impact on
migration ABI for new machine types > 4.0


Limiting knob usage to 4.0 and older would allow us to drop
without extra efforts once 4.0 is deprecated/removed.


...so, even if we set the property unconditionally for *all*
machine types, then we can still remove it in future, becuase
its removal won't affect ABI of the 5.x, 6.x machine types.


Alright, so after all you agree with proposed patch? I'm a bit confused.

At any rate, we have patches for nearly all available options (if not 
all), and I'd like to get things moving because both qemu and libvirt 
were already released meanwhile and our upstreams are broken. Not to 
mention users that started domain with libvirt 7.0.0 - they will be 
unable to migrate to libvirt 7.1.0+ (or whatever version we fix this), 
because libvirt 7.0.0 did not set the x-something-something attribute.


Michal

Re: [PATCH v3] machine: add missing doc for memory-backend option

2021-01-27 Thread Michal Privoznik


On 1/27/21 4:35 PM, Igor Mammedov wrote:

On Wed, 27 Jan 2021 15:24:26 +0100
Michal Privoznik  wrote:


On 1/27/21 11:54 AM, Daniel P. Berrangé wrote:

On Wed, Jan 27, 2021 at 10:45:11AM +, Daniel P. Berrangé wrote:

On Thu, Jan 21, 2021 at 11:15:04AM -0500, Igor Mammedov wrote:





How does a mgmt app know which machine types need to use this
option ? The machine type names are opaque strings, and apps
must not attempt to parse or interpret the version number
inside the machine type name, as they can be changed by
distros.  IOW, saying to use it for machine types 4.0 and
older isn't a valid usage strategy IMHO.

it's possible (but no necessary) to use knob with new machine types
(defaults for these match suggested property value).
Limiting knob usage to 4.0 and older would allow us to drop
without extra efforts once 4.0 is deprecated/removed.


Problem here is that libvirt treats machine type as an opaque string. 
Therefore, as could be seen in my patch for libvirt, the property is 
disabled for all started VMs, regardless of machine type:


https://www.redhat.com/archives/libvir-list/2021-January/msg00686.html

So it can't really go away ever, can it?




Looking at the libvirt patch, we do indeed use his property
unconditionally for all machine types, precisely because parsing
version numbers from the machine type is not allowed.

https://www.redhat.com/archives/libvir-list/2021-January/msg00633.html

So this doc is telling apps to do something that isn't viable


The other approach that I was suggesting was, that QEMU stops reporting
'default-ram-id' for affected machine types. The way the switch from '-m
XMB' to memory-backend-* was implemented in libvirt is that if libvirt
sees 'default-ram-id' attribute for given machine type it uses
memory-backend-* otherwise it falls back to -m.

Since we know which machine types are "broken", we can stop reporting
the attribute and thus stop tickling this bug. I agree that it puts more
burden on distro maintainers to backport the change, but I think it's
acceptable risk.


default-ram-id is already exposed in wild including old machine types
starting from 5.2


It is, but according to qapi/machine.json it is optional. Mgmt apps have 
to be able to deal with it missing.




if libvirt will take care this one quirk, then I guess we can
do as suggested. I can post an additional patch to this effect if there
is agreement to go this route.


The beauty of this solution is that libvirt wouldn't need to do anything 
:-)  As I said earlier, if no default-ram-id is found then libvirt falls 
back to '-m X'.


I've cooked a dirty patch that works in my testing:

diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index ae0c4a..2214782d72 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -238,8 +238,33 @@ MachineInfoList *qmp_query_machines(Error **errp)
 info->has_default_cpu_type = true;
 }
 if (mc->default_ram_id) {
-info->default_ram_id = g_strdup(mc->default_ram_id);
-info->has_default_ram_id = true;
+int i;
+bool broken = false;
+
+/* Default RAM ID is broken if 
x-use-canonical-path-for-ramblock-id
+ * property of memory-backend is on. That's why it's 
disabled in
+ * create_default_memdev(). However, some machine types 
turn it on

+ * for backwards compatibility. */
+for (i = 0; i < mc->compat_props->len; i++) {
+GlobalProperty *p = g_ptr_array_index(mc->compat_props, i);
+
+if (strcmp(p->driver, TYPE_MEMORY_BACKEND_FILE) != 0)
+continue;
+
+if (strcmp(p->property, 
"x-use-canonical-path-for-ramblock-id") != 0)

+continue;
+
+if (strcmp(p->value, "true") != 0)
+continue;
+
+broken = true;
+break;
+}
+
+if (!broken) {
+info->default_ram_id = g_strdup(mc->default_ram_id);
+info->has_default_ram_id = true;
+}
 }

 QAPI_LIST_PREPEND(mach_list, info);


Michal

Re: [PATCH v3] machine: add missing doc for memory-backend option

2021-01-27 Thread Michal Privoznik


On 1/27/21 11:54 AM, Daniel P. Berrangé wrote:

On Wed, Jan 27, 2021 at 10:45:11AM +, Daniel P. Berrangé wrote:

On Thu, Jan 21, 2021 at 11:15:04AM -0500, Igor Mammedov wrote:





How does a mgmt app know which machine types need to use this
option ? The machine type names are opaque strings, and apps
must not attempt to parse or interpret the version number
inside the machine type name, as they can be changed by
distros.  IOW, saying to use it for machine types 4.0 and
older isn't a valid usage strategy IMHO.


Looking at the libvirt patch, we do indeed use his property
unconditionally for all machine types, precisely because parsing
version numbers from the machine type is not allowed.

https://www.redhat.com/archives/libvir-list/2021-January/msg00633.html

So this doc is telling apps to do something that isn't viable


The other approach that I was suggesting was, that QEMU stops reporting 
'default-ram-id' for affected machine types. The way the switch from '-m 
XMB' to memory-backend-* was implemented in libvirt is that if libvirt 
sees 'default-ram-id' attribute for given machine type it uses 
memory-backend-* otherwise it falls back to -m.


Since we know which machine types are "broken", we can stop reporting 
the attribute and thus stop tickling this bug. I agree that it puts more 
burden on distro maintainers to backport the change, but I think it's 
acceptable risk.


Michal

Re: [PATCH v3] machine: add missing doc for memory-backend option

2021-01-27 Thread Michal Privoznik


On 1/21/21 5:15 PM, Igor Mammedov wrote:

Add documentation for '-machine memory-backend' CLI option and
how to use it.

And document that x-use-canonical-path-for-ramblock-id,
is considered to be stable to make sure it won't go away by accident.

x- was intended for unstable/iternal properties, and not supposed to
be stable option. However it's too late to rename (drop x-)
it as it would mean that users will have to mantain both
x-use-canonical-path-for-ramblock-id (for QEMU 5.0-5.2) versions
and prefix-less for later versions.

Signed-off-by: Igor Mammedov 
---
v2:
  - add doc that x-use-canonical-path-for-ramblock-id is considered stable,
  (Peter Krempa )
v3:
  - 
s/x-use-canonical-path-for-ramblock-id=on/x-use-canonical-path-for-ramblock-id=off/
  (Michal Privoznik )
  - add to commit message why x- prefix is preserved
  - drop clause about x-use-canonical-path-for-ramblock-id being stable
from help section, but keep it in code comment above
x-use-canonical-path-for-ramblock-id property. It's sufficient
to prevent option being changed/removed by accident.
  (Peter Maydell )
---
  backends/hostmem.c | 10 ++
  qemu-options.hx| 26 +-
  2 files changed, 35 insertions(+), 1 deletion(-)


Reviewed-by: Michal Privoznik 

Michal

[PATCH RESEND] hostmem: Don't report pmem attribute if unsupported

2021-01-25 Thread Michal Privoznik

When management applications (like Libvirt) want to check whether
memory-backend-file.pmem is supported they can list object
properties using 'qom-list-properties'. However, 'pmem' is
declared always (and thus reported always) and only at runtime
QEMU errors out if it was built without libpmem (and thus can not
guarantee write persistence). This is suboptimal since we have
ability to declare attributes at compile time.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1915216
Signed-off-by: Michal Privoznik 
Reviewed-by: Daniel Henrique Barboza 
Tested-by: Daniel Henrique Barboza 
Reviewed-by: Igor Mammedov 
---

This is just a resend of a patch I've sent earlier with Reviewed-by and
Tested-by added:

https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg04558.html

 backends/hostmem-file.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 40e1e5b3e3..7e30eb5985 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -123,6 +123,7 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 fb->align = val;
 }
 
+#ifdef CONFIG_LIBPMEM
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
 return MEMORY_BACKEND_FILE(o)->is_pmem;
@@ -139,17 +140,9 @@ static void file_memory_backend_set_pmem(Object *o, bool 
value, Error **errp)
 return;
 }
 
-#ifndef CONFIG_LIBPMEM
-if (value) {
-error_setg(errp, "Lack of libpmem support while setting the 'pmem=on'"
-   " of %s. We can't ensure data persistence.",
-   object_get_typename(o));
-return;
-}
-#endif
-
 fb->is_pmem = value;
 }
+#endif /* CONFIG_LIBPMEM */
 
 static void file_backend_unparent(Object *obj)
 {
@@ -180,8 +173,10 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL);
+#ifdef CONFIG_LIBPMEM
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem);
+#endif
 }
 
 static void file_backend_instance_finalize(Object *o)
-- 
2.26.2

[PATCH] hostmem: Don't report pmem attribute if unsupported

2021-01-19 Thread Michal Privoznik

When management applications (like Libvirt) want to check whether
memory-backend-file.pmem is supported they can list object
properties using 'qom-list-properties'. However, 'pmem' is
declared always (and thus reported always) and only at runtime
QEMU errors out if it was built without libpmem (and thus can not
guarantee write persistence). This is suboptimal since we have
ability to declare attributes at compile time.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1915216
Signed-off-by: Michal Privoznik 
---
 backends/hostmem-file.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 40e1e5b3e3..7e30eb5985 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -123,6 +123,7 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 fb->align = val;
 }
 
+#ifdef CONFIG_LIBPMEM
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
 return MEMORY_BACKEND_FILE(o)->is_pmem;
@@ -139,17 +140,9 @@ static void file_memory_backend_set_pmem(Object *o, bool 
value, Error **errp)
 return;
 }
 
-#ifndef CONFIG_LIBPMEM
-if (value) {
-error_setg(errp, "Lack of libpmem support while setting the 'pmem=on'"
-   " of %s. We can't ensure data persistence.",
-   object_get_typename(o));
-return;
-}
-#endif
-
 fb->is_pmem = value;
 }
+#endif /* CONFIG_LIBPMEM */
 
 static void file_backend_unparent(Object *obj)
 {
@@ -180,8 +173,10 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL);
+#ifdef CONFIG_LIBPMEM
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem);
+#endif
 }
 
 static void file_backend_instance_finalize(Object *o)
-- 
2.26.2

Re: [PATCH v2] machine: add missing doc for memory-backend option

2021-01-15 Thread Michal Privoznik


On 1/15/21 12:46 AM, Igor Mammedov wrote:

Add documentation for '-machine memory-backend' CLI option and
how to use it.

And document that x-use-canonical-path-for-ramblock-id,
is considered to be stable to make sure it won't go away by accident.

Signed-off-by: Igor Mammedov 
---
v2:
  - add doc that x-use-canonical-path-for-ramblock-id is considered stable,
(Peter Krempa )
---
  backends/hostmem.c | 10 ++
  qemu-options.hx| 28 +++-
  2 files changed, 37 insertions(+), 1 deletion(-)





@@ -96,6 +97,31 @@ SRST
  ``hmat=on|off``
  Enables or disables ACPI Heterogeneous Memory Attribute Table
  (HMAT) support. The default is off.
+
+ ``memory-backend='id'``
+An alternative to legacy ``-mem-path`` and ``mem-prealloc`` options.
+Allows to use a memory backend as main RAM.
+
+For example:
+::
+-object 
memory-backend-file,id=pc.ram,size=512M,mem-path=/hugetlbfs,prealloc=on,share=on
+-machine memory-backend=pc.ram
+-m 512M
+
+Migration compatibility note:
+a) as backend id one shall use value of 'default-ram-id', advertised by
+machine type (available via ``query-machines`` QMP command)
+b) for machine types 4.0 and older, user shall
+use ``x-use-canonical-path-for-ramblock-id=on`` backend option
+(this option must be considered stable, as if it didn't have the 'x-'
+prefix including deprecation period, as long as 4.0 and older machine
+types exists),
+if migration to/from old QEMU (<5.0) is expected.
+For example:
+::
+-object 
memory-backend-ram,id=pc.ram,size=512M,x-use-canonical-path-for-ramblock-id=on
+-machine memory-backend=pc.ram
+-m 512M


Igor, this doesn't correspond with your comment in bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=1836043#c31

In fact, we had to turn the attribute OFF so that canonical path is not 
used. Isn't ON the default state anyway?


Michal

Re: [PATCH] machine: add missing doc for memory-backend option

2021-01-11 Thread Michal Privoznik


On 1/11/21 11:27 PM, Igor Mammedov wrote:

Add documentation for '-machine memory-backend' CLI option and
how to use it.

PS:
While at it add a comment to x-use-canonical-path-for-ramblock-id,
to make sure it won't go away by accident.

Signed-off-by: Igor Mammedov 
---
  backends/hostmem.c |  8 
  qemu-options.hx| 25 -
  2 files changed, 32 insertions(+), 1 deletion(-)


Reviewed-by: Michal Privoznik 

Michal

Re: [PATCH 0/2] qemu-ga: add ssh-{add,remove}-authorized-keys

2020-10-14 Thread Michal Privoznik


On 10/13/20 10:25 PM, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

Hi,

Add two new commands to help modify ~/.ssh/authorized_keys.


Apart from Philippe's comments, this path is configurable in 
sshd.config. But I doubt we should care as ssh-copy-id doesn't care.




Although it's possible already to modify the authorized_keys files via
file-{read,write} or exec, the commands are often denied by default, and the
logic is left to the client. Let's add specific commands for this job.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1885332

Marc-André Lureau (2):
   glib-compat: add g_unix_get_passwd_entry_qemu()
   qga: add ssh-{add,remove}-authorized-keys

  include/glib-compat.h|  24 +++
  qga/commands-posix-ssh.c | 394 +++
  qga/commands-win32.c |  10 +
  qga/meson.build  |  17 +-
  qga/qapi-schema.json |  32 
  5 files changed, 476 insertions(+), 1 deletion(-)
  create mode 100644 qga/commands-posix-ssh.c



Reviewed-by: Michal Privoznik 

Michal

Re: [PULL v8 00/86] Misc QEMU patches for 2020-09-24

2020-10-07 Thread Michal Privoznik


On 10/2/20 8:24 PM, Eduardo Habkost wrote:

On Fri, Oct 02, 2020 at 06:27:35PM +0200, Paolo Bonzini wrote:

On 02/10/20 17:58, Michal Prívozník wrote:




cd442a45db60a1a72fcf980c24bd1227f13f8a87 is the first bad commit

Sorry for noticing this earlier, but is this known? The build starts
failing for me after this commit:

/usr/bin/sphinx-build -Dversion=5.1.50 -Drelease= -W
-Ddepfile=docs/devel.d -Ddepfile_stamp=docs/devel.stamp -b html -d
/home/zippy/work/qemu/qemu.git/build/docs/devel.p
/home/zippy/work/qemu/qemu.git/docs/devel
/home/zippy/work/qemu/qemu.git/build/docs/devel
Running Sphinx v3.2.1
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 20 source files that are out of date
updating environment: [new config] 20 added, 0 changed, 0 removed
reading sources... [100%] testing


No, this is new.  It works with older versions of Sphinx (I have 2.2.2
despite being on Fedora 32 which is pretty recent).

For now Sphinx 3 is not supported by kerneldoc, we probably should apply
a patch like

https://www.spinics.net/lists/linux-doc/msg83277.html


We already have Sphinx 3.x hacks inside our fork of kernel-doc,
see commit 152d1967f650 ("kernel-doc: Use c:struct for Sphinx 3.0
and later").

If we want to keep deviating from upstream kernel-doc, the
following patch seems to work.  Do we want to?

Signed-off-by: Eduardo Habkost 
---
diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index 40ad782e342..03b49380426 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -838,6 +838,13 @@ sub output_function_rst(%) {
$lineprefix = "";
output_highlight_rst($args{'purpose'});
$start = "\n\n**Syntax**\n\n  ``";
+} elsif ($args{'functiontype'} eq "") {
+   # this is a macro, Sphinx 3.x requires c:macro::
+   if ((split(/\./, $sphinx_version))[0] >= 3) {
+   print ".. c:macro:: ";
+   } else {
+   print ".. c:function:: ";
+   }
  } else {
print ".. c:function:: ";
  }


Paolo





I can confirm that this fixes the build for me.

Michal

Re: [PATCH v2] cphp: remove deprecated cpu-add command(s)

2020-09-14 Thread Michal Privoznik


On 9/14/20 9:46 AM, Igor Mammedov wrote:

theses were deprecated since 4.0, remove both HMP and QMP variants.

Users should use device_add command instead. To get list of
possible CPUs and options, use 'info hotpluggable-cpus' HMP
or query-hotpluggable-cpus QMP command.

Signed-off-by: Igor Mammedov 
Reviewed-by: Thomas Huth 
Acked-by: Dr. David Alan Gilbert 
---
  include/hw/boards.h |   1 -
  include/hw/i386/pc.h|   1 -
  include/monitor/hmp.h   |   1 -
  docs/system/deprecated.rst  |  25 +
  hmp-commands.hx |  15 --
  hw/core/machine-hmp-cmds.c  |  12 -
  hw/core/machine-qmp-cmds.c  |  12 -
  hw/i386/pc.c|  27 --
  hw/i386/pc_piix.c   |   1 -
  hw/s390x/s390-virtio-ccw.c  |  12 -
  qapi/machine.json   |  24 -
  tests/qtest/cpu-plug-test.c | 100 
  tests/qtest/test-hmp.c  |   1 -
  13 files changed, 21 insertions(+), 211 deletions(-)


Thanks to Peter Libvirt uses device_add instead cpu_add whenever 
possible. Hence this is okay from Libvirt's POV.


Reviewed-by: Michal Privoznik 

Michal

Re: [PATCH v2 0/2] qmp: Expose MachineClass::default_ram_id

2020-07-28 Thread Michal Privoznik


On 7/27/20 8:41 PM, Eduardo Habkost wrote:

Hi Michal,

It looks like this has fallen through the cracks, my apologies.

I'm queueing this for 5.2.  I assume this is the latest version,
correct?



Yes. No worries, I forgot about it too :-) If I remembered, I would have 
pinged.


Michal

Re: [PATCH v3 0/2] qga: Ditch g_get_host_name()

2020-07-10 Thread Michal Privoznik


On 6/22/20 8:19 PM, Michal Privoznik wrote:

v3 of:

https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg06913.html

diff to v2:
- don't leak @hostname in util/oslib-posix.c:qemu_get_host_name()
- document why we are allocating one byte more than needed
- switch to g_new0() from g_malloc0().

Michal Privoznik (2):
   util: Introduce qemu_get_host_name()
   qga: Use qemu_get_host_name() instead of g_get_host_name()

  include/qemu/osdep.h | 10 ++
  qga/commands.c   | 17 +
  util/oslib-posix.c   | 35 +++
  util/oslib-win32.c   | 13 +
  4 files changed, 71 insertions(+), 4 deletions(-)



Ping? How can I get these merged please?

Michal

[PATCH v3 0/2] qga: Ditch g_get_host_name()

2020-06-22 Thread Michal Privoznik

v3 of:

https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg06913.html

diff to v2:
- don't leak @hostname in util/oslib-posix.c:qemu_get_host_name()
- document why we are allocating one byte more than needed
- switch to g_new0() from g_malloc0().

Michal Privoznik (2):
  util: Introduce qemu_get_host_name()
  qga: Use qemu_get_host_name() instead of g_get_host_name()

 include/qemu/osdep.h | 10 ++
 qga/commands.c   | 17 +
 util/oslib-posix.c   | 35 +++
 util/oslib-win32.c   | 13 +
 4 files changed, 71 insertions(+), 4 deletions(-)

-- 
2.26.2

[PATCH v3 2/2] qga: Use qemu_get_host_name() instead of g_get_host_name()

2020-06-22 Thread Michal Privoznik

Problem with g_get_host_name() is that on the first call it saves
the hostname into a global variable and from then on, every
subsequent call returns the saved hostname. Even if the hostname
changes. This doesn't play nicely with guest agent, because if
the hostname is acquired before the guest is set up (e.g. on the
first boot, or before DHCP) we will report old, invalid hostname.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1845127

Signed-off-by: Michal Privoznik 
---
 qga/commands.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/qga/commands.c b/qga/commands.c
index efc8b90281..d3fec807c1 100644
--- a/qga/commands.c
+++ b/qga/commands.c
@@ -515,11 +515,20 @@ int ga_parse_whence(GuestFileWhence *whence, Error **errp)
 GuestHostName *qmp_guest_get_host_name(Error **errp)
 {
 GuestHostName *result = NULL;
-gchar const *hostname = g_get_host_name();
-if (hostname != NULL) {
-result = g_new0(GuestHostName, 1);
-result->host_name = g_strdup(hostname);
+g_autofree char *hostname = qemu_get_host_name(errp);
+
+/*
+ * We want to avoid using g_get_host_name() because that
+ * caches the result and we wouldn't reflect changes in the
+ * host name.
+ */
+
+if (!hostname) {
+hostname = g_strdup("localhost");
 }
+
+result = g_new0(GuestHostName, 1);
+result->host_name = g_steal_pointer();
 return result;
 }
 
-- 
2.26.2

[PATCH v3 1/2] util: Introduce qemu_get_host_name()

2020-06-22 Thread Michal Privoznik

This function offers operating system agnostic way to fetch host
name. It is implemented for both POSIX-like and Windows systems.

Signed-off-by: Michal Privoznik 
---
 include/qemu/osdep.h | 10 ++
 util/oslib-posix.c   | 35 +++
 util/oslib-win32.c   | 13 +
 3 files changed, 58 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index ff7c17b857..a795d46b28 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -607,4 +607,14 @@ static inline void qemu_reset_optind(void)
 #endif
 }
 
+/**
+ * qemu_get_host_name:
+ * @errp: Error object
+ *
+ * Operating system agnostic way of querying host name.
+ *
+ * Returns allocated hostname (caller should free), NULL on failure.
+ */
+char *qemu_get_host_name(Error **errp);
+
 #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 916f1be224..3997ee0442 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -761,3 +761,38 @@ void sigaction_invoke(struct sigaction *action,
 }
 action->sa_sigaction(info->ssi_signo, , NULL);
 }
+
+#ifndef HOST_NAME_MAX
+# ifdef _POSIX_HOST_NAME_MAX
+#  define HOST_NAME_MAX _POSIX_HOST_NAME_MAX
+# else
+#  define HOST_NAME_MAX 255
+# endif
+#endif
+
+char *qemu_get_host_name(Error **errp)
+{
+long len = -1;
+g_autofree char *hostname = NULL;
+
+#ifdef _SC_HOST_NAME_MAX
+len = sysconf(_SC_HOST_NAME_MAX);
+#endif /* _SC_HOST_NAME_MAX */
+
+if (len < 0) {
+len = HOST_NAME_MAX;
+}
+
+/* Unfortunately, gethostname() below does not guarantee a
+ * NULL terminated string. Therefore, allocate one byte more
+ * to be sure. */
+hostname = g_new0(char, len + 1);
+
+if (gethostname(hostname, len) < 0) {
+error_setg_errno(errp, errno,
+ "cannot get hostname");
+return NULL;
+}
+
+return g_steal_pointer();
+}
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index e9b14ab178..3b49d27297 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -808,3 +808,16 @@ bool qemu_write_pidfile(const char *filename, Error **errp)
 }
 return true;
 }
+
+char *qemu_get_host_name(Error **errp)
+{
+wchar_t tmp[MAX_COMPUTERNAME_LENGTH + 1];
+DWORD size = G_N_ELEMENTS(tmp);
+
+if (GetComputerNameW(tmp, ) == 0) {
+error_setg_win32(errp, GetLastError(), "failed close handle");
+return NULL;
+}
+
+return g_utf16_to_utf8(tmp, size, NULL, NULL, NULL);
+}
-- 
2.26.2

Re: [PATCH v2 1/2] util: Introduce qemu_get_host_name()

2020-06-22 Thread Michal Privoznik


On 6/22/20 7:38 PM, Daniel P. Berrangé wrote:

On Mon, Jun 22, 2020 at 07:26:44PM +0200, Michal Privoznik wrote:

This function offers operating system agnostic way to fetch host
name. It is implemented for both POSIX-like and Windows systems.

Signed-off-by: Michal Privoznik 
---
  include/qemu/osdep.h | 10 ++
  util/oslib-posix.c   | 32 
  util/oslib-win32.c   | 13 +
  3 files changed, 55 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index ff7c17b857..a795d46b28 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -607,4 +607,14 @@ static inline void qemu_reset_optind(void)
  #endif
  }
  
+/**

+ * qemu_get_host_name:
+ * @errp: Error object
+ *
+ * Operating system agnostic way of querying host name.
+ *
+ * Returns allocated hostname (caller should free), NULL on failure.
+ */
+char *qemu_get_host_name(Error **errp);
+
  #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 916f1be224..865a3d71a7 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -761,3 +761,35 @@ void sigaction_invoke(struct sigaction *action,
  }
  action->sa_sigaction(info->ssi_signo, , NULL);
  }
+
+#ifndef HOST_NAME_MAX
+# ifdef _POSIX_HOST_NAME_MAX
+#  define HOST_NAME_MAX _POSIX_HOST_NAME_MAX
+# else
+#  define HOST_NAME_MAX 255
+# endif
+#endif
+
+char *qemu_get_host_name(Error **errp)
+{
+long len = -1;
+char *hostname;
+
+#ifdef _SC_HOST_NAME_MAX
+len = sysconf(_SC_HOST_NAME_MAX);
+#endif /* _SC_HOST_NAME_MAX */
+
+if (len < 0) {
+len = HOST_NAME_MAX;
+}
+
+hostname = g_malloc0(len + 1);


Nitpick, generally qemu prefers g_new0


+
+if (gethostname(hostname, len) < 0) {
+error_setg_errno(errp, errno,
+ "cannot get hostname");
+return NULL;
+}


According to my man page, it is undefined by POSIX whether there's a
trailing NUL when  hostname exceeds the buffer, so the paranoid thing
todo is to add

   hostname[len] = '\0';


Isn't this guaranteed by allocating len + 1 bytes? I mean, g_malloc0() 
and g_new0() will memset() the memory to zero. And since I tell 
gethostname() the buf is only len bytes long I am guaranteed to have 0 
at the end of it, aren't I? Maybe I should put a comment just before 
g_malloc0() or g_new0() that documents this thought.


Michal

[PATCH v2 2/2] qga: Use qemu_get_host_name() instead of g_get_host_name()

2020-06-22 Thread Michal Privoznik

Problem with g_get_host_name() is that on the first call it saves
the hostname into a global variable and from then on, every
subsequent call returns the saved hostname. Even if the hostname
changes. This doesn't play nicely with guest agent, because if
the hostname is acquired before the guest is set up (e.g. on the
first boot, or before DHCP) we will report old, invalid hostname.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1845127

Signed-off-by: Michal Privoznik 
---
 qga/commands.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/qga/commands.c b/qga/commands.c
index efc8b90281..d3fec807c1 100644
--- a/qga/commands.c
+++ b/qga/commands.c
@@ -515,11 +515,20 @@ int ga_parse_whence(GuestFileWhence *whence, Error **errp)
 GuestHostName *qmp_guest_get_host_name(Error **errp)
 {
 GuestHostName *result = NULL;
-gchar const *hostname = g_get_host_name();
-if (hostname != NULL) {
-result = g_new0(GuestHostName, 1);
-result->host_name = g_strdup(hostname);
+g_autofree char *hostname = qemu_get_host_name(errp);
+
+/*
+ * We want to avoid using g_get_host_name() because that
+ * caches the result and we wouldn't reflect changes in the
+ * host name.
+ */
+
+if (!hostname) {
+hostname = g_strdup("localhost");
 }
+
+result = g_new0(GuestHostName, 1);
+result->host_name = g_steal_pointer();
 return result;
 }
 
-- 
2.26.2

[PATCH v2 1/2] util: Introduce qemu_get_host_name()

2020-06-22 Thread Michal Privoznik

This function offers operating system agnostic way to fetch host
name. It is implemented for both POSIX-like and Windows systems.

Signed-off-by: Michal Privoznik 
---
 include/qemu/osdep.h | 10 ++
 util/oslib-posix.c   | 32 
 util/oslib-win32.c   | 13 +
 3 files changed, 55 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index ff7c17b857..a795d46b28 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -607,4 +607,14 @@ static inline void qemu_reset_optind(void)
 #endif
 }
 
+/**
+ * qemu_get_host_name:
+ * @errp: Error object
+ *
+ * Operating system agnostic way of querying host name.
+ *
+ * Returns allocated hostname (caller should free), NULL on failure.
+ */
+char *qemu_get_host_name(Error **errp);
+
 #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 916f1be224..865a3d71a7 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -761,3 +761,35 @@ void sigaction_invoke(struct sigaction *action,
 }
 action->sa_sigaction(info->ssi_signo, , NULL);
 }
+
+#ifndef HOST_NAME_MAX
+# ifdef _POSIX_HOST_NAME_MAX
+#  define HOST_NAME_MAX _POSIX_HOST_NAME_MAX
+# else
+#  define HOST_NAME_MAX 255
+# endif
+#endif
+
+char *qemu_get_host_name(Error **errp)
+{
+long len = -1;
+char *hostname;
+
+#ifdef _SC_HOST_NAME_MAX
+len = sysconf(_SC_HOST_NAME_MAX);
+#endif /* _SC_HOST_NAME_MAX */
+
+if (len < 0) {
+len = HOST_NAME_MAX;
+}
+
+hostname = g_malloc0(len + 1);
+
+if (gethostname(hostname, len) < 0) {
+error_setg_errno(errp, errno,
+ "cannot get hostname");
+return NULL;
+}
+
+return hostname;
+}
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index e9b14ab178..3b49d27297 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -808,3 +808,16 @@ bool qemu_write_pidfile(const char *filename, Error **errp)
 }
 return true;
 }
+
+char *qemu_get_host_name(Error **errp)
+{
+wchar_t tmp[MAX_COMPUTERNAME_LENGTH + 1];
+DWORD size = G_N_ELEMENTS(tmp);
+
+if (GetComputerNameW(tmp, ) == 0) {
+error_setg_win32(errp, GetLastError(), "failed close handle");
+return NULL;
+}
+
+return g_utf16_to_utf8(tmp, size, NULL, NULL, NULL);
+}
-- 
2.26.2

[PATCH v2 0/2] qga: Ditch g_get_host_name()

2020-06-22 Thread Michal Privoznik

v2 of:

https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg04457.html

diff to v1:
- Move implementation out from qga/ to util/oslib-*

Michal Privoznik (2):
  util: Introduce qemu_get_host_name()
  qga: Use qemu_get_host_name() instead of g_get_host_name()

 include/qemu/osdep.h | 10 ++
 qga/commands.c   | 17 +
 util/oslib-posix.c   | 32 
 util/oslib-win32.c   | 13 +
 4 files changed, 68 insertions(+), 4 deletions(-)

-- 
2.26.2

Re: [PATCH] qga: Use gethostname() instead of g_get_host_name()

2020-06-22 Thread Michal Privoznik


On 6/22/20 12:14 PM, Philippe Mathieu-Daudé wrote:

Hi Michal,

On 6/16/20 10:34 AM, Michal Privoznik wrote:

Problem with g_get_host_name() is that on the first call it saves
the hostname into a global variable and from then on, every
subsequent call returns the saved hostname. Even if the hostname
changes. This doesn't play nicely with guest agent, because if
the hostname is acquired before the guest is set up (e.g. on the
first boot, or before DHCP) we will report old, invalid hostname.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1845127

Signed-off-by: Michal Privoznik 
---
  qga/commands.c | 56 ++
  1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/qga/commands.c b/qga/commands.c
index efc8b90281..ce3c2041a6 100644
--- a/qga/commands.c
+++ b/qga/commands.c
@@ -512,14 +512,62 @@ int ga_parse_whence(GuestFileWhence *whence, Error **errp)
  return -1;
  }
  
+#ifndef HOST_NAME_MAX

+# ifdef _POSIX_HOST_NAME_MAX
+#  define HOST_NAME_MAX _POSIX_HOST_NAME_MAX
+# else
+#  define HOST_NAME_MAX 255
+# endif
+#endif
+
  GuestHostName *qmp_guest_get_host_name(Error **errp)
  {
  GuestHostName *result = NULL;
-gchar const *hostname = g_get_host_name();
-if (hostname != NULL) {
-result = g_new0(GuestHostName, 1);
-result->host_name = g_strdup(hostname);
+g_autofree char *hostname = NULL;
+
+/*
+ * We want to avoid using g_get_host_name() because that
+ * caches the result and we wouldn't reflect changes in the
+ * host name.
+ */


I see there is only one g_get_host_name() call in the
codebase, but can we have a generic qemu_get_host_name()
helper implemented in util/oslib-*c instead?


Sure. Let me post a v2 so that I can include Richard's suggestion too.

Michal

Re: [PATCH] qga: Use gethostname() instead of g_get_host_name()

2020-06-22 Thread Michal Privoznik


On 6/19/20 11:54 PM, Richard Henderson wrote:

On 6/16/20 1:34 AM, Michal Privoznik wrote:

+#ifndef G_OS_WIN32


Nit: positive tests are easier to reason with and extend than negative tests.
I would reverse these two blocks and use a positive test for windows.

Also, CONFIG_WIN32 is what we use elsewhere for this test.


r~



Fair enough. Do you want me to send v2 or is it something that committer 
can fix before merging?


Michal

[PATCH] qga: Use gethostname() instead of g_get_host_name()

2020-06-16 Thread Michal Privoznik

Problem with g_get_host_name() is that on the first call it saves
the hostname into a global variable and from then on, every
subsequent call returns the saved hostname. Even if the hostname
changes. This doesn't play nicely with guest agent, because if
the hostname is acquired before the guest is set up (e.g. on the
first boot, or before DHCP) we will report old, invalid hostname.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1845127

Signed-off-by: Michal Privoznik 
---
 qga/commands.c | 56 ++
 1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/qga/commands.c b/qga/commands.c
index efc8b90281..ce3c2041a6 100644
--- a/qga/commands.c
+++ b/qga/commands.c
@@ -512,14 +512,62 @@ int ga_parse_whence(GuestFileWhence *whence, Error **errp)
 return -1;
 }
 
+#ifndef HOST_NAME_MAX
+# ifdef _POSIX_HOST_NAME_MAX
+#  define HOST_NAME_MAX _POSIX_HOST_NAME_MAX
+# else
+#  define HOST_NAME_MAX 255
+# endif
+#endif
+
 GuestHostName *qmp_guest_get_host_name(Error **errp)
 {
 GuestHostName *result = NULL;
-gchar const *hostname = g_get_host_name();
-if (hostname != NULL) {
-result = g_new0(GuestHostName, 1);
-result->host_name = g_strdup(hostname);
+g_autofree char *hostname = NULL;
+
+/*
+ * We want to avoid using g_get_host_name() because that
+ * caches the result and we wouldn't reflect changes in the
+ * host name.
+ */
+
+#ifndef G_OS_WIN32
+long len = -1;
+
+#ifdef _SC_HOST_NAME_MAX
+len = sysconf(_SC_HOST_NAME_MAX);
+#endif /* _SC_HOST_NAME_MAX */
+
+if (len < 0) {
+len = HOST_NAME_MAX;
 }
+
+hostname = g_malloc0(len + 1);
+
+if (gethostname(hostname, len) < 0) {
+return NULL;
+}
+
+#else /* G_OS_WIN32 */
+
+wchar_t tmp[MAX_COMPUTERNAME_LENGTH + 1];
+DWORD size = G_N_ELEMENTS(tmp);
+
+if (GetComputerNameW(tmp, ) != 0) {
+/*
+ * Indeed, on Windows retval of zero means failure
+ * and nonzero means success.
+ */
+hostname = g_utf16_to_utf8(tmp, size, NULL, NULL, NULL);
+}
+#endif /* G_OS_WIN32 */
+
+if (!hostname) {
+hostname = g_strdup("localhost");
+}
+
+result = g_new0(GuestHostName, 1);
+result->host_name = g_steal_pointer();
 return result;
 }
 
-- 
2.26.2

Re: [PATCH v2 1/2] qemu-options.hx: Mark all hmat-cache attributes required

2020-06-16 Thread Michal Privoznik


On 6/15/20 10:00 AM, Markus Armbruster wrote:

Cc: the people involved in commit c412a48d4d "numa: Extend CLI to
provide memory side cache information".

Michal Privoznik  writes:


The documentation to `-numa hmat-cache` says that @node-id, @size
and @level are the only required attributes. The rest
(@associativity, @policy and @line) is optional. Well, not quite
- if I try to start QEMU with only the three required attributes
defined the QAPI code is complaining about associativity missing.


Only because @associativity visited first.


According to QAPI all attributes are required. Make the docs
reflect that.


Correct.


Signed-off-by: Michal Privoznik 
---
  qemu-options.hx | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)





Assuming non-optional is what we want:
Reviewed-by: Markus Armbruster 



Indeed, it is:

https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg08411.html

Thanks,
Michal

Re: [PATCH v2 2/2] qemu-options.hx: Document hmat-lb and hmat-cache order

2020-06-16 Thread Michal Privoznik


On 6/15/20 10:02 AM, Markus Armbruster wrote:

Michal Privoznik  writes:


To simplify internal implementation the hmat-cache parsing code
expects hmat-lb to be already parsed. This means, that hmat-lb
arguments must come before hmat-cache. Document this restriction
so that management applications can follow it.

Signed-off-by: Michal Privoznik 
---
  qemu-options.hx | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index b1a399079a..3fe9e6d6a0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -319,6 +319,9 @@ SRST
  'none/direct(direct-mapped)/complex(complex cache indexing)'. policy
  is the write policy. line is the cache Line size in bytes.
  
+Please note, that due to internal implementation, '\ ``hmat-cache``\ '

+must be configured only after '\ ``hmat-lb``\ ' option.
+
  For example, the following options describe 2 NUMA nodes. Node 0 has
  2 cpus and a ram, node 1 has only a ram. The processors in node 0
  access memory in node 0 with access-latency 5 nanoseconds,


What happens when we do it wrong?



We error out.

https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg08409.html


Can we catch doing it wrong somehow?  I figure that would be much better
than merely documenting it.



Sure, but that would require some more code (according to Igor's e-mail 
and discussion linked from the linked document). And frankly, to libvirt 
it doesn't matter. For everybody else, let's document it at least and if 
somebody ever writes the missing piece we can remove the restriction 
from the docs.


Michal

Re: [PATCH 3/3] numa: Initialize node initiator with respect to .has_cpu

2020-06-11 Thread Michal Privoznik


On 6/5/20 3:52 AM, Tao Xu wrote:

On 6/3/20 5:16 PM, Michal Privoznik wrote:

On 6/2/20 10:00 AM, Tao Xu wrote:


On 6/1/2020 4:10 PM, Michal Privoznik wrote:

On 5/29/20 5:09 PM, Igor Mammedov wrote:

On Fri, 29 May 2020 15:33:48 +0200
Michal Privoznik  wrote:


The initiator attribute of a NUMA node is documented as the 'NUMA
node that has best performance to given NUMA node'. If a NUMA
node has at least one CPU there can hardly be a different node
with better performace and thus all NUMA nodes which have a CPU
are initiators to themselves. Reflect this fact when initializing
the attribute.


It is not true in case of the node is memory-less


Are you saying that if there's a memory-less NUMA node, then it 
needs to

have initiator set too? Asking mostly out of curiosity because we don't
allow memory-less NUMA nodes in Libvirt just yet. Nor cpu-less, but my
patches that I'm referring to in cover letter will allow at least
cpu-less nodes. Should I allow both?

QEMU now is not support memory-less NUMA node, but in hardware may be
supported. So we reserve this type of NUMA node for future usage. And
QEMU now can support cpu-less NUMA node, for emulating some "slow"
memory(like some NVDIMM).


Oh yeah, I understand that. But it doesn't explain why initiator needs
to be specified for NUMA nodes with cpus and memory, or does it? Maybe
I'm still misunderstanding what the initiator is.



Yes, the initiator NUMA nodes with cpus and memory should be itself. In 
ACPI 6.3 spec, initiator is defined as:


This field is valid only if the memory controller
responsible for satisfying the access to memory
belonging to the specified memory proximity
domain is directly attached to an initiator that
belongs to a proximity domain. In that case, this
field contains the integer that represents the
proximity domain to which the initiator (Generic
Initiator or Processor) belongs. This number shall
match the corresponding entry in the SRAT table’s
processor affinity structure (e.g., Processor Local
APIC/SAPIC Affinity Structure, Processor Local
x2APIC Affinity Structure, GICC Affinity Structure) if
the initiator is a processor, or the Generic Initiator
Affinity Structure if the initator is a generic
initiator.
Note: this field provides additional information as
to the initiator node that is closest (as in directly
attached) to the memory address ranges within
the specified memory proximity domain, and
therefore should provide the best performance.

And if in the future, there is a memory-less NUMA node. Because in HMAT 
we describe "Memory" Proximity Domain Attributes Structure, I think we 
should not add memory-less NUMA node into HMAT.


Then I guess something else must be broken. Because as reported here [1] 
if I configure two numa nodes, both with two vCPUs and set initiators of 
each node to itselfs I get the following error message:


qemu-system-x86_64: -numa 
node,nodeid=1,cpus=2-3,initiator=1,memdev=ram-node1: The initiator of 
CPU NUMA node 1 should be itself


Funny about this error message is how contradictory it is. The cmd line 
showed in the error shows the initiator of the node 1 is indeed node 1.


1: https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg00071.html

Michal

[PATCH v2 2/2] qemu-options.hx: Document hmat-lb and hmat-cache order

2020-06-10 Thread Michal Privoznik

To simplify internal implementation the hmat-cache parsing code
expects hmat-lb to be already parsed. This means, that hmat-lb
arguments must come before hmat-cache. Document this restriction
so that management applications can follow it.

Signed-off-by: Michal Privoznik 
---
 qemu-options.hx | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index b1a399079a..3fe9e6d6a0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -319,6 +319,9 @@ SRST
 'none/direct(direct-mapped)/complex(complex cache indexing)'. policy
 is the write policy. line is the cache Line size in bytes.
 
+Please note, that due to internal implementation, '\ ``hmat-cache``\ '
+must be configured only after '\ ``hmat-lb``\ ' option.
+
 For example, the following options describe 2 NUMA nodes. Node 0 has
 2 cpus and a ram, node 1 has only a ram. The processors in node 0
 access memory in node 0 with access-latency 5 nanoseconds,
-- 
2.26.2

[PATCH v2 1/2] qemu-options.hx: Mark all hmat-cache attributes required

2020-06-10 Thread Michal Privoznik

The documentation to `-numa hmat-cache` says that @node-id, @size
and @level are the only required attributes. The rest
(@associativity, @policy and @line) is optional. Well, not quite
- if I try to start QEMU with only the three required attributes
defined the QAPI code is complaining about associativity missing.

According to QAPI all attributes are required. Make the docs
reflect that.

Signed-off-by: Michal Privoznik 
---
 qemu-options.hx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 93bde2bbc8..b1a399079a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -188,7 +188,7 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
 "-numa dist,src=source,dst=destination,val=distance\n"
 "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
 "-numa 
hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
-"-numa 
hmat-cache,node-id=node,size=size,level=level[,associativity=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n",
+"-numa 
hmat-cache,node-id=node,size=size,level=level,associativity=none|direct|complex,policy=none|write-back|write-through,line=size\n",
 QEMU_ARCH_ALL)
 SRST
 ``-numa 
node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=initiator]``
@@ -201,7 +201,7 @@ SRST
   \ 
 ``-numa 
hmat-lb,initiator=node,target=node,hierarchy=hierarchy,data-type=tpye[,latency=lat][,bandwidth=bw]``
   \ 
-``-numa 
hmat-cache,node-id=node,size=size,level=level[,associativity=str][,policy=str][,line=size]``
+``-numa 
hmat-cache,node-id=node,size=size,level=level,associativity=str,policy=str,line=size``
 Define a NUMA node and assign RAM and VCPUs to it. Set the NUMA
 distance from a source node to a destination node. Set the ACPI
 Heterogeneous Memory Attributes for the given nodes.
-- 
2.26.2

[PATCH v2 0/2] A pair of HMAT docs fixes

2020-06-10 Thread Michal Privoznik

Technically, this is a v2 of:

https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg08316.html

But as it turned out during the review of v1, we don't need to change
the code rather than documentation.

Michal Privoznik (2):
  qemu-options.hx: Mark all hmat-cache attributes required
  qemu-options.hx: Document hmat-lb and hmat-cache order

 qemu-options.hx | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.26.2

Re: [PATCH] numa: forbid '-numa node, mem' for 5.1 and newer machine types

2020-06-04 Thread Michal Privoznik


On 6/2/20 10:41 AM, Igor Mammedov wrote:

Deprecation period is run out and it's a time to flip the switch
introduced by cd5ff8333a.  Disable legacy option for new machine
types (since 5.1) and amend documentation.

'-numa node,memdev' shall be used instead of disabled option
with new machine types.

Signed-off-by: Igor Mammedov 
---
  - rebased on top of current master
  - move compat mode from 4.2 to 5.0

CC: peter.mayd...@linaro.org
CC: ehabk...@redhat.com
CC: marcel.apfelb...@gmail.com
CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: r...@twiddle.net
CC: da...@gibson.dropbear.id.au
CC: libvir-l...@redhat.com
CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org
---
  docs/system/deprecated.rst | 17 -
  hw/arm/virt.c  |  2 +-
  hw/core/numa.c |  6 ++
  hw/i386/pc.c   |  1 -
  hw/i386/pc_piix.c  |  1 +
  hw/i386/pc_q35.c   |  1 +
  hw/ppc/spapr.c |  2 +-
  qemu-options.hx|  9 +
  8 files changed, 15 insertions(+), 24 deletions(-)


This works with libvirt perfectly.

Reviewed-by: Michal Privoznik 

Michal

Re: [PATCH 3/3] numa: Initialize node initiator with respect to .has_cpu

2020-06-03 Thread Michal Privoznik


On 6/2/20 10:00 AM, Tao Xu wrote:


On 6/1/2020 4:10 PM, Michal Privoznik wrote:

On 5/29/20 5:09 PM, Igor Mammedov wrote:

On Fri, 29 May 2020 15:33:48 +0200
Michal Privoznik  wrote:


The initiator attribute of a NUMA node is documented as the 'NUMA
node that has best performance to given NUMA node'. If a NUMA
node has at least one CPU there can hardly be a different node
with better performace and thus all NUMA nodes which have a CPU
are initiators to themselves. Reflect this fact when initializing
the attribute.


It is not true in case of the node is memory-less


Are you saying that if there's a memory-less NUMA node, then it needs to
have initiator set too? Asking mostly out of curiosity because we don't
allow memory-less NUMA nodes in Libvirt just yet. Nor cpu-less, but my
patches that I'm referring to in cover letter will allow at least
cpu-less nodes. Should I allow both?
QEMU now is not support memory-less NUMA node, but in hardware may be 
supported. So we reserve this type of NUMA node for future usage. And 
QEMU now can support cpu-less NUMA node, for emulating some "slow" 
memory(like some NVDIMM).


Oh yeah, I understand that. But it doesn't explain why initiator needs 
to be specified for NUMA nodes with cpus and memory, or does it? Maybe 
I'm still misunderstanding what the initiator is.






Also, can you shed more light into why machine_set_cpu_numa_node() did
not override the .initiator?


And this one is still unanswered too. Because from user's perspective, 
initiator has to be set on all NUMA nodes (if HMAT is enabled) and it 
seems like this auto assignment code is not run/not working.


Michal

Re: [PATCH 0/3] Couple of HMAT fixes

2020-06-01 Thread Michal Privoznik


On 5/29/20 3:33 PM, Michal Privoznik wrote:

I've started working on libvirt side of this feature. WIP patches can be
found here:

https://github.com/zippy2/libvirt/commits/hmat

I've gotten to a point where libvirt generates cmd line but QEMU refuses
it. Problem is that I was looking into qemu-options.hx instead of
qapi/machine.json and thus found some irregularities between these two.

I'm not necessarily stating that all these patches are correct (I have
some doubts about 3/3 because nearly identical code can be found in
machine_set_cpu_numa_node(), but I have no idea if it's a coincidence).

Michal Privoznik (3):
   qapi: Make @associativity, @policy and @line of NumaHmatCacheOptions
 optional
   numa: Allow HMAT cache to be defined before HMAT latency/bandwidth
   numa: Initialize node initiator with respect to .has_cpu

  hw/core/numa.c| 22 +-
  qapi/machine.json |  6 +++---
  2 files changed, 12 insertions(+), 16 deletions(-)



Hey, so as I'm experimenting with this, I have couple of questions. 
Hopefully, you have answers.


1) How can I read HMAT from inside the guest? More specifically, I can 
see cache exposed under sysfs, but not latency/bandwidth. I mean, there is:


/sys/devices/system/node/node0/memory_side_cache/

which appears to contain interesting bits. But there seem to be nothing 
like that for latency/bandwidth. There is:


/sys/devices/system/node/node0/access0/initiators

containing:
read_bandwidth  read_latency  write_bandwidth  write_latency

but they all contain "0".


2) I still don't quite understand what initiator is. The way I read the 
documentation is that if a NUMA node has CPUs, it is initiator to 
itself. But, when I try to start the following command line, I get an error:


-machine 
pc-q35-2.12,accel=kvm,usb=off,vmport=off,dump-guest-core=off,hmat=on \

-cpu host,vmx=off \
-m 4096 \
-overcommit mem-lock=off \
-smp 4,sockets=4,cores=1,threads=1 \
-object memory-backend-ram,id=ram-node0,size=2147483648 \
-numa node,nodeid=0,cpus=0-1,initiator=0,memdev=ram-node0 \
-object memory-backend-ram,id=ram-node1,size=2147483648 \
-numa node,nodeid=1,cpus=2-3,initiator=1,memdev=ram-node1 \
-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=1 
\
-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=read-latency,latency=2 
\
-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=write-latency,latency=4 
\
-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=1048576K 
\
-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=read-bandwidth,bandwidth=2097152K 
\
-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=write-bandwidth,bandwidth=4194304K 
\
-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10 
\
-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=read-latency,latency=20 
\
-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=write-latency,latency=40 
\
-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=1024K 
\
-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=read-bandwidth,bandwidth=2048K 
\
-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=write-bandwidth,bandwidth=4096K 
\
-numa 
hmat-cache,node-id=0,size=256K,level=1,associativity=direct,policy=write-back,line=8 
\
-numa 
hmat-cache,node-id=0,size=128K,level=2,associativity=complex,policy=write-through,line=16 
\



qemu-system-x86_64: -numa 
node,nodeid=1,cpus=2-3,initiator=1,memdev=ram-node1: The initiator of 
CPU NUMA node 1 should be itself


Firstly, why does "CPU" even appear in the message? Initiator is an 
attribute of a NUMA node not CPU, right? Secondly, as specified on the 
command line, the initiator of the node 1 *is* node 1.



Michal

Re: [PATCH 3/3] numa: Initialize node initiator with respect to .has_cpu

2020-06-01 Thread Michal Privoznik


On 5/29/20 5:09 PM, Igor Mammedov wrote:

On Fri, 29 May 2020 15:33:48 +0200
Michal Privoznik  wrote:


The initiator attribute of a NUMA node is documented as the 'NUMA
node that has best performance to given NUMA node'. If a NUMA
node has at least one CPU there can hardly be a different node
with better performace and thus all NUMA nodes which have a CPU
are initiators to themselves. Reflect this fact when initializing
the attribute.


It is not true in case of the node is memory-less


Are you saying that if there's a memory-less NUMA node, then it needs to 
have initiator set too? Asking mostly out of curiosity because we don't 
allow memory-less NUMA nodes in Libvirt just yet. Nor cpu-less, but my 
patches that I'm referring to in cover letter will allow at least 
cpu-less nodes. Should I allow both?


Also, can you shed more light into why machine_set_cpu_numa_node() did 
not override the .initiator?


Thanks,
Michal

Re: [PATCH 3/3] numa: Initialize node initiator with respect to .has_cpu

2020-05-29 Thread Michal Privoznik


On 5/29/20 5:09 PM, Igor Mammedov wrote:

On Fri, 29 May 2020 15:33:48 +0200
Michal Privoznik  wrote:


The initiator attribute of a NUMA node is documented as the 'NUMA
node that has best performance to given NUMA node'. If a NUMA
node has at least one CPU there can hardly be a different node
with better performace and thus all NUMA nodes which have a CPU
are initiators to themselves. Reflect this fact when initializing
the attribute.


It is not true in case of the node is memory-less


Ah, so the node has CPUs only then? Okay, right now my libvirt patches 
don't allow that, but formatting initator for all NUMA nodes should be 
trivial.


Michal

Re: [PATCH 2/3] numa: Allow HMAT cache to be defined before HMAT latency/bandwidth

2020-05-29 Thread Michal Privoznik


On 5/29/20 4:59 PM, Igor Mammedov wrote:

On Fri, 29 May 2020 15:33:47 +0200
Michal Privoznik  wrote:


Currently, when defining a HMAT cache for a NUMA node (in
parse_numa_hmat_cache()) there is this check that forces users to
define HMAT latency/bandwidth first. There is no real need for
this, because nothing in the parse function relies on that and
the HMAT table is constructed way later - when ACPI table is
constructed.


see comment in
   https://lists.gnu.org/archive/html/qemu-devel/2019-11/msg01206.html

in short doing check at this time allow us not to have more complex
check later on.

perhaps it needs a comment so that later it won't be dropped by accident


Fair enough. Discard this one then and I will post a patch that document 
this.


Michal

[PATCH 1/3] qapi: Make @associativity, @policy and @line of NumaHmatCacheOptions optional

2020-05-29 Thread Michal Privoznik

The documentation to `-numa hmat-cache` says that @node-id, @size
and @level are the only required attributes. The rest
(@associativity, @policy and @line) is optional. Well, not quite
- if I try to start QEMU with only the three required attributes
defined the QAPI code is complaining about associativity missing.

Signed-off-by: Michal Privoznik 
---
 qapi/machine.json | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index ff7b5032e3..952784f8ba 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -723,9 +723,9 @@
'node-id': 'uint32',
'size': 'size',
'level': 'uint8',
-   'associativity': 'HmatCacheAssociativity',
-   'policy': 'HmatCacheWritePolicy',
-   'line': 'uint16' }}
+   '*associativity': 'HmatCacheAssociativity',
+   '*policy': 'HmatCacheWritePolicy',
+   '*line': 'uint16' }}
 
 ##
 # @HostMemPolicy:
-- 
2.26.2

[PATCH 2/3] numa: Allow HMAT cache to be defined before HMAT latency/bandwidth

2020-05-29 Thread Michal Privoznik

Currently, when defining a HMAT cache for a NUMA node (in
parse_numa_hmat_cache()) there is this check that forces users to
define HMAT latency/bandwidth first. There is no real need for
this, because nothing in the parse function relies on that and
the HMAT table is constructed way later - when ACPI table is
constructed.

Signed-off-by: Michal Privoznik 
---
 hw/core/numa.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 316bc50d75..338453461c 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -384,7 +384,6 @@ void parse_numa_hmat_cache(MachineState *ms, 
NumaHmatCacheOptions *node,
Error **errp)
 {
 int nb_numa_nodes = ms->numa_state->num_nodes;
-NodeInfo *numa_info = ms->numa_state->nodes;
 NumaHmatCacheOptions *hmat_cache = NULL;
 
 if (node->node_id >= nb_numa_nodes) {
@@ -393,13 +392,6 @@ void parse_numa_hmat_cache(MachineState *ms, 
NumaHmatCacheOptions *node,
 return;
 }
 
-if (numa_info[node->node_id].lb_info_provided != (BIT(0) | BIT(1))) {
-error_setg(errp, "The latency and bandwidth information of "
-   "node-id=%" PRIu32 " should be provided before memory side "
-   "cache attributes", node->node_id);
-return;
-}
-
 if (node->level < 1 || node->level >= HMAT_LB_LEVELS) {
 error_setg(errp, "Invalid level=%" PRIu8 ", it should be larger than 0 
"
"and less than or equal to %d", node->level,
-- 
2.26.2

[PATCH 3/3] numa: Initialize node initiator with respect to .has_cpu

2020-05-29 Thread Michal Privoznik

The initiator attribute of a NUMA node is documented as the 'NUMA
node that has best performance to given NUMA node'. If a NUMA
node has at least one CPU there can hardly be a different node
with better performace and thus all NUMA nodes which have a CPU
are initiators to themselves. Reflect this fact when initializing
the attribute.

Signed-off-by: Michal Privoznik 
---
 hw/core/numa.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 338453461c..1c9bc761cc 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -136,11 +136,15 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
 }
 
-/*
- * If not set the initiator, set it to MAX_NODES. And if
- * HMAT is enabled and this node has no cpus, QEMU will raise error.
- */
-numa_info[nodenr].initiator = MAX_NODES;
+/* Initialize initiator to either the current NUMA node (if
+ * it has at least one CPU), or to MAX_NODES. If HMAT is
+ * enabled an error will be raised later in
+ * numa_validate_initiator(). */
+if (numa_info[nodenr].has_cpu)
+numa_info[nodenr].initiator = nodenr;
+else
+numa_info[nodenr].initiator = MAX_NODES;
+
 if (node->has_initiator) {
 if (!ms->numa_state->hmat_enabled) {
 error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
-- 
2.26.2

[PATCH 0/3] Couple of HMAT fixes

2020-05-29 Thread Michal Privoznik

I've started working on libvirt side of this feature. WIP patches can be
found here:

https://github.com/zippy2/libvirt/commits/hmat

I've gotten to a point where libvirt generates cmd line but QEMU refuses
it. Problem is that I was looking into qemu-options.hx instead of
qapi/machine.json and thus found some irregularities between these two.

I'm not necessarily stating that all these patches are correct (I have
some doubts about 3/3 because nearly identical code can be found in
machine_set_cpu_numa_node(), but I have no idea if it's a coincidence).

Michal Privoznik (3):
  qapi: Make @associativity, @policy and @line of NumaHmatCacheOptions
optional
  numa: Allow HMAT cache to be defined before HMAT latency/bandwidth
  numa: Initialize node initiator with respect to .has_cpu

 hw/core/numa.c| 22 +-
 qapi/machine.json |  6 +++---
 2 files changed, 12 insertions(+), 16 deletions(-)

-- 
2.26.2

[PATCH v2 2/2] qmp: Expose MachineClass::default_ram_id

2020-05-26 Thread Michal Privoznik

If a management application (like Libvirt) want's to preserve
migration ability and switch to '-machine memory-backend' it
needs to set exactly the same RAM id as QEMU would. Since the id
is machine type dependant, expose it under 'query-machines'
result. Some machine types don't have the attribute set (riscv
family for example), therefore the QMP attribute must be
optional.

Signed-off-by: Michal Privoznik 
---
 hw/core/machine-qmp-cmds.c | 4 
 qapi/machine.json  | 5 -
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 2c5da8413d..3e11a740c9 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -238,6 +238,10 @@ MachineInfoList *qmp_query_machines(Error **errp)
 info->default_cpu_type = g_strdup(mc->default_cpu_type);
 info->has_default_cpu_type = true;
 }
+if (mc->default_ram_id) {
+info->default_ram_id = g_strdup(mc->default_ram_id);
+info->has_default_ram_id = true;
+}
 
 entry = g_malloc0(sizeof(*entry));
 entry->value = info;
diff --git a/qapi/machine.json b/qapi/machine.json
index 39caa1d914..76c1606390 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -355,13 +355,16 @@
 # @default-cpu-type: default CPU model typename if none is requested via
 #the -cpu argument. (since 4.2)
 #
+# @default-ram-id: the default ID of initial RAM memory backend (since 5.1)
+#
 # Since: 1.2.0
 ##
 { 'struct': 'MachineInfo',
   'data': { 'name': 'str', '*alias': 'str',
 '*is-default': 'bool', 'cpu-max': 'int',
 'hotpluggable-cpus': 'bool',  'numa-mem-supported': 'bool',
-'deprecated': 'bool', '*default-cpu-type': 'str' } }
+'deprecated': 'bool', '*default-cpu-type': 'str',
+'*default-ram-id': 'str' } }
 
 ##
 # @query-machines:
-- 
2.26.2

[PATCH v2 0/2] qmp: Expose MachineClass::default_ram_id

2020-05-26 Thread Michal Privoznik

v2 of:

https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg07103.html

diff to v1:
- in 2/2 I made the default-ram-id optional, because as it turns out,
not every machine type has it set.

Michal Privoznik (2):
  qapi: Fix comment format for @CpuInstanceProperties
  qmp: Expose MachineClass::default_ram_id

 hw/core/machine-qmp-cmds.c | 4 
 qapi/machine.json  | 8 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

-- 
2.26.2

[PATCH v2 1/2] qapi: Fix comment format for @CpuInstanceProperties

2020-05-26 Thread Michal Privoznik

In 176d2cda0de, the @die-id attribute was introduced to
CpuInstanceProperties type. However, it mangled the comment.

Signed-off-by: Michal Privoznik 
---
 qapi/machine.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index ff7b5032e3..39caa1d914 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -824,7 +824,8 @@
 # @node-id: NUMA node ID the CPU belongs to
 # @socket-id: socket number within node/board the CPU belongs to
 # @die-id: die number within node/board the CPU belongs to (Since 4.1)
-# @core-id: core number within die the CPU belongs to# @thread-id: thread 
number within core the CPU belongs to
+# @core-id: core number within die the CPU belongs to
+# @thread-id: thread number within core the CPU belongs to
 #
 # Note: currently there are 5 properties that could be present
 #   but management should be prepared to pass through other
-- 
2.26.2

Re: [PATCH 2/2] qmp: Expose MachineClass::default_ram_id

2020-05-26 Thread Michal Privoznik


On 5/25/20 8:06 PM, Eduardo Habkost wrote:

On Mon, May 25, 2020 at 07:03:28PM +0200, Michal Privoznik wrote:

If a management application (like Libvirt) want's to preserve
migration ability and switch to '-machine memory-backend' it
needs to set exactly the same RAM id as QEMU would. Since the id
is machine type dependant, expose it under 'query-machines'
result.

Signed-off-by: Michal Privoznik 


The code looks good, but documentation was a bit confusing:


---

[...]

+# @default-ram-id: the default name of initial RAM memory region (since 5.1)
+#


Everywhere else in the commit message you call it "id", but here
you say "name".  Also, I don't think we have any references to a
"memory region" abstraction in the docs for the QAPI schema,
-machine options, or memory backend objects.

I had to look it up in the code, to finally understand you were
talking about the memory backend object ID.

To make it consistent with terminology used for -machine and
QAPI, I suggest:

   @default-ram-id: the default ID of initial RAM memory backend (since 5.1)

I can change it before committing, if you agree.


Thanks for the offer, but I will post a v2, because as I was developing 
patches for libvirt to consume this I found out that some machine types 
don't have the attribute set (riscv is one of them). Therefore I will 
have to make this optional.


Michal

[PATCH 1/2] qapi: Fix comment format for @CpuInstanceProperties

2020-05-25 Thread Michal Privoznik

In 176d2cda0de, the @die-id attribute was introduced to
CpuInstanceProperties type. However, it mangled the comment.

Signed-off-by: Michal Privoznik 
---
 qapi/machine.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index ff7b5032e3..39caa1d914 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -824,7 +824,8 @@
 # @node-id: NUMA node ID the CPU belongs to
 # @socket-id: socket number within node/board the CPU belongs to
 # @die-id: die number within node/board the CPU belongs to (Since 4.1)
-# @core-id: core number within die the CPU belongs to# @thread-id: thread 
number within core the CPU belongs to
+# @core-id: core number within die the CPU belongs to
+# @thread-id: thread number within core the CPU belongs to
 #
 # Note: currently there are 5 properties that could be present
 #   but management should be prepared to pass through other
-- 
2.26.2

[PATCH 2/2] qmp: Expose MachineClass::default_ram_id

2020-05-25 Thread Michal Privoznik

If a management application (like Libvirt) want's to preserve
migration ability and switch to '-machine memory-backend' it
needs to set exactly the same RAM id as QEMU would. Since the id
is machine type dependant, expose it under 'query-machines'
result.

Signed-off-by: Michal Privoznik 
---
 hw/core/machine-qmp-cmds.c | 1 +
 qapi/machine.json  | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 2c5da8413d..8333e674ce 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -234,6 +234,7 @@ MachineInfoList *qmp_query_machines(Error **errp)
 info->hotpluggable_cpus = mc->has_hotpluggable_cpus;
 info->numa_mem_supported = mc->numa_mem_supported;
 info->deprecated = !!mc->deprecation_reason;
+info->default_ram_id = g_strdup(mc->default_ram_id);
 if (mc->default_cpu_type) {
 info->default_cpu_type = g_strdup(mc->default_cpu_type);
 info->has_default_cpu_type = true;
diff --git a/qapi/machine.json b/qapi/machine.json
index 39caa1d914..e5647c4031 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -352,6 +352,8 @@
 #  in future versions of QEMU according to the QEMU deprecation
 #  policy (since 4.1.0)
 #
+# @default-ram-id: the default name of initial RAM memory region (since 5.1)
+#
 # @default-cpu-type: default CPU model typename if none is requested via
 #the -cpu argument. (since 4.2)
 #
@@ -361,7 +363,8 @@
   'data': { 'name': 'str', '*alias': 'str',
 '*is-default': 'bool', 'cpu-max': 'int',
 'hotpluggable-cpus': 'bool',  'numa-mem-supported': 'bool',
-'deprecated': 'bool', '*default-cpu-type': 'str' } }
+'deprecated': 'bool', 'default-ram-id': 'str',
+'*default-cpu-type': 'str' } }
 
 ##
 # @query-machines:
-- 
2.26.2

[PATCH 0/2] qmp: Expose MachineClass::default_ram_id

2020-05-25 Thread Michal Privoznik

The important patch is 2/2.

Michal Privoznik (2):
  qapi: Fix comment format for @CpuInstanceProperties
  qmp: Expose MachineClass::default_ram_id

 hw/core/machine-qmp-cmds.c | 1 +
 qapi/machine.json  | 8 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

-- 
2.26.2

Re: device hotplug & file handles

2020-05-11 Thread Michal Privoznik


On 5/7/20 4:49 PM, Gerd Hoffmann wrote:

   Hi,

For usb device pass-through (aka -device usb-host) it would be very
useful to pass file handles from libvirt to qemu.  The workflow would
change from ...

   (1) libvirt enables access to /dev/usb/$bus/$dev
   (2) libvirt passes $bus + $dev (using hostbus + hostaddr properties)
   to qemu.
   (3) qemu opens /dev/usb/$bus/$dev

... to ...

   (1) libvirt opens /dev/usb/$bus/$dev
   (2) libvirt passes filehandle to qemu.

Question is how can we pass the file descriptor best?  My idea would be
to simply add an fd property to usb-host:

  * Coldplug would be "-device usb-host,fd=" (cmd line).
  * Hotplug would be "device_add usb-host,fd=" (monitor).

Will that work from libvirt point of view?
Or does anyone have an better idea?

thanks,
   Gerd

PS: background: https://bugzilla.redhat.com/show_bug.cgi?id=1595525



I don't have a better idea, but a little background on why libvirt even 
invented private /dev in the first place. The reason was that 
occasionally, when udev ran its rules it would overwrite the security 
labels on /dev nodes set by libvirt and thus denying access to QEMU. See:


https://bugzilla.redhat.com/show_bug.cgi?id=1354251

Now, I think there is the same risk with what you are proposing. This 
isn't problem for DAC where permissions are checked during open(), but 
it may be a problem for SELinux where each individual operation with the 
FD is inspected.


Having said that, I am not against this approach, in fact I'm in favour 
of it. Let's hope that people learned that having udev overwriting 
seclabels is a bad idea and the bug won't appear again.


Michal

[PATCH] vfio-helpers: Free QEMUVFIOState in qemu_vfio_close()

2020-04-22 Thread Michal Privoznik

The qemu_vfio_open_pci() allocates this QEMUVFIOState structure
but free counterpart is missing. Since we already have
qemu_vfio_close() which does cleanup of the state, it looks like
a perfect place to free the structure too. However, to avoid
confusing rename the function to make it explicit that the passed
structure is also freed.

==167296== 528 (360 direct, 168 indirect) bytes in 1 blocks are definitely lost 
in loss record 8,504 of 8,908
==167296==at 0x4837B86: calloc (vg_replace_malloc.c:762)
==167296==by 0x4B8F6A0: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.7)
==167296==by 0xA7F532: qemu_vfio_open_pci (vfio-helpers.c:428)
==167296==by 0x989595: nvme_init (nvme.c:606)
==167296==by 0x989EB0: nvme_file_open (nvme.c:795)
==167296==by 0x8F9D04: bdrv_open_driver (block.c:1466)
==167296==by 0x8FA6E1: bdrv_open_common (block.c:1744)
==167296==by 0x8FDC73: bdrv_open_inherit (block.c:3291)
==167296==by 0x8FE1B5: bdrv_open (block.c:3384)
==167296==by 0x5EE828: bds_tree_init (blockdev.c:663)
==167296==by 0x5F57F8: qmp_blockdev_add (blockdev.c:3746)
==167296==by 0x5666DC: configure_blockdev (vl.c:1047)

Signed-off-by: Michal Privoznik 
---
 block/nvme.c| 2 +-
 include/qemu/vfio-helpers.h | 2 +-
 util/vfio-helpers.c | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 7b7c0cc5d6..7e00c4f1a7 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -766,7 +766,7 @@ static void nvme_close(BlockDriverState *bs)
false, NULL, NULL);
 event_notifier_cleanup(>irq_notifier);
 qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0, NVME_BAR_SIZE);
-qemu_vfio_close(s->vfio);
+qemu_vfio_close_and_free(s->vfio);
 
 g_free(s->device);
 }
diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 1f057c2b9e..c96a0b1963 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -16,7 +16,7 @@
 typedef struct QEMUVFIOState QEMUVFIOState;
 
 QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp);
-void qemu_vfio_close(QEMUVFIOState *s);
+void qemu_vfio_close_and_free(QEMUVFIOState *s);
 int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
   bool temporary, uint64_t *iova_list);
 int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s);
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index ddd9a96e76..4c525d245b 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -706,7 +706,7 @@ static void qemu_vfio_reset(QEMUVFIOState *s)
 }
 
 /* Close and free the VFIO resources. */
-void qemu_vfio_close(QEMUVFIOState *s)
+void qemu_vfio_close_and_free(QEMUVFIOState *s)
 {
 int i;
 
@@ -721,4 +721,5 @@ void qemu_vfio_close(QEMUVFIOState *s)
 close(s->device);
 close(s->group);
 close(s->container);
+g_free(s);
 }
-- 
2.25.3

Re: [PATCH] Report stringified errno in VFIO related errors

2020-02-14 Thread Michal Privoznik


On 2/14/20 4:49 PM, Alex Williamson wrote:

On Fri, 14 Feb 2020 10:19:50 +0100
Michal Privoznik  wrote:



Do you guys want me to file a bug?


Probably a good idea.  Thanks,


Since I'm reproducing this against upstream, I've reported it here:

https://bugs.launchpad.net/qemu/+bug/186

Michal

[PATCH v2] Report stringified errno in VFIO related errors

2020-02-14 Thread Michal Privoznik

In a few places we report errno formatted as a negative integer.
This is not as user friendly as it can be. Use strerror() and/or
error_setg_errno() instead.

Signed-off-by: Michal Privoznik 
---

v1 posted here:

https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg03623.html

diff to v1:
 - Change error reporting in vfio_dma_unmap() too as I missed it in v1.

 hw/vfio/common.c| 4 ++--
 util/vfio-helpers.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5ca11488d6..0b3593b3c0 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -319,7 +319,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
 unmap.size -= 1ULL << ctz64(container->pgsizes);
 continue;
 }
-error_report("VFIO_UNMAP_DMA: %d", -errno);
+error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
 return -errno;
 }
 
@@ -352,7 +352,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 return 0;
 }
 
-error_report("VFIO_MAP_DMA: %d", -errno);
+error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
 return -errno;
 }
 
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 813f7ec564..ddd9a96e76 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -545,7 +545,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void 
*host, size_t size,
 trace_qemu_vfio_do_mapping(s, host, size, iova);
 
 if (ioctl(s->container, VFIO_IOMMU_MAP_DMA, _map)) {
-error_report("VFIO_MAP_DMA: %d", -errno);
+error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
 return -errno;
 }
 return 0;
@@ -570,7 +570,7 @@ static void qemu_vfio_undo_mapping(QEMUVFIOState *s, 
IOVAMapping *mapping,
 assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
 assert(index >= 0 && index < s->nr_mappings);
 if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, )) {
-error_setg(errp, "VFIO_UNMAP_DMA failed: %d", -errno);
+error_setg_errno(errp, errno, "VFIO_UNMAP_DMA failed");
 }
 memmove(mapping, >mappings[index + 1],
 sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
@@ -669,7 +669,7 @@ int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s)
 trace_qemu_vfio_dma_reset_temporary(s);
 qemu_mutex_lock(>lock);
 if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, )) {
-error_report("VFIO_UNMAP_DMA: %d", -errno);
+error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
 qemu_mutex_unlock(>lock);
 return -errno;
 }
-- 
2.24.1

Re: [PATCH] Report stringified errno in VFIO related errors

2020-02-14 Thread Michal Privoznik


On 2/14/20 10:47 AM, Cornelia Huck wrote:

On Fri, 14 Feb 2020 09:47:39 +0100
Michal Privoznik  wrote:


In a few places we report errno formatted as a negative integer.
This is not as user friendly as it can be. Use strerror() and/or
error_setg_errno() instead.

Signed-off-by: Michal Privoznik 
---
  hw/vfio/common.c| 2 +-
  util/vfio-helpers.c | 6 +++---
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5ca11488d6..a3a2a82d99 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -352,7 +352,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
  return 0;
  }
  
-error_report("VFIO_MAP_DMA: %d", -errno);

+error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
  return -errno;
  }


I think you missed the one in vfio_dma_unmap().



Indeed, will send v2.

Michal

Re: [PATCH] Report stringified errno in VFIO related errors

2020-02-14 Thread Michal Privoznik


On 2/14/20 9:47 AM, Michal Privoznik wrote:

In a few places we report errno formatted as a negative integer.
This is not as user friendly as it can be. Use strerror() and/or
error_setg_errno() instead.

Signed-off-by: Michal Privoznik 
---
  hw/vfio/common.c| 2 +-
  util/vfio-helpers.c | 6 +++---
  2 files changed, 4 insertions(+), 4 deletions(-)



BTW the reason I've noticed these is because I'm seeing some errors when 
assigning my NVMe disk to qemu. This is the full command line:



/home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 \
-name guest=fedora,debug-threads=on \
-S \
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes 
\

-machine pc-i440fx-4.1,accel=kvm,usb=off,dump-guest-core=off \
-cpu host \
-m size=4194304k,slots=16,maxmem=1099511627776k \
-overcommit mem-lock=off \
-smp 4,sockets=1,dies=1,cores=2,threads=2 \
-object iothread,id=iothread1 \
-object iothread,id=iothread2 \
-object iothread,id=iothread3 \
-object iothread,id=iothread4 \
-mem-prealloc \
-mem-path /hugepages2M/libvirt/qemu/2-fedora \
-numa node,nodeid=0,cpus=0,mem=4096 \
-uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=31,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-boot menu=on,strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-blockdev 
'{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' 
\
-blockdev 
'{"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-2-storage","backing":null}' 
\
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-2-format,id=scsi0-0-0-0,bootindex=1 
\
-blockdev 
'{"driver":"nvme","device":":02:00.0","namespace":1,"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' 
\
-blockdev 
'{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' 
\
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0 
\

-netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 \
-device 
virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3 
\

-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=35,server,nowait \
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
\

-spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on \
-device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
-sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \

-msg timestamp=on

And these are the errors I see:

2020-02-14T09:06:18.183167Z qemu-system-x86_64: VFIO_MAP_DMA failed: 
Invalid argument
2020-02-14T09:10:49.753767Z qemu-system-x86_64: VFIO_MAP_DMA failed: 
Cannot allocate memory
2020-02-14T09:11:04.530344Z qemu-system-x86_64: VFIO_MAP_DMA failed: No 
space left on device
2020-02-14T09:11:04.531087Z qemu-system-x86_64: VFIO_MAP_DMA failed: No 
space left on device
2020-02-14T09:11:04.531230Z qemu-system-x86_64: VFIO_MAP_DMA failed: No 
space left on device



I'm doing nothing with the disk inside the guest, but:

  # dd if=/dev/vda of=/dev/null status=progress

(the disk appears as /dev/vda in the guest). Surprisingly, I do not see 
these errors when I use the traditional PCI assignment (-device 
vfio-pci). My versions of kernel and qemu:


moe ~ # uname -r
5.4.15-gentoo
moe ~ # /home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 
--version

QEMU emulator version 4.2.50 (v4.2.0-1439-g5d6542bea7-dirty)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

Do you guys want me to file a bug?

Michal

[PATCH] Report stringified errno in VFIO related errors

2020-02-14 Thread Michal Privoznik

In a few places we report errno formatted as a negative integer.
This is not as user friendly as it can be. Use strerror() and/or
error_setg_errno() instead.

Signed-off-by: Michal Privoznik 
---
 hw/vfio/common.c| 2 +-
 util/vfio-helpers.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5ca11488d6..a3a2a82d99 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -352,7 +352,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 return 0;
 }
 
-error_report("VFIO_MAP_DMA: %d", -errno);
+error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
 return -errno;
 }
 
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 813f7ec564..ddd9a96e76 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -545,7 +545,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void 
*host, size_t size,
 trace_qemu_vfio_do_mapping(s, host, size, iova);
 
 if (ioctl(s->container, VFIO_IOMMU_MAP_DMA, _map)) {
-error_report("VFIO_MAP_DMA: %d", -errno);
+error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
 return -errno;
 }
 return 0;
@@ -570,7 +570,7 @@ static void qemu_vfio_undo_mapping(QEMUVFIOState *s, 
IOVAMapping *mapping,
 assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
 assert(index >= 0 && index < s->nr_mappings);
 if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, )) {
-error_setg(errp, "VFIO_UNMAP_DMA failed: %d", -errno);
+error_setg_errno(errp, errno, "VFIO_UNMAP_DMA failed");
 }
 memmove(mapping, >mappings[index + 1],
 sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
@@ -669,7 +669,7 @@ int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s)
 trace_qemu_vfio_dma_reset_temporary(s);
 qemu_mutex_lock(>lock);
 if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, )) {
-error_report("VFIO_UNMAP_DMA: %d", -errno);
+error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
 qemu_mutex_unlock(>lock);
 return -errno;
 }
-- 
2.24.1

Re: [libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types

2020-01-16 Thread Michal Privoznik


On 1/16/20 1:37 PM, Igor Mammedov wrote:

On Thu, 16 Jan 2020 11:42:09 +0100
Michal Privoznik  wrote:


On 1/15/20 5:52 PM, Igor Mammedov wrote:

On Wed, 15 Jan 2020 16:34:53 +0100
Peter Krempa  wrote:
   

On Wed, Jan 15, 2020 at 16:07:37 +0100, Igor Mammedov wrote:

Deprecation period is ran out and it's a time to flip the switch
introduced by cd5ff8333a.
Disable legacy option for new machine types and amend documentation.

Signed-off-by: Igor Mammedov 
---
CC: peter.mayd...@linaro.org
CC: ehabk...@redhat.com
CC: marcel.apfelb...@gmail.com
CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: r...@twiddle.net
CC: da...@gibson.dropbear.id.au
CC: libvir-l...@redhat.com
CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org
---
   hw/arm/virt.c|  2 +-
   hw/core/numa.c   |  6 ++
   hw/i386/pc.c |  1 -
   hw/i386/pc_piix.c|  1 +
   hw/i386/pc_q35.c |  1 +
   hw/ppc/spapr.c   |  2 +-
   qemu-deprecated.texi | 16 
   qemu-options.hx  |  8 
   8 files changed, 14 insertions(+), 23 deletions(-)


I'm afraid nobody bothered to fix it yet:

https://bugzilla.redhat.com/show_bug.cgi?id=1783355


It's time to start working on it :)
(looks like just deprecating stuff isn't sufficient motivation,
maybe actual switch flipping would work out better)
   


So how was the upgrade from older to newer version resolved? I mean, if
the old qemu used -numa node,mem=XXX and it is migrated to a host with
newer qemu, the cmd line can't be switched to -numa node,memdev=node0,
can it? I'm asking because I've just started working on this.


see commit cd5ff8333a3c87 for detailed info.
Short answer is it's not really resolved [*],
-numa node,mem will keep working on newer QEMU but only for old machine types
new machine types will accept only -numa node,memdev.

One can check if "mem=' is supported by using QAPI query-machines
and checking numa-mem-supported field. That field is flipped to false
for 5.0 and later machine types in this patch.


Alright, so what we can do is the following:

1) For new machine types (pc-5.0/q35-5.0 and newer) use memdev= always.

2) For older machine types, we are stuck with mem= until qemu is capable 
of migrating from mem= to memdev=


I think this is a safe thing to do since migrating from one version of a 
machine type to another is not supported (since it can change guest 
ABI). And we will see how much 2) bothers us. Does this sound reasonable?


Michal

Re: [libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types

2020-01-16 Thread Michal Privoznik


On 1/15/20 5:52 PM, Igor Mammedov wrote:

On Wed, 15 Jan 2020 16:34:53 +0100
Peter Krempa  wrote:


On Wed, Jan 15, 2020 at 16:07:37 +0100, Igor Mammedov wrote:

Deprecation period is ran out and it's a time to flip the switch
introduced by cd5ff8333a.
Disable legacy option for new machine types and amend documentation.

Signed-off-by: Igor Mammedov 
---
CC: peter.mayd...@linaro.org
CC: ehabk...@redhat.com
CC: marcel.apfelb...@gmail.com
CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: r...@twiddle.net
CC: da...@gibson.dropbear.id.au
CC: libvir-l...@redhat.com
CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org
---
  hw/arm/virt.c|  2 +-
  hw/core/numa.c   |  6 ++
  hw/i386/pc.c |  1 -
  hw/i386/pc_piix.c|  1 +
  hw/i386/pc_q35.c |  1 +
  hw/ppc/spapr.c   |  2 +-
  qemu-deprecated.texi | 16 
  qemu-options.hx  |  8 
  8 files changed, 14 insertions(+), 23 deletions(-)


I'm afraid nobody bothered to fix it yet:

https://bugzilla.redhat.com/show_bug.cgi?id=1783355


It's time to start working on it :)
(looks like just deprecating stuff isn't sufficient motivation,
maybe actual switch flipping would work out better)



So how was the upgrade from older to newer version resolved? I mean, if 
the old qemu used -numa node,mem=XXX and it is migrated to a host with 
newer qemu, the cmd line can't be switched to -numa node,memdev=node0, 
can it? I'm asking because I've just started working on this.


Michal

Re: discuss about pvpanic

2020-01-08 Thread Michal Privoznik


On 1/8/20 10:36 AM, Paolo Bonzini wrote:

On 08/01/20 09:25, zhenwei pi wrote:

Hey, Paolo

Currently, pvpapic only supports bit 0(PVPANIC_PANICKED).
We usually expect that guest writes ioport (typical 0x505) in 
panic_notifier_list callback
during handling panic, then we can handle pvpapic event PVPANIC_PANICKED in 
QEMU.

On the other hand, guest wants to handle the crash by kdump-tools, and reboots 
without any
panic_notifier_list callback. So QEMU only knows that guest has rebooted 
(because guest
write 0xcf9 ioport for RCR request), but QEMU can't identify why guest resets.

In production environment, we hit about 100+ guest reboot event everyday, sadly 
we
can't separate the abnormal reboot from normal operation.

We want to add a new bit for pvpanic event(maybe PVPANIC_CRASHLOADED) to 
represent the guest has crashed,
and the panic is handled by the guest kernel. (here is the previous patch 
https://lkml.org/lkml/2019/12/14/265)

What do you think about this solution? Or do you have any other suggestions?


Hi Zhenwei,

the kernel-side patch certainly makes sense.  I assume that you want the
event to propagate up from QEMU to Libvirt and so on?  The QEMU patch
would need to declare a new event (qapi/misc.json) and send it in
handle_event (hw/misc/pvpanic.c).  For Libvirt I'm not familiar, so I'm
adding the respective list.


Adding an event is fairly easy, if everything you want libvirt to do is 
report the event to upper layers. I volunteer to do it. Question is, how 
qemu is going to report this, whether some attributes to GUEST_PANICKED 
event or some new event. But more important is to merge the change into 
kernel.


Michal

Re: Making QEMU easier for management tools and applications

2020-01-07 Thread Michal Privoznik


On 1/7/20 10:36 AM, Kevin Wolf wrote:

The easy way out would be tying libvirt to a specific QEMU version. And
I'm only half joking.

If libvirt didn't exist yet and we needed a management library for QEMU,
what we would build now would probably not look much like libvirt looks
today. We wouldn't try to have basic support for every hypervisor out
there, but integrate it much closer with QEMU and avoid much of the
backwards compatibility requirements that the interface between QEMU and
libvirt has (which requires us to deal with compatibility twice for
everything).



By doing this, you would force your consumers to implement compatibility 
layer each on their own. Freshly finished blockdev is a beautiful 
example - OpenStack, oVirt and whatnot - they all are/can use blockdev 
without even noticing, because the API provided by libvirt is stable and 
provides abstraction, i.e. you don't need to change anything in your 
domain XML to use blockdev.


Of course, you can apply the argument one more time and have mgmt 
application tied to a specific version of qemu. But even that is not 
good enough, because with backports version is just meaningless number.



Maybe it would even be part of the QEMU repository, allowing a single
patch series to implement a new feature in the system emulator and
expose it in the API immediately instead of waiting for the next QEMU
release before libvirt can even think about implementing support for it.


Thing is, it's not just qmp that a mgmt application has to master, it's 
also process managing (and with growing number of helper binaries this 
is not as trivial as fork() + exec()). This would need to be the bare 
minimum your API layer has to provide to be consumable by anybody.
But then you have some advanced subsystems to take care of (CGroups, 
SELinux, etc.) which are used heavily by OpenStack. oVirt and friends.




So should libvirt move in that direction? Do people actually still make
much use of its multi-hypervisor nature, or would it make sense to split
it into separate libraries for each hypervisor that can be much tighter
integrated with (a specific version of) the respective hypervisor?


Truth to be told, I don't think libvirt is held back by its attempts to 
provide hypervisor agnostic APIs. Sure, it leads to some weirdness (e.g. 
different naming in libvirt and qemu), but as a libvirt developer I 
don't remember feeling blocked by this multi-hypervisor nature (not to 
mention that this has saved us couple of times).


Also, it would be not correct to think that a feature is implemented for 
all hypervisors in libvirt. I mean, when implementing a feature I 
usually implement it only for qemu driver and don't even look at other 
drivers (unless I'm doing a change in a core that causes build 
failures). On the other hand, I do sometimes review patches posted by 
developers from other companies which have interest in different 
hypervisors (e.g. there is a SUSE guy working on LXC driver, and another 
SUSE guy working on libxenlight (basically Xen)), so I do spend some 
time not working on qemu driver, but I'd say it's negligible.


Michal

[PATCH] x86: Check for machine state object class before typecasting it

2019-12-30 Thread Michal Privoznik

In v4.2.0-246-ged9e923c3c the SMM property was moved from PC
machine class to x86 machine class. Makes sense, but the change
was too aggressive: in target/i386/kvm.c:kvm_arch_init() it
altered check which sets SMRAM if given machine has SMM enabled.
The line that detects whether given machine object is class of
PC_MACHINE was removed from the check. This makes qemu try to
enable SMRAM for all machine types, which is not what we want.

Signed-off-by: Michal Privoznik 
---
 target/i386/kvm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 0b511906e3..7ee3202634 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -2173,6 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 
 if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
+object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) &&
 x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
 smram_machine_done.notify = register_smram_listener;
 qemu_add_machine_init_done_notifier(_machine_done);
-- 
2.24.1

Re: [PATCH v3] qga: fence guest-set-time if hwclock not available

2019-12-05 Thread Michal Privoznik


On 12/5/19 2:12 PM, Cornelia Huck wrote:

On Thu, 5 Dec 2019 14:05:19 +0100
Philippe Mathieu-Daudé  wrote:


Hi Cornelia,

On 12/5/19 12:53 PM, Cornelia Huck wrote:

The Posix implementation of guest-set-time invokes hwclock to
set/retrieve the time to/from the hardware clock. If hwclock
is not available, the user is currently informed that "hwclock
failed to set hardware clock to system time", which is quite
misleading. This may happen e.g. on s390x, which has a different
timekeeping concept anyway.

Let's check for the availability of the hwclock command and
return QERR_UNSUPPORTED for guest-set-time if it is not available.

Reviewed-by: Laszlo Ersek 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Michael Roth 
Signed-off-by: Cornelia Huck 
---

v2->v3:
- added 'static' keyword to hwclock_path

Not sure what tree this is going through; if there's no better place,
I can also take this through the s390 tree.


s390 or trivial trees seems appropriate.



---
   qga/commands-posix.c | 13 -
   1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 1c1a165daed8..0be301a4ea77 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -156,6 +156,17 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
Error **errp)
   pid_t pid;
   Error *local_err = NULL;
   struct timeval tv;
+static const char hwclock_path[] = "/sbin/hwclock";
+static int hwclock_available = -1;
+
+if (hwclock_available < 0) {
+hwclock_available = (access(hwclock_path, X_OK) == 0);
+}
+
+if (!hwclock_available) {
+error_setg(errp, QERR_UNSUPPORTED);


In include/qapi/qmp/qerror.h we have:

/*
   * These macros will go away, please don't use in new code, and do not
   * add new ones!
   */


Sigh, it is really hard to keep track here :( I just copied from other
callers in this file...



Maybe we can replace it by "this feature is not supported on this
architecture"? (or without 'on this architecture').


This is not really architecture specific, you'd get this on any setup
that does not have /sbin/hwclock.

Q: Is libvirt doing anything with such an error message from QEMU? Do
we have the freedom to say e.g "guest-set-time is not supported" or so?
Or is it beneficial to print the same error message for any unsupported
feature?


No. Libvirt threats error messages as an opaque data. In a few cases we 
check for the class of the error and for instance issue a different 
command if class was CommandNotFound. You are free to change error 
messages as you wish.


Note that this is not true for HMP - there libvirt does some parsing to 
figure out the source of error. For instance:


https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_monitor_text.c;h=9054682d608b13347880f36cacd0e023151322e6;hb=HEAD#l36

Michal

Re: [PATCH 2/2] vfio-helpers: Free QEMUVFIOState in qemu_vfio_close()

2019-11-12 Thread Michal Privoznik


On 11/11/19 12:15 PM, Cornelia Huck wrote:

On Mon, 11 Nov 2019 11:37:42 +0100
Michal Privoznik  wrote:


The qemu_vfio_open_pci() allocates this QEMUVFIOState structure
but free counterpart is missing. Since we already have
qemu_vfio_close() which does cleanup of the state, it looks like
a perfect place to free the structure too.

==178278== 528 (360 direct, 168 indirect) bytes in 1 blocks are definitely lost 
in loss record 6,605 of 6,985
==178278==at 0x4A35476: calloc (vg_replace_malloc.c:752)
==178278==by 0x51B1158: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.6)
==178278==by 0xA68613: qemu_vfio_open_pci (vfio-helpers.c:428)
==178278==by 0x9779EA: nvme_init (nvme.c:606)
==178278==by 0x97830F: nvme_file_open (nvme.c:795)
==178278==by 0x8E9439: bdrv_open_driver (block.c:1293)
==178278==by 0x8E9E1C: bdrv_open_common (block.c:1553)
==178278==by 0x8ED264: bdrv_open_inherit (block.c:3083)
==178278==by 0x8ED79D: bdrv_open (block.c:3176)
==178278==by 0x5DA5C1: bds_tree_init (blockdev.c:670)
==178278==by 0x5E2B64: qmp_blockdev_add (blockdev.c:4354)
==178278==by 0x5ECB1D: configure_blockdev (vl.c:1202)

Signed-off-by: Michal Privoznik 
---
  util/vfio-helpers.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 813f7ec564..5ff91c1e5c 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -721,4 +721,5 @@ void qemu_vfio_close(QEMUVFIOState *s)
  close(s->device);
  close(s->group);
  close(s->container);
+g_free(s);


Not sure if freeing the parameter passed in via a function called
'close' isn't too surprising... it's not that obvious that the caller
is also relinquishing its reference to the QEMUVFIOState; maybe rename
the function to qemu_vfio_close_and_free() or so?


Alright, I'll rename the function. I worry that if free is left as an 
exercise to caller then it'll be always forgotten about. That's why I 
put the call into close function.


Michal

[PATCH 0/2] A pair of memory access problems

2019-11-11 Thread Michal Privoznik

The first patch fixes a crasher, the second fixes a memleak.

Michal Privoznik (2):
  hw/vfio/pci: Fix double free of migration_blocker
  vfio-helpers: Free QEMUVFIOState in qemu_vfio_close()

 hw/vfio/pci.c   | 2 ++
 util/vfio-helpers.c | 1 +
 2 files changed, 3 insertions(+)

-- 
2.23.0

[PATCH 2/2] vfio-helpers: Free QEMUVFIOState in qemu_vfio_close()

2019-11-11 Thread Michal Privoznik

The qemu_vfio_open_pci() allocates this QEMUVFIOState structure
but free counterpart is missing. Since we already have
qemu_vfio_close() which does cleanup of the state, it looks like
a perfect place to free the structure too.

==178278== 528 (360 direct, 168 indirect) bytes in 1 blocks are definitely lost 
in loss record 6,605 of 6,985
==178278==at 0x4A35476: calloc (vg_replace_malloc.c:752)
==178278==by 0x51B1158: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.6)
==178278==by 0xA68613: qemu_vfio_open_pci (vfio-helpers.c:428)
==178278==by 0x9779EA: nvme_init (nvme.c:606)
==178278==by 0x97830F: nvme_file_open (nvme.c:795)
==178278==by 0x8E9439: bdrv_open_driver (block.c:1293)
==178278==by 0x8E9E1C: bdrv_open_common (block.c:1553)
==178278==by 0x8ED264: bdrv_open_inherit (block.c:3083)
==178278==by 0x8ED79D: bdrv_open (block.c:3176)
==178278==by 0x5DA5C1: bds_tree_init (blockdev.c:670)
==178278==by 0x5E2B64: qmp_blockdev_add (blockdev.c:4354)
==178278==by 0x5ECB1D: configure_blockdev (vl.c:1202)

Signed-off-by: Michal Privoznik 
---
 util/vfio-helpers.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 813f7ec564..5ff91c1e5c 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -721,4 +721,5 @@ void qemu_vfio_close(QEMUVFIOState *s)
 close(s->device);
 close(s->group);
 close(s->container);
+g_free(s);
 }
-- 
2.23.0

[PATCH 1/2] hw/vfio/pci: Fix double free of migration_blocker

2019-11-11 Thread Michal Privoznik

When user tries to hotplug a VFIO device, but the operation fails
somewhere in the middle (in my testing it failed because of
RLIMIT_MEMLOCK forbidding more memory allocation), then a double
free occurs. In vfio_realize() the vdev->migration_blocker is
allocated, then something goes wrong which causes control to jump
onto 'error' label where the error is freed. But the pointer is
left pointing to invalid memory. Later, when
vfio_instance_finalize() is called, the memory is freed again.

In my testing the second hunk was sufficient to fix the bug, but
I figured the first hunk doesn't hurt either.

==169952== Invalid read of size 8
==169952==at 0xA47DCD: error_free (error.c:266)
==169952==by 0x4E0A18: vfio_instance_finalize (pci.c:3040)
==169952==by 0x8DF74C: object_deinit (object.c:606)
==169952==by 0x8DF7BE: object_finalize (object.c:620)
==169952==by 0x8E0757: object_unref (object.c:1074)
==169952==by 0x45079C: memory_region_unref (memory.c:1779)
==169952==by 0x45376B: do_address_space_destroy (memory.c:2793)
==169952==by 0xA5C600: call_rcu_thread (rcu.c:283)
==169952==by 0xA427CB: qemu_thread_start (qemu-thread-posix.c:519)
==169952==by 0x80A8457: start_thread (in /lib64/libpthread-2.29.so)
==169952==by 0x81C96EE: clone (in /lib64/libc-2.29.so)
==169952==  Address 0x143137e0 is 0 bytes inside a block of size 48 free'd
==169952==at 0x4A342BB: free (vg_replace_malloc.c:530)
==169952==by 0xA47E05: error_free (error.c:270)
==169952==by 0x4E0945: vfio_realize (pci.c:3025)
==169952==by 0x76A4FF: pci_qdev_realize (pci.c:2099)
==169952==by 0x689B9A: device_set_realized (qdev.c:876)
==169952==by 0x8E2C80: property_set_bool (object.c:2080)
==169952==by 0x8E0EF6: object_property_set (object.c:1272)
==169952==by 0x8E3FC8: object_property_set_qobject (qom-qobject.c:26)
==169952==by 0x8E11DB: object_property_set_bool (object.c:1338)
==169952==by 0x5E7BDD: qdev_device_add (qdev-monitor.c:673)
==169952==by 0x5E81E5: qmp_device_add (qdev-monitor.c:798)
==169952==by 0x9E18A8: do_qmp_dispatch (qmp-dispatch.c:132)
==169952==  Block was alloc'd at
==169952==at 0x4A35476: calloc (vg_replace_malloc.c:752)
==169952==by 0x51B1158: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.6)
==169952==by 0xA47357: error_setv (error.c:61)
==169952==by 0xA475D9: error_setg_internal (error.c:97)
==169952==by 0x4DF8C2: vfio_realize (pci.c:2737)
==169952==by 0x76A4FF: pci_qdev_realize (pci.c:2099)
==169952==by 0x689B9A: device_set_realized (qdev.c:876)
==169952==by 0x8E2C80: property_set_bool (object.c:2080)
==169952==by 0x8E0EF6: object_property_set (object.c:1272)
==169952==by 0x8E3FC8: object_property_set_qobject (qom-qobject.c:26)
==169952==by 0x8E11DB: object_property_set_bool (object.c:1338)
==169952==by 0x5E7BDD: qdev_device_add (qdev-monitor.c:673)

Signed-off-by: Michal Privoznik 
---
 hw/vfio/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e6569a7968..9c165995df 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2740,6 +2740,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 if (err) {
 error_propagate(errp, err);
 error_free(vdev->migration_blocker);
+vdev->migration_blocker = NULL;
 return;
 }
 }
@@ -3023,6 +3024,7 @@ error:
 if (vdev->migration_blocker) {
 migrate_del_blocker(vdev->migration_blocker);
 error_free(vdev->migration_blocker);
+vdev->migration_blocker = NULL;
 }
 }
 
-- 
2.23.0

Re: [Qemu-devel] [PATCH v2] qapi/qmp-dispatch: Fix error class for reporting disabled commands

2019-09-13 Thread Michal Privoznik


On 9/13/19 2:52 PM, Markus Armbruster wrote:

Michal Privoznik  writes:


If a command is disabled an error is reported. But due to usage
of error_setg() the class of the error is GenericError which does
not help callers in distinguishing this case from a case where a
qmp command fails regularly due to other reasons. Use
CommandNotFound error class which is much closer to the actual
root cause.

Signed-off-by: Michal Privoznik 
---


I'd like to tweak the commit message a bit:

   qmp-dispatch: Use CommandNotFound error for disabled commands

   If a command is disabled an error is reported.  But due to usage of
   error_setg() the class of the error is GenericError which does not
   help callers in distinguishing this case from a case where a qmp
   command fails regularly due to other reasons.

   We used to use class CommandDisabled until the great error
   simplification (commit de253f1491 for QMP and commit 93b91c59db for
   qemu-ga, both v1.2.0).

   Use CommandNotFound error class, which is close enough.

Objections?



None, thanks for taking care of this.

Michal

Re: [Qemu-devel] [PATCH] qapi: Reintroduce CommandDisabled error class

2019-08-30 Thread Michal Privoznik


On 8/30/19 1:52 PM, Markus Armbruster wrote:

Michal Privoznik  writes:


On 8/29/19 3:12 PM, Eric Blake wrote:

On 8/29/19 8:04 AM, Michal Privoznik wrote:


A bit of background: up until very recently libvirt used qemu-ga
in all or nothing way. It didn't care why a qemu-ga command
failed. But very recently a new API was introduced which
implements 'best effort' approach (in some cases) and thus
libvirt must differentiate between: {CommandNotFound,
CommandDisabled} and some generic error. While the former classes
mean the API can issue some other commands the latter raises a
red flag causing the API to fail.


Why do you need to distinguish CommandNotFound from CommandDisabled?


I don't. That's why I've put them both in curly braces. Perhaps this
says its better:

switch (klass) {
case CommandNotFound:
case CommandDisabled:
  /* okay */
  break;



So the obvious counter-question - why not use class CommandNotFound for
a command that was disabled, rather than readding another class that has
no distinctive purpose?




Because disabling a command is not the same as nonexistent
command. While a command can be disabled by user/sysadmin, they are
disabled at runtime by qemu-ga itself for a short period of time
(e.g. on FS freeze some commands are disabled - typically those which
require write disk access). And I guess reporting CommandNotFound for
a command that does exist only is disabled temporarily doesn't reflect
the reality, does it?

On the other hand, CommandNotFound would fix the issue for libvirt, so
if you don't want to invent a new error class, then that's the way to
go.


I'm fine with changing the error to CommandNotFound.

I'm reluctant to add back CommandDisabled.  I doubt it's necessary.

To arrive at an informed opinion, I had to figure out how this command
disablement stuff works.  I can just as well send it out, so here goes.

Let's review our command disable feature.

Commands are enabled on registration, see qmp_register_command().

To disable, call qmp_disable_command().  Only qga/main.c does, in two
places:

* ga_disable_non_whitelisted(): disable all commands except for
   ga_freeze_whitelist[], which is documented as /* commands that are
   safe to issue while filesystems are frozen */

* initialize_agent(): disable blacklisted commands.  I figure these are
   the ones blacklisted with -b, plus commands blacklisted due to build
   configuration.  The latter feels inappropriate; we should use QAPI
   schema conditionals to compile them out instead (QAPI conditionals
   didn't exist when the blacklisting code was written).

Disabled commands can be re-enabled with qmp_enable_command().  Only
qga/main.c does, in ga_enable_non_blacklisted().  I figure it re-enables
the commands ga_disable_non_whitelisted() disables.  Gets called when
guest-fsfreeze-freeze freezes nothing[1], and when guest-fsfreeze-thaw
succeeds[2].

Command dispatch fails when the command is disabled, in
do_qmp_dispatch().  The proposed patch changes the error reply.

QGA's guest-info shows whether a command is disabled
(GuestAgentCommandInfo member @enabled, set in qmp_command_info()).

QMP's query-commands skips disabled commands, in query_commands_cb().
Dead, as nothing ever disables QMP commands.  Skipping feels like a bad
idea anyway.

Analysis:

There are three kinds of disabled commands: compile-time (should be
compiled out instead), permanently blacklisted with -b, temporarily
disabled while filesystems are frozen.

There are two states: thawed (first two kinds disabled) and frozen (all
three kinds disabled).

Command guest-fsfreeze-freeze[3] goes to state frozen or else fails.

Command guest-fsfreeze-thaw goes to state thawed or else fails.

guest-fsfreeze-status reports the state.

Note that the transition to frozen (and thus the temporary command
disablement) is under the control of the QGA client.  There is no
TOCTTOU between guest-info telling you which commands are disabled and
executing the next command.  My point is: the client can figure out
whether a command is disabled before executing it.


Alright then, I'll respin with CommandNotFound. Both work for libvirt.

Michal

[Qemu-devel] [PATCH v2] qapi/qmp-dispatch: Fix error class for reporting disabled commands

2019-08-30 Thread Michal Privoznik

If a command is disabled an error is reported. But due to usage
of error_setg() the class of the error is GenericError which does
not help callers in distinguishing this case from a case where a
qmp command fails regularly due to other reasons. Use
CommandNotFound error class which is much closer to the actual
root cause.

Signed-off-by: Michal Privoznik 
---

This is a v2 of:

https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg06327.html

diff to v1:
- Don't introduce new error class (CommandDisabled)
- Use CommandNotFound error class

 qapi/qmp-dispatch.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 3037d353a4..bc264b3c9b 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -104,8 +104,9 @@ static QObject *do_qmp_dispatch(QmpCommandList *cmds, 
QObject *request,
 return NULL;
 }
 if (!cmd->enabled) {
-error_setg(errp, "The command %s has been disabled for this instance",
-   command);
+error_set(errp, ERROR_CLASS_COMMAND_NOT_FOUND,
+  "The command %s has been disabled for this instance",
+  command);
 return NULL;
 }
 if (oob && !(cmd->options & QCO_ALLOW_OOB)) {
-- 
2.21.0

Re: [Qemu-devel] [PATCH] qapi: Reintroduce CommandDisabled error class

2019-08-29 Thread Michal Privoznik


On 8/29/19 3:12 PM, Eric Blake wrote:

On 8/29/19 8:04 AM, Michal Privoznik wrote:


A bit of background: up until very recently libvirt used qemu-ga
in all or nothing way. It didn't care why a qemu-ga command
failed. But very recently a new API was introduced which
implements 'best effort' approach (in some cases) and thus
libvirt must differentiate between: {CommandNotFound,
CommandDisabled} and some generic error. While the former classes
mean the API can issue some other commands the latter raises a
red flag causing the API to fail.


Why do you need to distinguish CommandNotFound from CommandDisabled?


I don't. That's why I've put them both in curly braces. Perhaps this
says its better:

switch (klass) {
   case CommandNotFound:
   case CommandDisabled:
 /* okay */
 break;



So the obvious counter-question - why not use class CommandNotFound for
a command that was disabled, rather than readding another class that has
no distinctive purpose?




Because disabling a command is not the same as nonexistent command. 
While a command can be disabled by user/sysadmin, they are disabled at 
runtime by qemu-ga itself for a short period of time (e.g. on FS freeze 
some commands are disabled - typically those which require write disk 
access). And I guess reporting CommandNotFound for a command that does 
exist only is disabled temporarily doesn't reflect the reality, does it?


On the other hand, CommandNotFound would fix the issue for libvirt, so 
if you don't want to invent a new error class, then that's the way to go.


Michal

Re: [Qemu-devel] [PATCH] qapi: Reintroduce CommandDisabled error class

2019-08-29 Thread Michal Privoznik

On 8/29/19 2:10 PM, Markus Armbruster wrote:
> Michal Privoznik  writes:
> 
>> If there was a disabled command, then qemu-ga used to report
>> CommandDisabled error class (among with human readable
>> description). This changed in v1.2.0-rc0~28^2~16 in favor of
>> GenericError class.
> 
> Really?  I believe it was slightly earlier in the same series:
> 
> 93b91c59db qemu-ga: switch to the new error format on the wire
> de253f1491 qmp: switch to the new error format on the wire

Ah, you're right. It's the first commit that you reference.

> 
> The commit you mention (df1e608a01e) is merely follow-up simplification.
> 
>>  While the change might work for other
>> classes, this one should not have been dropped because it helps
>> callers distinguish the root cause of the error.
>>
>> A bit of background: up until very recently libvirt used qemu-ga
>> in all or nothing way. It didn't care why a qemu-ga command
>> failed. But very recently a new API was introduced which
>> implements 'best effort' approach (in some cases) and thus
>> libvirt must differentiate between: {CommandNotFound,
>> CommandDisabled} and some generic error. While the former classes
>> mean the API can issue some other commands the latter raises a
>> red flag causing the API to fail.
> 
> Why do you need to distinguish CommandNotFound from CommandDisabled?

I don't. That's why I've put them both in curly braces. Perhaps this 
says its better:

switch (klass) {
  case CommandNotFound:
  case CommandDisabled:
/* okay */
break;

  default:
/* bad, error out */
break;
}

> 
>> This reverts df1e608a01 partially.
>>
>> Signed-off-by: Michal Privoznik 

Michal

[Qemu-devel] [PATCH] qapi: Reintroduce CommandDisabled error class

2019-08-29 Thread Michal Privoznik

If there was a disabled command, then qemu-ga used to report
CommandDisabled error class (among with human readable
description). This changed in v1.2.0-rc0~28^2~16 in favor of
GenericError class. While the change might work for other
classes, this one should not have been dropped because it helps
callers distinguish the root cause of the error.

A bit of background: up until very recently libvirt used qemu-ga
in all or nothing way. It didn't care why a qemu-ga command
failed. But very recently a new API was introduced which
implements 'best effort' approach (in some cases) and thus
libvirt must differentiate between: {CommandNotFound,
CommandDisabled} and some generic error. While the former classes
mean the API can issue some other commands the latter raises a
red flag causing the API to fail.

This reverts df1e608a01 partially.

Signed-off-by: Michal Privoznik 
---
 include/qapi/error.h | 1 +
 qapi/error.json  | 4 +++-
 qapi/qmp-dispatch.c  | 5 +++--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/qapi/error.h b/include/qapi/error.h
index 3f95141a01..7116b86a92 100644
--- a/include/qapi/error.h
+++ b/include/qapi/error.h
@@ -129,6 +129,7 @@
 typedef enum ErrorClass {
 ERROR_CLASS_GENERIC_ERROR = QAPI_ERROR_CLASS_GENERICERROR,
 ERROR_CLASS_COMMAND_NOT_FOUND = QAPI_ERROR_CLASS_COMMANDNOTFOUND,
+ERROR_CLASS_COMMAND_DISABLED = QAPI_ERROR_CLASS_COMMANDDISABLED,
 ERROR_CLASS_DEVICE_NOT_ACTIVE = QAPI_ERROR_CLASS_DEVICENOTACTIVE,
 ERROR_CLASS_DEVICE_NOT_FOUND = QAPI_ERROR_CLASS_DEVICENOTFOUND,
 ERROR_CLASS_KVM_MISSING_CAP = QAPI_ERROR_CLASS_KVMMISSINGCAP,
diff --git a/qapi/error.json b/qapi/error.json
index 3fad08f506..334d481399 100644
--- a/qapi/error.json
+++ b/qapi/error.json
@@ -14,6 +14,8 @@
 #
 # @CommandNotFound: the requested command has not been found
 #
+# @CommandDisabled: the requested command has been disabled
+#
 # @DeviceNotActive: a device has failed to be become active
 #
 # @DeviceNotFound: the requested device has not been found
@@ -25,5 +27,5 @@
 ##
 { 'enum': 'QapiErrorClass',
   # Keep this in sync with ErrorClass in error.h
-  'data': [ 'GenericError', 'CommandNotFound',
+  'data': [ 'GenericError', 'CommandNotFound', 'CommandDisabled',
 'DeviceNotActive', 'DeviceNotFound', 'KVMMissingCap' ] }
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 3037d353a4..913b3363cb 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -104,8 +104,9 @@ static QObject *do_qmp_dispatch(QmpCommandList *cmds, 
QObject *request,
 return NULL;
 }
 if (!cmd->enabled) {
-error_setg(errp, "The command %s has been disabled for this instance",
-   command);
+error_set(errp, ERROR_CLASS_COMMAND_DISABLED,
+  "The command %s has been disabled for this instance",
+  command);
 return NULL;
 }
 if (oob && !(cmd->options & QCO_ALLOW_OOB)) {
-- 
2.21.0

[Qemu-devel] [PATCH] Makefile: Drop $(DESTDIR) from generated FW paths

2019-08-05 Thread Michal Privoznik

The way that generating firmware descriptor files work is that
for every input file, every occurrence of @DATADIR@ within the
file is replaced with $(DESTDIR)$(qemu_datadir). This works as
long as DESTDIR is empty. But in some cases (e.g. on my Gentoo
box), compilation is done in one dir, then the installation is
done to another dir and then package manager copies over the
installed files. It's obvious that $(DESTDIR) must be ignored
otherwise the generated FW descriptor files will refer to old
installation directory and ignore --prefix given to ./configure.

Steps to reproduce:
1) qemu.git $ mkdir _build _install; cd _build && \
   ../configure --prefix=/usr && make && \
   make DESTDIR=../_install install

2) Observe wrong path:
   qemu.git/_build $ grep filename ../_install/usr/share/qemu/firmware/*

Signed-off-by: Michal Privoznik 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index cfab1561b9..85862fb81a 100644
--- a/Makefile
+++ b/Makefile
@@ -881,7 +881,7 @@ ifneq ($(DESCS),)
$(INSTALL_DIR) "$(DESTDIR)$(qemu_datadir)/firmware"
set -e; tmpf=$$(mktemp); trap 'rm -f -- "$$tmpf"' EXIT; \
for x in $(DESCS); do \
-   sed -e 's,@DATADIR@,$(DESTDIR)$(qemu_datadir),' \
+   sed -e 's,@DATADIR@,$(qemu_datadir),' \
"$(SRC_PATH)/pc-bios/descriptors/$$x" > "$$tmpf"; \
$(INSTALL_DATA) "$$tmpf" \
"$(DESTDIR)$(qemu_datadir)/firmware/$$x"; \
-- 
2.21.0

[Qemu-devel] [PATCH] .gitignore: ignore some vhost-user* related files

2019-07-12 Thread Michal Privoznik

Commit d52c454aadc creates
'/contrib/vhost-user-gpu/50-qemu-gpu.json' and '/vhost-user-gpu'
and commit 06914c97d3a creates '/vhost-user-input' neither of
which is ignored by git.

Signed-off-by: Michal Privoznik 
---
 .gitignore | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/.gitignore b/.gitignore
index fd6e6c38c7..e9bbc006d3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -65,6 +65,8 @@
 /scsi/qemu-pr-helper
 /vhost-user-scsi
 /vhost-user-blk
+/vhost-user-gpu
+/vhost-user-input
 /fsdev/virtfs-proxy-helper
 *.tmp
 *.[1-9]
@@ -131,6 +133,7 @@
 /docs/interop/qemu-qmp-ref.info*
 /docs/interop/qemu-qmp-ref.txt
 /docs/version.texi
+/contrib/vhost-user-gpu/50-qemu-gpu.json
 *.tps
 .stgit-*
 .git-submodule-status
-- 
2.21.0

[Qemu-devel] [PATCH] nvme: Set number of queues later in nvme_init()

2019-07-10 Thread Michal Privoznik

When creating the admin queue in nvme_init() the variable that
holds the number of queues created is modified before actual
queue creation. This is a problem because if creating the queue
fails then the variable is left in inconsistent state. This was
actually observed when I tried to hotplug a nvme disk. The
control got to nvme_file_open() which called nvme_init() which
failed and thus nvme_close() was called which in turn called
nvme_free_queue_pair() with queue being NULL. This lead to an
instant crash:

  #0  0x55d9507ec211 in nvme_free_queue_pair (bs=0x55d952ddb880, q=0x0) at 
block/nvme.c:164
  #1  0x55d9507ee180 in nvme_close (bs=0x55d952ddb880) at block/nvme.c:729
  #2  0x55d9507ee3d5 in nvme_file_open (bs=0x55d952ddb880, 
options=0x55d952bb1410, flags=147456, errp=0x7ffd8e19e200) at block/nvme.c:781
  #3  0x55d9507629f3 in bdrv_open_driver (bs=0x55d952ddb880, 
drv=0x55d95109c1e0 , node_name=0x0, options=0x55d952bb1410, 
open_flags=147456, errp=0x7ffd8e19e310) at block.c:1291
  #4  0x55d9507633d6 in bdrv_open_common (bs=0x55d952ddb880, file=0x0, 
options=0x55d952bb1410, errp=0x7ffd8e19e310) at block.c:1551
  #5  0x55d950766881 in bdrv_open_inherit (filename=0x0, reference=0x0, 
options=0x55d952bb1410, flags=32768, parent=0x55d9538ce420, 
child_role=0x55d950eaade0 , errp=0x7ffd8e19e510) at block.c:3063
  #6  0x55d950765ae4 in bdrv_open_child_bs (filename=0x0, 
options=0x55d9541cdff0, bdref_key=0x55d950af33aa "file", parent=0x55d9538ce420, 
child_role=0x55d950eaade0 , allow_none=true, errp=0x7ffd8e19e510) 
at block.c:2712
  #7  0x55d950766633 in bdrv_open_inherit (filename=0x0, reference=0x0, 
options=0x55d9541cdff0, flags=0, parent=0x0, child_role=0x0, 
errp=0x7ffd8e19e908) at block.c:3011
  #8  0x55d950766dba in bdrv_open (filename=0x0, reference=0x0, 
options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block.c:3156
  #9  0x55d9507cb635 in blk_new_open (filename=0x0, reference=0x0, 
options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at 
block/block-backend.c:389
  #10 0x55d950465ec5 in blockdev_init (file=0x0, bs_opts=0x55d953d00390, 
errp=0x7ffd8e19e908) at blockdev.c:602

Signed-off-by: Michal Privoznik 
---
 block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index 73ed5fa75f..9896b7f7c6 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -613,12 +613,12 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 
 /* Set up admin queue. */
 s->queues = g_new(NVMeQueuePair *, 1);
-s->nr_queues = 1;
 s->queues[0] = nvme_create_queue_pair(bs, 0, NVME_QUEUE_SIZE, errp);
 if (!s->queues[0]) {
 ret = -EINVAL;
 goto out;
 }
+s->nr_queues = 1;
 QEMU_BUILD_BUG_ON(NVME_QUEUE_SIZE & 0xF000);
 s->regs->aqa = cpu_to_le32((NVME_QUEUE_SIZE << 16) | NVME_QUEUE_SIZE);
 s->regs->asq = cpu_to_le64(s->queues[0]->sq.iova);
-- 
2.21.0

Re: [Qemu-devel] [PATCH 0/2] vl: Fix -drive / -blockdev persistent reservation management

2019-06-04 Thread Michal Privoznik


On 6/4/19 5:12 PM, Markus Armbruster wrote:

This is a minimal regression fix.  There's more work to do nearby.
Left for another day.

Markus Armbruster (2):
   vl: Fix -drive / -blockdev persistent reservation management
   vl: Document why objects are delayed

  vl.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)



Reviewed-by: Michal Privoznik 

Thanks for fixing this.

Michal

Re: [Qemu-devel] [Qemu-block] [PATCH] pr-manager-helper: fix pr process been killed when reconectting

2019-05-30 Thread Michal Privoznik


On 5/29/19 11:34 AM, Paolo Bonzini wrote:

On 29/05/19 09:33, Michal Privoznik wrote:

On 5/28/19 7:45 PM, Paolo Bonzini wrote:

On 28/05/19 15:06, Jie Wang wrote:

if pr-helper been killed and qemu send disconnect event to libvirt
and libvirt started a new pr-helper process, the new pr-heleper
been killed again when qemu is connectting to the new pr-helper,
qemu will never send disconnect to libvirt, and libvirt will never
receive connected event.


I think this would let a guest "spam" events just by sending a lot PR
commands.  Also, as you said, in this case QEMU has never sent a
"connected" event, so I'm not sure why it should send a disconnection
event.


So pr manager is initialized on the first PR command and not when qemu
is starting?


It is initialized when QEMU is started, but if it dies, that's not
detected until the first PR command.  The command is retried for 5
seconds, which should give libvirt ample time to restart the PR manager
(and it's an exceptional situation anyway).


Ah yes, I recall vaguelly testing this whilst writing patches for libvirt.




If a user inside the guest could somehow kill pr-helper process in the
host then yes, they could spam libvirt/qemu. But if a user from inside a
guest can kill a process in the host that is much bigger problem than
spaming libvirt.


This is true.


Does libvirt monitor at all the pr-helper to check if it dies?  Or does
it rely exclusively on QEMU's events?


Libvirt relies solely on QEMU's events. Just like with qemu process
itself, libvirt can't rely on SIGCHILD because the daemon might be
restarted which would reparent all qemu and pr-helper processes
rendering libvirt wait for SIGCHILD useless.

But there is an exception to this: when libvirt is spawning pr-helper it
does so by following these steps:

1) Try to acquire (lock) pidfile
2) unlink(socket)
3) spawn pr-helper process (this yields child's PID)
4) wait some time until socket is created
5) some follow up work (move child's PID into same cgroup as qemu's main
thread, relabel the socket so that qemu can access it)

If any of these steps fails then child is killed. However, the PID is
not recorded anywhere and thus is forgotten once control jumps out of
the function.


Note that qemu-pr-helper supports the systemd socket activation
protocol.  Would it help if libvirt used it?


Thing is, libvirt creates a mount namespace for domains (one namespace 
for one domain). In this namespace a dummy /dev is mounted and only 
nodes that qemu is configured to have are created. For instance, you 
won't see /dev/sda there unless your domain has it as a disk. Then, 
libvirt moves pr-helper process into the same cgroups as the qemu's main 
thread. This is all done so that pr-helper has the same view of the 
system as qemu. I don't think that he same result can be achieved using 
socket activation.
Also, libvirt spawns one pr-helper per domain (so that the socket can be 
private and share seclabel with qemu process it's attached to).


Michal

Re: [Qemu-devel] [Qemu-block] [PATCH] pr-manager-helper: fix pr process been killed when reconectting

2019-05-29 Thread Michal Privoznik


On 5/28/19 7:45 PM, Paolo Bonzini wrote:

On 28/05/19 15:06, Jie Wang wrote:

if pr-helper been killed and qemu send disconnect event to libvirt
and libvirt started a new pr-helper process, the new pr-heleper
been killed again when qemu is connectting to the new pr-helper,
qemu will never send disconnect to libvirt, and libvirt will never
receive connected event.


I think this would let a guest "spam" events just by sending a lot PR
commands.  Also, as you said, in this case QEMU has never sent a
"connected" event, so I'm not sure why it should send a disconnection event.


So pr manager is initialized on the first PR command and not when qemu 
is starting?


If a user inside the guest could somehow kill pr-helper process in the 
host then yes, they could spam libvirt/qemu. But if a user from inside a 
guest can kill a process in the host that is much bigger problem than 
spaming libvirt.




Does libvirt monitor at all the pr-helper to check if it dies?  Or does
it rely exclusively on QEMU's events?


Libvirt relies solely on QEMU's events. Just like with qemu process 
itself, libvirt can't rely on SIGCHILD because the daemon might be 
restarted which would reparent all qemu and pr-helper processes 
rendering libvirt wait for SIGCHILD useless.


But there is an exception to this: when libvirt is spawning pr-helper it 
does so by following these steps:


1) Try to acquire (lock) pidfile
2) unlink(socket)
3) spawn pr-helper process (this yields child's PID)
4) wait some time until socket is created
5) some follow up work (move child's PID into same cgroup as qemu's main 
thread, relabel the socket so that qemu can access it)


If any of these steps fails then child is killed. However, the PID is 
not recorded anywhere and thus is forgotten once control jumps out of 
the function.


Michal

Re: [Qemu-devel] [PULL 22/27] vl: Create block backends before setting machine properties

2019-05-16 Thread Michal Privoznik


On 5/16/19 1:43 PM, Markus Armbruster wrote:



Actually, there is more problems with this. Trying to run a guest with
persistent reservations fails after this patch is applied (git bisect
points me to this commit). My command line is:

qemu.git $ ./x86_64-softmmu/qemu-system-x86_64 \
-monitor stdio \
-object pr-manager-helper,id=pr-helper0,path=/tmp/pr-helper0.sock
-drive
file=/dev/mapper/crypt,file.pr-manager=pr-helper0,format=raw,if=none,id=drive-scsi0-0-0-2
\
-device
scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-0-2,id=scsi0-0-0-2

Honestly, I have no idea how to fix it, so I'm just raising this issue
here. Do you want me to open a bug or something?


Let's skip the bug filing bureaucracy and go straight to debugging.


Agreed.



Actual and expected behavior of your reproducer, please :)



Actual is that qemu fails to parse cmd line:


qemu-system-x86_64: -drive 
file=/dev/mapper/crypt,file.pr-manager=pr-helper0,format=raw,if=none,id=drive-scsi0-0-0-2: 
No persistent reservation manager with id 'pr-helper0'



Which obviously is not correct, because pr-helper0 is specified.
Expected result is that qemu suceeds in parsing the cmd line and starts 
the guest. To test it you don't need /dev/mapper/* really, I mean, if 
qemu fails with a different error message (e.g. it can't open the disk 
or EPERM or whatever), it's a sign it got past the cmd line parsing 
successfuly.


Michal

1 2 3 4 >

1 - 100 of 334 matches

Mail list logo