from:"Richard W.M. Jones"

Re: hppa-firmware.img missing build-id

2024-04-23 Thread Richard W.M. Jones

On Tue, Apr 23, 2024 at 10:11:50AM -0400, Cole Robinson wrote:
> Hi,
> 
> hppa-firmware.img and hppa-firmware64.img in qemu.git are missing ELF
> build-id annotations. rpm builds on Fedora will error if an ELF binary
> doesn't have build-id:
> 
> RPM build errors:
> Missing build-id in
> /tmp/rpmbuild/BUILDROOT/qemu-9.0.0-1.rc2.fc41.x86_64/usr/share/qemu/hppa-firmware.img
> Missing build-id in
> /tmp/rpmbuild/BUILDROOT/qemu-9.0.0-1.rc2.fc41.x86_64/usr/share/qemu/hppa-firmware64.img
> Generating build-id links failed
> 
> I didn't hit this with qemu 8.2.* builds FWIW

Discussed in chat, and I think the consensus is to rebuild these
downstream, since we have the cross-compilers available.  It requires
adding -Wl,--build-id=sha1 at link time.

FWIW this worked for me on Fedora:

# dnf install /usr/bin/hppa-linux-gnu-gcc
$ pushd roms/seabios-hppa
$ make parisc
$ popd

That didn't actually add the .note.gnu.build-id section though, you
have to add this patch:

diff --git a/Makefile.parisc b/Makefile.parisc
index 36edc0c2..3e0c1812 100644
--- a/Makefile.parisc
+++ b/Makefile.parisc
@@ -168,7 +168,7 @@ $(OUT)hppa-firmware$(BIT_SUFFIX).img: $(OUT)autoconf.h 
$(OUT)head.o $(OUT)ccode3
@echo "  Linking $@"
$(Q)$(CPP) $(CPPFLAGS) -Isrc -D__ASSEMBLY__ -DBITS=$(BITS) 
src/parisc/pafirmware.lds.S -o $(OUT)pafirmware.lds
$(Q)$(CC) $(CFLAGS32FLAT) -c src/version.c -o $(OUT)version.o
-   $(Q)$(LD) -N -T $(OUT)pafirmware.lds $(OUT)head.o $(OUT)version.o -X -o 
$@ -e startup --as-needed $(OUT)ccode32flat.o $(LIBGCC)
+   $(Q)$(LD) --build-id=sha1 -N -T $(OUT)pafirmware.lds $(OUT)head.o 
$(OUT)version.o -X -o $@ -e startup --as-needed $(OUT)ccode32flat.o $(LIBGCC)
 
  Kconfig rules
 
... and then:

$ objdump -sj .note.gnu.build-id ./out/hppa-firmware.img 

./out/hppa-firmware.img: file format elf32-big

Contents of section .note.gnu.build-id:
 f000 0004 0014 0003 474e5500  GNU.
 f010 daabe2dc 4e95a4c2 bad0cc57 e7f63152  N..W..1R
 f020 46274585 F'E. 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

Re: [PATCH for-9.1 6/9] block/nbd: Use URI parsing code from glib

2024-04-04 Thread Richard W.M. Jones

On Thu, Mar 28, 2024 at 04:40:10PM +, Richard W.M. Jones wrote:
> libnbd absolutely does *not* get this right, eg:
> 
>   $ nbdinfo NBD://localhost
>   nbdinfo: nbd_connect_uri: unknown NBD URI scheme: NBD: Invalid argument
> 
> so that's a bug too.

Proposed fix:
https://gitlab.com/nbdkit/libnbd/-/merge_requests/6

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Re: [PATCH for-9.1 6/9] block/nbd: Use URI parsing code from glib

2024-03-28 Thread Richard W.M. Jones

On Thu, Mar 28, 2024 at 10:06:01AM -0500, Eric Blake wrote:
> Adjusting cc list to add upstream NBD and drop developers unrelated to
> this part of the qemu series...
> 
> On Thu, Mar 28, 2024 at 02:13:42PM +0000, Richard W.M. Jones wrote:
> > On Thu, Mar 28, 2024 at 03:06:03PM +0100, Thomas Huth wrote:
> > > Since version 2.66, glib has useful URI parsing functions, too.
> > > Use those instead of the QEMU-internal ones to be finally able
> > > to get rid of the latter. The g_uri_get_host() also takes care
> > > of removing the square brackets from IPv6 addresses, so we can
> > > drop that part of the QEMU code now, too.
> > > 
> 
> > >  
> > >  if (is_unix) {
> > >  /* nbd+unix:///export?socket=path */
> > > -if (uri->server || uri->port || strcmp(qp->p[0].name, "socket")) 
> > > {
> > > +const char *uri_socket = g_hash_table_lookup(qp, "socket");
> > > +if (uri_server || uri_port != -1 || !uri_socket) {
> > >  ret = -EINVAL;
> > >  goto out;
> > >  }
> 
> The spec for NBD URIs is at:
> 
> https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md
> 
> Should any of this spec mention case-insensitive concerns, such as
> whether 'NBD://' may be equivalent to 'nbd://', and whether
> 'nbd+unix:///?socket=x' is equivalent to 'nbd+unix:///?Socket=x'?
> Right now, I think that most implementations of NBD servers and
> clients happen to use case-sensitive parsing; but glib provides the
> option to do case-insensitive query parsing.

I haven't thought about this before, but do note that the NBD URI spec
defers to "IETF standards describing URIs" for all unanswered
questions.  RFC3986 does talk about this incidentally.  About the
scheme field it says (section 3.1):

   Although schemes are case-
   insensitive, the canonical form is lowercase and documents that
   specify schemes must do so with lowercase letters.  An implementation
   should accept uppercase letters as equivalent to lowercase in scheme
   names (e.g., allow "HTTP" as well as "http") for the sake of
   robustness but should only produce lowercase scheme names for
   consistency.

Hostname is also (obviously) case insensitive.  There's also a section
(6.2.3) which talks about normalization of URIs.

Overall it seems the intention of the RFC writer is that parsers
should handle any case; but when generating URIs (and for examples,
documentation etc) we should only generate lowercase.

libnbd absolutely does *not* get this right, eg:

  $ nbdinfo NBD://localhost
  nbdinfo: nbd_connect_uri: unknown NBD URI scheme: NBD: Invalid argument

so that's a bug too.

> If I read https://docs.gtk.org/glib/type_func.Uri.parse_params.html
> correctly, passing G_URI_PARAMS_CASE_INSENSITIVE (which you did not
> do) would mean that 'nbd+unix:///?socket=ignore=/for/real'
> would result in this g_hash_table_lookup finding only "Socket", not
> "socket".  Maybe it is worth an explicit addition to the NBD URI spec
> to mention that we intend to be case-sensitive (in the parts where it
> can be; I'm not sure if the schema part must be handled
> case-insensitively without re-reading the RFCs), and therefore that
> 'Socket=' does NOT have the same meaning as 'socket='.

We could mention this in the spec for clarity, but the current meaning
(as above) would be that case-insensitive parsing is recommended.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Re: [PATCH for-9.1 6/9] block/nbd: Use URI parsing code from glib

2024-03-28 Thread Richard W.M. Jones

On Thu, Mar 28, 2024 at 03:06:03PM +0100, Thomas Huth wrote:
> Since version 2.66, glib has useful URI parsing functions, too.
> Use those instead of the QEMU-internal ones to be finally able
> to get rid of the latter. The g_uri_get_host() also takes care
> of removing the square brackets from IPv6 addresses, so we can
> drop that part of the QEMU code now, too.
> 
> Signed-off-by: Thomas Huth 
> ---
>  block/nbd.c | 66 ++---
>  1 file changed, 38 insertions(+), 28 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index ef05f7cdfd..95b507f872 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -31,7 +31,6 @@
>  #include "qemu/osdep.h"
>  
>  #include "trace.h"
> -#include "qemu/uri.h"
>  #include "qemu/option.h"
>  #include "qemu/cutils.h"
>  #include "qemu/main-loop.h"
> @@ -1514,30 +1513,34 @@ static void nbd_client_close(BlockDriverState *bs)
>  
>  static int nbd_parse_uri(const char *filename, QDict *options)
>  {
> -URI *uri;
> +GUri *uri;
>  const char *p;
> -QueryParams *qp = NULL;
> +GHashTable *qp = NULL;
> +int qp_n;
>  int ret = 0;
>  bool is_unix;
> +const char *uri_scheme, *uri_query, *uri_server;
> +int uri_port;
>  
> -uri = uri_parse(filename);
> +uri = g_uri_parse(filename, G_URI_FLAGS_NONE, NULL);
>  if (!uri) {
>  return -EINVAL;
>  }
>  
>  /* transport */
> -if (!g_strcmp0(uri->scheme, "nbd")) {
> +uri_scheme = g_uri_get_scheme(uri);
> +if (!g_strcmp0(uri_scheme, "nbd")) {
>  is_unix = false;
> -} else if (!g_strcmp0(uri->scheme, "nbd+tcp")) {
> +} else if (!g_strcmp0(uri_scheme, "nbd+tcp")) {
>  is_unix = false;
> -} else if (!g_strcmp0(uri->scheme, "nbd+unix")) {
> +} else if (!g_strcmp0(uri_scheme, "nbd+unix")) {
>  is_unix = true;
>  } else {
>  ret = -EINVAL;
>  goto out;
>  }
>  
> -p = uri->path ? uri->path : "";
> +p = g_uri_get_path(uri) ?: "";
>  if (p[0] == '/') {
>  p++;
>  }
> @@ -1545,51 +1548,58 @@ static int nbd_parse_uri(const char *filename, QDict 
> *options)
>  qdict_put_str(options, "export", p);
>  }
>  
> -qp = query_params_parse(uri->query);
> -if (qp->n > 1 || (is_unix && !qp->n) || (!is_unix && qp->n)) {
> -ret = -EINVAL;
> -goto out;
> +uri_query = g_uri_get_query(uri);
> +if (uri_query) {
> +qp = g_uri_parse_params(uri_query, -1, "&", G_URI_PARAMS_NONE, NULL);
> +if (!qp) {
> +ret = -EINVAL;
> +goto out;
> +}
> +qp_n = g_hash_table_size(qp);
> +if (qp_n > 1 || (is_unix && !qp_n) || (!is_unix && qp_n)) {
> +ret = -EINVAL;
> +goto out;
> +}
> + }
> +
> +uri_server = g_uri_get_host(uri);
> +if (uri_server && !uri_server[0]) {
> +uri_server = NULL;
>  }
> +uri_port = g_uri_get_port(uri);
>  
>  if (is_unix) {
>  /* nbd+unix:///export?socket=path */
> -if (uri->server || uri->port || strcmp(qp->p[0].name, "socket")) {
> +const char *uri_socket = g_hash_table_lookup(qp, "socket");
> +if (uri_server || uri_port != -1 || !uri_socket) {
>  ret = -EINVAL;
>  goto out;
>  }
>  qdict_put_str(options, "server.type", "unix");
> -qdict_put_str(options, "server.path", qp->p[0].value);
> +qdict_put_str(options, "server.path", uri_socket);
>  } else {
> -QString *host;
>  char *port_str;
>  
>  /* nbd[+tcp]://host[:port]/export */
> -if (!uri->server) {
> +if (!uri_server) {
>  ret = -EINVAL;
>  goto out;
>  }
>  
> -/* strip braces from literal IPv6 address */
> -if (uri->server[0] == '[') {
> -host = qstring_from_substr(uri->server, 1,
> -   strlen(uri->server) - 1);
> -} else {
> -host = qstring_from_str(uri->server);
> -}
> -
>  qdict_put_str(options, "server.type", "inet");
> -qdict_put(options, "server.host", host);
> +qdict_put_str(options, "server.host", uri_server);
>

Re: [PATCH for-9.1 8/9] block/ssh: Use URI parsing code from glib

2024-03-28 Thread Richard W.M. Jones

On Thu, Mar 28, 2024 at 03:06:05PM +0100, Thomas Huth wrote:
> Since version 2.66, glib has useful URI parsing functions, too.
> Use those instead of the QEMU-internal ones to be finally able
> to get rid of the latter.
> 
> Signed-off-by: Thomas Huth 
> ---
>  block/ssh.c | 69 +++--
>  1 file changed, 40 insertions(+), 29 deletions(-)
> 
> diff --git a/block/ssh.c b/block/ssh.c
> index 2748253d4a..c0caf59793 100644
> --- a/block/ssh.c
> +++ b/block/ssh.c
> @@ -37,7 +37,6 @@
>  #include "qemu/ctype.h"
>  #include "qemu/cutils.h"
>  #include "qemu/sockets.h"
> -#include "qemu/uri.h"
>  #include "qapi/qapi-visit-sockets.h"
>  #include "qapi/qapi-visit-block-core.h"
>  #include "qapi/qmp/qdict.h"
> @@ -181,64 +180,76 @@ static void sftp_error_trace(BDRVSSHState *s, const 
> char *op)
>  
>  static int parse_uri(const char *filename, QDict *options, Error **errp)
>  {
> -URI *uri = NULL;
> -QueryParams *qp;
> +GUri *uri;
> +const char *uri_host, *uri_path, *uri_user, *uri_query;
>  char *port_str;
> -int i;
> +int port;
> +g_autoptr(GError) gerror = NULL;
> +char *qp_name, *qp_value;
> +GUriParamsIter qp;
>  
> -uri = uri_parse(filename);
> +uri = g_uri_parse(filename, G_URI_FLAGS_NONE, NULL);
>  if (!uri) {
>  return -EINVAL;
>  }
>  
> -if (g_strcmp0(uri->scheme, "ssh") != 0) {
> +if (g_strcmp0(g_uri_get_scheme(uri), "ssh") != 0) {
>  error_setg(errp, "URI scheme must be 'ssh'");
>  goto err;
>  }
>  
> -if (!uri->server || strcmp(uri->server, "") == 0) {
> +uri_host = g_uri_get_host(uri);
> +if (!uri_host || g_str_equal(uri_host, "")) {
>  error_setg(errp, "missing hostname in URI");
>  goto err;
>  }
>  
> -if (!uri->path || strcmp(uri->path, "") == 0) {
> +uri_path = g_uri_get_path(uri);
> +if (!uri_path || g_str_equal(uri_path, "")) {
>  error_setg(errp, "missing remote path in URI");
>  goto err;
>  }
>  
> -qp = query_params_parse(uri->query);
> -if (!qp) {
> -error_setg(errp, "could not parse query parameters");
> -goto err;
> -}
> -
> -if(uri->user && strcmp(uri->user, "") != 0) {
> -qdict_put_str(options, "user", uri->user);
> +uri_user = g_uri_get_user(uri);
> +if (uri_user && !g_str_equal(uri_user, "")) {
> +qdict_put_str(options, "user", uri_user);
>  }
>  
> -qdict_put_str(options, "server.host", uri->server);
> +qdict_put_str(options, "server.host", uri_host);
>  
> -port_str = g_strdup_printf("%d", uri->port ?: 22);
> +port = g_uri_get_port(uri);
> +port_str = g_strdup_printf("%d", port != -1 ? port : 22);
>  qdict_put_str(options, "server.port", port_str);
>  g_free(port_str);
>  
> -qdict_put_str(options, "path", uri->path);
> -
> -/* Pick out any query parameters that we understand, and ignore
> - * the rest.
> - */
> -for (i = 0; i < qp->n; ++i) {
> -if (strcmp(qp->p[i].name, "host_key_check") == 0) {
> -qdict_put_str(options, "host_key_check", qp->p[i].value);
> +qdict_put_str(options, "path", uri_path);
> +
> +uri_query = g_uri_get_query(uri);
> +if (uri_query) {
> +g_uri_params_iter_init(, uri_query, -1, "&", G_URI_PARAMS_NONE);
> +while (g_uri_params_iter_next(, _name, _value, )) {
> +if (!qp_name || !qp_value || gerror) {
> +warn_report("Failed to parse SSH URI parameters '%s'.",
> +uri_query);
> +        break;
> +}
> +/*
> + * Pick out the query parameters that we understand, and ignore
> + * (or rather warn about) the rest.
> + */
> +if (g_str_equal(qp_name, "host_key_check")) {
> +qdict_put_str(options, "host_key_check", qp_value);
> +} else {
> +warn_report("Unsupported parameter '%s' in URI.", qp_name);
> +}
>  }
>  }
>  
> -query_params_free(qp);
> -uri_free(uri);
> +g_uri_unref(uri);
>  return 0;
>  
>   err:
> -uri_free(uri);
> +g_uri_unref(uri);
>  return -EINVAL;
>  }

Seems reasonable too,

Reviewed-by: Richard W.M. Jones 


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

[PATCH v2] block/blkio: Make s->mem_region_alignment be 64 bits

2024-01-30 Thread Richard W.M. Jones

With GCC 14 the code failed to compile on i686 (and was wrong for any
version of GCC):

../block/blkio.c: In function ‘blkio_file_open’:
../block/blkio.c:857:28: error: passing argument 3 of ‘blkio_get_uint64’ from 
incompatible pointer type [-Wincompatible-pointer-types]
  857 |>mem_region_alignment);
  |^~~~
  ||
  |size_t * {aka unsigned int *}
In file included from ../block/blkio.c:12:
/usr/include/blkio.h:49:67: note: expected ‘uint64_t *’ {aka ‘long long 
unsigned int *’} but argument is of type ‘size_t *’ {aka ‘unsigned int *’}
   49 | int blkio_get_uint64(struct blkio *b, const char *name, uint64_t 
*value);
  | ~~^

Signed-off-by: Richard W.M. Jones 
---
 block/blkio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blkio.c b/block/blkio.c
index 0a0a6c0f5fd..bc2f21784c7 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -68,7 +68,7 @@ typedef struct {
 CoQueue bounce_available;
 
 /* The value of the "mem-region-alignment" property */
-size_t mem_region_alignment;
+uint64_t mem_region_alignment;
 
 /* Can we skip adding/deleting blkio_mem_regions? */
 bool needs_mem_regions;
-- 
2.43.0

Re: [PATCH [repost]] block/blkio: Don't assume size_t is 64 bit

2024-01-30 Thread Richard W.M. Jones

On Tue, Jan 30, 2024 at 01:04:46PM +0100, Kevin Wolf wrote:
> Am 30.01.2024 um 11:30 hat Richard W.M. Jones geschrieben:
> > On Tue, Jan 30, 2024 at 09:51:59AM +0100, Kevin Wolf wrote:
> > > Am 29.01.2024 um 19:53 hat Richard W.M. Jones geschrieben:
> > > > With GCC 14 the code failed to compile on i686 (and was wrong for any
> > > > version of GCC):
> > > > 
> > > > ../block/blkio.c: In function ‘blkio_file_open’:
> > > > ../block/blkio.c:857:28: error: passing argument 3 of 
> > > > ‘blkio_get_uint64’ from incompatible pointer type 
> > > > [-Wincompatible-pointer-types]
> > > >   857 |>mem_region_alignment);
> > > >   |^~~~
> > > >   ||
> > > >   |size_t * {aka unsigned int *}
> > > > In file included from ../block/blkio.c:12:
> > > > /usr/include/blkio.h:49:67: note: expected ‘uint64_t *’ {aka ‘long long 
> > > > unsigned int *’} but argument is of type ‘size_t *’ {aka ‘unsigned int 
> > > > *’}
> > > >49 | int blkio_get_uint64(struct blkio *b, const char *name, 
> > > > uint64_t *value);
> > > >   | 
> > > > ~~^
> > > > 
> > > > Signed-off-by: Richard W.M. Jones 
> > > 
> > > Why not simply make BDRVBlkioState.mem_region_alignment a uint64_t
> > > instead of keeping it size_t and doing an additional conversion with
> > > a check that requires an #if (probably to avoid a warning on 64 bit
> > > hosts because the condition is never true)?
> > 
> > The smaller change (attached) does work on i686, but this worries me a
> > little (although it doesn't give any error or warning):
> > 
> > if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > error_setg(errp, "unaligned buf %p with size %zu", host, size);
> > return BMRR_FAIL;
> > }
> 
> I don't see the problem? The calculation will now be done in 64 bits
> even on a 32 bit host, but that seems fine to me. Is there a trap I'm
> missing?

I guess not.  Stefan, any comments on whether we need to worry about
huge mem-region-alignment?  I'll post the updated patch as a new
message in a second.

Rich.

> Kevin
> 
> > From 500f3a81652dcefa79a4864c1f3fa6747c16952e Mon Sep 17 00:00:00 2001
> > From: "Richard W.M. Jones" 
> > Date: Mon, 29 Jan 2024 18:20:46 +
> > Subject: [PATCH] block/blkio: Make s->mem_region_alignment be 64 bits
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> > 
> > With GCC 14 the code failed to compile on i686 (and was wrong for any
> > version of GCC):
> > 
> > ../block/blkio.c: In function ‘blkio_file_open’:
> > ../block/blkio.c:857:28: error: passing argument 3 of ‘blkio_get_uint64’ 
> > from incompatible pointer type [-Wincompatible-pointer-types]
> >   857 |>mem_region_alignment);
> >   |^~~~
> >   |        |
> >   |size_t * {aka unsigned int *}
> > In file included from ../block/blkio.c:12:
> > /usr/include/blkio.h:49:67: note: expected ‘uint64_t *’ {aka ‘long long 
> > unsigned int *’} but argument is of type ‘size_t *’ {aka ‘unsigned int *’}
> >49 | int blkio_get_uint64(struct blkio *b, const char *name, uint64_t 
> > *value);
> >   | 
> > ~~^
> > 
> > Signed-off-by: Richard W.M. Jones 
> > ---
> >  block/blkio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/block/blkio.c b/block/blkio.c
> > index 0a0a6c0f5fd..bc2f21784c7 100644
> > --- a/block/blkio.c
> > +++ b/block/blkio.c
> > @@ -68,7 +68,7 @@ typedef struct {
> >  CoQueue bounce_available;
> >  
> >  /* The value of the "mem-region-alignment" property */
> > -size_t mem_region_alignment;
> > +uint64_t mem_region_alignment;
> >  
> >  /* Can we skip adding/deleting blkio_mem_regions? */
> >  bool needs_mem_regions;
> > -- 
> > 2.43.0
> > 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH [repost]] block/blkio: Don't assume size_t is 64 bit

2024-01-30 Thread Richard W.M. Jones

On Tue, Jan 30, 2024 at 09:51:59AM +0100, Kevin Wolf wrote:
> Am 29.01.2024 um 19:53 hat Richard W.M. Jones geschrieben:
> > With GCC 14 the code failed to compile on i686 (and was wrong for any
> > version of GCC):
> > 
> > ../block/blkio.c: In function ‘blkio_file_open’:
> > ../block/blkio.c:857:28: error: passing argument 3 of ‘blkio_get_uint64’ 
> > from incompatible pointer type [-Wincompatible-pointer-types]
> >   857 |>mem_region_alignment);
> >   |^~~~
> >   ||
> >   |size_t * {aka unsigned int *}
> > In file included from ../block/blkio.c:12:
> > /usr/include/blkio.h:49:67: note: expected ‘uint64_t *’ {aka ‘long long 
> > unsigned int *’} but argument is of type ‘size_t *’ {aka ‘unsigned int *’}
> >49 | int blkio_get_uint64(struct blkio *b, const char *name, uint64_t 
> > *value);
> >   |             
> > ~~^
> > 
> > Signed-off-by: Richard W.M. Jones 
> 
> Why not simply make BDRVBlkioState.mem_region_alignment a uint64_t
> instead of keeping it size_t and doing an additional conversion with
> a check that requires an #if (probably to avoid a warning on 64 bit
> hosts because the condition is never true)?

The smaller change (attached) does work on i686, but this worries me a
little (although it doesn't give any error or warning):

if (((uintptr_t)host | size) % s->mem_region_alignment) {
error_setg(errp, "unaligned buf %p with size %zu", host, size);
return BMRR_FAIL;
}

Rich.

> Kevin
> 
> >  block/blkio.c | 12 +++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/blkio.c b/block/blkio.c
> > index 0a0a6c0f5fd..52d78935147 100644
> > --- a/block/blkio.c
> > +++ b/block/blkio.c
> > @@ -794,6 +794,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  const char *blkio_driver = bs->drv->protocol_name;
> >  BDRVBlkioState *s = bs->opaque;
> >  int ret;
> > +uint64_t val;
> >  
> >  ret = blkio_create(blkio_driver, >blkio);
> >  if (ret < 0) {
> > @@ -854,7 +855,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  
> >  ret = blkio_get_uint64(s->blkio,
> > "mem-region-alignment",
> > -   >mem_region_alignment);
> > +   );
> >  if (ret < 0) {
> >  error_setg_errno(errp, -ret,
> >   "failed to get mem-region-alignment: %s",
> > @@ -862,6 +863,15 @@ static int blkio_file_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  blkio_destroy(>blkio);
> >  return ret;
> >  }
> > +#if HOST_LONG_BITS == 32
> > +if (val > SIZE_MAX) {
> > +error_setg_errno(errp, ERANGE,
> > + "mem-region-alignment too large for size_t");
> > +blkio_destroy(>blkio);
> > +return -ERANGE;
> > +}
> > +#endif
> > +s->mem_region_alignment = (size_t)val;
> >  
> >  ret = blkio_get_bool(s->blkio,
> >   "may-pin-mem-regions",
> > -- 
> > 2.43.0
> > 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org
>From 500f3a81652dcefa79a4864c1f3fa6747c16952e Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" 
Date: Mon, 29 Jan 2024 18:20:46 +
Subject: [PATCH] block/blkio: Make s->mem_region_alignment be 64 bits
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

With GCC 14 the code failed to compile on i686 (and was wrong for any
version of GCC):

../block/blkio.c: In function ‘blkio_file_open’:
../block/blkio.c:857:28: error: passing argument 3 of ‘blkio_get_uint64’ from 
incompatible pointer type [-Wincompatible-pointer-types]
  857 |>mem_region_alignment);
  |        ^~~~
  ||
  |size_t * {aka unsigned int *}
In file included from ../block/blkio.c:12:
/usr/include/blkio.h:49:67: note: expected ‘uint64_t *’ {aka ‘long long 
unsig

[PATCH [repost]] block/blkio: Don't assume size_t is 64 bit

2024-01-29 Thread Richard W.M. Jones

With GCC 14 the code failed to compile on i686 (and was wrong for any
version of GCC):

../block/blkio.c: In function ‘blkio_file_open’:
../block/blkio.c:857:28: error: passing argument 3 of ‘blkio_get_uint64’ from 
incompatible pointer type [-Wincompatible-pointer-types]
  857 |>mem_region_alignment);
  |^~~~
  ||
  |size_t * {aka unsigned int *}
In file included from ../block/blkio.c:12:
/usr/include/blkio.h:49:67: note: expected ‘uint64_t *’ {aka ‘long long 
unsigned int *’} but argument is of type ‘size_t *’ {aka ‘unsigned int *’}
   49 | int blkio_get_uint64(struct blkio *b, const char *name, uint64_t 
*value);
  | ~~^

Signed-off-by: Richard W.M. Jones 
---
 block/blkio.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/block/blkio.c b/block/blkio.c
index 0a0a6c0f5fd..52d78935147 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -794,6 +794,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 const char *blkio_driver = bs->drv->protocol_name;
 BDRVBlkioState *s = bs->opaque;
 int ret;
+uint64_t val;
 
 ret = blkio_create(blkio_driver, >blkio);
 if (ret < 0) {
@@ -854,7 +855,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 ret = blkio_get_uint64(s->blkio,
"mem-region-alignment",
-   >mem_region_alignment);
+   );
 if (ret < 0) {
 error_setg_errno(errp, -ret,
  "failed to get mem-region-alignment: %s",
@@ -862,6 +863,15 @@ static int blkio_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 blkio_destroy(>blkio);
 return ret;
 }
+#if HOST_LONG_BITS == 32
+if (val > SIZE_MAX) {
+error_setg_errno(errp, ERANGE,
+ "mem-region-alignment too large for size_t");
+blkio_destroy(>blkio);
+return -ERANGE;
+}
+#endif
+s->mem_region_alignment = (size_t)val;
 
 ret = blkio_get_bool(s->blkio,
  "may-pin-mem-regions",
-- 
2.43.0

[PATCH [repost]] block/blkio: Don't assume size_t is 64 bit

2024-01-29 Thread Richard W.M. Jones

Repost of the same patch as a minute ago because I messed up a couple
of email addresses in the CC.

Rich.

Re: [PATCH] xen: fix condition for enabling the Xen accelerator

2023-12-09 Thread Richard W.M. Jones

On Sat, Dec 09, 2023 at 03:32:22PM +0100, Paolo Bonzini wrote:
> A misspelled condition in xen_native.h is hiding a bug in the enablement of
> Xen for qemu-system-aarch64.  The bug becomes apparent when building for
> Xen 4.18.
>
> While the i386 emulator provides the xenpv machine type for multiple
> architectures, and therefore can be compiled with Xen enabled even
> when the host is Arm, the opposite is not true: qemu-system-aarch64
> can only be compiled with Xen support enabled when the host is Arm.
>
> Expand the computation of accelerator_targets['CONFIG_XEN'] similar
> to what is already there for KVM, and fix xen_native.h.

Here's a Fedora scratch build with Xen 4.18.0 which includes this patch:

https://koji.fedoraproject.org/koji/taskinfo?taskID=110105806

Rich.

> Cc: Stefano Stabellini 
> Cc: Richard W.M. Jones 
> Cc: Daniel P. Berrangé 
> Reported-by: Michael Young 
> Supersedes: 
> <277e21fc78b75ec459efc7f5fde628a0222c63b0.1701989261.git.m.a.yo...@durham.ac.uk>
> Signed-off-by: Paolo Bonzini 
> ---
>  include/hw/xen/xen_native.h |  2 +-
>  meson.build | 17 ++---
>  2 files changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/include/hw/xen/xen_native.h b/include/hw/xen/xen_native.h
> index 6f09c48823b..1a5ad693a4d 100644
> --- a/include/hw/xen/xen_native.h
> +++ b/include/hw/xen/xen_native.h
> @@ -532,7 +532,7 @@ static inline int 
> xendevicemodel_set_irq_level(xendevicemodel_handle *dmod,
>  }
>  #endif
>  
> -#if CONFIG_XEN_CTRL_INTERFACE_VERSION <= 41700
> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 41700
>  #define GUEST_VIRTIO_MMIO_BASE   xen_mk_ullong(0x0200)
>  #define GUEST_VIRTIO_MMIO_SIZE   xen_mk_ullong(0x0010)
>  #define GUEST_VIRTIO_MMIO_SPI_FIRST   33
> diff --git a/meson.build b/meson.build
> index ec01f8b138a..67f4ede8aea 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -123,21 +123,24 @@ if get_option('kvm').allowed() and targetos == 'linux'
>kvm_targets_c = '"' + '" ,"'.join(kvm_targets) + '"'
>  endif
>  config_host_data.set('CONFIG_KVM_TARGETS', kvm_targets_c)
> -
>  accelerator_targets = { 'CONFIG_KVM': kvm_targets }
>  
> +if cpu in ['x86', 'x86_64']
> +  xen_targets = ['i386-softmmu', 'x86_64-softmmu']
> +elif cpu in ['arm', 'aarch64']
> +  # i386 emulator provides xenpv machine type for multiple architectures
> +  xen_targets = ['i386-softmmu', 'x86_64-softmmu', 'aarch64-softmmu']
> +else
> +  xen_targets = []
> +endif
> +accelerator_targets += { 'CONFIG_XEN': xen_targets }
> +
>  if cpu in ['aarch64']
>accelerator_targets += {
>  'CONFIG_HVF': ['aarch64-softmmu']
>}
>  endif
>  
> -if cpu in ['x86', 'x86_64', 'arm', 'aarch64']
> -  # i386 emulator provides xenpv machine type for multiple architectures
> -  accelerator_targets += {
> -'CONFIG_XEN': ['i386-softmmu', 'x86_64-softmmu', 'aarch64-softmmu'],
> -  }
> -endif
>  if cpu in ['x86', 'x86_64']
>accelerator_targets += {
>  'CONFIG_HVF': ['x86_64-softmmu'],
> -- 
> 2.43.0

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

Re: [PATCH] fix qemu build with xen-4.18.0

2023-12-08 Thread Richard W.M. Jones

On Fri, Dec 08, 2023 at 08:47:07AM +, Richard W.M. Jones wrote:
> (Adding Xen maintainers)
> 
> On Thu, Dec 07, 2023 at 11:12:48PM +, Michael Young wrote:
> > Builds of qemu-8.2.0rc2 with xen-4.18.0 are currently failing
> > with errors like
> > ../hw/arm/xen_arm.c:74:5: error: ‘GUEST_VIRTIO_MMIO_SPI_LAST’ undeclared 
> > (first use in this function)
> >74 |(GUEST_VIRTIO_MMIO_SPI_LAST - GUEST_VIRTIO_MMIO_SPI_FIRST)
> >   | ^~
> > 
> > as there is an incorrect comparision in include/hw/xen/xen_native.h
> > which means that settings like GUEST_VIRTIO_MMIO_SPI_LAST
> > aren't being defined for xen-4.18.0
> > 
> > Signed-off-by: Michael Young 
> 
> Reviewed-by: Richard W.M. Jones 

Actually, see Dan Berrange's answer in this thread.

Rich.

> > ---
> >  include/hw/xen/xen_native.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/hw/xen/xen_native.h b/include/hw/xen/xen_native.h
> > index 6f09c48823..04b1ef4d34 100644
> > --- a/include/hw/xen/xen_native.h
> > +++ b/include/hw/xen/xen_native.h
> > @@ -532,7 +532,7 @@ static inline int 
> > xendevicemodel_set_irq_level(xendevicemodel_handle *dmod,
> >  }
> >  #endif
> >  
> > -#if CONFIG_XEN_CTRL_INTERFACE_VERSION <= 41700
> > +#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 41700
> >  #define GUEST_VIRTIO_MMIO_BASE   xen_mk_ullong(0x0200)
> >  #define GUEST_VIRTIO_MMIO_SIZE   xen_mk_ullong(0x0010)
> >  #define GUEST_VIRTIO_MMIO_SPI_FIRST   33
> > -- 
> > 2.43.0
> > 
> 
> -- 
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> libguestfs lets you edit virtual machines.  Supports shell scripting,
> bindings from many languages.  http://libguestfs.org
> 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Re: [PATCH] fix qemu build with xen-4.18.0

2023-12-08 Thread Richard W.M. Jones

On Fri, Dec 08, 2023 at 08:47:07AM +, Richard W.M. Jones wrote:
> (Adding Xen maintainers)
> 
> On Thu, Dec 07, 2023 at 11:12:48PM +, Michael Young wrote:
> > Builds of qemu-8.2.0rc2 with xen-4.18.0 are currently failing
> > with errors like
> > ../hw/arm/xen_arm.c:74:5: error: ‘GUEST_VIRTIO_MMIO_SPI_LAST’ undeclared 
> > (first use in this function)
> >74 |(GUEST_VIRTIO_MMIO_SPI_LAST - GUEST_VIRTIO_MMIO_SPI_FIRST)
> >   | ^~
> > 
> > as there is an incorrect comparision in include/hw/xen/xen_native.h
> > which means that settings like GUEST_VIRTIO_MMIO_SPI_LAST
> > aren't being defined for xen-4.18.0
> > 
> > Signed-off-by: Michael Young 
> 
> Reviewed-by: Richard W.M. Jones 

I added this patch to Fedora, which has Xen 4.18 and where
builds were previously failing, and now it's working:

https://koji.fedoraproject.org/koji/taskinfo?taskID=110043878

So also adding:

Tested-by: Richard W.M. Jones 

Rich.

> > ---
> >  include/hw/xen/xen_native.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/hw/xen/xen_native.h b/include/hw/xen/xen_native.h
> > index 6f09c48823..04b1ef4d34 100644
> > --- a/include/hw/xen/xen_native.h
> > +++ b/include/hw/xen/xen_native.h
> > @@ -532,7 +532,7 @@ static inline int 
> > xendevicemodel_set_irq_level(xendevicemodel_handle *dmod,
> >  }
> >  #endif
> >  
> > -#if CONFIG_XEN_CTRL_INTERFACE_VERSION <= 41700
> > +#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 41700
> >  #define GUEST_VIRTIO_MMIO_BASE   xen_mk_ullong(0x0200)
> >  #define GUEST_VIRTIO_MMIO_SIZE   xen_mk_ullong(0x0010)
> >  #define GUEST_VIRTIO_MMIO_SPI_FIRST   33
> > -- 
> > 2.43.0
> > 
> 
> -- 
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> libguestfs lets you edit virtual machines.  Supports shell scripting,
> bindings from many languages.  http://libguestfs.org
> 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Re: [PATCH] fix qemu build with xen-4.18.0

2023-12-08 Thread Richard W.M. Jones

(Adding Xen maintainers)

On Thu, Dec 07, 2023 at 11:12:48PM +, Michael Young wrote:
> Builds of qemu-8.2.0rc2 with xen-4.18.0 are currently failing
> with errors like
> ../hw/arm/xen_arm.c:74:5: error: ‘GUEST_VIRTIO_MMIO_SPI_LAST’ undeclared 
> (first use in this function)
>74 |(GUEST_VIRTIO_MMIO_SPI_LAST - GUEST_VIRTIO_MMIO_SPI_FIRST)
>   | ^~
> 
> as there is an incorrect comparision in include/hw/xen/xen_native.h
> which means that settings like GUEST_VIRTIO_MMIO_SPI_LAST
> aren't being defined for xen-4.18.0
> 
> Signed-off-by: Michael Young 

Reviewed-by: Richard W.M. Jones 

> ---
>  include/hw/xen/xen_native.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/hw/xen/xen_native.h b/include/hw/xen/xen_native.h
> index 6f09c48823..04b1ef4d34 100644
> --- a/include/hw/xen/xen_native.h
> +++ b/include/hw/xen/xen_native.h
> @@ -532,7 +532,7 @@ static inline int 
> xendevicemodel_set_irq_level(xendevicemodel_handle *dmod,
>  }
>  #endif
>  
> -#if CONFIG_XEN_CTRL_INTERFACE_VERSION <= 41700
> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 41700
>  #define GUEST_VIRTIO_MMIO_BASE   xen_mk_ullong(0x0200)
>  #define GUEST_VIRTIO_MMIO_SIZE   xen_mk_ullong(0x0010)
>  #define GUEST_VIRTIO_MMIO_SPI_FIRST   33
> -- 
> 2.43.0
> 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Re: [PATCH 4/7] hw/scsi/virtio-scsi: Use VIRTIO_SCSI_COMMON() macro

2023-10-31 Thread Richard W.M. Jones

On Tue, Oct 31, 2023 at 05:42:37PM +0100, Kevin Wolf wrote:
> Am 31.10.2023 um 14:48 hat Richard W.M. Jones geschrieben:
> > On Tue, Oct 31, 2023 at 02:17:56PM +0100, Kevin Wolf wrote:
> > > Am 17.10.2023 um 16:01 hat Philippe Mathieu-Daudé geschrieben:
> > > > Access QOM parent with the proper QOM VIRTIO_SCSI_COMMON() macro.
> > > > 
> > > > Signed-off-by: Philippe Mathieu-Daudé 
> > > > ---
> > > >  hw/scsi/virtio-scsi.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> > > > index 45b95ea070..fa53f0902c 100644
> > > > --- a/hw/scsi/virtio-scsi.c
> > > > +++ b/hw/scsi/virtio-scsi.c
> > > > @@ -761,7 +761,7 @@ static void virtio_scsi_fail_cmd_req(VirtIOSCSIReq 
> > > > *req)
> > > >  
> > > >  static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, 
> > > > VirtIOSCSIReq *req)
> > > >  {
> > > > -VirtIOSCSICommon *vs = >parent_obj;
> > > > +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
> > > >  SCSIDevice *d;
> > > >  int rc;
> > > 
> > > Why is a dynamic cast more "proper" than a static type-safe access, even
> > > more so in a hot I/O path?
> > > 
> > > Rich Jones posted a flamegraph the other day that surprised me because
> > > object_class_dynamic_class_assert() and object_dynamic_cast_assert()
> > > were shown to be a big part of scsi_req_new(). In the overall
> > > performance, it's probably dwarved by other issues, but unnecessary
> > > little things can add up, too.
> > 
> > I think Kevin is referring to one of these flamegraphs:
> > 
> >   http://oirase.annexia.org/tmp/2023-kvm-build-on-device.svg
> >   http://oirase.annexia.org/tmp/2023-kvm-build.svg
> > 
> > Here's a zoom showing scsi_req_new (hopefully this URL is stable ...):
> > 
> >   
> > http://oirase.annexia.org/tmp/2023-kvm-build-on-device.svg?s=scsi_req_new=512.9=501
> 
> Oh, this one (kvm-build-on-device) doesn't even show the object cast.
> I was looking at kvm-build, it seems, where both the class and the
> object cast are visible:
> 
> http://oirase.annexia.org/tmp/2023-kvm-build.svg?s=scsi_req_new=455.4=533

Not sure if this is the reason why, but the difference between these
two runs is that kvm-build is backed by a qcow2 file and
kvm-build-on-device is backed by a host block device.  I believe they
both were otherwise identically configured qemu.

More background in this Fedora thread:

https://lists.fedoraproject.org/archives/list/de...@lists.fedoraproject.org/thread/MSJAL7OE2TBO6U4ZWXKTKQLDSGRFK6YR/

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [PATCH 4/7] hw/scsi/virtio-scsi: Use VIRTIO_SCSI_COMMON() macro

2023-10-31 Thread Richard W.M. Jones

On Tue, Oct 31, 2023 at 02:17:56PM +0100, Kevin Wolf wrote:
> Am 17.10.2023 um 16:01 hat Philippe Mathieu-Daudé geschrieben:
> > Access QOM parent with the proper QOM VIRTIO_SCSI_COMMON() macro.
> > 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >  hw/scsi/virtio-scsi.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> > index 45b95ea070..fa53f0902c 100644
> > --- a/hw/scsi/virtio-scsi.c
> > +++ b/hw/scsi/virtio-scsi.c
> > @@ -761,7 +761,7 @@ static void virtio_scsi_fail_cmd_req(VirtIOSCSIReq *req)
> >  
> >  static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq 
> > *req)
> >  {
> > -VirtIOSCSICommon *vs = >parent_obj;
> > +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
> >  SCSIDevice *d;
> >  int rc;
> 
> Why is a dynamic cast more "proper" than a static type-safe access, even
> more so in a hot I/O path?
> 
> Rich Jones posted a flamegraph the other day that surprised me because
> object_class_dynamic_class_assert() and object_dynamic_cast_assert()
> were shown to be a big part of scsi_req_new(). In the overall
> performance, it's probably dwarved by other issues, but unnecessary
> little things can add up, too.

I think Kevin is referring to one of these flamegraphs:

  http://oirase.annexia.org/tmp/2023-kvm-build-on-device.svg
  http://oirase.annexia.org/tmp/2023-kvm-build.svg

Here's a zoom showing scsi_req_new (hopefully this URL is stable ...):

  
http://oirase.annexia.org/tmp/2023-kvm-build-on-device.svg?s=scsi_req_new=512.9=501

Note that qemu has been compiled with QOM cast debug.  This is the
default for Fedora (not RHEL) because we'd like to get early detection
of bugs from Fedora users.

There was another patch recently where a simple change saved about 5%
of total runtime in RISC-V TCG guests (admittedly a much more hot path
than this one).

  https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg02388.html

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: tcg_flush_jmp_cache replacing qatomic_set loop with memset

2023-10-16 Thread Richard W.M. Jones

On Mon, Oct 16, 2023 at 04:43:36PM +0100, Richard W.M. Jones wrote:
> Hey Paolo,
> 
> Quick question.  I'm sure the transformation below is *not* correct,
> because it doesn't preserve the invariant of the lockless structure.
> Is there a way to do this while maintaining correctness?  For example
> putting barrier() after memset?  (Note I'm also zeroing .pc which may
> be a problem.)

Alright so ignore this question :-(

After inspecting the assembly on x86-64, I can see the qatomic_set
simply expands to a regular store (actually looks like it is
unrolled by 2):

  716340:   48 c7 00 00 00 00 00movq   $0x0,(%rax)
  716347:   48 c7 40 10 00 00 00movq   $0x0,0x10(%rax)
  71634e:   00 
  71634f:   48 83 c0 20 add$0x20,%rax
  716353:   48 39 d0cmp%rdx,%rax
  716356:   75 e8   jne716340 

My memset version was twice as fast because it used some avx
instructions.

I guess this would do something more fancy on aarch64 host ...

Rich.

> The background to this is that I've been playing around with the very
> hot tb_lookup function.  Increasing the size of the jump cache (which
> hasn't changed since, erm, 2005!), looks like it could improve
> performance, plus a few other changes which I'm playing with.  However
> increasing the size causes profiles to be dominated by the loop in
> tcg_flush_jmp_cache, presumably because of all those serialized atomic ops.
> 
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 8cb6ad3511..6a21b3dba8 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -796,9 +796,7 @@ void tcg_flush_jmp_cache(CPUState *cpu)
>  return;
>  }
>  
> -for (int i = 0; i < TB_JMP_CACHE_SIZE; i++) {
> -qatomic_set(>array[i].tb, NULL);
> -}
> +memset(jc->array, 0, TB_JMP_CACHE_SIZE * sizeof jc->array[0]);
>  }
>  
>  /* This is a wrapper for common code that can not use CONFIG_SOFTMMU */
> 
> Rich.
> 
> -- 
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> nbdkit - Flexible, fast NBD server with plugins
> https://gitlab.com/nbdkit/nbdkit

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

tcg_flush_jmp_cache replacing qatomic_set loop with memset

2023-10-16 Thread Richard W.M. Jones

Hey Paolo,

Quick question.  I'm sure the transformation below is *not* correct,
because it doesn't preserve the invariant of the lockless structure.
Is there a way to do this while maintaining correctness?  For example
putting barrier() after memset?  (Note I'm also zeroing .pc which may
be a problem.)

The background to this is that I've been playing around with the very
hot tb_lookup function.  Increasing the size of the jump cache (which
hasn't changed since, erm, 2005!), looks like it could improve
performance, plus a few other changes which I'm playing with.  However
increasing the size causes profiles to be dominated by the loop in
tcg_flush_jmp_cache, presumably because of all those serialized atomic ops.

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8cb6ad3511..6a21b3dba8 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -796,9 +796,7 @@ void tcg_flush_jmp_cache(CPUState *cpu)
 return;
 }
 
-for (int i = 0; i < TB_JMP_CACHE_SIZE; i++) {
-qatomic_set(>array[i].tb, NULL);
-}
+memset(jc->array, 0, TB_JMP_CACHE_SIZE * sizeof jc->array[0]);
 }
 
 /* This is a wrapper for common code that can not use CONFIG_SOFTMMU */

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

Re: [PATCH 2/6] target/riscv: Use env_archcpu() in [check_]nanbox()

2023-10-09 Thread Richard W.M. Jones

On Mon, Oct 09, 2023 at 01:02:35PM +0200, Philippe Mathieu-Daudé wrote:
> When CPUArchState* is available (here CPURISCVState*), we
> can use the fast env_archcpu() macro to get ArchCPU* (here
> RISCVCPU*). The QOM cast RISCV_CPU() macro will be slower
> when building with --enable-qom-cast-debug.
> 
> Inspired-by: Richard W.M. Jones 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/riscv/internals.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> index b5f823c7ec..8239ae83cc 100644
> --- a/target/riscv/internals.h
> +++ b/target/riscv/internals.h
> @@ -87,7 +87,7 @@ enum {
>  static inline uint64_t nanbox_s(CPURISCVState *env, float32 f)
>  {
>  /* the value is sign-extended instead of NaN-boxing for zfinx */
> -if (RISCV_CPU(env_cpu(env))->cfg.ext_zfinx) {
> +if (env_archcpu(env)->cfg.ext_zfinx) {
>  return (int32_t)f;
>  } else {
>  return f | MAKE_64BIT_MASK(32, 32);
> @@ -97,7 +97,7 @@ static inline uint64_t nanbox_s(CPURISCVState *env, float32 
> f)
>  static inline float32 check_nanbox_s(CPURISCVState *env, uint64_t f)
>  {
>  /* Disable NaN-boxing check when enable zfinx */
> -if (RISCV_CPU(env_cpu(env))->cfg.ext_zfinx) {
> +if (env_archcpu(env)->cfg.ext_zfinx) {
>  return (uint32_t)f;
>  }
>  
> @@ -113,7 +113,7 @@ static inline float32 check_nanbox_s(CPURISCVState *env, 
> uint64_t f)
>  static inline uint64_t nanbox_h(CPURISCVState *env, float16 f)
>  {
>  /* the value is sign-extended instead of NaN-boxing for zfinx */
> -if (RISCV_CPU(env_cpu(env))->cfg.ext_zfinx) {
> +if (env_archcpu(env)->cfg.ext_zfinx) {
>  return (int16_t)f;
>  } else {
>  return f | MAKE_64BIT_MASK(16, 48);
> @@ -123,7 +123,7 @@ static inline uint64_t nanbox_h(CPURISCVState *env, 
> float16 f)
>  static inline float16 check_nanbox_h(CPURISCVState *env, uint64_t f)
>  {
>  /* Disable nanbox check when enable zfinx */
> -if (RISCV_CPU(env_cpu(env))->cfg.ext_zfinx) {
> +if (env_archcpu(env)->cfg.ext_zfinx) {
>  return (uint16_t)f;
>  }

Reviewed-by: Richard W.M. Jones 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH v2] target/riscv: Use a direct cast for better performance

2023-10-09 Thread Richard W.M. Jones

On Mon, Oct 09, 2023 at 08:36:28PM +0800, LIU Zhiwei wrote:
> 
> On 2023/10/9 5:50, Richard W.M. Jones wrote:
> >RISCV_CPU(cs) uses a checked cast.  When QOM cast debugging is enabled
> >this adds about 5% total overhead when emulating RV64 on x86-64 host.
> >
> >Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
> >disk.  The guest has a copy of the qemu source tree.  The test
> >involves compiling the qemu source tree with 'make clean; time make -j16'.
> >
> >Before making this change the compile step took 449 & 447 seconds over
> >two consecutive runs.
> >
> >After making this change, 428 & 422 seconds.
> >
> >The saving is about 5%.
> >
> >Thanks: Paolo Bonzini
> >Signed-off-by: Richard W.M. Jones 
> >Reviewed-by: Daniel Henrique Barboza 
> >---
> >  target/riscv/cpu_helper.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> >diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> >index 3a02079290..479d9863ae 100644
> >--- a/target/riscv/cpu_helper.c
> >+++ b/target/riscv/cpu_helper.c
> >@@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
> >uint64_t *cs_base, uint32_t *pflags)
> >  {
> >  CPUState *cs = env_cpu(env);
> >-RISCVCPU *cpu = RISCV_CPU(cs);
> >+/*
> >+ * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
> >+ * QOM cast debugging is enabled, so use a direct cast instead.
> >+ */
> >+RISCVCPU *cpu = (RISCVCPU *)cs;
> 
> This function is very hot. Maybe we should cache the tbflags instead
> of calculate it here. Otherwise,

This function is indeed very hot, taking over 20% of total host time
in my guest stress test.

How would we cache the flags?  AIUI they simply depend on machine
state and we must recalculate them either when the machine state
changes (sprinkle "update_tbflags" everywhere) or here.  If you have
any suggestions I can try things.

> Reviewed-by: LIU Zhiwei 

I posted a v3 based on Philippe's feedback.

Rich.

> Zhiwei
> 
> >  RISCVExtStatus fs, vs;
> >  uint32_t flags = 0;

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

[PATCH v3] target/riscv: Use env_archcpu for better performance

2023-10-09 Thread Richard W.M. Jones

RISCV_CPU(cs) uses a checked cast.  When QOM cast debugging is enabled
this adds about 5% total overhead when emulating RV64 on x86-64 host.

Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
disk.  The guest has a copy of the qemu source tree.  The test
involves compiling the qemu source tree with 'make clean; time make -j16'.

Before making this change the compile step took 449 & 447 seconds over
two consecutive runs.

After making this change: 428 & 421 seconds.

The saving is over 5%.

Thanks: Paolo Bonzini
Thanks: Philippe Mathieu-Daudé
Signed-off-by: Richard W.M. Jones 
---
 target/riscv/cpu_helper.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 3a02079290..8c28241c18 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -65,8 +65,7 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
 void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
   uint64_t *cs_base, uint32_t *pflags)
 {
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
 RISCVExtStatus fs, vs;
 uint32_t flags = 0;
 
-- 
2.41.0

[PATCH v3] target/riscv: Use env_archcpu for better performance

2023-10-09 Thread Richard W.M. Jones

In v3:

 - Use env_archcpu

 - Rerun the benchmark to get new "after" figures

Rich.

[PATCH v2] target/riscv: Use a direct cast for better performance

2023-10-08 Thread Richard W.M. Jones

RISCV_CPU(cs) uses a checked cast.  When QOM cast debugging is enabled
this adds about 5% total overhead when emulating RV64 on x86-64 host.

Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
disk.  The guest has a copy of the qemu source tree.  The test
involves compiling the qemu source tree with 'make clean; time make -j16'.

Before making this change the compile step took 449 & 447 seconds over
two consecutive runs.

After making this change, 428 & 422 seconds.

The saving is about 5%.

Thanks: Paolo Bonzini
Signed-off-by: Richard W.M. Jones 
Reviewed-by: Daniel Henrique Barboza 
---
 target/riscv/cpu_helper.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 3a02079290..479d9863ae 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
   uint64_t *cs_base, uint32_t *pflags)
 {
 CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+/*
+ * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
+ * QOM cast debugging is enabled, so use a direct cast instead.
+ */
+RISCVCPU *cpu = (RISCVCPU *)cs;
 RISCVExtStatus fs, vs;
 uint32_t flags = 0;
 
-- 
2.41.0

[PATCH v2] target/riscv: Use a direct cast for better performance

2023-10-08 Thread Richard W.M. Jones

v1 was here:
https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg02021.html

v2 is functionally exactly the same, except I changed
s/qemu cast/QOM cast/ in the comment, and added the Reviewed-by tag
received for the first version.

Rich.

Re: [PATCH] target/riscv: Use a direct cast for better performance

2023-10-07 Thread Richard W.M. Jones

If you're interested in how I found this problem, it was done using
'perf report -a -g' & flamegraphs.  This is the flamegraph of qemu (on
the host) when the guest is running the parallel compile:

  http://oirase.annexia.org/tmp/qemu-riscv.svg

If you click into 'CPU_0/TCG' at the bottom left (all the vCPUs
basically act alike), and then go to 'cpu_get_tb_cpu_state' you can
see the call to 'object_dynamic_cast_assert' taking considerable time.

If you zoom out, hit Ctrl F and type 'object_dynamic_cast_assert' into
the search box then the flamegraph will tell you this call takes about
6.6% of total time (not all, but most, attributable to the call from
'cpu_get_tb_cpu_state' -> 'object_dynamic_cast_assert').

There are several other issues in the flamegraph which I'm trying to
address, but this was the simplest one.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

[PATCH] target/riscv: Use a direct cast for better performance

2023-10-07 Thread Richard W.M. Jones

RISCV_CPU(cs) uses a checked cast.  When QOM cast debugging is enabled
this adds about 5% total overhead when emulating RV64 on x86-64 host.

Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
disk.  The guest has a copy of the qemu source tree.  The test
involves compiling the qemu source tree with 'make clean; time make -j16'.

Before making this change the compile step took 449 & 447 seconds over
two consecutive runs.

After making this change, 428 & 422 seconds.

The saving is about 5%.

Thanks: Paolo Bonzini
Signed-off-by: Richard W.M. Jones 
---
 target/riscv/cpu_helper.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 3a02079290..6174d99fb2 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
   uint64_t *cs_base, uint32_t *pflags)
 {
 CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+/*
+ * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
+ * qemu cast debugging is enabled, so use a direct cast instead.
+ */
+RISCVCPU *cpu = (RISCVCPU *)cs;
 RISCVExtStatus fs, vs;
 uint32_t flags = 0;
 
-- 
2.41.0

Re: [PATCH] qemu-img: Update documentation for compressed images

2023-09-01 Thread Richard W.M. Jones

On Fri, Sep 01, 2023 at 12:24:30PM +0200, Kevin Wolf wrote:
> Document the 'compression_type' option for qcow2, and mention that
> streamOptimized vmdk supports compression, too.
> 
> Reported-by: Richard W.M. Jones 
> Signed-off-by: Kevin Wolf 

Looks good, so:

Reviewed-by: Richard W.M. Jones 

> ---
>  docs/tools/qemu-img.rst | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> index 15aeddc6d8..ca5a2773cf 100644
> --- a/docs/tools/qemu-img.rst
> +++ b/docs/tools/qemu-img.rst
> @@ -106,7 +106,11 @@ by the used format or see the format descriptions below 
> for details.
>  
>  .. option:: -c
>  
> -  Indicates that target image must be compressed (qcow format only).
> +  Indicates that target image must be compressed (qcow/qcow2 and vmdk with
> +  streamOptimized subformat only).
> +
> +  For qcow2, the compression algorithm can be specified with the ``-o
> +  compression_type=...`` option (see below).
>  
>  .. option:: -h
>  
> @@ -776,7 +780,7 @@ Supported image file formats:
>  
>QEMU image format, the most versatile format. Use it to have smaller
>images (useful if your filesystem does not supports holes, for example
> -  on Windows), optional AES encryption, zlib based compression and
> +  on Windows), optional AES encryption, zlib or zstd based compression and
>support of multiple VM snapshots.
>  
>Supported options:
> @@ -794,6 +798,17 @@ Supported image file formats:
>``backing_fmt``
>  Image format of the base image
>  
> +  ``compression_type``
> +This option configures which compression algorithm will be used for
> +compressed clusters on the image. Note that setting this option doesn't 
> yet
> +cause the image to actually receive compressed writes. It is most 
> commonly
> +used with the ``-c`` option of ``qemu-img convert``, but can also be used
> +with the ``compress`` filter driver or backup block jobs with compression
> +enabled.
> +
> +Valid values are ``zlib`` and ``zstd``. For images that use
> +``compat=0.10``, only ``zlib`` compression is available.
> +
>``encryption``
>  If this option is set to ``on``, the image is encrypted with
>  128-bit AES-CBC.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Re: qemu-img cache modes with Linux cgroup v1

2023-07-31 Thread Richard W.M. Jones

On Mon, Jul 31, 2023 at 11:40:36AM -0400, Stefan Hajnoczi wrote:
> 3. Using buffered I/O because O_DIRECT is not universally supported?
> 
> If you can't use O_DIRECT, then qemu-img could be extended to manage its
> dirty page cache set carefully. This consists of picking a budget and
> writing back to disk when the budget is exhausted. Richard Jones has
> shared links covering posix_fadvise(2) and sync_file_range(2):
> https://lkml.iu.edu/hypermail/linux/kernel/1005.2/01845.html
> https://lkml.iu.edu/hypermail/linux/kernel/1005.2/01953.html
> 
> We can discuss qemu-img code changes and performance analysis more if
> you decide to take that direction.

There's a bit more detail in these two commits:

  
https://gitlab.com/nbdkit/libnbd/-/commit/64d50d994dd7062d5cce21f26f0e8eba0e88c87e
  
https://gitlab.com/nbdkit/nbdkit/-/commit/a956e2e75d6c88eeefecd967505667c9f176e3af

In my experience this method is much better than using O_DIRECT,
it has much fewer sharp edges.

By the way, this is a super-useful tool for measuring how much of the
page cache is being used to cache a file:

  https://github.com/Feh/nocache

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: [PATCH v2 2/2] accel/tcg: Always lock pages before translation

2023-07-07 Thread Richard W.M. Jones



I'm not sure if you meant v3 there, or if this is v2 rebased on top of
the main branch, but I tested it again and it passed 5,000 boots.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Re: [PATCH v2 2/2] accel/tcg: Always lock pages before translation

2023-07-07 Thread Richard W.M. Jones

On Fri, Jul 07, 2023 at 11:36:11AM +0100, Richard Henderson wrote:
> We had done this for user-mode by invoking page_protect
> within the translator loop.  Extend this to handle system
> mode as well.  Move page locking out of tb_link_page.
> 
> Reported-by: Liren Wei 
> Reported-by: Richard W.M. Jones 
> Signed-off-by: Richard Henderson 
> Tested-by: Richard W.M. Jones 

I tested another 5,000 iterations successfully, so this one looks good
as well.  I don't have an easy way to test qemu-user, so I only tested
qemu-system-x86_64.

Rich.

> ---
>  accel/tcg/internal.h  |  30 -
>  accel/tcg/cpu-exec.c  |  20 
>  accel/tcg/tb-maint.c  | 242 --
>  accel/tcg/translate-all.c |  43 ++-
>  accel/tcg/translator.c|  34 --
>  5 files changed, 236 insertions(+), 133 deletions(-)
> 
> diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
> index 650c3ac53f..e8cbbde581 100644
> --- a/accel/tcg/internal.h
> +++ b/accel/tcg/internal.h
> @@ -10,6 +10,7 @@
>  #define ACCEL_TCG_INTERNAL_H
>  
>  #include "exec/exec-all.h"
> +#include "exec/translate-all.h"
>  
>  /*
>   * Access to the various translations structures need to be serialised
> @@ -35,6 +36,32 @@ static inline void page_table_config_init(void) { }
>  void page_table_config_init(void);
>  #endif
>  
> +#ifdef CONFIG_USER_ONLY
> +/*
> + * For user-only, page_protect sets the page read-only.
> + * Since most execution is already on read-only pages, and we'd need to
> + * account for other TBs on the same page, defer undoing any page protection
> + * until we receive the write fault.
> + */
> +static inline void tb_lock_page0(tb_page_addr_t p0)
> +{
> +page_protect(p0);
> +}
> +
> +static inline void tb_lock_page1(tb_page_addr_t p0, tb_page_addr_t p1)
> +{
> +page_protect(p1);
> +}
> +
> +static inline void tb_unlock_page1(tb_page_addr_t p0, tb_page_addr_t p1) { }
> +static inline void tb_unlock_pages(TranslationBlock *tb) { }
> +#else
> +void tb_lock_page0(tb_page_addr_t);
> +void tb_lock_page1(tb_page_addr_t, tb_page_addr_t);
> +void tb_unlock_page1(tb_page_addr_t, tb_page_addr_t);
> +void tb_unlock_pages(TranslationBlock *);
> +#endif
> +
>  #ifdef CONFIG_SOFTMMU
>  void tb_invalidate_phys_range_fast(ram_addr_t ram_addr,
> unsigned size,
> @@ -48,8 +75,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, vaddr pc,
>  void page_init(void);
>  void tb_htable_init(void);
>  void tb_reset_jump(TranslationBlock *tb, int n);
> -TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
> -   tb_page_addr_t phys_page2);
> +TranslationBlock *tb_link_page(TranslationBlock *tb);
>  bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc);
>  void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
> uintptr_t host_pc);
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 31aa320513..fdd6d3e0e4 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -536,6 +536,26 @@ static void cpu_exec_longjmp_cleanup(CPUState *cpu)
>  if (have_mmap_lock()) {
>  mmap_unlock();
>  }
> +#else
> +/*
> + * For softmmu, a tlb_fill fault during translation will land here,
> + * and we need to release any page locks held.  In system mode we
> + * have one tcg_ctx per thread, so we know it was this cpu doing
> + * the translation.
> + *
> + * Alternative 1: Install a cleanup to be called via an exception
> + * handling safe longjmp.  It seems plausible that all our hosts
> + * support such a thing.  We'd have to properly register unwind info
> + * for the JIT for EH, rather that just for GDB.
> + *
> + * Alternative 2: Set and restore cpu->jmp_env in tb_gen_code to
> + * capture the cpu_loop_exit longjmp, perform the cleanup, and
> + * jump again to arrive here.
> + */
> +if (tcg_ctx->gen_tb) {
> +tb_unlock_pages(tcg_ctx->gen_tb);
> +tcg_ctx->gen_tb = NULL;
> +}
>  #endif
>  if (qemu_mutex_iothread_locked()) {
>  qemu_mutex_unlock_iothread();
> diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
> index 9566224d18..c406b2f7b7 100644
> --- a/accel/tcg/tb-maint.c
> +++ b/accel/tcg/tb-maint.c
> @@ -70,17 +70,7 @@ typedef struct PageDesc PageDesc;
>   */
>  #define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock())
>  
> -static inline void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
> -  PageDesc **ret_p2, tb_page_addr_

Re: [PATCH 2/2] accel/tcg: Always lock pages before translation

2023-07-06 Thread Richard W.M. Jones

On Thu, Jul 06, 2023 at 06:05:37PM +0100, Richard Henderson wrote:
> We had done this for user-mode by invoking page_protect
> within the translator loop.  Extend this to handle system
> mode as well.  Move page locking out of tb_link_page.
> 
> Reported-by: Liren Wei 
> Reported-by: Richard W.M. Jones 
> Signed-off-by: Richard Henderson 

Tested-by: Richard W.M. Jones 

I tested it across two machines, total of 10,000 iterations successfully.
Great fix, thanks.

Rich.

>  accel/tcg/internal.h  |  30 -
>  accel/tcg/cpu-exec.c  |   4 +
>  accel/tcg/tb-maint.c  | 242 --
>  accel/tcg/translate-all.c |  43 ++-
>  accel/tcg/translator.c|  34 --
>  5 files changed, 220 insertions(+), 133 deletions(-)
> 
> diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
> index 650c3ac53f..e8cbbde581 100644
> --- a/accel/tcg/internal.h
> +++ b/accel/tcg/internal.h
> @@ -10,6 +10,7 @@
>  #define ACCEL_TCG_INTERNAL_H
>  
>  #include "exec/exec-all.h"
> +#include "exec/translate-all.h"
>  
>  /*
>   * Access to the various translations structures need to be serialised
> @@ -35,6 +36,32 @@ static inline void page_table_config_init(void) { }
>  void page_table_config_init(void);
>  #endif
>  
> +#ifdef CONFIG_USER_ONLY
> +/*
> + * For user-only, page_protect sets the page read-only.
> + * Since most execution is already on read-only pages, and we'd need to
> + * account for other TBs on the same page, defer undoing any page protection
> + * until we receive the write fault.
> + */
> +static inline void tb_lock_page0(tb_page_addr_t p0)
> +{
> +page_protect(p0);
> +}
> +
> +static inline void tb_lock_page1(tb_page_addr_t p0, tb_page_addr_t p1)
> +{
> +page_protect(p1);
> +}
> +
> +static inline void tb_unlock_page1(tb_page_addr_t p0, tb_page_addr_t p1) { }
> +static inline void tb_unlock_pages(TranslationBlock *tb) { }
> +#else
> +void tb_lock_page0(tb_page_addr_t);
> +void tb_lock_page1(tb_page_addr_t, tb_page_addr_t);
> +void tb_unlock_page1(tb_page_addr_t, tb_page_addr_t);
> +void tb_unlock_pages(TranslationBlock *);
> +#endif
> +
>  #ifdef CONFIG_SOFTMMU
>  void tb_invalidate_phys_range_fast(ram_addr_t ram_addr,
> unsigned size,
> @@ -48,8 +75,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, vaddr pc,
>  void page_init(void);
>  void tb_htable_init(void);
>  void tb_reset_jump(TranslationBlock *tb, int n);
> -TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
> -   tb_page_addr_t phys_page2);
> +TranslationBlock *tb_link_page(TranslationBlock *tb);
>  bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc);
>  void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
> uintptr_t host_pc);
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 31aa320513..4e9dc0b3ba 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -531,6 +531,10 @@ static void cpu_exec_longjmp_cleanup(CPUState *cpu)
>  /* Non-buggy compilers preserve this; assert the correct value. */
>  g_assert(cpu == current_cpu);
>  
> +if (tcg_ctx->gen_tb) {
> +tb_unlock_pages(tcg_ctx->gen_tb);
> +tcg_ctx->gen_tb = NULL;
> +}
>  #ifdef CONFIG_USER_ONLY
>  clear_helper_retaddr();
>  if (have_mmap_lock()) {
> diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
> index 9566224d18..c406b2f7b7 100644
> --- a/accel/tcg/tb-maint.c
> +++ b/accel/tcg/tb-maint.c
> @@ -70,17 +70,7 @@ typedef struct PageDesc PageDesc;
>   */
>  #define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock())
>  
> -static inline void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
> -  PageDesc **ret_p2, tb_page_addr_t phys2,
> -  bool alloc)
> -{
> -*ret_p1 = NULL;
> -*ret_p2 = NULL;
> -}
> -
> -static inline void page_unlock(PageDesc *pd) { }
> -static inline void page_lock_tb(const TranslationBlock *tb) { }
> -static inline void page_unlock_tb(const TranslationBlock *tb) { }
> +static inline void tb_lock_pages(const TranslationBlock *tb) { }
>  
>  /*
>   * For user-only, since we are protecting all of memory with a single lock,
> @@ -96,7 +86,7 @@ static void tb_remove_all(void)
>  }
>  
>  /* Call with mmap_lock held. */
> -static void tb_record(TranslationBlock *tb, PageDesc *p1, PageDesc *p2)
> +static void tb_record(TranslationBlock *tb)
>  {
>  vaddr addr;
>  int flags;
> @@ -391,12 +381,108 @@ static void page_lock(PageDesc *pd)
>

Re: [PATCH 1/2] accel/tcg: Split out cpu_exec_longjmp_cleanup

2023-07-06 Thread Richard W.M. Jones

On Thu, Jul 06, 2023 at 06:05:36PM +0100, Richard Henderson wrote:
> Share the setjmp cleanup between cpu_exec_step_atomic
> and cpu_exec_setjmp.
> 
> Signed-off-by: Richard Henderson 

Reviewed-by: Richard W.M. Jones 

(I'm still testing the other one, but already up to 600 iterations)

Rich.

> ---
>  accel/tcg/cpu-exec.c | 43 +++
>  1 file changed, 19 insertions(+), 24 deletions(-)
> 
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index ba1890a373..31aa320513 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -526,6 +526,23 @@ static void cpu_exec_exit(CPUState *cpu)
>  }
>  }
>  
> +static void cpu_exec_longjmp_cleanup(CPUState *cpu)
> +{
> +/* Non-buggy compilers preserve this; assert the correct value. */
> +g_assert(cpu == current_cpu);
> +
> +#ifdef CONFIG_USER_ONLY
> +clear_helper_retaddr();
> +if (have_mmap_lock()) {
> +mmap_unlock();
> +}
> +#endif
> +if (qemu_mutex_iothread_locked()) {
> +qemu_mutex_unlock_iothread();
> +}
> +assert_no_pages_locked();
> +}
> +
>  void cpu_exec_step_atomic(CPUState *cpu)
>  {
>  CPUArchState *env = cpu->env_ptr;
> @@ -568,16 +585,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
>  cpu_tb_exec(cpu, tb, _exit);
>  cpu_exec_exit(cpu);
>  } else {
> -#ifdef CONFIG_USER_ONLY
> -clear_helper_retaddr();
> -if (have_mmap_lock()) {
> -mmap_unlock();
> -}
> -#endif
> -if (qemu_mutex_iothread_locked()) {
> -qemu_mutex_unlock_iothread();
> -}
> -assert_no_pages_locked();
> +cpu_exec_longjmp_cleanup(cpu);
>  }
>  
>  /*
> @@ -1023,20 +1031,7 @@ static int cpu_exec_setjmp(CPUState *cpu, SyncClocks 
> *sc)
>  {
>  /* Prepare setjmp context for exception handling. */
>  if (unlikely(sigsetjmp(cpu->jmp_env, 0) != 0)) {
> -/* Non-buggy compilers preserve this; assert the correct value. */
> -g_assert(cpu == current_cpu);
> -
> -#ifdef CONFIG_USER_ONLY
> -clear_helper_retaddr();
> -if (have_mmap_lock()) {
> -mmap_unlock();
> -}
> -#endif
> -if (qemu_mutex_iothread_locked()) {
> -qemu_mutex_unlock_iothread();
> -}
> -
> -assert_no_pages_locked();
> +cpu_exec_longjmp_cleanup(cpu);
>  }
>  
>  return cpu_exec_loop(cpu, sc);
> -- 
> 2.34.1

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

[PATCH] tb-maint: Document #ifdef..else..endif correctly

2023-07-06 Thread Richard W.M. Jones

It was hard to tell from the comments whether the code applied to user
mode (CONFIG_USER_ONLY) or system mode.  Fix the comments on the #else
and #endif directives to be clearer.

Signed-off-by: Richard W.M. Jones 
---
 accel/tcg/tb-maint.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 9566224d18..bf99a46872 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -156,7 +156,7 @@ static PageForEachNext foreach_tb_next(PageForEachNext tb,
 return NULL;
 }
 
-#else
+#else /* !CONFIG_USER_ONLY */
 /*
  * In system mode we want L1_MAP to be based on ram offsets.
  */
@@ -722,7 +722,7 @@ static void page_unlock_tb(const TranslationBlock *tb)
 }
 }
 }
-#endif /* CONFIG_USER_ONLY */
+#endif /* !CONFIG_USER_ONLY */
 
 /* flush all the translation blocks */
 static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
-- 
2.41.0

Collecting information from a hung qemu process

2023-06-08 Thread Richard W.M. Jones

I filed this bug about recent Linux hanging very rarely when booting
on recent qemu:

https://gitlab.com/qemu-project/qemu/-/issues/1696

As I'm able to reproduce this bug at will (albeit I have to wait for
100s or 1000s of iterations of the test), I am able to observe the
qemu process after it hangs.  Is there any information which is useful
to collect from the hung qemu, such as stack traces, etc?

I know how to collect a stack trace, but if there's other information
please give me a guide about how to collect that.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH nbd 1/4] nbd: Add multi-conn option

2023-03-10 Thread Richard W.M. Jones

On Fri, Mar 10, 2023 at 04:17:17PM -0600, Eric Blake wrote:
> On Thu, Mar 09, 2023 at 11:39:43AM +0000, Richard W.M. Jones wrote:
> > + * safe for multi-conn, force it to 1.
> > + */
> > +if (!(s->info.flags & NBD_FLAG_CAN_MULTI_CONN)) {
> > +s->multi_conn = 1;
> > +}
> > +
> >  return 0;
> 
> Is there an intended QAPI counterpart for this command?  We'll need
> that if it is to be set during the command line of
> qemu-storage-daemon.

Does it just need to be added to qapi/block-core.json?

It's a shame we can't add the API in one place and have everything
generated from there.  Like some kind of 'generator' ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: [PATCH nbd 0/4] Enable multi-conn NBD [for discussion only]

2023-03-10 Thread Richard W.M. Jones

On Fri, Mar 10, 2023 at 01:04:12PM -0600, Eric Blake wrote:
> How many of these timing numbers can be repeated with TLS in the mix?

While I have been playing with TLS and kTLS recently, it's not
something that is especially important to v2v since all NBD traffic
goes over Unix domain sockets only (ie. it's used as kind of
interprocess communication).

I could certainly provide benchmarks, although as I'm going on holiday
shortly it may be a little while.

> > Curl local server test (./multi-conn.pl curlhttp)
> > =
> > 
> > Localhost Apache serving a file over http
> >   |
> >   | http
> >   v
> > nbdkit-curl-plugin   or   qemu-nbd
> >   |
> >   | nbd+unix
> >   v
> > qemu-img convert   or   nbdcopy
> > 
> > We download an image from a local web server through
> > nbdkit-curl-plugin or qemu-nbd using the curl block driver, over NBD.
> > The image is copied to /dev/null.
> > 
> >   server  clientmulti-conn
> >   ---
> >   qemu-nbd nbdcopy  1   8.88s   
> >   qemu-nbd nbdcopy  2   8.64s   
> >   qemu-nbd nbdcopy  4   8.37s   
> >   qemu-nbdqemu-img  [u/s]   6.47s
> 
> Do we have any good feel for why qemu-img is faster than nbdcopy in
> the baseline?  But improving that is orthogonal to this series.

I do not, but we have in the past found that results can be very
sensitive to request size.  By default (and also in all of these
tests) nbdcopy is using a request size of 256K, and qemu-img is using
a request size of 2M.

> >   qemu-nbdqemu-img  1   6.56s   
> >   qemu-nbdqemu-img  2   6.63s   
> >   qemu-nbdqemu-img  4   6.50s   
> > nbdkit nbdcopy  1   12.15s  
> 
> I'm assuming this is nbdkit with your recent in-progress patches to
> have the curl plugin serve parallel requests.  But another place where
> we can investigate why nbdkit is not as performant as qemu-nbd at
> utilizing curl.
> 
> > nbdkit nbdcopy  2   7.05s   (72.36% better)
> > nbdkit nbdcopy  4   3.54s   (242.90% better)
> 
> That one is impressive!
> 
> > nbdkitqemu-img  [u/s]   6.90s   
> > nbdkitqemu-img  1   7.00s   
> 
> Minimal penalty for adding the code but not utilizing it...

[u/s] and qemu-img with multi-conn:1 ought to be identical actually.
After all, the only difference should be the restructuring of the code
to add the intermediate NBDConnState struct In this case it's probably
just measurement error.

> > nbdkitqemu-img  2   3.85s   (79.15% better)
> > nbdkitqemu-img  4   3.85s   (79.15% better)
> 
> ...and definitely shows its worth.
> 
> > 
> > 
> > Curl local file test (./multi-conn.pl curlfile)
> > ===
> > 
> > nbdkit-curl-plugin   using file:/// URI
> >   |
> >   | nbd+unix
> >   v
> > qemu-img convert   or   nbdcopy
> > 
> > We download from a file:/// URI.  This test is designed to exercise
> > NBD and some curl internal paths without the overhead from an external
> > server.  qemu-nbd doesn't support file:/// URIs so we cannot duplicate
> > the test for qemu as server.
> > 
> >   server  clientmulti-conn
> >   ---
> > nbdkit nbdcopy  1   31.32s  
> > nbdkit nbdcopy  2   20.29s  (54.38% better)
> > nbdkit nbdcopy  4   13.22s  (136.91% better)
> > nbdkitqemu-img  [u/s]   31.55s  
> 
> Here, the baseline is already comparable; both nbdcopy and qemu-img
> are parsing the image off nbdkit in about the same amount of time.
> 
> > nbdkitqemu-img  1   31.70s  
> 
> And again, minimal penalty for having the new code in place but not
> exploiting it.
> 
> > nbdkitqemu-img  2   21.60s  (46.07% better)
> > nbdkitqemu-img  4   13.88s  (127.25% better)
> 
> Plus an obvious benefit when the parallel sockets matter.
> 
> > 
> > 
> > Curl remote server test (./multi-conn.pl curlremote)
> > 
> > 
> > nbdkit-curl-plugin   using http://remote/*.qcow2 URI
> >  |
> >  | nbd+unix
> >  v
> > qemu-img convert
> > 
> > We download from a remote qcow2 file to a local raw file, converting
> > between formats during copying.
> > 
> > qemu-nbd   using http://remote/*.qcow2 URI
> > |
> > | nbd+unix
> > v
> > qemu-img convert
> > 
> > Similarly, replacing nbdkit with qemu-nbd (treating the remote file as
> > if it is raw, so the conversion is still done by qemu-img).
> > 
> > Additionally we compare downloading the file with wget (note this
> > doesn't include the time for conversion, but that

[PATCH nbd 1/4] nbd: Add multi-conn option

2023-03-09 Thread Richard W.M. Jones

Add multi-conn option to the NBD client.  This commit just adds the
option, it is not functional.

Setting this to a value > 1 permits multiple connections to the NBD
server; a typical value might be 4.  The default is 1, meaning only a
single connection is made.  If the NBD server does not advertise that
it is safe for multi-conn then this setting is forced to 1.

Signed-off-by: Richard W.M. Jones 
---
 block/nbd.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index bf2894ad5c..5ffae0b798 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -49,6 +49,7 @@
 
 #define EN_OPTSTR ":exportname="
 #define MAX_NBD_REQUESTS16
+#define MAX_MULTI_CONN  16
 
 #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
 #define INDEX_TO_HANDLE(bs, index)  ((index)  ^ (uint64_t)(intptr_t)(bs))
@@ -98,6 +99,7 @@ typedef struct BDRVNBDState {
 /* Connection parameters */
 uint32_t reconnect_delay;
 uint32_t open_timeout;
+uint32_t multi_conn;
 SocketAddress *saddr;
 char *export;
 char *tlscredsid;
@@ -1803,6 +1805,15 @@ static QemuOptsList nbd_runtime_opts = {
 "attempts until successful or until @open-timeout seconds "
 "have elapsed. Default 0",
 },
+{
+.name = "multi-conn",
+.type = QEMU_OPT_NUMBER,
+.help = "If > 1 permit up to this number of connections to the "
+"server. The server must also advertise multi-conn "
+"support.  If <= 1, only a single connection is made "
+"to the server even if the server advertises multi-conn. "
+"Default 1",
+},
 { /* end of list */ }
 },
 };
@@ -1858,6 +1869,10 @@ static int nbd_process_options(BlockDriverState *bs, 
QDict *options,
 
 s->reconnect_delay = qemu_opt_get_number(opts, "reconnect-delay", 0);
 s->open_timeout = qemu_opt_get_number(opts, "open-timeout", 0);
+s->multi_conn = qemu_opt_get_number(opts, "multi-conn", 1);
+if (s->multi_conn > MAX_MULTI_CONN) {
+s->multi_conn = MAX_MULTI_CONN;
+}
 
 ret = 0;
 
@@ -1912,6 +1927,15 @@ static int nbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 nbd_client_connection_enable_retry(s->conn);
 
+/*
+ * We set s->multi_conn in nbd_process_options above, but now that
+ * we have connected if the server doesn't advertise that it is
+ * safe for multi-conn, force it to 1.
+ */
+if (!(s->info.flags & NBD_FLAG_CAN_MULTI_CONN)) {
+s->multi_conn = 1;
+}
+
 return 0;
 
 fail:
-- 
2.39.2

[PATCH nbd 3/4] nbd: Open multiple NBD connections if multi-conn is set

2023-03-09 Thread Richard W.M. Jones

If the user opts in to multi-conn and the server advertises it, open
multiple NBD connections.  They are opened with the same parameters as
the initial connection.  Similarly on the close path the connections
are closed.

However only the zeroth connection is used, so this commit does not
actually enable multi-conn capabilities.

(XXX) The strategy here is very naive.  Firstly if you were going to
open them, you'd probably want to open them in parallel.  Secondly it
would make sense to delay opening until multiple parallel requests are
seen (perhaps above some threshold), so that simple or shortlived NBD
operations do not require multiple connections to be made.

Signed-off-by: Richard W.M. Jones 
---
 block/nbd.c | 128 
 1 file changed, 90 insertions(+), 38 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 84e8a1add0..4c99c3f865 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -124,18 +124,23 @@ static void nbd_yank(void *opaque);
 static void nbd_clear_bdrvstate(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+size_t i;
 
-nbd_client_connection_release(s->conns[0]->conn);
-s->conns[0]->conn = NULL;
+for (i = 0; i < MAX_MULTI_CONN; ++i) {
+if (s->conns[i]) {
+nbd_client_connection_release(s->conns[i]->conn);
+s->conns[i]->conn = NULL;
+
+/* Must not leave timers behind that would access freed data */
+assert(!s->conns[i]->reconnect_delay_timer);
+assert(!s->conns[i]->open_timer);
+
+g_free(s->conns[i]);
+}
+}
 
 yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
 
-/* Must not leave timers behind that would access freed data */
-assert(!s->conns[0]->reconnect_delay_timer);
-assert(!s->conns[0]->open_timer);
-
-g_free(s->conns[0]);
-
 object_unref(OBJECT(s->tlscreds));
 qapi_free_SocketAddress(s->saddr);
 s->saddr = NULL;
@@ -1905,43 +1910,39 @@ static int nbd_process_options(BlockDriverState *bs, 
QDict *options,
 return ret;
 }
 
-static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
-Error **errp)
+static NBDConnState *init_conn_state(BDRVNBDState *s)
 {
+NBDConnState *cs;
+
+cs = g_new0(NBDConnState, 1);
+cs->s = s;
+qemu_mutex_init(>requests_lock);
+qemu_co_queue_init(>free_sema);
+qemu_co_mutex_init(>send_mutex);
+qemu_co_mutex_init(>receive_mutex);
+
+return cs;
+}
+
+static int conn_state_connect(BlockDriverState *bs, NBDConnState *cs,
+  Error **errp)
+{
+BDRVNBDState *s = cs->s;
 int ret;
-BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
-s->bs = bs;
-
-s->conns[0] = g_new0(NBDConnState, 1);
-s->conns[0]->s = s;
-qemu_mutex_init(>conns[0]->requests_lock);
-qemu_co_queue_init(>conns[0]->free_sema);
-qemu_co_mutex_init(>conns[0]->send_mutex);
-qemu_co_mutex_init(>conns[0]->receive_mutex);
-
-if (!yank_register_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name), errp)) {
-return -EEXIST;
-}
-
-ret = nbd_process_options(bs, options, errp);
-if (ret < 0) {
-goto fail;
-}
-
-s->conns[0]->conn =
+cs->conn =
 nbd_client_connection_new(s->saddr, true, s->export,
   s->x_dirty_bitmap, s->tlscreds,
   s->tlshostname);
 
 if (s->open_timeout) {
-nbd_client_connection_enable_retry(s->conns[0]->conn);
-open_timer_init(s->conns[0], qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
+nbd_client_connection_enable_retry(cs->conn);
+open_timer_init(cs, qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
 s->open_timeout * NANOSECONDS_PER_SECOND);
 }
 
-s->conns[0]->state = NBD_CLIENT_CONNECTING_WAIT;
-ret = nbd_do_establish_connection(bs, s->conns[0], true, errp);
+cs->state = NBD_CLIENT_CONNECTING_WAIT;
+ret = nbd_do_establish_connection(bs, cs, true, errp);
 if (ret < 0) {
 goto fail;
 }
@@ -1951,9 +1952,44 @@ static int nbd_open(BlockDriverState *bs, QDict 
*options, int flags,
  * Delete it, because we do not want it to be around when this node
  * is drained or closed.
  */
-open_timer_del(s->conns[0]);
+open_timer_del(cs);
 
-nbd_client_connection_enable_retry(s->conns[0]->conn);
+nbd_client_connection_enable_retry(cs->conn);
+
+return 0;
+
+fail:
+open_timer_del(cs);
+return ret;
+}
+
+static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
+Error **errp)
+{
+int ret;
+BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+size_t i;
+
+s->bs

[PATCH nbd 4/4] nbd: Enable multi-conn using round-robin

2023-03-09 Thread Richard W.M. Jones

Enable NBD multi-conn by spreading operations across multiple
connections.

(XXX) This uses a naive round-robin approach which could be improved.
For example we could look at how many requests are in flight and
assign operations to the connections with fewest.  Or we could try to
estimate (based on size of requests outstanding) the load on each
connection.  But this implementation doesn't do any of that.

Signed-off-by: Richard W.M. Jones 
---
 block/nbd.c | 67 +++--
 1 file changed, 49 insertions(+), 18 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 4c99c3f865..df32ba67ed 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1232,6 +1232,26 @@ static int coroutine_fn nbd_co_request(NBDConnState *cs, 
NBDRequest *request,
 return ret ? ret : request_ret;
 }
 
+/*
+ * If multi-conn, choose a connection for this operation.
+ */
+static NBDConnState *choose_connection(BDRVNBDState *s)
+{
+static size_t next;
+size_t i;
+
+if (s->multi_conn <= 1) {
+return s->conns[0];
+}
+
+/* XXX Stupid simple round robin. */
+i = qatomic_fetch_inc();
+i %= s->multi_conn;
+
+assert(s->conns[i] != NULL);
+return s->conns[i];
+}
+
 static int coroutine_fn nbd_client_co_preadv(BlockDriverState *bs, int64_t 
offset,
  int64_t bytes, QEMUIOVector *qiov,
  BdrvRequestFlags flags)
@@ -1244,7 +1264,7 @@ static int coroutine_fn 
nbd_client_co_preadv(BlockDriverState *bs, int64_t offse
 .from = offset,
 .len = bytes,
 };
-NBDConnState * const cs = s->conns[0];
+NBDConnState * const cs = choose_connection(s);
 
 assert(bytes <= NBD_MAX_BUFFER_SIZE);
 
@@ -1301,7 +1321,7 @@ static int coroutine_fn 
nbd_client_co_pwritev(BlockDriverState *bs, int64_t offs
 .from = offset,
 .len = bytes,
 };
-NBDConnState * const cs = s->conns[0];
+NBDConnState * const cs = choose_connection(s);
 
 assert(!(cs->info.flags & NBD_FLAG_READ_ONLY));
 if (flags & BDRV_REQ_FUA) {
@@ -1326,7 +1346,7 @@ static int coroutine_fn 
nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_
 .from = offset,
 .len = bytes,  /* .len is uint32_t actually */
 };
-NBDConnState * const cs = s->conns[0];
+NBDConnState * const cs = choose_connection(s);
 
 assert(bytes <= UINT32_MAX); /* rely on max_pwrite_zeroes */
 
@@ -1357,7 +1377,13 @@ static int coroutine_fn 
nbd_client_co_flush(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 NBDRequest request = { .type = NBD_CMD_FLUSH };
-NBDConnState * const cs = s->conns[0];
+
+/*
+ * Multi-conn (if used) guarantees that flushing on any connection
+ * flushes caches on all connections, so we can perform this
+ * operation on any.
+ */
+NBDConnState * const cs = choose_connection(s);
 
 if (!(cs->info.flags & NBD_FLAG_SEND_FLUSH)) {
 return 0;
@@ -1378,7 +1404,7 @@ static int coroutine_fn 
nbd_client_co_pdiscard(BlockDriverState *bs, int64_t off
 .from = offset,
 .len = bytes, /* len is uint32_t */
 };
-NBDConnState * const cs = s->conns[0];
+NBDConnState * const cs = choose_connection(s);
 
 assert(bytes <= UINT32_MAX); /* rely on max_pdiscard */
 
@@ -1398,7 +1424,7 @@ static int coroutine_fn nbd_client_co_block_status(
 NBDExtent extent = { 0 };
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 Error *local_err = NULL;
-NBDConnState * const cs = s->conns[0];
+NBDConnState * const cs = choose_connection(s);
 
 NBDRequest request = {
 .type = NBD_CMD_BLOCK_STATUS,
@@ -2027,7 +2053,7 @@ static int coroutine_fn nbd_co_flush(BlockDriverState *bs)
 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-NBDConnState * const cs = s->conns[0];
+NBDConnState * const cs = choose_connection(s);
 uint32_t min = cs->info.min_block;
 uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, cs->info.max_block);
 
@@ -2085,7 +2111,7 @@ static int coroutine_fn nbd_co_truncate(BlockDriverState 
*bs, int64_t offset,
 BdrvRequestFlags flags, Error **errp)
 {
 BDRVNBDState *s = bs->opaque;
-NBDConnState * const cs = s->conns[0];
+NBDConnState * const cs = choose_connection(s);
 
 if (offset != cs->info.size && exact) {
 error_setg(errp, "Cannot resize NBD nodes");
@@ -2168,24 +2194,29 @@ static const char *const nbd_strong_runtime_opts[] = {
 static void nbd_cancel_in_flight(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-NBDConnState * const cs = s->conns[0];
+size_t i;
+NBDConnState *cs;
 
-reconnect_delay_timer_del(cs);
+for (i = 0

[PATCH nbd 2/4] nbd: Split out block device state from underlying NBD connections

2023-03-09 Thread Richard W.M. Jones

To implement multi-conn, we will put multiple underlying NBD
connections (ie. NBDClientConnection) inside the NBD block device
handle (BDRVNBDState).  This requires first breaking the one-to-one
relationship between NBDClientConnection and BDRVNBDState.

To do this a new structure (NBDConnState) is implemented.
NBDConnState takes all the per-connection state out of the block
driver struct.  BDRVNBDState now contains a conns[] array of pointers
to NBDConnState, one for each underlying multi-conn connection.

After this change there is still a one-to-one relationship because we
only ever use the zeroth slot in the conns[] array.  Thus this does
not implement multi-conn yet.

Signed-off-by: Richard W.M. Jones 
---
 block/coroutines.h |   5 +-
 block/nbd.c| 674 -
 2 files changed, 358 insertions(+), 321 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index dd9f3d449b..14b01d8591 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -62,7 +62,7 @@ int coroutine_fn GRAPH_RDLOCK
 bdrv_co_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 
 int coroutine_fn
-nbd_co_do_establish_connection(BlockDriverState *bs, bool blocking,
+nbd_co_do_establish_connection(BlockDriverState *bs, void *cs, bool blocking,
Error **errp);
 
 
@@ -86,6 +86,7 @@ bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState **file,
int *depth);
 int co_wrapper_mixed
-nbd_do_establish_connection(BlockDriverState *bs, bool blocking, Error **errp);
+nbd_do_establish_connection(BlockDriverState *bs, void *cs, bool blocking,
+Error **errp);
 
 #endif /* BLOCK_COROUTINES_H */
diff --git a/block/nbd.c b/block/nbd.c
index 5ffae0b798..84e8a1add0 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -51,8 +51,8 @@
 #define MAX_NBD_REQUESTS16
 #define MAX_MULTI_CONN  16
 
-#define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
-#define INDEX_TO_HANDLE(bs, index)  ((index)  ^ (uint64_t)(intptr_t)(bs))
+#define HANDLE_TO_INDEX(cs, handle) ((handle) ^ (uint64_t)(intptr_t)(cs))
+#define INDEX_TO_HANDLE(cs, index)  ((index)  ^ (uint64_t)(intptr_t)(cs))
 
 typedef struct {
 Coroutine *coroutine;
@@ -67,7 +67,10 @@ typedef enum NBDClientState {
 NBD_CLIENT_QUIT
 } NBDClientState;
 
-typedef struct BDRVNBDState {
+/* A single client connection. */
+typedef struct NBDConnState {
+struct BDRVNBDState *s; /* Points to enclosing BDRVNBDState */
+
 QIOChannel *ioc; /* The current I/O channel */
 NBDExportInfo info;
 
@@ -94,7 +97,12 @@ typedef struct BDRVNBDState {
 
 QEMUTimer *open_timer;
 
-BlockDriverState *bs;
+NBDClientConnection *conn;
+} NBDConnState;
+
+typedef struct BDRVNBDState {
+/* The underlying NBD connections */
+NBDConnState *conns[MAX_MULTI_CONN];
 
 /* Connection parameters */
 uint32_t reconnect_delay;
@@ -108,7 +116,7 @@ typedef struct BDRVNBDState {
 char *x_dirty_bitmap;
 bool alloc_depth;
 
-NBDClientConnection *conn;
+BlockDriverState *bs;
 } BDRVNBDState;
 
 static void nbd_yank(void *opaque);
@@ -117,14 +125,16 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
-nbd_client_connection_release(s->conn);
-s->conn = NULL;
+nbd_client_connection_release(s->conns[0]->conn);
+s->conns[0]->conn = NULL;
 
 yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
 
 /* Must not leave timers behind that would access freed data */
-assert(!s->reconnect_delay_timer);
-assert(!s->open_timer);
+assert(!s->conns[0]->reconnect_delay_timer);
+assert(!s->conns[0]->open_timer);
+
+g_free(s->conns[0]);
 
 object_unref(OBJECT(s->tlscreds));
 qapi_free_SocketAddress(s->saddr);
@@ -151,139 +161,143 @@ static bool coroutine_fn 
nbd_recv_coroutine_wake_one(NBDClientRequest *req)
 return false;
 }
 
-static void coroutine_fn nbd_recv_coroutines_wake(BDRVNBDState *s)
+static void coroutine_fn nbd_recv_coroutines_wake(NBDConnState *cs)
 {
 int i;
 
-QEMU_LOCK_GUARD(>receive_mutex);
+QEMU_LOCK_GUARD(>receive_mutex);
 for (i = 0; i < MAX_NBD_REQUESTS; i++) {
-if (nbd_recv_coroutine_wake_one(>requests[i])) {
+if (nbd_recv_coroutine_wake_one(>requests[i])) {
 return;
 }
 }
 }
 
 /* Called with s->requests_lock held.  */
-static void coroutine_fn nbd_channel_error_locked(BDRVNBDState *s, int ret)
+static void coroutine_fn nbd_channel_error_locked(NBDConnState *cs, int ret)
 {
-if (s->state == NBD_CLIENT_CONNECTED) {
-qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+if (cs->state == NBD_CLIENT_CONNECTED) {
+qio_channel_shutdown(cs->ioc, QIO_CHA

[PATCH nbd 0/4] Enable multi-conn NBD [for discussion only]

2023-03-09 Thread Richard W.M. Jones

[ Patch series also available here, along with this cover letter and the
  script used to generate test results:
  https://gitlab.com/rwmjones/qemu/-/commits/2023-nbd-multi-conn-v1 ]

This patch series adds multi-conn support to the NBD block driver in
qemu.  It is only meant for discussion and testing because it has a
number of obvious shortcomings (see "XXX" in commit messages and
code).  If we decided this was a good idea, we can work on a better
patch.

The Network Block Driver (NBD) protocol allows servers to advertise
that they are capable of multi-conn.  This means they obey certain
requirements around how data is cached, allowing multiple connections
to be safely opened to the NBD server.  For example, a flush or FUA
operation on one connection is guaranteed to flush the cache on all
connections.

Clients that use multi-conn can achieve better performance.  This
seems to be down to at least two factors:

 - Avoids "head of line blocking" of large requests.

 - With NBD over Unix domain sockets, more cores can be used.

qemu-nbd, nbdkit and libnbd have all supported multi-conn for ages,
but qemu's built in NBD client does not, which is what this patch
fixes.

Below I've produced a few benchmarks.

Note these are mostly concoted to try to test NBD performance and may
not make sense in their own terms (eg. qemu's disk image layer has a
curl client so you wouldn't need to run one separately).  In the real
world we use long pipelines of NBD operations where different tools
are mixed together to achieve efficient downloads, conversions, disk
modifications and sparsification, and they would exercise different
aspects of this.

I've also included nbdcopy as a client for comparison in some tests.

All tests were run 4 times, the first result discarded, and the last 3
averaged.  If any of the last 3 were > 10% away from the average then
the test was stopped.

My summary:

 - It works effectively for qemu client & nbdkit server, especially in
   cases where the server does large, heavyweight requests.  This is
   important for us because virt-v2v uses an nbdkit Python plugin and
   various other heavyweight plugins (eg. plugins that access remote
   servers for each request).

 - It seems to make little or no difference with qemu + qemu-nbd
   server.  I speculate that's because qemu-nbd doesn't support system
   threads, so networking is bottlenecked through a single core.  Even
   though there are coroutines handling different sockets, they must
   all wait in turn to issue send(3) or recv(3) calls on the same
   core.

 - qemu-img unfortunately uses a single thread for all coroutines so
   it suffers from a similar problem to qemu-nbd.  This change would
   be much more effective if we could distribute coroutines across
   threads.

 - For tests which are highly bottlenecked on disk I/O (eg. the large
   local file test and null test) multi-conn doesn't make much
   difference.

 - Multi-conn even with only 2 connections can make up for the
   overhead of range requests, exceeding the performance of wget.

 - In the curlremote test, qemu-nbd is especially slow, for unknown
   reasons.


Integrity test (./multi-conn.pl integrity)
==

nbdkit-sparse-random-plugin
  | ^
  | nbd+unix| nbd+unix
  v |
   qemu-img convert

Reading from and writing the same data back to nbdkit sparse-random
plugin checks that the data written is the same as the data read.
This uses two Unix domain sockets, with or without multi-conn.  This
test is mainly here to check we don't crash or corrupt data with this
patch.

  server  clientmulti-conn
  ---
nbdkitqemu-img  [u/s]   9.07s   
nbdkitqemu-img  1   9.05s   
nbdkitqemu-img  2   9.02s   
nbdkitqemu-img  4   8.98s   

[u/s] = upstream qemu 7.2.0


Curl local server test (./multi-conn.pl curlhttp)
=

Localhost Apache serving a file over http
  |
  | http
  v
nbdkit-curl-plugin   or   qemu-nbd
  |
  | nbd+unix
  v
qemu-img convert   or   nbdcopy

We download an image from a local web server through
nbdkit-curl-plugin or qemu-nbd using the curl block driver, over NBD.
The image is copied to /dev/null.

  server  clientmulti-conn
  ---
  qemu-nbd nbdcopy  1   8.88s   
  qemu-nbd nbdcopy  2   8.64s   
  qemu-nbd nbdcopy  4   8.37s   
  qemu-nbdqemu-img  [u/s]   6.47s   
  qemu-nbdqemu-img  1   6.56s   
  qemu-nbdqemu-img  2   6.63s   
  qemu-nbdqemu-img  4   6.50s   
nbdkit nbdcopy  1   12.15s  
nbdkit

[PATCH v2] tcg: Include "qemu/timer.h" for profile_getclock

2023-03-03 Thread Richard W.M. Jones

When CONFIG_PROFILER is set there are various undefined references to
profile_getclock.  Include the header which defines this function.

For example:

../tcg/tcg.c: In function ‘tcg_gen_code’:
../tcg/tcg.c:4905:51: warning: implicit declaration of function 
‘profile_getclock’ [-Wimplicit-function-declaration]
 4905 | qatomic_set(>opt_time, prof->opt_time - profile_getclock());
  |   ^~~~

Thanks: Philippe Mathieu-Daudé
Signed-off-by: Richard W.M. Jones 
---
 accel/tcg/tcg-accel-ops.c | 1 +
 accel/tcg/translate-all.c | 1 +
 softmmu/runstate.c| 1 +
 tcg/tcg.c | 1 +
 4 files changed, 4 insertions(+)

diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index aeb1cbaf65..af35e0d092 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -31,6 +31,7 @@
 #include "sysemu/cpu-timers.h"
 #include "qemu/main-loop.h"
 #include "qemu/guest-random.h"
+#include "qemu/timer.h"
 #include "exec/exec-all.h"
 #include "exec/hwaddr.h"
 #include "exec/gdbstub.h"
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4b5abc0f44..a5bea8f99c 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -51,6 +51,7 @@
 #include "qemu/qemu-print.h"
 #include "qemu/main-loop.h"
 #include "qemu/cacheinfo.h"
+#include "qemu/timer.h"
 #include "exec/log.h"
 #include "sysemu/cpus.h"
 #include "sysemu/cpu-timers.h"
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index f9ad88e6a7..9b3611d56d 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -46,6 +46,7 @@
 #include "qemu/module.h"
 #include "qemu/plugin.h"
 #include "qemu/sockets.h"
+#include "qemu/timer.h"
 #include "qemu/thread.h"
 #include "qom/object.h"
 #include "qom/object_interfaces.h"
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 506ae3..6b830ade4c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -36,6 +36,7 @@
 #include "qemu/qemu-print.h"
 #include "qemu/cacheflush.h"
 #include "qemu/cacheinfo.h"
+#include "qemu/timer.h"
 
 /* Note: the long term plan is to reduce the dependencies on the QEMU
CPU definitions. Currently they are used for qemu_ld/st
-- 
2.39.2

[PATCH] tcg: Include "qemu/timer.h" for profile_getclock

2023-03-02 Thread Richard W.M. Jones

When CONFIG_PROFILER is set there are various undefined references to
profile_getclock.  Include the header which defines this function.

For example:

../tcg/tcg.c: In function ‘tcg_gen_code’:
../tcg/tcg.c:4905:51: warning: implicit declaration of function 
‘profile_getclock’ [-Wimplicit-function-declaration]
 4905 | qatomic_set(>opt_time, prof->opt_time - profile_getclock());
  |   ^~~~

Signed-off-by: Richard W.M. Jones 
---
 tcg/tcg.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 506ae3..6b830ade4c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -36,6 +36,7 @@
 #include "qemu/qemu-print.h"
 #include "qemu/cacheflush.h"
 #include "qemu/cacheinfo.h"
+#include "qemu/timer.h"
 
 /* Note: the long term plan is to reduce the dependencies on the QEMU
CPU definitions. Currently they are used for qemu_ld/st
-- 
2.39.2

Re: [GSoC 2023] Introducing Myself

2023-03-02 Thread Richard W.M. Jones

On Thu, Mar 02, 2023 at 07:17:46PM +0530, Ayush Singh wrote:
> Hello Everyone,
> 
> I am Ayush Singh, a 3rd-year university student from the Indian Institute of
> Technology (Indian School of Mines), Dhanbad, India. This email is just to
> 
> I participated and successfully completed my GSoC 2022 Project under Tianocore
> Organization in Rust, so I am pretty experienced in Rust Language. I am also
> fairly proficient in reading and working with C, although not to the same
> degree as Rust.
> 
> I use Qemu often for testing and thus would like to contribute to Qemu as a
> part of GSoC 2023. I have narrowed down the projects to:
> 
> 1.  Rust bindings for libnbd: 
> https://wiki.qemu.org/Google_Summer_of_Code_2023#
> Rust_bindings_for_libnbd
> 2.  RDP server: https://wiki.qemu.org/Google_Summer_of_Code_2023#RDP_server
> 
> I would just like to confirm if both of the above projects are up for grabs in
> the upcoming GSoC 2023, and if there are any specific requirements/tasks to
> complete to apply for either of the projects.

We do have another candidate.  I'm not sure what happens in these
situations .. Erik?

Rich.

> Finally, what is the preferred way of discussions in Qemu community? I did see
> an IRC channel as well as qemu-discuss mailing list.
> 
> Yours sincerely
> Ayush Singh
> 
> Github: https://github.com/Ayush1325
> GSoC 2022 Project: https://summerofcode.withgoogle.com/archive/2022/projects/
> PwQlcngc

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: [PATCH] tests: Ensure TAP version is printed before other messages

2023-03-01 Thread Richard W.M. Jones

On Tue, Feb 28, 2023 at 09:30:56PM +0100, Thomas Huth wrote:
> On 27/02/2023 18.40, Richard W.M. Jones wrote:
> >These two tests were failing with this error:
> >
> >   stderr:
> >   TAP parsing error: version number must be on the first line
> >   [...]
> >   Unknown TAP version. The first line MUST be `TAP version `. Assuming 
> > version 12.
> >
> >This can be fixed by ensuring we always call g_test_init first in the
> >body of main.
> >
> >Thanks: Daniel Berrange, for diagnosing the problem
> >Signed-off-by: Richard W.M. Jones 
> >---
> >  tests/qtest/fuzz-lsi53c895a-test.c | 4 ++--
> >  tests/qtest/rtl8139-test.c | 5 +++--
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> >
> >diff --git a/tests/qtest/fuzz-lsi53c895a-test.c 
> >b/tests/qtest/fuzz-lsi53c895a-test.c
> >index a9254b455d..2012bd54b7 100644
> >--- a/tests/qtest/fuzz-lsi53c895a-test.c
> >+++ b/tests/qtest/fuzz-lsi53c895a-test.c
> >@@ -112,12 +112,12 @@ static void test_lsi_do_dma_empty_queue(void)
> >  int main(int argc, char **argv)
> >  {
> >+g_test_init(, , NULL);
> >+
> >  if (!qtest_has_device("lsi53c895a")) {
> >  return 0;
> 
> Could you please double-check that the !lsi53c895a case works fine,
> too? (just temporarily change it into a "if (1) { ..." statement)
> ... I'm a little bit afraid that the TAP protocol might be
> incomplete without the g_test_run() at the end otherwise. If so, you
> might now need a "goto out" instead of the "return 0" here...

Applying ...

diff --git a/tests/qtest/fuzz-lsi53c895a-test.c 
b/tests/qtest/fuzz-lsi53c895a-test.c
index 2012bd54b7..e0c902aac4 100644
--- a/tests/qtest/fuzz-lsi53c895a-test.c
+++ b/tests/qtest/fuzz-lsi53c895a-test.c
@@ -114,7 +114,7 @@ int main(int argc, char **argv)
 {
 g_test_init(, , NULL);
 
-if (!qtest_has_device("lsi53c895a")) {
+if (1) {
 return 0;
 }
 
... and rerunning the tests, everything still passes.

The stdout of the test after this change is:

TAP version 13
# random seed: R02S1c1f371a09fbfdf0dd747f898d55fe97

but apparently this version of TAP doesn't care (perhaps because the
number of tests "1..2" is never printed?)

Anyway it doesn't appear to be a problem.

glib2-2.75.3-4.fc39.x86_64

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Re: [PATCH v2] test-vmstate: fix bad GTree usage, use-after-free

2023-02-27 Thread Richard W.M. Jones

On Mon, Feb 27, 2023 at 07:35:05PM +0100, Eric Auger wrote:
> According to g_tree_foreach() documentation:
> "The tree may not be modified while iterating over it (you can't
> add/remove items)."

It might be worth noting that this bug only happens now because glib2
remove their custom slice allocator and switched to using system
malloc.  With glibc + MALLOC_PERTURB_, malloc will find these kinds of
bugs.  The relevant glib2 change that causes the problem is:

https://gitlab.gnome.org/GNOME/glib/-/commit/45b5a6c1e56d5b73cc5ed798ef59a5601e56c170

> A SIGSEV can be observed while running test-vmstate.

SIGSEGV

> Get rid of the node removal within the tree traversal. Also
> check the trees have the same number of nodes before the actual
> diff.
> 
> Fixes: 9a85e4b8f6 ("migration: Support gtree migration")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1518
> Signed-off-by: Marc-André Lureau 
> Signed-off-by: Eric Auger 
> Reported-by: Richard W.M. Jones 

You can add:

Tested-by: Richard W.M. Jones 
Reviewed-by: Richard W.M. Jones 

> ---
> 
> This is a respin of Marc-André's patch from Aug 2020, which can be
> found at
> https://lore.kernel.org/qemu-devel/20200827161826.1165971-1-marcandre.lur...@redhat.com/
> This fell through the cracks and now we hit a SIGSEV

SIGSEGV

> ---
>  tests/unit/test-vmstate.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
> index 79357b29ca..0b7d5ecd68 100644
> --- a/tests/unit/test-vmstate.c
> +++ b/tests/unit/test-vmstate.c
> @@ -1073,7 +1073,6 @@ static gboolean diff_tree(gpointer key, gpointer value, 
> gpointer data)
>  struct match_node_data d = {tp->tree2, key, value};
>  
>  g_tree_foreach(tp->tree2, tp->match_node, );
> -g_tree_remove(tp->tree1, key);
>  return false;
>  }
>  
> @@ -1082,9 +1081,9 @@ static void compare_trees(GTree *tree1, GTree *tree2,
>  {
>  struct tree_cmp_data tp = {tree1, tree2, function};
>  
> +assert(g_tree_nnodes(tree1) == g_tree_nnodes(tree2));
>  g_tree_foreach(tree1, diff_tree, );
> -assert(g_tree_nnodes(tree1) == 0);
> -assert(g_tree_nnodes(tree2) == 0);
> +g_tree_destroy(g_tree_ref(tree1));
>  }
>  
>  static void diff_domain(TestGTreeDomain *d1, TestGTreeDomain *d2)
> -- 
> 2.38.1

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

[PATCH] tests: Ensure TAP version is printed before other messages

2023-02-27 Thread Richard W.M. Jones

These two tests were failing with this error:

  stderr:
  TAP parsing error: version number must be on the first line
  [...]
  Unknown TAP version. The first line MUST be `TAP version `. Assuming 
version 12.

This can be fixed by ensuring we always call g_test_init first in the
body of main.

Thanks: Daniel Berrange, for diagnosing the problem
Signed-off-by: Richard W.M. Jones 
---
 tests/qtest/fuzz-lsi53c895a-test.c | 4 ++--
 tests/qtest/rtl8139-test.c | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/fuzz-lsi53c895a-test.c 
b/tests/qtest/fuzz-lsi53c895a-test.c
index a9254b455d..2012bd54b7 100644
--- a/tests/qtest/fuzz-lsi53c895a-test.c
+++ b/tests/qtest/fuzz-lsi53c895a-test.c
@@ -112,12 +112,12 @@ static void test_lsi_do_dma_empty_queue(void)
 
 int main(int argc, char **argv)
 {
+g_test_init(, , NULL);
+
 if (!qtest_has_device("lsi53c895a")) {
 return 0;
 }
 
-g_test_init(, , NULL);
-
 qtest_add_func("fuzz/lsi53c895a/lsi_do_dma_empty_queue",
test_lsi_do_dma_empty_queue);
 
diff --git a/tests/qtest/rtl8139-test.c b/tests/qtest/rtl8139-test.c
index 1beb83805c..4bd240e9ee 100644
--- a/tests/qtest/rtl8139-test.c
+++ b/tests/qtest/rtl8139-test.c
@@ -207,9 +207,10 @@ int main(int argc, char **argv)
 verbosity_level = atoi(v_env);
 }
 
-qtest_start("-device rtl8139");
-
 g_test_init(, , NULL);
+
+qtest_start("-device rtl8139");
+
 qtest_add_func("/rtl8139/nop", nop);
 qtest_add_func("/rtl8139/timer", test_init);
 
-- 
2.39.2

Re: RFC: towards systemd socket activation in q-s-d

2023-01-30 Thread Richard W.M. Jones

On Mon, Jan 30, 2023 at 04:45:08PM +, Daniel P. Berrangé wrote:
> which is LISTEN_FDS=2, LISTEN_FDNAMES=control,vnc

Colon for separating the elements not comma.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

Re: RFC: towards systemd socket activation in q-s-d

2023-01-30 Thread Richard W.M. Jones

On Mon, Jan 30, 2023 at 02:58:01PM +, Daniel P. Berrangé wrote:
> Obviously at startup QEMU can trivially inherit the FDs from whatever
> spawned it. The only task is to identify the FDs that are passed into,
> and systemd defined a mechanism for this using LISTEN_FDNAMES. IOW the
> socket activation can fully replace 'getfd' for purpose of initial
> startup. This will get rid of the annoying difference that SocketAddress
> only allows numeric FDs at startup and named FDs at runtime, by making
> named FDs the consistent standard. We could thus deprecate the use of
> non-named numeric FDs in SocketAddress to improve our sanity.
> 
> The question is how to define semantics for the LISTEN_FDNAMES while
> also still remaining back compat with the existing QEMU utilities
> that allow socket activation. Some kind of naming scheme would need
> to be decided upon, as well as handling the use of activation without
> LISTEN_FDNAMES being set. 

If I understand LISTEN_FDNAMES correctly, it's the names of the
protocols to be used (rather clumsily expressed through IANA
registered names from /etc/services).  It would be valid to use
LISTEN_FDNAMES=http:http for example, for a service that must use HTTP
on both sockets.

In other words it's not just names of file descriptors that you can
make up.

However as there is zero documentation for this environment variable,
who knows what it's really supposed to be ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

Re: RFC: towards systemd socket activation in q-s-d

2023-01-27 Thread Richard W.M. Jones

On Fri, Jan 27, 2023 at 03:26:15PM -0600, Eric Blake wrote:
> In https://bugzilla.redhat.com/show_bug.cgi?id=2055229, the question
> was raised on how to make qemu-storage-daemon sufficiently powerful to
> be a full-blown replacement to qemu-nbd.  One of the features still
> lacking is the ability to do systemd socket activation (qemu-nbd does
> this, qemu-storage-daemon needs a way to do it).
> 
> But that bug further noted that systemd supports LISTEN_FDNAMES to
> supply names to a passed-in fd (right now, qemu-nbd does not use
> names, but merely expects one fd in LISTEN_FDS).  Dan had the idea
> that it would be nice to write a systemd file that passes in a socket
> name for a QMP socket, as in:
> 
>  [Socket]
>  ListenStream=/var/run/myapp/qsd.qmp
>  FileDescriptorName=qmp
>  Service=myapp-qsd.service
> 
> and further notes that QAPI SocketAddressType supports @fd which is a
> name in QMP (a previously-added fd passed through the older 'getfd'
> command, rather than the newer 'add-fd' command), but an integer on
> the command line.  With LISTEN_FDNAMES, we could mix systemd socket
> activation with named fds for any command line usage that already
> supports SocketAddressType (not limited to just q-s-d usage).

I couldn't find a good description of LISTEN_FDNAMES anywhere, and we
don't use it in libnbd / nbdkit / qemu-nbd right now.  So here's my
interpretation of what this environment variable means ...

Currently systemd socket activation in qemu-nbd or nbdkit uses only
LISTEN_PID and LISTEN_FDS, usually with a single fd being passed.
(I'll ignore LISTEN_PID.)  So:

  LISTEN_FDS=1
fd 3 --> NBD socket

This works because there's only one NBD protocol, it doesn't need to
be named.

However if we imagine that a superserver wants to run a webserver, it
might need to pass two sockets carrying distinct protocols (HTTP and
HTTPS).  In this case it would use:

  LISTEN_FDS=2
fd 3 --> HTTP socket
fd 4 --> HTTPS socket
  LISTEN_FDNAMES=http:https

LISTEN_FDNAMES is telling the webserver that the first socket is HTTP
and the second is HTTPS, so it knows which protocol to negotiate on
each socket.

I believe what you're saying above is that you'd like to have the
ability to pass multiple sockets to q-s-d carrying distinct protocols,
with the obvious ones being NBD + QMP (although other combinations
could be possible, eg. NBD + vhost + QMP?):

  LISTEN_FDS=2
fd 3 --> NBD socket
fd 4 --> QMP socket
  LISTEN_FDNAMES=nbd:qmp

If my understanding is correct, then this makes sense.  We might also
modify libnbd to pass LISTEN_FDNAMES=nbd in:

https://gitlab.com/nbdkit/libnbd/-/blob/master/generator/states-connect-socket-activation.c

(Current servers would ignore the new environment variable.)

> I'm at a point where I can take a shot at implementing this, but want
> some feedback on whether it is better to try to shoehorn a generic
> solution into the existing @fd member of the SocketAddressType union,
> or whether it would be better to add yet another union member
> @systemd-fd or some similar name to make it explicit when a command
> line parameter wants to refer to an fd being passed through systemd
> socket activation LISTEN_FDS and friends.

It sounds like the q-s-d command lines will be even more convoluted
and inelegant than before.

That's fine, but please keep qemu-nbd around as a sane alternative.
It might in the end just be a wrapper that translates the command line
to q-s-d.  I don't believe it's ever going to be possible or desirable
to deprecate or remove qemu-nbd.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: [PATCH 4/4] hw/isa: enable TCO watchdog reboot pin strap by default

2022-10-31 Thread Richard W.M. Jones

On Mon, Oct 31, 2022 at 01:19:34PM +, Daniel P. Berrangé wrote:
> The TCO watchdog implementation default behaviour from POV of the
> guest OS relies on the initial values for two I/O ports:
> 
>   * TCO1_CNT == 0x0
> 
> Since bit 11 (TCO Timer Halt) is clear, the watchdog state
> is considered to be initially running
> 
>   * GCS == 0x20
> 
> Since bit 5 (No Reboot) is set, the watchdog will not trigger
> when the timer expires
> 
> This is a safe default, because the No Reboot bit will prevent the
> watchdog from triggering if the guest OS is unaware of its existance,
> or is slow in configuring it. When a Linux guest initializes the TCO
> watchdog, it will attempt to clear the "No Reboot" flag, and read the
> value back. If the clear was honoured, the driver will treat this as
> an indicator that the watchdog is functional and create the geust

Typo: "guest"

> watchdog device.
> 
> QEMU implements a second "no reboot" flag, however, via pin straps
> which overrides the behaviour of the guest controlled "no reboot"
> flag:
> 
>   commit 5add35bec1e249bb5345a47008c8f298d4760be4
>   Author: Paulo Alcantara 
>   Date:   Sun Jun 28 14:58:58 2015 -0300
> 
> ich9: implement strap SPKR pin logic
> 
> This second 'noreboot' pin was defaulted to high, which also inhibits
> triggering of the requested watchdog actions, unless QEMU is launched
> with the magic flag "-global ICH9-LPC.noreboot=false".
> 
> This is a bad default as we are exposing a watchdog to every guest OS
> using the q35 machine type, but preventing it from actually doing what
> it is designed to do. What is worse is that the guest OS and its apps
> have no way to know that the watchdog is never going to fire, due to
> this second 'noreboot' pin.
> 
> If a guest OS had no watchdog device at all, then apps whose operation
> and/or data integrity relies on a watchdog can refuse to launch, and
> alert the administrator of the problematic deployment. With Q35 machines
> unconditionally exposing a watchdog though, apps will think their
> deployment is correct but in fact have no protection at all.
> 
> This patch flips the default of the second 'no reboot' flag, so that
> configured watchdog actions will be honoured out of the box for the
> 7.2 Q35 machine type onwards, if the guest enables use of the watchdog.
> 
> Signed-off-by: Daniel P. Berrangé 

Add Fixes: or some other reference to the BZs?  We have a few!

https://bugzilla.redhat.com/show_bug.cgi?id=2136889
https://bugzilla.redhat.com/show_bug.cgi?id=2080207
https://bugzilla.redhat.com/show_bug.cgi?id=2137346 (libvirt)

>  hw/i386/pc.c   | 4 +++-
>  hw/isa/lpc_ich9.c  | 2 +-
>  tests/qtest/tco-test.c | 2 +-
>  3 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 3e86083db3..e814f62fc6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -108,7 +108,9 @@
>  { "qemu64-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
>  { "athlon-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },
>  
> -GlobalProperty pc_compat_7_1[] = {};
> +GlobalProperty pc_compat_7_1[] = {
> +{ "ICH9-LPC", "noreboot", "true" },
> +};
>  const size_t pc_compat_7_1_len = G_N_ELEMENTS(pc_compat_7_1);
>  
>  GlobalProperty pc_compat_7_0[] = {};
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 66062a344c..f9ce2ee1dc 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -789,7 +789,7 @@ static const VMStateDescription vmstate_ich9_lpc = {
>  };
>  
>  static Property ich9_lpc_properties[] = {
> -DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, true),
> +DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, false),
>  DEFINE_PROP_BOOL("smm-compat", ICH9LPCState, pm.smm_compat, false),
>  DEFINE_PROP_BIT64("x-smi-broadcast", ICH9LPCState, smi_host_features,
>ICH9_LPC_SMI_F_BROADCAST_BIT, true),
> diff --git a/tests/qtest/tco-test.c b/tests/qtest/tco-test.c
> index 254f735370..caabcac6e5 100644
> --- a/tests/qtest/tco-test.c
> +++ b/tests/qtest/tco-test.c
> @@ -60,7 +60,7 @@ static void test_init(TestData *d)
>  QTestState *qs;
>  
>  qs = qtest_initf("-machine q35 %s %s",
> - d->noreboot ? "" : "-global ICH9-LPC.noreboot=false",
> + d->noreboot ? "-global ICH9-LPC.noreboot=true" : "",
>   !d->args ? "" : d->args);
>  qtest_irq_intercept_in(qs, "ioapic");

Acked-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH 2/4] hw/isa: add trace events for ICH9 LPC chip config access

2022-10-31 Thread Richard W.M. Jones

On Mon, Oct 31, 2022 at 01:19:32PM +, Daniel P. Berrangé wrote:
> These tracepoints aid in understanding and debugging the guest drivers
> for the TCO watchdog.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  hw/isa/lpc_ich9.c   | 3 +++
>  hw/isa/trace-events | 4 
>  2 files changed, 7 insertions(+)
> 
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 4553b5925b..66062a344c 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -51,6 +51,7 @@
>  #include "hw/nvram/fw_cfg.h"
>  #include "qemu/cutils.h"
>  #include "hw/acpi/acpi_aml_interface.h"
> +#include "trace.h"
>  
>  
> /*/
>  /* ICH9 LPC PCI to ISA bridge */
> @@ -161,6 +162,7 @@ static void ich9_cc_write(void *opaque, hwaddr addr,
>  {
>  ICH9LPCState *lpc = (ICH9LPCState *)opaque;
>  
> +trace_ich9_cc_write(addr, val, len);
>  ich9_cc_addr_len(, );
>  memcpy(lpc->chip_config + addr, , len);
>  pci_bus_fire_intx_routing_notifier(pci_get_bus(>d));
> @@ -176,6 +178,7 @@ static uint64_t ich9_cc_read(void *opaque, hwaddr addr,
>  uint32_t val = 0;
>  ich9_cc_addr_len(, );
>  memcpy(, lpc->chip_config + addr, len);
> +trace_ich9_cc_read(addr, val, len);
>  return val;
>  }
>  
> diff --git a/hw/isa/trace-events b/hw/isa/trace-events
> index b8f877e1ed..c4567a9b47 100644
> --- a/hw/isa/trace-events
> +++ b/hw/isa/trace-events
> @@ -21,3 +21,7 @@ via_pm_io_read(uint32_t addr, uint32_t val, int len) "addr 
> 0x%x val 0x%x len 0x%
>  via_pm_io_write(uint32_t addr, uint32_t val, int len) "addr 0x%x val 0x%x 
> len 0x%x"
>  via_superio_read(uint8_t addr, uint8_t val) "addr 0x%x val 0x%x"
>  via_superio_write(uint8_t addr, uint32_t val) "addr 0x%x val 0x%x"
> +
> +# lpc_ich9.c
> +ich9_cc_write(uint64_t addr, uint64_t val, unsigned len) "addr=0x%"PRIx64 " 
> val=0x%"PRIx64 " len=%u"
> +ich9_cc_read(uint64_t addr, uint64_t val, unsigned len) "addr=0x%"PRIx64 " 
> val=0x%"PRIx64 " len=%u"

Reviewed-by: Richard W.M. Jones 

I can't help thinking that the trace-events file ought to be generated
from the source code ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Re: [PATCH 3/4] hw/watchdog: add trace events for watchdog action handling

2022-10-31 Thread Richard W.M. Jones

On Mon, Oct 31, 2022 at 01:19:33PM +, Daniel P. Berrangé wrote:
> The tracepoints aid in debugging the triggering of watchdog devices.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  hw/watchdog/trace-events | 4 
>  hw/watchdog/watchdog.c   | 4 
>  2 files changed, 8 insertions(+)
> 
> diff --git a/hw/watchdog/trace-events b/hw/watchdog/trace-events
> index 89ccbcfdfd..fc1d048702 100644
> --- a/hw/watchdog/trace-events
> +++ b/hw/watchdog/trace-events
> @@ -16,3 +16,7 @@ spapr_watchdog_stop(uint64_t num, uint64_t ret) "num=%" 
> PRIu64 " ret=%" PRId64
>  spapr_watchdog_query(uint64_t caps) "caps=0x%" PRIx64
>  spapr_watchdog_query_lpm(uint64_t caps) "caps=0x%" PRIx64
>  spapr_watchdog_expired(uint64_t num, unsigned action) "num=%" PRIu64 " 
> action=%u"
> +
> +# watchdog.c
> +watchdog_perform_action(unsigned int action) "action=%d"
> +watchdog_set_action(unsigned int action) "action=%d"
> diff --git a/hw/watchdog/watchdog.c b/hw/watchdog/watchdog.c
> index 6c082a3263..955046161b 100644
> --- a/hw/watchdog/watchdog.c
> +++ b/hw/watchdog/watchdog.c
> @@ -30,6 +30,7 @@
>  #include "sysemu/watchdog.h"
>  #include "hw/nmi.h"
>  #include "qemu/help_option.h"
> +#include "trace.h"
>  
>  static WatchdogAction watchdog_action = WATCHDOG_ACTION_RESET;
>  
> @@ -43,6 +44,8 @@ WatchdogAction get_watchdog_action(void)
>   */
>  void watchdog_perform_action(void)
>  {
> +trace_watchdog_perform_action(watchdog_action);
> +
>  switch (watchdog_action) {
>  case WATCHDOG_ACTION_RESET: /* same as 'system_reset' in monitor */
>  qapi_event_send_watchdog(WATCHDOG_ACTION_RESET);
> @@ -89,4 +92,5 @@ void watchdog_perform_action(void)
>  void qmp_watchdog_set_action(WatchdogAction action, Error **errp)
>  {
>  watchdog_action = action;
> +trace_watchdog_set_action(watchdog_action);
>  }

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: [PATCH 1/4] hw/acpi: add trace events for TCO watchdog register access

2022-10-31 Thread Richard W.M. Jones

On Mon, Oct 31, 2022 at 01:19:31PM +, Daniel P. Berrangé wrote:
> These tracepoints aid in understanding and debugging the guest drivers
> for the TCO watchdog.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  hw/acpi/tco.c| 41 -
>  hw/acpi/trace-events |  2 ++
>  2 files changed, 30 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/acpi/tco.c b/hw/acpi/tco.c
> index 4783721e4e..9ebd3e5e64 100644
> --- a/hw/acpi/tco.c
> +++ b/hw/acpi/tco.c
> @@ -86,6 +86,7 @@ static inline int can_start_tco_timer(TCOIORegs *tr)
>  static uint32_t tco_ioport_readw(TCOIORegs *tr, uint32_t addr)
>  {
>  uint16_t rld;
> +uint32_t ret = 0;
>  
>  switch (addr) {
>  case TCO_RLD:
> @@ -96,35 +97,49 @@ static uint32_t tco_ioport_readw(TCOIORegs *tr, uint32_t 
> addr)
>  } else {
>  rld = tr->tco.rld;
>  }
> -return rld;
> +ret = rld;
> +break;
>  case TCO_DAT_IN:
> -return tr->tco.din;
> +ret = tr->tco.din;
> +break;
>  case TCO_DAT_OUT:
> -return tr->tco.dout;
> +ret = tr->tco.dout;
> +break;
>  case TCO1_STS:
> -return tr->tco.sts1;
> +ret = tr->tco.sts1;
> +break;
>  case TCO2_STS:
> -return tr->tco.sts2;
> +ret = tr->tco.sts2;
> +break;
>  case TCO1_CNT:
> -return tr->tco.cnt1;
> +ret = tr->tco.cnt1;
> +break;
>  case TCO2_CNT:
> -return tr->tco.cnt2;
> +ret = tr->tco.cnt2;
> +break;
>  case TCO_MESSAGE1:
> -return tr->tco.msg1;
> +ret = tr->tco.msg1;
> +break;
>  case TCO_MESSAGE2:
> -return tr->tco.msg2;
> +ret = tr->tco.msg2;
> +break;
>  case TCO_WDCNT:
> -return tr->tco.wdcnt;
> +ret = tr->tco.wdcnt;
> +break;
>  case TCO_TMR:
> -return tr->tco.tmr;
> +ret = tr->tco.tmr;
> +break;
>  case SW_IRQ_GEN:
> -return tr->sw_irq_gen;
> +ret = tr->sw_irq_gen;
> +break;
>  }
> -return 0;
> +trace_tco_io_read(addr, ret);
> +return ret;
>  }
>  
>  static void tco_ioport_writew(TCOIORegs *tr, uint32_t addr, uint32_t val)
>  {
> +trace_tco_io_write(addr, val);
>  switch (addr) {
>  case TCO_RLD:
>  tr->timeouts_no = 0;
> diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
> index eb60b04f9b..78e0a8670e 100644
> --- a/hw/acpi/trace-events
> +++ b/hw/acpi/trace-events
> @@ -55,6 +55,8 @@ piix4_gpe_writeb(uint64_t addr, unsigned width, uint64_t 
> val) "addr: 0x%" PRIx64
>  # tco.c
>  tco_timer_reload(int ticks, int msec) "ticks=%d (%d ms)"
>  tco_timer_expired(int timeouts_no, bool strap, bool no_reboot) 
> "timeouts_no=%d no_reboot=%d/%d"
> +tco_io_write(uint64_t addr, uint32_t val) "addr=0x%" PRIx64 " val=0x%" PRIx32
> +tco_io_read(uint64_t addr, uint32_t val) "addr=0x%" PRIx64 " val=0x%" PRIx32
>  
>  # erst.c
>  acpi_erst_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr: 
> 0x%04" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: ublk-qcow2: ublk-qcow2 is available

2022-10-06 Thread Richard W.M. Jones

On Tue, Oct 04, 2022 at 09:53:32AM -0400, Stefan Hajnoczi wrote:
> qemu-nbd doesn't use io_uring to handle the backend IO,

Would this be fixed by your (not yet upstream) libblkio driver for
qemu?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [PATCH 01/11] crypto: sanity check that LUKS header strings are NUL-terminated

2022-09-06 Thread Richard W.M. Jones

On Tue, Sep 06, 2022 at 09:41:37AM +0100, Daniel P. Berrangé wrote:
> The LUKS spec requires that header strings are NUL-terminated, and our
> code relies on that. Protect against maliciously crafted headers by
> adding validation.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  crypto/block-luks.c | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/crypto/block-luks.c b/crypto/block-luks.c
> index f62be6836b..27d1b34c1d 100644
> --- a/crypto/block-luks.c
> +++ b/crypto/block-luks.c
> @@ -554,6 +554,24 @@ qcrypto_block_luks_check_header(const QCryptoBlockLUKS 
> *luks, Error **errp)
>  return -1;
>  }
>  
> +if (!memchr(luks->header.cipher_name, '\0',
> +sizeof(luks->header.cipher_name))) {
> +error_setg(errp, "LUKS header cipher name is not NUL terminated");
> +return -1;
> +}
> +
> +if (!memchr(luks->header.cipher_mode, '\0',
> +sizeof(luks->header.cipher_mode))) {
> +error_setg(errp, "LUKS header cipher mode is not NUL terminated");
> +return -1;
> +}
> +
> +if (!memchr(luks->header.hash_spec, '\0',
> +sizeof(luks->header.hash_spec))) {
> +error_setg(errp, "LUKS header hash spec is not NUL terminated");
> +return -1;
> +}
> +
>  /* Check all keyslots for corruption  */
>  for (i = 0 ; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS ; i++) {

I think this was the error I originally wrote to you about, and I
think it's the most important fix because non-terminated strings seem
(possibly) exploitable.

FWIW nbdkit does this which is slightly different:

  char cipher_name[33], cipher_mode[33], hash_spec[33];

  /* Copy the header fields locally and ensure they are \0 terminated. */
  memcpy (cipher_name, h->phdr.cipher_name, 32);
  cipher_name[32] = 0;
  memcpy (cipher_mode, h->phdr.cipher_mode, 32);
  cipher_mode[32] = 0;
  memcpy (hash_spec, h->phdr.hash_spec, 32);
  hash_spec[32] = 0;

Anyway the change above looks good so:

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH 04/11] crypto: validate that LUKS payload doesn't overlap with header

2022-09-06 Thread Richard W.M. Jones

On Tue, Sep 06, 2022 at 09:41:40AM +0100, Daniel P. Berrangé wrote:
> We already validate that LUKS keyslots don't overlap with the
> header, or with each other. This closes the remain hole in

remain -> remaining

> validation of LUKS file regions.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  crypto/block-luks.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/crypto/block-luks.c b/crypto/block-luks.c
> index 6ef9a89ffa..f22bc63e54 100644
> --- a/crypto/block-luks.c
> +++ b/crypto/block-luks.c
> @@ -572,6 +572,13 @@ qcrypto_block_luks_check_header(const QCryptoBlockLUKS 
> *luks, Error **errp)
>  return -1;
>  }
>  
> +if (luks->header.payload_offset_sector <
> +DIV_ROUND_UP(QCRYPTO_BLOCK_LUKS_KEY_SLOT_OFFSET,
> + QCRYPTO_BLOCK_LUKS_SECTOR_SIZE)) {
> +error_setg(errp, "LUKS payload is overlapping with the header");
> +return -1;
> +}
> +
>  /* Check all keyslots for corruption  */
>  for (i = 0 ; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS ; i++) {
>  
> -- 
> 2.37.2

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH 00/11] crypto: improve robustness of LUKS metadata validation

2022-09-06 Thread Richard W.M. Jones

On Tue, Sep 06, 2022 at 09:41:36AM +0100, Daniel P. Berrangé wrote:
> Richard pointed out that we didn't do all that much validation against
> bad parameters in the LUKS header metadata. This series adds a bunch
> more validation checks along with unit tests to demonstrate they are
> having effect against maliciously crafted headers.
> 
> Daniel P. Berrangé (11):
>   crypto: sanity check that LUKS header strings are NUL-terminated
>   crypto: enforce that LUKS stripes is always a fixed value
>   crypto: enforce that key material doesn't overlap with LUKS header
>   crypto: validate that LUKS payload doesn't overlap with header
>   crypto: strengthen the check for key slots overlapping with LUKS
> header
>   crypto: check that LUKS PBKDF2 iterations count is non-zero
>   crypto: split LUKS header definitions off into file
>   crypto: split off helpers for converting LUKS header endianess
>   crypto: quote algorithm names in error messages
>   crypto: ensure LUKS tests run with GNUTLS crypto provider
>   crypto: add test cases for many malformed LUKS header scenarios
> 
>  crypto/block-luks-priv.h   | 143 
>  crypto/block-luks.c| 228 +++--
>  tests/unit/test-crypto-block.c | 302 -
>  3 files changed, 542 insertions(+), 131 deletions(-)
>  create mode 100644 crypto/block-luks-priv.h

I think there is one typo in a commit message, but for the series:

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH 06/11] crypto: check that LUKS PBKDF2 iterations count is non-zero

2022-09-06 Thread Richard W.M. Jones

On Tue, Sep 06, 2022 at 09:41:42AM +0100, Daniel P. Berrangé wrote:
> Both the master key and key slot passphrases are run through the PBKDF2
> algorithm. The iterations count is expected to be generally very large
> (many 10's or 100's of 1000s). It is hard to define a low level cutoff,
> but we can certainly say that iterations count should be non-zero. A
> zero count likely indicates an initialization mistake so reject it.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  crypto/block-luks.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/crypto/block-luks.c b/crypto/block-luks.c
> index e6ee8506b2..254490c256 100644
> --- a/crypto/block-luks.c
> +++ b/crypto/block-luks.c
> @@ -579,6 +579,11 @@ qcrypto_block_luks_check_header(const QCryptoBlockLUKS 
> *luks, Error **errp)
>  return -1;
>  }
>  
> +if (luks->header.master_key_iterations == 0) {
> +error_setg(errp, "LUKS key iteration count is zero");
> +return -1;
> +}
> +
>  /* Check all keyslots for corruption  */
>  for (i = 0 ; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS ; i++) {
>  
> @@ -602,6 +607,12 @@ qcrypto_block_luks_check_header(const QCryptoBlockLUKS 
> *luks, Error **errp)
>  return -1;
>  }
>  
> +if (slot1->active == QCRYPTO_BLOCK_LUKS_KEY_SLOT_ENABLED &&
> +slot1->iterations == 0) {
> +error_setg(errp, "Keyslot %zu iteration count is zero", i);
> +return -1;
> +}
> +
>  if (start1 < DIV_ROUND_UP(QCRYPTO_BLOCK_LUKS_KEY_SLOT_OFFSET,
>QCRYPTO_BLOCK_LUKS_SECTOR_SIZE)) {
>  error_setg(errp,

Equivalent checks were missing in nbdkit - I've added them.

I wonder if there's a problem that a very large number here would
cause long delays opening the device.  In general it's not very clear
to me if the aim is to prevent malicious LUKS input, or if we're just
trying to sanity check the device hasn't been corrupted or improperly
prepared.  The test above is the latter, I think.

Nevertheless as this is an improvement over the current situation:

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: [PATCH 02/11] crypto: enforce that LUKS stripes is always a fixed value

2022-09-06 Thread Richard W.M. Jones

On Tue, Sep 06, 2022 at 09:41:38AM +0100, Daniel P. Berrangé wrote:
> Although the LUKS stripes are encoded in the keyslot header and so
> potentially configurable, in pratice the cryptsetup impl mandates
> this has the fixed value 4000. To avoid incompatibility apply the
> same enforcement in QEMU too. This also caps the memory usage for
> key material when QEMU tries to open a LUKS volume.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  crypto/block-luks.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/crypto/block-luks.c b/crypto/block-luks.c
> index 27d1b34c1d..81744e2a8e 100644
> --- a/crypto/block-luks.c
> +++ b/crypto/block-luks.c
> @@ -582,8 +582,9 @@ qcrypto_block_luks_check_header(const QCryptoBlockLUKS 
> *luks, Error **errp)
> header_sectors,
> slot1->stripes);
>  
> -if (slot1->stripes == 0) {
> -error_setg(errp, "Keyslot %zu is corrupted (stripes == 0)", i);
> +if (slot1->stripes != QCRYPTO_BLOCK_LUKS_STRIPES) {
> +error_setg(errp, "Keyslot %zu is corrupted (stripes %d != %d)",
> +   i, slot1->stripes, QCRYPTO_BLOCK_LUKS_STRIPES);
>  return -1;
>  }

In nbdkit I decided to just check that this number < 1, but I
agree that the only important implementation (the kernel) always fixes
this at 4000 (cryptsetup.git/lib/keymanage.c), so:

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: [PATCH] linux-user: use 'max' instead of 'qemu32' / 'qemu64' by defualt

2022-08-26 Thread Richard W.M. Jones

On Fri, Aug 26, 2022 at 12:39:00PM +0100, Daniel P. Berrangé wrote:
> The 'qemu64' CPU model implements the least featureful x86_64 CPU that's
> possible. Historically this hasn't been an issue since it was rare for
> OS distros to build with a higher mandatory CPU baseline.
> 
> With RHEL-9, however, the entire distro is built for the x86_64-v2 ABI
> baseline:
> 
>   
> https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level
> 
> It is likely that other distros may take similar steps in the not too
> distant future. For example, it has been suggested for Fedora on a
> number of occassions.
> 
> This new baseline is not compatible with the qemu64 CPU model though.
> While it is possible to pass a '-cpu xxx' flag to qemu-x86_64, the
> usage of QEMU doesn't always allow for this. For example, the args
> are typically controlled via binfmt rules that the user has no ability
> to change. This impacts users who are trying to use podman on aarch64
> platforms, to run containers with x86_64 content. There's no arg to
> podman that can be used to change the qemu-x86_64 args, and a non-root
> user of podman can not change binfmt rules without elevating privileges:
> 
>   https://github.com/containers/podman/issues/15456#issuecomment-1228210973
> 
> Changing to the 'max' CPU model gives 'qemu-x86_64' maximum
> compatibility with binaries it is likely to encounter in the wild,
> and not likely to have a significant downside for existing usage.
> 
> Most other architectures already use an 'any' CPU model, which is
> often mapped to 'max' (or similar) already, rather than the oldest
> possible CPU model.
> 
> For the sake of consistency the 'i386' architecture is also changed
> from using 'qemu32' to 'max'.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  linux-user/i386/target_elf.h   | 2 +-
>  linux-user/x86_64/target_elf.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-user/i386/target_elf.h b/linux-user/i386/target_elf.h
> index 1c6142e7da..238a9aba73 100644
> --- a/linux-user/i386/target_elf.h
> +++ b/linux-user/i386/target_elf.h
> @@ -9,6 +9,6 @@
>  #define I386_TARGET_ELF_H
>  static inline const char *cpu_get_model(uint32_t eflags)
>  {
> -return "qemu32";
> +return "max";
>  }
>  #endif
> diff --git a/linux-user/x86_64/target_elf.h b/linux-user/x86_64/target_elf.h
> index 7b76a90de8..3f628f8d66 100644
> --- a/linux-user/x86_64/target_elf.h
> +++ b/linux-user/x86_64/target_elf.h
> @@ -9,6 +9,6 @@
>  #define X86_64_TARGET_ELF_H
>  static inline const char *cpu_get_model(uint32_t eflags)
>  {
> -return "qemu64";
> +return "max";
>  }
>  #endif

Can we be assured we won't ever hit this TCG bug that currently
affects -cpu max ?

https://gitlab.com/qemu-project/qemu/-/issues/1023

I'm going to guess we will be OK because qemu-user doesn't run a
kernel and therefore wouldn't normally touch %cr3.  Is there any other
situation?  (Of course it would be better all round if that glaring
bug could be fixed.)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [RFC v2 02/10] Drop unused static function return values

2022-08-03 Thread Richard W.M. Jones

On Wed, Aug 03, 2022 at 01:25:34PM +0100, Peter Maydell wrote:
> On Wed, 3 Aug 2022 at 12:44, Daniel P. Berrangé  wrote:
> > Inconsistent return value checking is designed-in behaviour for
> > QEMU's current Error handling coding pattern with error_abort/fatal.
> 
> Yes; I habitually mark as false-positive Coverity reports about
> missing error checks where it has not noticed that the error
> handling is done via the errp pointer.

Presumably the advantage of having a qemu-specific static analyser is
it'll be able to ignore certain cases, eg. spotting if error_abort is
a parameter and allowing (requiring even?) the return value to be
ignored.

Coverity allows custom models too, but obviously that's all
proprietary software.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [RFC v2 02/10] Drop unused static function return values

2022-08-03 Thread Richard W.M. Jones

On Wed, Aug 03, 2022 at 12:07:19PM +0100, Alberto Faria wrote:
> On Wed, Aug 3, 2022 at 11:46 AM Dr. David Alan Gilbert
>  wrote:
> >
> > * Alberto Faria (afa...@redhat.com) wrote:
> > > Make non-void static functions whose return values are ignored by
> > > all callers return void instead.
> > >
> > > These functions were found by static-analyzer.py.
> > >
> > > Not all occurrences of this problem were fixed.
> > >
> > > Signed-off-by: Alberto Faria 
> >
> > 
> >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index e03f698a3c..4698080f96 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -175,7 +175,7 @@ static MigrationIncomingState *current_incoming;
> > >
> > >  static GSList *migration_blockers;
> > >
> > > -static bool migration_object_check(MigrationState *ms, Error **errp);
> > > +static void migration_object_check(MigrationState *ms, Error **errp);
> > >  static int migration_maybe_pause(MigrationState *s,
> > >   int *current_active_state,
> > >   int new_state);
> > > @@ -4485,15 +4485,15 @@ static void migration_instance_init(Object *obj)
> > >   * Return true if check pass, false otherwise. Error will be put
> > >   * inside errp if provided.
> > >   */
> > > -static bool migration_object_check(MigrationState *ms, Error **errp)
> > > +static void migration_object_check(MigrationState *ms, Error **errp)
> > >  {
> >
> > I'm not sure if this is a good change.
> > Where we have a function that returns an error via an Error ** it's
> > normal practice for us to return a bool to say whether it generated an
> > error.
> >
> > Now, in our case we only call it with error_fatal:
> >
> > migration_object_check(current_migration, _fatal);
> >
> > so the bool isn't used/checked.
> >
> > So I'm a bit conflicted:
> >
> >   a) Using error_fatal is the easiest way to handle this function
> >   b) Things taking Error ** normally do return a flag value
> >   c) But it's not used in this case.
> >
> > Hmm.
> 
> I guess this generalizes to the bigger question of whether a global
> "return-value-never-used" check makes sense and brings value. Maybe
> there are too many cases where it would be preferable to keep the
> return value for consistency? Maybe they're not that many and could be
> tagged with __attribute__((unused))?
> 
> But in this particular case, perhaps we could drop the Error **errp
> parameter and directly pass _fatal to migrate_params_check() and
> migrate_caps_check().

If it helps to think about this, Coverity checks for consistency.
Across the whole code base, is the return value of a function used or
ignored consistently.  You will see Coverity errors like:

  Error: CHECKED_RETURN (CWE-252): [#def37]
  libnbd-1.12.5/fuse/operations.c:180: check_return: Calling "nbd_poll" 
without checking return value (as is done elsewhere 5 out of 6 times).
  libnbd-1.12.5/examples/aio-connect-read.c:96: example_checked: Example 1: 
"nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == -1".
  libnbd-1.12.5/examples/aio-connect-read.c:128: example_checked: Example 
2: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == -1".
  libnbd-1.12.5/examples/strict-structured-reads.c:246: example_checked: 
Example 3: "nbd_poll(nbd, -1)" has its value checked in "nbd_poll(nbd, -1) == 
-1".
  libnbd-1.12.5/ocaml/nbd-c.c:2599: example_assign: Example 4: Assigning: 
"r" = return value from "nbd_poll(h, timeout)".
  libnbd-1.12.5/ocaml/nbd-c.c:2602: example_checked: Example 4 (cont.): "r" 
has its value checked in "r == -1".
  libnbd-1.12.5/python/methods.c:2806: example_assign: Example 5: 
Assigning: "ret" = return value from "nbd_poll(h, timeout)".
  libnbd-1.12.5/python/methods.c:2808: example_checked: Example 5 (cont.): 
"ret" has its value checked in "ret == -1".
  #  178|   /* Dispatch work while there are commands in flight. */
  #  179|   while (thread->in_flight > 0)
  #  180|->   nbd_poll (h, -1);
  #  181| }
  #  182|

What it's saying is that in this code base, nbd_poll's return value
was checked by the caller 5 out of 6 times, but ignored here.  (This
turned out to be a real bug which we fixed).

It seems like the check implemented in your patch is: If the return
value is used 0 times anywhere in the code base, change the return
value to 'void'.  Coverity would not flag this.

Maybe a consistent use check is better?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Re: [PATCH for 7.1] linux-user: fix compat with glibc >= 2.36 sys/mount.h

2022-08-02 Thread Richard W.M. Jones

On Tue, Aug 02, 2022 at 07:29:29PM +0100, Richard W.M. Jones wrote:
> Dan, which Fedora glibc package shows this problem?  I have
> glibc-2.35.9000-31.fc37.x86_64 and qemu compiled fine.  (Also nbdkit
> which includes linux/fs.h)

It would help if I enabled a *-linux-user target ...
Yes, I can reproduce this now and the patch fixes it, so:

Tested-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH for 7.1] linux-user: fix compat with glibc >= 2.36 sys/mount.h

2022-08-02 Thread Richard W.M. Jones

On Tue, Aug 02, 2022 at 12:41:34PM -0400, Daniel P. Berrangé wrote:
> The latest glibc 2.36 has extended sys/mount.h so that it
> defines the FSCONFIG_* enum constants. These are historically
> defined in linux/mount.h, and thus if you include both headers
> the compiler complains:
> 
> In file included from /usr/include/linux/fs.h:19,
>  from ../linux-user/syscall.c:98:
> /usr/include/linux/mount.h:95:6: error: redeclaration of 'enum 
> fsconfig_command'
>95 | enum fsconfig_command {
>   |  ^~~~
> In file included from ../linux-user/syscall.c:31:
> /usr/include/sys/mount.h:189:6: note: originally defined here
>   189 | enum fsconfig_command
>   |  ^~~~
> /usr/include/linux/mount.h:96:9: error: redeclaration of enumerator 
> 'FSCONFIG_SET_FLAG'
>96 | FSCONFIG_SET_FLAG   = 0,/* Set parameter, supplying 
> no value */
>   | ^
> /usr/include/sys/mount.h:191:3: note: previous definition of 
> 'FSCONFIG_SET_FLAG' with type 'enum fsconfig_command'
>   191 |   FSCONFIG_SET_FLAG   = 0,/* Set parameter, supplying no 
> value */
>   |   ^
> ...snip...
> 
> QEMU doesn't include linux/mount.h, but it does use
> linux/fs.h and thus gets linux/mount.h indirectly.
> 
> glibc acknowledges this problem but does not appear to
> be intending to fix it in the forseeable future, simply
> documenting it as a known incompatibility with no
> workaround:
> 
>   
> https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E
>   https://sourceware.org/glibc/wiki/Synchronizing_Headers
> 
> To address this requires either removing use of sys/mount.h
> or linux/fs.h, despite QEMU needing declarations from
> both.
> 
> This patch removes linux/fs.h, meaning we have to define
> various FS_IOC constants that are now unavailable.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  linux-user/syscall.c | 18 ++
>  meson.build  |  2 ++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index b27a6552aa..52d178afe7 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -95,7 +95,25 @@
>  #include 
>  #include 
>  #include 
> +
> +#ifdef HAVE_SYS_MOUNT_FSCONFIG
> +/*
> + * glibc >= 2.36 linux/mount.h conflicts with sys/mount.h,
> + * which in turn prevents use of linux/fs.h. So we have to
> + * define the constants ourselves for now.
> + */
> +#define FS_IOC_GETFLAGS_IOR('f', 1, long)
> +#define FS_IOC_SETFLAGS_IOW('f', 2, long)
> +#define FS_IOC_GETVERSION  _IOR('v', 1, long)
> +#define FS_IOC_SETVERSION  _IOW('v', 2, long)
> +#define FS_IOC_FIEMAP  _IOWR('f', 11, struct fiemap)
> +#define FS_IOC32_GETFLAGS  _IOR('f', 1, int)
> +#define FS_IOC32_SETFLAGS  _IOW('f', 2, int)
> +#define FS_IOC32_GETVERSION_IOR('v', 1, int)
> +#define FS_IOC32_SETVERSION_IOW('v', 2, int)
> +#else
>  #include 
> +#endif
>  #include 
>  #if defined(CONFIG_FIEMAP)
>  #include 
> diff --git a/meson.build b/meson.build
> index 294e9a8f32..30a380752c 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1963,6 +1963,8 @@ config_host_data.set('HAVE_OPTRESET',
>   cc.has_header_symbol('getopt.h', 'optreset'))
>  config_host_data.set('HAVE_IPPROTO_MPTCP',
>   cc.has_header_symbol('netinet/in.h', 'IPPROTO_MPTCP'))
> +config_host_data.set('HAVE_SYS_MOUNT_FSCONFIG',
> + cc.has_header_symbol('sys/mount.h', 
> 'FSCONFIG_SET_FLAG'))
>  
>  # has_member
>  config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',

Dan, which Fedora glibc package shows this problem?  I have
glibc-2.35.9000-31.fc37.x86_64 and qemu compiled fine.  (Also nbdkit
which includes linux/fs.h)

I see various other build failures in Koji, but not this one as far as
I can tell:

https://koji.fedoraproject.org/koji/packageinfo?packageID=3685

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

Re: [PATCH v3 3/3] nbd/server: Allow MULTI_CONN for shared writable exports

2022-03-16 Thread Richard W.M. Jones

On Wed, Mar 16, 2022 at 04:15:53PM -0500, Eric Blake wrote:
> On Wed, Mar 16, 2022 at 04:07:21PM -0500, Eric Blake wrote:
> > On Tue, Mar 15, 2022 at 01:14:41PM +, Richard W.M. Jones wrote:
> > > The patches seem OK to me, but I don't really know enough about the
> > > internals of qemu-nbd to give a line-by-line review.  I did however
> > > build and test qemu-nbd with the patches:
> > > 
> > >   $ ./build/qemu-nbd /var/tmp/test.qcow2 
> > >   $ nbdinfo nbd://localhost
> > >   ...
> > >   can_multi_conn: false
> > > 
> > > 
> > >   $ ./build/qemu-nbd -e 2 /var/tmp/test.qcow2 
> > >   $ nbdinfo nbd://localhost
> > >   ...
> > >   can_multi_conn: false
> > > 
> > > ^^^ Is this expected?  It also happens with -e 0.
> > 
> > Yes, because qemu-nbd defaults to read-write connections, but to be
> > conservative, this patch defaults '-m auto' to NOT advertise
> > multi-conn for read-write; you need to be explicit:
> > 
> > > 
> > > 
> > >   $ ./build/qemu-nbd -e 2 -m on /var/tmp/test.qcow2 
> > >   $ nbdinfo nbd://localhost
> > >   ...
> > >   can_multi_conn: true
> > 
> > either with '-m on' as you did here, or with
> > 
> > build/qemu-nbd -r -e 2 /var/tmp/test.qcow2
> > 
> > where the '-m auto' default exposes multi-conn for a readonly client.
> 
> I hit send prematurely - one more thought I wanted to make clear.  For
> _this_ series, the behavior of '-m auto' depends solely on readonly
> vs. read-write; that being the most conservative choice of preserving
> pre-existing qemu-nbd behavior (that is, if you don't use -m, the only
> change in behavior should be the fact that --help output now lists
> -m).
> 
> But in future versions of qemu, we might be able to improve '-m auto'
> to also take into consideration whether the backing image of a
> read-write device is known-cache-consistent (such as a local file
> system), where we can then manage to default to advertising multi-conn
> for those writable images, while still avoiding the advertisement when
> exposing other devices such as network-mounted devices where we are
> less confident on whether there are sufficient cache consistency
> guarantees.  Use of explicit '-m off' or '-m on' gives guaranteed
> results no matter the qemu version, so it is only explicit (or
> implied) '-m auto' where we might get smarter defaults over time.
> 
> Thus testing whether advertising multi-conn makes a difference in
> other tools like nbdcopy thus requires checking if qemu-nbd is new
> enough to support -m, and then being explicit that we are opting in to
> using it.

I see.  The commit message of patch 3 confused me because
superficially it seems to say that multi-conn is safe (and therefore I
assumed safe == enabled) when backed by a local filesystem.  Could the
commit message be improved to list the default for common combinations
of flags?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Re: [PATCH 07/27] Replace GCC_FMT_ATTR with G_GNUC_PRINTF

2022-03-16 Thread Richard W.M. Jones

On Wed, Mar 16, 2022 at 01:52:48PM +0400, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> One less qemu-specific macro. It also helps to make some headers/units
> only depend on glib, and thus moved in standalone projects eventually.
> 
> Signed-off-by: Marc-André Lureau 

I checked the replacements and couldn't spot any differences (I assume
you used a 'perl -pi.bak -e s///' or similar rather than doing it by
hand?).  Also I checked the macro definitions in
include/qemu/compiler.h vs /usr/include/glib-2.0/glib/gmacros.h and
they're pretty much identical.  I even learned about gnu_printf.  So:

Reviewed-by: Richard W.M. Jones 

Shouldn't there be a hunk which removes the definition of GCC_FMT_ATTR
from include/qemu/compiler.h?  Maybe that's in another place in the
patch series.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Re: [PATCH v3 3/3] nbd/server: Allow MULTI_CONN for shared writable exports

2022-03-15 Thread Richard W.M. Jones

The patches seem OK to me, but I don't really know enough about the
internals of qemu-nbd to give a line-by-line review.  I did however
build and test qemu-nbd with the patches:

  $ ./build/qemu-nbd /var/tmp/test.qcow2 
  $ nbdinfo nbd://localhost
  ...
can_multi_conn: false


  $ ./build/qemu-nbd -e 2 /var/tmp/test.qcow2 
  $ nbdinfo nbd://localhost
  ...
can_multi_conn: false

^^^ Is this expected?  It also happens with -e 0.


  $ ./build/qemu-nbd -e 2 -m on /var/tmp/test.qcow2 
  $ nbdinfo nbd://localhost
  ...
can_multi_conn: true


  $ ./build/qemu-nbd -e 2 -m off /var/tmp/test.qcow2 
  $ nbdinfo nbd://localhost
  ...
can_multi_conn: false


Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

2022-02-16 Thread Richard W.M. Jones

On Tue, Feb 15, 2022 at 05:24:14PM -0600, Eric Blake wrote:
> Oh. The QMP command (which is immediately visible through
> nbd-server-add/block-storage-add to qemu and qemu-storage-daemon)
> gains "multi-conn":"on", but you may be right that qemu-nbd would want
> a command line option (either that, or we accellerate our plans that
> qsd should replace qemu-nbd).

I really hope there will always be something called "qemu-nbd"
that acts like qemu-nbd.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Re: [PATCH v2] Deprecate C virtiofsd

2022-02-10 Thread Richard W.M. Jones

On Thu, Feb 10, 2022 at 05:47:14PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> There's a nice new Rust implementation out there; recommend people
> do new work on that.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  docs/about/deprecated.rst | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 47a594a3b6..3c73d22729 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -454,3 +454,20 @@ nanoMIPS ISA
>  
>  The ``nanoMIPS`` ISA has never been upstreamed to any compiler toolchain.
>  As it is hard to generate binaries for it, declare it deprecated.
> +
> +Tools
> +-
> +
> +virtiofsd
> +'
> +
> +There is a new Rust implementation of ``virtiofsd`` at
> +``https://gitlab.com/virtio-fs/virtiofsd``;
> +since this is now marked stable, new development should be done on that
> +rather than the existing C version in the QEMU tree.
> +The C version will still accept fixes and patches that
> +are already in development for the moment, but will eventually
> +be deleted from this tree.
> +New deployments should use the Rust version, and existing systems
> +should consider moving to it.  The command line and feature set
> +is very close and moving should be simple.

I'm not qualified to say if the Rust impl is complete enough
to replace the C version, so I won't add a reviewed tag.

However I want to say that from the point of view of downstream
packagers of qemu -- especially Fedora -- it would be helpful if we
could direct both upstream development effort and downstream packaging
into just the one virtiofsd.  So I agree in principle with this.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: Block alignment of qcow2 compress driver

2022-01-28 Thread Richard W.M. Jones



On Fri, Jan 28, 2022 at 01:30:43PM +0100, Hanna Reitz wrote:
> On 28.01.22 13:18, Richard W.M. Jones wrote:
> >On Fri, Jan 28, 2022 at 12:57:47PM +0100, Hanna Reitz wrote:
> >>On 28.01.22 12:48, Richard W.M. Jones wrote:
> >>>On Fri, Jan 28, 2022 at 12:39:11PM +0100, Hanna Reitz wrote:
> >>>>So I actually don’t know why it works for you.  OTOH, I don’t
> >>>>understand why the block size affects you over NBD, because I would
> >>>>have expected qemu to internally auto-align requests when they are
> >>>>not aligned (in bdrv_co_pwritev_part()).
> >>>I checked it again and my hack definitely fixes nbdcopy.  But maybe
> >>>that's expected if qemu-nbd is auto-aligning requests?  (I'm only
> >>>accessing the block layer through qemu-nbd, not with qemu-io)
> >>It’s not just qemu-io, with your diff[3] I get the same EINVAL over
> >>NBD, too:
> >>
> >>$ ./qemu-img create -f qcow2 test.qcow2 64M
> >>Formatting 'test.qcow2', fmt=qcow2 cluster_size=65536
> >>extended_l2=off compression_type=zlib size=67108864
> >>lazy_refcounts=off refcount_bits=16
> >>
> >>$ ./qemu-nbd --fork --image-opts \
> >>driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=test.qcow2
> >>
> >>$ ./qemu-io -c 'write 0 32k' -f raw nbd://localhost
> >>write failed: Invalid argument
> >Strange - is that error being generated by qemu's nbd client code?
> 
> It’s generated by qcow2, namely the exact place I pointed out (as
> [1]).  I can see that when I put an fprintf there.

I can't reproduce this behaviour (with qemu @ cfe63e46be0a, the head
of git at time of writing).  I wonder if I'm doing something wrong?

  ++ /home/rjones/d/qemu/build/qemu-img create -f qcow2 output.qcow2 64k
  Formatting 'output.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=65536 lazy_refcounts=off refcount_bits=16
  ++ sleep 1
  ++ /home/rjones/d/qemu/build/qemu-nbd -t --image-opts 
driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2
  ++ /home/rjones/d/qemu/build/qemu-io -c 'write 0 32k' -f raw nbd://localhost
  wrote 32768/32768 bytes at offset 0
  32 KiB, 1 ops; 00.02 sec (1.547 MiB/sec and 49.5067 ops/sec)

> >I know I said I didn't care about performance (in this case), but is
> >there in fact a penalty to sending unaligned requests to the qcow2
> >layer?  Or perhaps it cannot compress them?
> 
> In qcow2, only the whole cluster can be compressed, so writing
> compressed data means having to write the whole cluster.  qcow2
> could implement the padding by itself, but we decided to just leave
> the burden of only writing full clusters (with the COMPRESSED write
> flag) on the callers.

I feel like this may be a bug in what qemu-nbd advertises.  Currently
it is:

$ qemu-nbd -t --image-opts 
driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2
 &
[2] 2068900
$ nbdinfo nbd://localhost
protocol: newstyle-fixed without TLS
export="":
export-size: 65536 (64K)
uri: nbd://localhost:10809/
contexts:
base:allocation
is_rotational: false
is_read_only: false
can_cache: true
can_df: true
can_fast_zero: true
can_flush: true
can_fua: true
can_multi_conn: false
can_trim: true
can_zero: true
block_size_minimum: 65536<---
block_size_preferred: 65536
block_size_maximum: 33554432

block_size_preferred is (rightly) set to 64K, as that's what the
compress + qcow2 combination prefers.

But block_size_minimum sounds as if it should be 512 or 1, if qemu-nbd
is able to reassemble smaller than preferred requests, even if they
are suboptimal.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: Block alignment of qcow2 compress driver

2022-01-28 Thread Richard W.M. Jones

On Fri, Jan 28, 2022 at 01:30:53PM +, Richard W.M. Jones wrote:
> I feel like this may be a bug in what qemu-nbd advertises.  Currently
> it is:

Ignore this email, see other reply.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Re: Block alignment of qcow2 compress driver

2022-01-28 Thread Richard W.M. Jones

On Fri, Jan 28, 2022 at 02:19:44PM +0100, Kevin Wolf wrote:
> Am 28.01.2022 um 13:30 hat Hanna Reitz geschrieben:
> > > > I just changed that line of code [2], as shown in [4].  I suppose
> > > > the better thing to do would be to have an option for the NBD server
> > > > to force-change the announced request alignment, because it can
> > > > expect the qemu block layer code to auto-align requests through
> > > > RMW.  Doing it in the client is wrong, because the NBD server might
> > > > want to detect that the client sends unaligned requests and reject
> > > > them (though ours doesn’t, it just traces such events[5] – note that
> > > > it’s explicitly noted there that qemu will auto-align requests).
> > > I know I said I didn't care about performance (in this case), but is
> > > there in fact a penalty to sending unaligned requests to the qcow2
> > > layer?  Or perhaps it cannot compress them?
> > 
> > In qcow2, only the whole cluster can be compressed, so writing compressed
> > data means having to write the whole cluster.  qcow2 could implement the
> > padding by itself, but we decided to just leave the burden of only writing
> > full clusters (with the COMPRESSED write flag) on the callers.
> > 
> > Things like qemu-img convert and blockdev-backup just adhere to that by
> > design; and the compress driver makes sure to set its request alignment
> > accordingly so that requests to it will always be aligned to the cluster
> > size (either by its user, or by the qemu block layer which performs the
> > padding automatically).
> 
> I thought the more limiting factor would be that after auto-aligning the
> first request by padding with zeros, the second request to the same
> cluster would fail because compression doesn't allow using an already
> allocated cluster:
> 
> /* Compression can't overwrite anything. Fail if the cluster was already
>  * allocated. */
> cluster_offset = get_l2_entry(s, l2_slice, l2_index);
> if (cluster_offset & L2E_OFFSET_MASK) {
> qcow2_cache_put(s->l2_table_cache, (void **) _slice);
> return -EIO;
> }
> 
> Did you always just test a single request or why don't you run into
> this?

I didn't test that one specifically and yes it does fail:

$ qemu-img create -f qcow2 output.qcow2 1M
Formatting 'output.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
$ qemu-nbd -t --image-opts 
driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2
 &
[1] 2069037

$ nbdsh -u nbd://localhost
nbd> h.set_strict_mode(h.get_strict_mode() & ~nbd.STRICT_ALIGN)
nbd> buf = b'1' * 1024
nbd> h.pwrite(buf, 0)
nbd> h.pwrite(buf, 1024)
Traceback (most recent call last):
  File "/usr/lib64/python3.10/code.py", line 90, in runcode
exec(code, self.locals)
  File "", line 1, in 
  File "/usr/lib64/python3.10/site-packages/nbd.py", line 1631, in pwrite
return libnbdmod.pwrite(self._o, buf, offset, flags)
nbd.Error: nbd_pwrite: write: command failed: Input/output error (EIO)

So what I said in the previous email about about minimum vs preferred
is wrong :-(

What's more interesting is that nbdcopy still appeared to work.
Simulating what that was doing would be something like which
also fails when I do it directly:

nbd> h.pwrite(buf, 0)
nbd> h.zero(1024, 1024)
Traceback (most recent call last):
  File "/usr/lib64/python3.10/code.py", line 90, in runcode
exec(code, self.locals)
  File "", line 1, in 
  File "/usr/lib64/python3.10/site-packages/nbd.py", line 1782, in zero
return libnbdmod.zero(self._o, count, offset, flags)
nbd.Error: nbd_zero: write-zeroes: command failed: Input/output error (EIO)

Anyway back to poking at nbdcopy to make it support block sizes ...

> I guess checking L2E_OFFSET_MASK is strictly speaking wrong because it's
> invalid for compressed clusters (qcow2_get_cluster_type() feels more
> appropriate), but in practice, you will always have non-zero data there,
> so it should error out here.
> 
> Kevin

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Re: Block alignment of qcow2 compress driver

2022-01-28 Thread Richard W.M. Jones

On Fri, Jan 28, 2022 at 12:57:47PM +0100, Hanna Reitz wrote:
> On 28.01.22 12:48, Richard W.M. Jones wrote:
> >On Fri, Jan 28, 2022 at 12:39:11PM +0100, Hanna Reitz wrote:
> >>So I actually don’t know why it works for you.  OTOH, I don’t
> >>understand why the block size affects you over NBD, because I would
> >>have expected qemu to internally auto-align requests when they are
> >>not aligned (in bdrv_co_pwritev_part()).
> >I checked it again and my hack definitely fixes nbdcopy.  But maybe
> >that's expected if qemu-nbd is auto-aligning requests?  (I'm only
> >accessing the block layer through qemu-nbd, not with qemu-io)
> 
> It’s not just qemu-io, with your diff[3] I get the same EINVAL over
> NBD, too:
> 
> $ ./qemu-img create -f qcow2 test.qcow2 64M
> Formatting 'test.qcow2', fmt=qcow2 cluster_size=65536
> extended_l2=off compression_type=zlib size=67108864
> lazy_refcounts=off refcount_bits=16
> 
> $ ./qemu-nbd --fork --image-opts \
> driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=test.qcow2
> 
> $ ./qemu-io -c 'write 0 32k' -f raw nbd://localhost
> write failed: Invalid argument

Strange - is that error being generated by qemu's nbd client code?

Here's my test not involving qemu's client code:

$ qemu-nbd --version
qemu-nbd 6.2.0 (qemu-6.2.0-2.fc36)

$ qemu-img create -f qcow2 output.qcow2 1M
Formatting 'output.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16

$ qemu-nbd --fork --image-opts 
driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2

$ nbdsh -u nbd://localhost
nbd> h.get_strict_mode()
31
nbd> h.set_strict_mode(31 & ~nbd.STRICT_ALIGN)
nbd> h.get_strict_mode()
15
nbd> h.pwrite(b'1'*1024, 0)
nbd> exit

So an unaligned 1K write works (after disabling libnbd's client-side
alignment checks).

> I just changed that line of code [2], as shown in [4].  I suppose
> the better thing to do would be to have an option for the NBD server
> to force-change the announced request alignment, because it can
> expect the qemu block layer code to auto-align requests through
> RMW.  Doing it in the client is wrong, because the NBD server might
> want to detect that the client sends unaligned requests and reject
> them (though ours doesn’t, it just traces such events[5] – note that
> it’s explicitly noted there that qemu will auto-align requests).

I know I said I didn't care about performance (in this case), but is
there in fact a penalty to sending unaligned requests to the qcow2
layer?  Or perhaps it cannot compress them?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Re: Block alignment of qcow2 compress driver

2022-01-28 Thread Richard W.M. Jones



I hacked nbdcopy to ignore block alignment (the error actually comes
from libnbd refusing to send the unaligned request, not from
qemu-nbd), and indeed qemu-nbd accepts the unaligned request without
complaint.

Eric - maybe having some flag for nbdcopy to ignore unaligned requests
when we know the server doesn't care (qemu-nbd) would work?

Rich.

--- a/copy/nbd-ops.c
+++ b/copy/nbd-ops.c
@@ -59,6 +59,10 @@ open_one_nbd_handle (struct rw_nbd *rwn)
 exit (EXIT_FAILURE);
   }
 
+  uint32_t sm = nbd_get_strict_mode (nbd);
+  sm &= ~LIBNBD_STRICT_ALIGN;
+  nbd_set_strict_mode (nbd, sm);
+
   nbd_set_debug (nbd, verbose);
 
   if (extents && rwn->d == READING &&


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: Block alignment of qcow2 compress driver

2022-01-28 Thread Richard W.M. Jones

On Fri, Jan 28, 2022 at 12:39:11PM +0100, Hanna Reitz wrote:
> So I actually don’t know why it works for you.  OTOH, I don’t
> understand why the block size affects you over NBD, because I would
> have expected qemu to internally auto-align requests when they are
> not aligned (in bdrv_co_pwritev_part()).

I checked it again and my hack definitely fixes nbdcopy.  But maybe
that's expected if qemu-nbd is auto-aligning requests?  (I'm only
accessing the block layer through qemu-nbd, not with qemu-io)

> Like, when I set the NBD
> block driver’s alignment to 512[2], the following still succeeds:

Did you just patch that line in the code or is there a qemu-nbd
option/image-opts to do this?

Rich.

> [1] https://gitlab.com/qemu-project/qemu/-/blob/master/block/qcow2.c#L4662
> [2] https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L1918

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Block alignment of qcow2 compress driver

2022-01-28 Thread Richard W.M. Jones

The commands below set up a sparse RAM disk, with an allocated block
at offset 32K and another one at offset 1M-32K.  Then it tries to copy
this to a compressed qcow2 file using qemu-nbd + the qemu compress
filter:

  $ qemu-img create -f qcow2 output.qcow2 1M
  $ qemu-nbd -t --image-opts 
driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=output.qcow2
 & sleep 1
  $ nbdkit -U - \
   data '@32768 1*32768 @1015808 1*32768' \
   --run 'nbdcopy $uri nbd://localhost -p'

The nbdcopy command fails when zeroing the first 32K with:

  nbd://localhost: nbd_aio_zero: request is unaligned: Invalid argument

This is a bug in nbdcopy because it ignores the minimum block size
being correctly declared by the compress filter:

  $ nbdinfo nbd://localhost
  protocol: newstyle-fixed without TLS
  export="":
export-size: 1048576 (1M)
uri: nbd://localhost:10809/
contexts:
  ...
block_size_minimum: 65536  <
block_size_preferred: 65536
block_size_maximum: 33554432

The compress filter sets the minimum block size to the the same as the
qcow2 cluster size here:

  
https://gitlab.com/qemu-project/qemu/-/blob/cfe63e46be0a1f8a7fd2fd5547222f8344a43279/block/filter-compress.c#L117

I patched qemu to force this to 4K:

-bs->bl.request_alignment = bdi.cluster_size;
+//bs->bl.request_alignment = bdi.cluster_size;
+bs->bl.request_alignment = 4096;

and the copy above works, and the output file is compressed!

So my question is, does the compress filter in qemu really need to
declare the large minimum block size?  I'm not especially concerned
about efficiency, I'd prefer it just worked, and changing nbdcopy to
understand block sizes is painful.

Is it already adjustable at run time?  (I tried using --image-opts
like compress.request_alignment=4096 but it seems like the filter
doesn't support anything I could think of, and I don't know how to
list the supported options.)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: [PATCH v1 20/34] tests/docker: add libfuse3 development headers

2022-01-05 Thread Richard W.M. Jones

On Wed, Jan 05, 2022 at 01:49:55PM +, Alex Bennée wrote:
> From: Stefan Hajnoczi 
> 
> The FUSE exports feature is not built because most container images do
> not have libfuse3 development headers installed. Add the necessary
> packages to the Dockerfiles.
> 
> Cc: Hanna Reitz 
> Cc: Richard W.M. Jones 
> Signed-off-by: Stefan Hajnoczi 
> Acked-by: Richard W.M. Jones 
> Reviewed-by: Beraldo Leal 
> Tested-by: Beraldo Leal 
> Message-Id: <20211207160025.52466-1-stefa...@redhat.com>
> [AJB: migrate to lcitool qemu.yml and regenerate]
> Signed-off-by: Alex Bennée 


I checked all the package names and they look good, so:

Reviewed-by: Richard W.M. Jones 

Rich.

>  tests/docker/dockerfiles/alpine.docker| 1 +
>  tests/docker/dockerfiles/centos8.docker   | 1 +
>  tests/docker/dockerfiles/fedora.docker| 1 +
>  tests/docker/dockerfiles/opensuse-leap.docker | 1 +
>  tests/docker/dockerfiles/ubuntu2004.docker| 1 +
>  tests/lcitool/projects/qemu.yml   | 1 +
>  6 files changed, 6 insertions(+)
> 
> diff --git a/tests/docker/dockerfiles/alpine.docker 
> b/tests/docker/dockerfiles/alpine.docker
> index 97c7a88d1f..eb2251c81c 100644
> --- a/tests/docker/dockerfiles/alpine.docker
> +++ b/tests/docker/dockerfiles/alpine.docker
> @@ -29,6 +29,7 @@ RUN apk update && \
>  dtc-dev \
>  eudev-dev \
>  findutils \
> +fuse3-dev \
>  g++ \
>  gcc \
>  gcovr \
> diff --git a/tests/docker/dockerfiles/centos8.docker 
> b/tests/docker/dockerfiles/centos8.docker
> index 3c62b62a99..cbb909d02b 100644
> --- a/tests/docker/dockerfiles/centos8.docker
> +++ b/tests/docker/dockerfiles/centos8.docker
> @@ -30,6 +30,7 @@ RUN dnf update -y && \
>  device-mapper-multipath-devel \
>  diffutils \
>  findutils \
> +fuse3-devel \
>  gcc \
>  gcc-c++ \
>  genisoimage \
> diff --git a/tests/docker/dockerfiles/fedora.docker 
> b/tests/docker/dockerfiles/fedora.docker
> index 6784878b56..60207f3da3 100644
> --- a/tests/docker/dockerfiles/fedora.docker
> +++ b/tests/docker/dockerfiles/fedora.docker
> @@ -37,6 +37,7 @@ exec "$@"' > /usr/bin/nosync && \
>  device-mapper-multipath-devel \
>  diffutils \
>  findutils \
> +fuse3-devel \
>  gcc \
>  gcc-c++ \
>  gcovr \
> diff --git a/tests/docker/dockerfiles/opensuse-leap.docker 
> b/tests/docker/dockerfiles/opensuse-leap.docker
> index 5510bdf19c..f57d8cfb29 100644
> --- a/tests/docker/dockerfiles/opensuse-leap.docker
> +++ b/tests/docker/dockerfiles/opensuse-leap.docker
> @@ -22,6 +22,7 @@ RUN zypper update -y && \
> dbus-1 \
> diffutils \
> findutils \
> +   fuse3-devel \
> gcc \
> gcc-c++ \
> gcovr \
> diff --git a/tests/docker/dockerfiles/ubuntu2004.docker 
> b/tests/docker/dockerfiles/ubuntu2004.docker
> index 40402b91fe..4e562dfdcd 100644
> --- a/tests/docker/dockerfiles/ubuntu2004.docker
> +++ b/tests/docker/dockerfiles/ubuntu2004.docker
> @@ -46,6 +46,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
>  libepoxy-dev \
>  libfdt-dev \
>  libffi-dev \
> +libfuse3-dev \
>  libgbm-dev \
>  libgcrypt20-dev \
>  libglib2.0-dev \
> diff --git a/tests/lcitool/projects/qemu.yml b/tests/lcitool/projects/qemu.yml
> index 2e2271510e..ed5ab1407a 100644
> --- a/tests/lcitool/projects/qemu.yml
> +++ b/tests/lcitool/projects/qemu.yml
> @@ -18,6 +18,7 @@ packages:
>   - diffutils
>   - dtrace
>   - findutils
> + - fuse3
>   - g++
>   - gcc
>   - gcovr
> -- 
> 2.30.2

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: Redesign of QEMU startup & initial configuration

2022-01-04 Thread Richard W.M. Jones

Sorry for very delayed reply ...

On Thu, Dec 02, 2021 at 07:57:38AM +0100, Markus Armbruster wrote:
> 1. QMP only
> 
>Management applications need to use QMP for monitoring anyway.  They
>may want to use it for initial configuration, too.  Libvirt does.
> 
>They still need to bootstrap a QMP monitor, and for that, CLI is fine
>as long as it's simple and stable.

libguestfs actually does not use the QMP monitor but manages to
configure eveything from the command line just fine.  I've attached
below a typical example.  (Of course we can use libvirt too, but still
for many configurations libvirt causes problems unfortunately).

> Human users struggle with inconsistent syntax, insufficiently expressive
> configuration files, and huge command lines.

One advantage of "huge command lines" is that they document the
configuration of qemu quite well.  I know it's only an approximation,
but in many cases it's exactly what we want.  It is frequently the
case when troubleshooting that we ask the user "what is the qemu
command line", and they can get that from the libvirt log file or
through "ps".

So we need to consider this and make sure that everything is changed
so we don't regress.  libguestfs will need substantial changes and
libvirt must dump the full configuration (qinfo or whatever) to the
log file.

I don't really disagreee with anything else you wrote however.

Rich.

libguestfs example:

/usr/bin/qemu-kvm \
-global virtio-blk-pci.scsi=off \
-no-user-config \
-nodefaults \
-display none \
-machine accel=kvm:tcg,graphics=off \
-cpu max \
-m 1280 \
-no-reboot \
-rtc driftfix=slew \
-no-hpet \
-global kvm-pit.lost_tick_policy=discard \
-kernel /var/tmp/.guestfs-1000/appliance.d/kernel \
-initrd /var/tmp/.guestfs-1000/appliance.d/initrd \
-object rng-random,filename=/dev/urandom,id=rng0 \
-device virtio-rng-pci,rng=rng0 \
-device virtio-scsi-pci,id=scsi \
-drive 
file=/tmp/libguestfs9bBO1w/scratch1.img,cache=unsafe,format=raw,id=hd0,if=none \
-device scsi-hd,drive=hd0 \
-drive 
file=/var/tmp/.guestfs-1000/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none
 \
-device scsi-hd,drive=appliance \
-device virtio-serial-pci \
-serial stdio \
-chardev 
socket,path=/run/user/1000/libguestfsGIlAlu/guestfsd.sock,id=channel0 \
-device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
-append "panic=1 console=ttyS0 edd=off udevtimeout=6000 
udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory 
usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 
root=UUID=9e6e8889-f991-45a3-bb41-67acebe7b062 selinux=0 guestfs_verbose=1 
TERM=xterm-256color"

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Re: [PATCH v2] Move the libssh setup from configure to meson.build

2021-12-09 Thread Richard W.M. Jones

On Thu, Dec 09, 2021 at 04:08:24PM +0100, Thomas Huth wrote:
> On 09/12/2021 15.55, Richard W.M. Jones wrote:
> >On Thu, Dec 09, 2021 at 03:48:01PM +0100, Thomas Huth wrote:
> >>It's easier to do this in meson.build now.
> >>
> >>Signed-off-by: Thomas Huth 
> >>---
> >>  v2: Added the missing "config_host_data.set('CONFIG_LIBSSH', 
> >> libssh.found())"
> >>
> >>  configure | 27 ---
> >>  meson.build   | 13 +
> >>  meson_options.txt |  2 ++
> >>  scripts/meson-buildoptions.sh |  3 +++
> >>  4 files changed, 14 insertions(+), 31 deletions(-)
> >>
> >>diff --git a/configure b/configure
> >>index 48c21775f3..bb99a40ed0 100755
> >>--- a/configure
> >>+++ b/configure
> >>@@ -344,7 +344,6 @@ debug_stack_usage="no"
> >>  crypto_afalg="no"
> >>  tls_priority="NORMAL"
> >>  tpm="$default_feature"
> >>-libssh="$default_feature"
> >>  live_block_migration=${default_feature:-yes}
> >>  numa="$default_feature"
> >>  replication=${default_feature:-yes}
> >>@@ -1078,10 +1077,6 @@ for opt do
> >>;;
> >>--enable-tpm) tpm="yes"
> >>;;
> >>-  --disable-libssh) libssh="no"
> >>-  ;;
> >>-  --enable-libssh) libssh="yes"
> >>-  ;;
> >>--disable-live-block-migration) live_block_migration="no"
> >>;;
> >>--enable-live-block-migration) live_block_migration="yes"
> >>@@ -1448,7 +1443,6 @@ cat << EOF
> >>live-block-migration   Block migration in the main migration stream
> >>coroutine-pool  coroutine freelist (better performance)
> >>tpm TPM support
> >>-  libssh  ssh block device support
> >>numalibnuma support
> >>avx2AVX2 optimization support
> >>avx512f AVX512F optimization support
> >>@@ -2561,21 +2555,6 @@ if test "$modules" = yes; then
> >>  fi
> >>  fi
> >>-##
> >>-# libssh probe
> >>-if test "$libssh" != "no" ; then
> >>-  if $pkg_config --exists "libssh >= 0.8.7"; then
> >>-libssh_cflags=$($pkg_config libssh --cflags)
> >>-libssh_libs=$($pkg_config libssh --libs)
> >>-libssh=yes
> >>-  else
> >>-if test "$libssh" = "yes" ; then
> >>-  error_exit "libssh required for --enable-libssh"
> >>-fi
> >>-libssh=no
> >>-  fi
> >>-fi
> >>-
> >>  ##
> >>  # TPM emulation is only on POSIX
> >>@@ -3636,12 +3615,6 @@ if test "$cmpxchg128" = "yes" ; then
> >>echo "CONFIG_CMPXCHG128=y" >> $config_host_mak
> >>  fi
> >>-if test "$libssh" = "yes" ; then
> >>-  echo "CONFIG_LIBSSH=y" >> $config_host_mak
> >>-  echo "LIBSSH_CFLAGS=$libssh_cflags" >> $config_host_mak
> >>-  echo "LIBSSH_LIBS=$libssh_libs" >> $config_host_mak
> >>-fi
> >>-
> >>  if test "$live_block_migration" = "yes" ; then
> >>echo "CONFIG_LIVE_BLOCK_MIGRATION=y" >> $config_host_mak
> >>  fi
> >>diff --git a/meson.build b/meson.build
> >>index 96de1a6ef9..ae67ca28ab 100644
> >>--- a/meson.build
> >>+++ b/meson.build
> >>@@ -874,11 +874,15 @@ if not get_option('glusterfs').auto() or have_block
> >>  ''', dependencies: glusterfs)
> >>endif
> >>  endif
> >>+
> >>  libssh = not_found
> >>-if 'CONFIG_LIBSSH' in config_host
> >>-  libssh = declare_dependency(compile_args: 
> >>config_host['LIBSSH_CFLAGS'].split(),
> >>-  link_args: 
> >>config_host['LIBSSH_LIBS'].split())
> >>+if not get_option('libssh').auto() or have_block
> >>+  libssh = dependency('libssh', version: '>=0.8.7',
> >>+method: 'pkg-config',
> >>+required: get_option('libssh'),
> >>+kwargs: static_kwargs)
> >>  endif
> >>+
> >>  libbzip2 = not_found
> >>  if not get_option('bzip2').auto

Re: [PATCH v2] Move the libssh setup from configure to meson.build

2021-12-09 Thread Richard W.M. Jones

On Thu, Dec 09, 2021 at 03:48:01PM +0100, Thomas Huth wrote:
> It's easier to do this in meson.build now.
> 
> Signed-off-by: Thomas Huth 
> ---
>  v2: Added the missing "config_host_data.set('CONFIG_LIBSSH', libssh.found())"
> 
>  configure | 27 ---
>  meson.build   | 13 +
>  meson_options.txt |  2 ++
>  scripts/meson-buildoptions.sh |  3 +++
>  4 files changed, 14 insertions(+), 31 deletions(-)
> 
> diff --git a/configure b/configure
> index 48c21775f3..bb99a40ed0 100755
> --- a/configure
> +++ b/configure
> @@ -344,7 +344,6 @@ debug_stack_usage="no"
>  crypto_afalg="no"
>  tls_priority="NORMAL"
>  tpm="$default_feature"
> -libssh="$default_feature"
>  live_block_migration=${default_feature:-yes}
>  numa="$default_feature"
>  replication=${default_feature:-yes}
> @@ -1078,10 +1077,6 @@ for opt do
>;;
>--enable-tpm) tpm="yes"
>;;
> -  --disable-libssh) libssh="no"
> -  ;;
> -  --enable-libssh) libssh="yes"
> -  ;;
>--disable-live-block-migration) live_block_migration="no"
>;;
>--enable-live-block-migration) live_block_migration="yes"
> @@ -1448,7 +1443,6 @@ cat << EOF
>live-block-migration   Block migration in the main migration stream
>coroutine-pool  coroutine freelist (better performance)
>tpm TPM support
> -  libssh  ssh block device support
>numalibnuma support
>avx2AVX2 optimization support
>avx512f AVX512F optimization support
> @@ -2561,21 +2555,6 @@ if test "$modules" = yes; then
>  fi
>  fi
>  
> -##
> -# libssh probe
> -if test "$libssh" != "no" ; then
> -  if $pkg_config --exists "libssh >= 0.8.7"; then
> -libssh_cflags=$($pkg_config libssh --cflags)
> -libssh_libs=$($pkg_config libssh --libs)
> -libssh=yes
> -  else
> -if test "$libssh" = "yes" ; then
> -  error_exit "libssh required for --enable-libssh"
> -fi
> -libssh=no
> -  fi
> -fi
> -
>  ##
>  # TPM emulation is only on POSIX
>  
> @@ -3636,12 +3615,6 @@ if test "$cmpxchg128" = "yes" ; then
>echo "CONFIG_CMPXCHG128=y" >> $config_host_mak
>  fi
>  
> -if test "$libssh" = "yes" ; then
> -  echo "CONFIG_LIBSSH=y" >> $config_host_mak
> -  echo "LIBSSH_CFLAGS=$libssh_cflags" >> $config_host_mak
> -  echo "LIBSSH_LIBS=$libssh_libs" >> $config_host_mak
> -fi
> -
>  if test "$live_block_migration" = "yes" ; then
>echo "CONFIG_LIVE_BLOCK_MIGRATION=y" >> $config_host_mak
>  fi
> diff --git a/meson.build b/meson.build
> index 96de1a6ef9..ae67ca28ab 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -874,11 +874,15 @@ if not get_option('glusterfs').auto() or have_block
>  ''', dependencies: glusterfs)
>endif
>  endif
> +
>  libssh = not_found
> -if 'CONFIG_LIBSSH' in config_host
> -  libssh = declare_dependency(compile_args: 
> config_host['LIBSSH_CFLAGS'].split(),
> -  link_args: config_host['LIBSSH_LIBS'].split())
> +if not get_option('libssh').auto() or have_block
> +  libssh = dependency('libssh', version: '>=0.8.7',
> +method: 'pkg-config',
> +required: get_option('libssh'),
> +kwargs: static_kwargs)
>  endif
> +
>  libbzip2 = not_found
>  if not get_option('bzip2').auto() or have_block
>libbzip2 = cc.find_library('bz2', has_headers: ['bzlib.h'],
> @@ -1451,6 +1455,7 @@ config_host_data.set('CONFIG_EBPF', libbpf.found())
>  config_host_data.set('CONFIG_LIBDAXCTL', libdaxctl.found())
>  config_host_data.set('CONFIG_LIBISCSI', libiscsi.found())
>  config_host_data.set('CONFIG_LIBNFS', libnfs.found())
> +config_host_data.set('CONFIG_LIBSSH', libssh.found())
>  config_host_data.set('CONFIG_LINUX_AIO', libaio.found())
>  config_host_data.set('CONFIG_LINUX_IO_URING', linux_io_uring.found())
>  config_host_data.set('CONFIG_LIBPMEM', libpmem.found())
> @@ -3430,7 +3435,7 @@ endif
>  summary_info += {'seccomp support':   seccomp}
>  summary_info += {'GlusterFS support': glusterfs}
>  summary_info += {'TPM support':   config_host.has_key('CONFIG_TPM')}
> -summary_info += {'libssh support':config_host.has_key('CONFIG_LIBSSH')}
> +summary_info += {'libssh support':libssh}
>  summary_info += {'lzo support':   lzo}
>  summary_info += {'snappy support':snappy}
>  summary_info += {'bzip2 support': libbzip2}
> diff --git a/meson_options.txt b/meson_options.txt
> index e392323732..4114bfcaa4 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -105,6 +105,8 @@ option('libdaxctl', type : 'feature', value : 'auto',
> description: 'libdaxctl support')
>  option('libpmem', type : 'feature', value : 'auto',
> description: 'libpmem support')
> +option('libssh', type : 'feature', value : 'auto',
> +   description: 'ssh block device support')
>  option('libudev', type : 'feature', value : 'auto',
>

Re: [PATCH] tests/docker: add libfuse3 development headers

2021-12-07 Thread Richard W.M. Jones

On Tue, Dec 07, 2021 at 04:00:25PM +, Stefan Hajnoczi wrote:
...
> diff --git a/tests/docker/dockerfiles/centos8.docker 
> b/tests/docker/dockerfiles/centos8.docker
> index 7f135f8e8c..a2dae4be29 100644
> --- a/tests/docker/dockerfiles/centos8.docker
> +++ b/tests/docker/dockerfiles/centos8.docker
> @@ -19,6 +19,7 @@ ENV PACKAGES \
>  device-mapper-multipath-devel \
>  diffutils \
>  findutils \
> +fuse3-devel \
>  gcc \
>  gcc-c++ \
>  genisoimage \

Just for my own notes, it took me a while to work out that CentOS 8
does have fuse3.  It didn't appear in EPEL 8 etc:

https://src.fedoraproject.org/rpms/fuse3
https://ci.centos.org/search/?q=fuse3

However it turns out it is built from a source package called "fuse"
(version 2.9.7!)  Also I am able to install fuse3 on RHEL 8.  So I
guess that's OK in the end.

The rest of the changes look good too, so:

Acked-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: QEMU 6.2.0 and rhbz#1999878

2021-12-03 Thread Richard W.M. Jones

On Fri, Dec 03, 2021 at 05:35:41PM -0300, Eduardo Lima wrote:
> 
> 
> On Fri, Dec 3, 2021 at 4:37 PM Richard W.M. Jones  wrote:
> 
> On Fri, Dec 03, 2021 at 04:20:23PM -0300, Eduardo Lima wrote:
> > Hi Rich,
> >
> > Can you confirm if the patch you added for qemu in Fedora has still not
> been
> > merged upstream? I could not find it on the git source tree.
> >
> > +Patch2: 0001-tcg-arm-Reduce-vector-alignment-requirement-for-NEON.patch
> > +From 1331e4eec016a295949009b4360c592401b089f7 Mon Sep 17 00:00:00 2001
> > +From: Richard Henderson 
> > +Date: Sun, 12 Sep 2021 10:49:25 -0700
> > +Subject: [PATCH] tcg/arm: Reduce vector alignment requirement for NEON
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1999878
> https://lists.nongnu.org/archive/html/qemu-devel/2021-09/msg01028.html
> 
> The patch I posted wasn't correct (or meant to be), it was just a
> workaround.  However I think you're right - I don't believe the
> original problem was ever fixed.
>
> Yes, I saw that your original patch had been replaced by this new
> one I mentioned, so I thought it was the correct solution, but I
> could not find this new one on the repository as well.

Oh I see, it was indeed replaced by Richard Henderson's patch:

https://src.fedoraproject.org/rpms/qemu/blob/rawhide/f/0001-tcg-arm-Reduce-vector-alignment-requirement-for-NEON.patch

> At the moment I kept it as part of 6.2.0 build, which I am just about to push
> to rawhide. It builds locally, and I am only waiting for the scratch-build to
> finish.

Yes looks like we need to keep it, and get it upstream too.

Thanks,

Rich.

> https://koji.fedoraproject.org/koji/taskinfo?taskID=79556515
> 
> Thanks, Eduardo.
> 
>  
> 
> 
> Let's see what upstreams says ...
> 
> Rich.
> 
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/
> ~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-p2v converts physical machines to virtual machines.  Boot with a
> live CD or over the network (PXE) and turn machines into KVM guests.
> http://libguestfs.org/virt-v2v
> 
> 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Re: QEMU 6.2.0 and rhbz#1999878

2021-12-03 Thread Richard W.M. Jones

On Fri, Dec 03, 2021 at 04:20:23PM -0300, Eduardo Lima wrote:
> Hi Rich,
> 
> Can you confirm if the patch you added for qemu in Fedora has still not been
> merged upstream? I could not find it on the git source tree.
> 
> +Patch2: 0001-tcg-arm-Reduce-vector-alignment-requirement-for-NEON.patch
> +From 1331e4eec016a295949009b4360c592401b089f7 Mon Sep 17 00:00:00 2001
> +From: Richard Henderson 
> +Date: Sun, 12 Sep 2021 10:49:25 -0700
> +Subject: [PATCH] tcg/arm: Reduce vector alignment requirement for NEON

https://bugzilla.redhat.com/show_bug.cgi?id=1999878
https://lists.nongnu.org/archive/html/qemu-devel/2021-09/msg01028.html

The patch I posted wasn't correct (or meant to be), it was just a
workaround.  However I think you're right - I don't believe the
original problem was ever fixed.

Let's see what upstreams says ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [PATCH v2 0/6] aio-posix: split poll check from ready handler

2021-12-02 Thread Richard W.M. Jones



Not sure if this is related, but builds are failing with:

FAILED: libblockdev.fa.p/block_export_fuse.c.o 
cc -m64 -mcx16 -Ilibblockdev.fa.p -I. -I.. -Iqapi -Itrace -Iui -Iui/shader 
-I/usr/include/fuse3 -I/usr/include/p11-kit-1 -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include -I/usr/include/sysprof-4 -fdiagnostics-color=auto 
-Wall -Winvalid-pch -std=gnu11 -O2 -g -isystem 
/home/rjones/d/qemu/linux-headers -isystem linux-headers -iquote . -iquote 
/home/rjones/d/qemu -iquote /home/rjones/d/qemu/include -iquote 
/home/rjones/d/qemu/disas/libvixl -iquote /home/rjones/d/qemu/tcg/i386 -pthread 
-DSTAP_SDT_V2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels 
-Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wno-missing-include-dirs 
-Wno-shift-negative-value -Wno-psabi -fstack-protector-strong -fPIE -MD -MQ 
libblockdev.fa.p/block_export_fuse.c.o -MF 
libblockdev.fa.p/block_export_fuse.c.o.d -o 
libblockdev.fa.p/block_export_fuse.c.o -c ../block/export/fuse.c
../block/export/fuse.c: In function ‘setup_fuse_export’:
../block/export/fuse.c:226:59: warning: passing argument 7 of 
‘aio_set_fd_handler’ from incompatible pointer type 
[-Wincompatible-pointer-types]
  226 |read_from_fuse_export, NULL, NULL, exp);
  |   ^~~
  |   |
  |   FuseExport *
In file included from ../block/export/fuse.c:22:
/home/rjones/d/qemu/include/block/aio.h:472:36: note: expected ‘void (*)(void 
*)’ but argument is of type ‘FuseExport *’
  472 | IOHandler *io_poll_ready,
  | ~~~^
../block/export/fuse.c:224:5: error: too few arguments to function 
‘aio_set_fd_handler’
  224 | aio_set_fd_handler(exp->common.ctx,
  | ^~
In file included from ../block/export/fuse.c:22:
/home/rjones/d/qemu/include/block/aio.h:466:6: note: declared here
  466 | void aio_set_fd_handler(AioContext *ctx,
  |  ^~
../block/export/fuse.c: In function ‘fuse_export_shutdown’:
../block/export/fuse.c:268:13: error: too few arguments to function 
‘aio_set_fd_handler’
  268 | aio_set_fd_handler(exp->common.ctx,
  | ^~
In file included from ../block/export/fuse.c:22:
/home/rjones/d/qemu/include/block/aio.h:466:6: note: declared here
  466 | void aio_set_fd_handler(AioContext *ctx,
  |  ^~

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Re: [PATCH 0/1] vmx: Fix mapping

2021-10-04 Thread Richard W.M. Jones

On Mon, Oct 04, 2021 at 04:50:51PM +0200, Laszlo Ersek wrote:
> On 10/04/21 11:59, Richard W.M. Jones wrote:
> > It turns out that changing the qemu implementation is painful,
> > particularly if we wish to maintain backwards compatibility of the
> > command line and live migration.
> >
> > Instead I opted to document comprehensively what all the
> > different hypervisors do:
> >
> >   
> > https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt
> 
> > Unfortunately QEMU made a significant mistake when implementing this
> > feature.  Because the string is 128 bits wrong, they decided it must
>   ^^
> 
> Haha, that's a great typo :)
> 
> > be a UUID (as you can see above there is no evidence that Microsoft
> > who wrote the original spec thought it was).  Following from this
> > incorrect assumption, they stated that the "UUID" must be supplied to
> > qemu in big endian format and must be byteswapped when writing it to
> > guest memory.  Their documentation says that they only do this for
> > little endian guests, but this is not true of their implementation
> > which byte swaps it for all guests.
> 
> I don't think this is what section "Endian-ness Considerations" in
> "docs/specs/vmgenid.txt" says. That text says that the *device* uses
> little-endian format. That's independent of the endianness of *CPU* of
> the particular guest architecture.
> 
> So the byte-swapping in the QEMU code occurs unconditionally because
> QEMU's UUID type is inherently big endian, and the device model in
> question is fixed little endian. The guest CPU's endianness is
> irrelevant as far as the device is concerned.
> 
> If a BE guest CPU intends to read the generation ID and to interpret it
> as a set of integers, then the guest CPU is supposed to byte-swap the
> appropriate fields itself.
> 
> > References
> 
> I suggest adding two links in this section, namely to:
> - docs/specs/vmgenid.txt
> - hw/acpi/vmgenid.c

Fair enough - I've made the changes you suggest (same URL as above).

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Re: [PATCH 0/1] vmx: Fix mapping

2021-10-04 Thread Richard W.M. Jones

It turns out that changing the qemu implementation is painful,
particularly if we wish to maintain backwards compatibility of the
command line and live migration.

Instead I opted to document comprehensively what all the
different hypervisors do:

https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt

On Thu, Sep 30, 2021 at 10:16:20AM +0100, Richard W.M. Jones wrote:
> I was going to suggest something like:
> 
>   aa-bb-cc..
> or
>   aabbcc..

After thinking about this some more, the real implementation on
Windows guest and host is two 64 bit little-endian integers[1].  How
about implementing this exactly the same way as Hyper-V (and VMware):

0x8877665544332211
0x00ffeeddccbbaa99

This would have to be mapped to the following qemu command line:

  qemu -device vmgenid,guid=44332211-6655-8877-99aa-bbccddeeff00

which would unmangle to the following in guest physical memory:

  11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff 00

The equivalent back-compat option would be:

  44332211-6655-8877-99aa-bbccddeeff00

Rich.

[1] No one has ever thought what to do about big-endian guests.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Re: [PATCH v3] nbd/server: Add --selinux-label option

2021-09-30 Thread Richard W.M. Jones

On Thu, Sep 30, 2021 at 02:00:11PM -0300, Willian Rampazzo wrote:
> On Thu, Sep 30, 2021 at 5:55 AM Vladimir Sementsov-Ogievskiy
>  wrote:
> >
> > 9/30/21 11:47, Richard W.M. Jones wrote:
> > > Under SELinux, Unix domain sockets have two labels.  One is on the
> > > disk and can be set with commands such as chcon(1).  There is a
> > > different label stored in memory (called the process label).  This can
> > > only be set by the process creating the socket.  When using SELinux +
> > > SVirt and wanting qemu to be able to connect to a qemu-nbd instance,
> > > you must set both labels correctly first.
> > >
> > > For qemu-nbd the options to set the second label are awkward.  You can
> > > create the socket in a wrapper program and then exec into qemu-nbd.
> > > Or you could try something with LD_PRELOAD.
> > >
> > > This commit adds the ability to set the label straightforwardly on the
> > > command line, via the new --selinux-label flag.  (The name of the flag
> > > is the same as the equivalent nbdkit option.)
> > >
> > > A worked example showing how to use the new option can be found in
> > > this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1984938
> > >
> > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1984938
> > > Signed-off-by: Richard W.M. Jones 
> > > Reviewed-by: Daniel P. Berrangé 
> > > Signed-off-by: Eric Blake 
> >
> > this should be Reviewed-by?
> 
> Maybe, because of this:
> https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg07081.html
> 
> I got confused with this v3.

Yes, I'd somehow lost the original patch and picked it up from Eric's
queue to make v3.

Having said that I'm not sure what the objection above means.  Do you
mean Eric's tag should be Reviewed-by instead of S-o-b?  (And why?)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Re: [PATCH 2/2] tests/docker: Fix fedora-i386-cross

2021-09-30 Thread Richard W.M. Jones

On Thu, Sep 30, 2021 at 12:36:36PM -0400, Richard Henderson wrote:
> By using PKG_CONFIG_PATH instead of PKG_CONFIG_LIBDIR,
> we were still including the 64-bit packages.  Install
> pcre-devel.i686 to fill a missing glib2 dependency.
> 
> By using --extra-cflags instead of --cpu, we incorrectly
> use the wrong probing during meson.
> 
> Cc: Alex Bennée 
> Cc: Paolo Bonzini 
> Cc: Daniel P. Berrangé 
> Cc: Richard W.M. Jones 
> Signed-off-by: Richard Henderson 
> ---
>  tests/docker/dockerfiles/fedora-i386-cross.docker | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/docker/dockerfiles/fedora-i386-cross.docker 
> b/tests/docker/dockerfiles/fedora-i386-cross.docker
> index dbb8195eb1..820740d5be 100644
> --- a/tests/docker/dockerfiles/fedora-i386-cross.docker
> +++ b/tests/docker/dockerfiles/fedora-i386-cross.docker
> @@ -17,12 +17,13 @@ ENV PACKAGES \
>  glibc-static.i686 \
>  gnutls-devel.i686 \
>  nettle-devel.i686 \
> +pcre-devel.i686 \
>  perl-Test-Harness \
>  pixman-devel.i686 \
>  zlib-devel.i686
>  
> -ENV QEMU_CONFIGURE_OPTS --extra-cflags=-m32 --disable-vhost-user
> -ENV PKG_CONFIG_PATH /usr/lib/pkgconfig
> +ENV QEMU_CONFIGURE_OPTS --cpu=i386 --disable-vhost-user
> +ENV PKG_CONFIG_LIBDIR /usr/lib/pkgconfig
>  
>  RUN dnf install -y $PACKAGES
>  RUN rpm -q $PACKAGES | sort > /packages.txt

While I'm not able to directly test this docker file, I did run the
equivalent commands (PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig ../configure
--cpu=i386 [etc]) and successfully build a 32-bit qemu binary on
Fedora 64-bit host with the multilib libraries installed.  Therefore
I'm pretty confident it should work:

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [PULL 00/20] NBD patches through 2021-09-27

2021-09-30 Thread Richard W.M. Jones

On Thu, Sep 30, 2021 at 10:27:45AM -0400, Richard Henderson wrote:
> On 9/30/21 4:45 AM, Richard W.M. Jones wrote:
> >   PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig ../configure --extra-cflags=-m32 
> > --disable-vhost-user
> 
> Not --extra-cflags, use --cpu=i386.

That also builds the 32 bit binary successfully.  config-meson.cross
doesn't contain -m32, if that matters.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH 0/1] vmx: Fix mapping

2021-09-30 Thread Richard W.M. Jones

On Thu, Sep 30, 2021 at 09:47:01AM +0100, Daniel P. Berrangé wrote:
> On Thu, Sep 30, 2021 at 08:33:48AM +0100, Richard W.M. Jones wrote:
> > I propose we deprecate the guid parameter in:
> > 
> >   -device vmgenid,guid=8987940a-0951-2cc5-e815-10634ff550b9,id=vmgenid0
> > 
> > Instead it will be replaced by bytes= which will simply write
> > the bytes, in the order they appear, into guest memory with no
> > attempt to interpret or byte-swap.  Something like:
> > 
> >   -device vmgenid,bytes=112233445566778899aabbccddeeff00,id=vmgenid0
> > 
> > (guid although deprecated will need to be kept around for a while,
> > along with its weird byte-swapping behaviour).
> > 
> > We will then have a plain and simple method to emulate the behaviour
> > of other hypervisors.  We will look at exactly what bytes they write
> > to guest memory and copy that behaviour when v2v converting from those
> > hypervisors.
> 
> From the libvirt POV, I'm not expecting anything in QEMU to change
> in this respect. If guid is replaced by a new attribute taking data
> in a different way, then libvirt will have to remap itself, so that
> existing usage in libvirt keeps working the same way as it did with
> guid.  Essentially from libvirt's POV, it is simply a documentation
> issue to specify how the libvirt XML representation translates to
> the guest visible representation, and ensure that all libvirt drivers
> implement it the same way. The QEMU genid support arrived first so
> that set the standard for how libvirt will represent it, that all
> further libvirt hypervisor drivers need to match.

I was going to suggest something like:

  aa-bb-cc..
or
  aabbcc..

with the type defaulting to guid for backwards compatibility.

Does libvirt XML have any other fields were you're passing
essentially small snippets of binary data?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

[PATCH v3] nbd/server: Add --selinux-label option

2021-09-30 Thread Richard W.M. Jones

Under SELinux, Unix domain sockets have two labels.  One is on the
disk and can be set with commands such as chcon(1).  There is a
different label stored in memory (called the process label).  This can
only be set by the process creating the socket.  When using SELinux +
SVirt and wanting qemu to be able to connect to a qemu-nbd instance,
you must set both labels correctly first.

For qemu-nbd the options to set the second label are awkward.  You can
create the socket in a wrapper program and then exec into qemu-nbd.
Or you could try something with LD_PRELOAD.

This commit adds the ability to set the label straightforwardly on the
command line, via the new --selinux-label flag.  (The name of the flag
is the same as the equivalent nbdkit option.)

A worked example showing how to use the new option can be found in
this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1984938

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1984938
Signed-off-by: Richard W.M. Jones 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Eric Blake 
---
 configure |  8 +++-
 meson.build   | 10 -
 meson_options.txt |  3 ++
 qemu-nbd.c| 39 +++
 tests/docker/dockerfiles/centos8.docker   |  1 +
 .../dockerfiles/fedora-i386-cross.docker  |  1 +
 tests/docker/dockerfiles/fedora.docker|  1 +
 tests/docker/dockerfiles/opensuse-leap.docker |  1 +
 tests/docker/dockerfiles/ubuntu1804.docker|  1 +
 tests/docker/dockerfiles/ubuntu2004.docker|  1 +
 10 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 1043ccce4f..b3211a66ee 100755
--- a/configure
+++ b/configure
@@ -445,6 +445,7 @@ fuse="auto"
 fuse_lseek="auto"
 multiprocess="auto"
 slirp_smbd="$default_feature"
+selinux="auto"
 
 malloc_trim="auto"
 gio="$default_feature"
@@ -1576,6 +1577,10 @@ for opt do
   ;;
   --disable-slirp-smbd) slirp_smbd=no
   ;;
+  --enable-selinux) selinux="enabled"
+  ;;
+  --disable-selinux) selinux="disabled"
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1963,6 +1968,7 @@ disabled with --disable-FEATURE, default is enabled if 
available
   multiprocessOut of process device emulation support
   gio libgio support
   slirp-smbd  use smbd (at path --smbd=*) in slirp networking
+  selinux SELinux support in qemu-nbd
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -5207,7 +5213,7 @@ if test "$skip_meson" = no; then
 -Dattr=$attr -Ddefault_devices=$default_devices 
-Dvirglrenderer=$virglrenderer \
 -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
 -Dvhost_user_blk_server=$vhost_user_blk_server 
-Dmultiprocess=$multiprocess \
--Dfuse=$fuse -Dfuse_lseek=$fuse_lseek 
-Dguest_agent_msi=$guest_agent_msi -Dbpf=$bpf\
+-Dselinux=$selinux \
 $(if test "$default_feature" = no; then echo 
"-Dauto_features=disabled"; fi) \
-Dtcg_interpreter=$tcg_interpreter \
 $cross_arg \
diff --git a/meson.build b/meson.build
index 15ef4d3c41..0ded2ac5eb 100644
--- a/meson.build
+++ b/meson.build
@@ -1072,6 +1072,11 @@ keyutils = dependency('libkeyutils', required: false,
 
 has_gettid = cc.has_function('gettid')
 
+# libselinux
+selinux = dependency('libselinux',
+ required: get_option('selinux'),
+ method: 'pkg-config', kwargs: static_kwargs)
+
 # Malloc tests
 
 malloc = []
@@ -1300,6 +1305,7 @@ config_host_data.set('CONFIG_FUSE', fuse.found())
 config_host_data.set('CONFIG_FUSE_LSEEK', fuse_lseek.found())
 config_host_data.set('CONFIG_X11', x11.found())
 config_host_data.set('CONFIG_CFI', get_option('cfi'))
+config_host_data.set('CONFIG_SELINUX', selinux.found())
 config_host_data.set('QEMU_VERSION', '"@0@"'.format(meson.project_version()))
 config_host_data.set('QEMU_VERSION_MAJOR', 
meson.project_version().split('.')[0])
 config_host_data.set('QEMU_VERSION_MINOR', 
meson.project_version().split('.')[1])
@@ -2759,7 +2765,8 @@ if have_tools
   qemu_io = executable('qemu-io', files('qemu-io.c'),
  dependencies: [block, qemuutil], install: true)
   qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
-   dependencies: [blockdev, qemuutil, gnutls], install: true)
+   dependencies: [blockdev, qemuutil, gnutls, selinux],
+   install: true)
 
   subdir('storage-daemon')
   subdir('contrib/rdmacm-mux')
@@ -3124,6 +3131,7 @@ summary_info += {'libpmem support':   libpmem.found()}
 summary_info += {'libdaxctl support': libdaxctl.found()}
 summary_info += {'libudev':   libudev.found()}
 summary_info += {'FUSE l

Re: [PULL 00/20] NBD patches through 2021-09-27

2021-09-30 Thread Richard W.M. Jones

On Wed, Sep 29, 2021 at 01:29:21PM -0500, Eric Blake wrote:
> On Wed, Sep 29, 2021 at 05:03:08PM +0200, Paolo Bonzini wrote:
> > On 29/09/21 15:58, Richard Henderson wrote:
> > > 
> > >  > /usr/bin/ld: /usr/lib64/libselinux.so: error adding symbols: file 
> > > in
> > >  > wrong format
> > >  > collect2: error: ld returned 1 exit status
> > > 
> > > Missing libselinux-devel.i686 in
> > > tests/docker/dockerfiles/fedora-i386-cross.docker, I think?

This part is easy to fix.

> > > But additionally, incorrect package probing, I think.
> > 
> > Probably Meson deciding to look at --print-search-dirs and crossing fingers.
> > But -m32 and other multilib flags should be added to config-meson.cross
> > rather than QEMU_CFLAGS.

However this part I've no idea.  The docker file uses:

  ENV QEMU_CONFIGURE_OPTS --extra-cflags=-m32 --disable-vhost-user

I emulated this locally using:

  PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig ../configure --extra-cflags=-m32 
--disable-vhost-user

but config-meson.cross does not have -m32 anywhere.  Nevertheless it
seemed to build a 32 bit qemu with selinux fine:

$ file ./qemu-system-x86_64
./qemu-system-x86_64: ELF 32-bit LSB pie executable, Intel 80386, version 1 
(SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, 
BuildID[sha1]=1d070416c7d211f8bfa018557265c25e79a913bb, for GNU/Linux 3.2.0, 
with debug_info, not stripped
$ ldd ./qemu-system-x86_64 | grep selin
  libselinux.so.1 => /lib/libselinux.so.1 (0xf52b)

I will post v3 soon, but how can I test patches under your CI system?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [PATCH 0/1] vmx: Fix mapping

2021-09-30 Thread Richard W.M. Jones



More data: I found a colleague who has a Hyper-V instance with a
Windows guest and he helped me to understand how Hyper-V represents
generation IDs.  Hyper-V has had support for generation IDs since long
before Microsoft proposed the feature for standardization.  Originally
(I think pre-2013) Hyper-V used an XML description which included:

4a07df4361fdf4c883f97bc30e524b9d

Note this is a pure hex string, not a GUID.

In current Hyper-V, the same representation is used but it's embedded
in a binary file.

My colleague ran two Windows VMs on Hyper-V and examined the
generation_id on the hypervisor and compared it to the output of
VMGENID.EXE inside the guest.

The first guest had this in the binary hypervisor metadata:

56b0  00 00 0e 67 65 6e 65 72  61 74 69 6f 6e 5f 69 64  |...generation_id|
56c0  00 40 00 00 00 38 00 30  00 35 00 61 00 32 00 38  |.@...8.0.5.a.2.8|
56d0  00 37 00 61 00 32 00 35  00 30 00 39 00 38 00 39  |.7.a.2.5.0.9.8.9|
56e0  00 65 00 34 00 66 00 65  00 36 00 66 00 36 00 39  |.e.4.f.e.6.f.6.9|
56f0  00 39 00 32 00 62 00 65  00 33 00 33 00 34 00 61  |.9.2.b.e.3.3.4.a|
5700  00 34 00 33 00 00 00 00  00 00 00 00 00 00 00 00  |.4.3|

and reported the output of VMGENID in this guest as:

VmCounterValue: fe6f6992be334a43:805a287a250989e4

The second guest had:

5360  c5 06 00 00 00 0e 67 65  6e 65 72 61 74 69 6f 6e  |..generation|
5370  5f 69 64 00 40 00 00 00  65 00 62 00 66 00 62 00  |_id.@...e.b.f.b.|
5380  62 00 37 00 39 00 37 00  33 00 36 00 35 00 37 00  |b.7.9.7.3.6.5.7.|
5390  32 00 38 00 65 00 37 00  30 00 62 00 33 00 66 00  |2.8.e.7.0.b.3.f.|
53a0  64 00 33 00 63 00 37 00  31 00 33 00 65 00 63 00  |d.3.c.7.1.3.e.c.|
53b0  65 00 38 00 34 00 32 00  00 00 00 00 00 00 00 00  |e.8.4.2.|

and VMGENID was:

VmCounterValue: b3fd3c713ece842:ebfbb797365728e7

Note:

 - In both cases, the generation ID is clearly not a GUID.

 - The two 64 bit words are swapped over somewhere, but individual
   bytes are not byte-swapped, and there is again no evidence of
   UUID-aware byte swapping.

--

So how do we get from where we are to a more sane vmgenid in qemu?

I propose we deprecate the guid parameter in:

  -device vmgenid,guid=8987940a-0951-2cc5-e815-10634ff550b9,id=vmgenid0

Instead it will be replaced by bytes= which will simply write
the bytes, in the order they appear, into guest memory with no
attempt to interpret or byte-swap.  Something like:

  -device vmgenid,bytes=112233445566778899aabbccddeeff00,id=vmgenid0

(guid although deprecated will need to be kept around for a while,
along with its weird byte-swapping behaviour).

We will then have a plain and simple method to emulate the behaviour
of other hypervisors.  We will look at exactly what bytes they write
to guest memory and copy that behaviour when v2v converting from those
hypervisors.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Re: [PULL 00/20] NBD patches through 2021-09-27

2021-09-29 Thread Richard W.M. Jones

On Wed, Sep 29, 2021 at 01:29:21PM -0500, Eric Blake wrote:
> On Wed, Sep 29, 2021 at 05:03:08PM +0200, Paolo Bonzini wrote:
> > On 29/09/21 15:58, Richard Henderson wrote:
> > > 
> > >  > /usr/bin/ld: /usr/lib64/libselinux.so: error adding symbols: file 
> > > in
> > >  > wrong format
> > >  > collect2: error: ld returned 1 exit status
> > > 
> > > Missing libselinux-devel.i686 in
> > > tests/docker/dockerfiles/fedora-i386-cross.docker, I think?
> > > 
> > > But additionally, incorrect package probing, I think.
> > 
> > Probably Meson deciding to look at --print-search-dirs and crossing fingers.
> > But -m32 and other multilib flags should be added to config-meson.cross
> > rather than QEMU_CFLAGS.
> 
> Rich, Dan, this is caused by 'nbd/server: Add --selinux-label option'
> (20/20 in this pull request); can you investigate?
> 
> In the short term, I'm leaning towards withdrawing that patch from the
> pull request, and getting everything else upstream; we can revisit a
> fixed version of that patch for my next pull request.

Yes I'll have to look at this later, so withdraw it from the PR for now.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Re: [PATCH 0/1] vmx: Fix mapping

2021-09-29 Thread Richard W.M. Jones

On Wed, Sep 29, 2021 at 11:10:35AM +0100, Daniel P. Berrangé wrote:
> On Wed, Sep 29, 2021 at 10:57:19AM +0100, Richard W.M. Jones wrote:
> > Looking at the qemu code the problem IMHO is:
> > 
> > https://gitlab.com/qemu-project/qemu/-/blob/6b54a31bf7b403672a798b6443b1930ae6c74dea/docs/specs/vmgenid.txt#L189
> > https://gitlab.com/qemu-project/qemu/-/blob/6b54a31bf7b403672a798b6443b1930ae6c74dea/hw/acpi/vmgenid.c#L37
> > 
> > This byte swapping makes no sense to me.  How do we know that the
> > guest is little endian?  What will this code do for BE guests?  I
> > think qemu would be better off treating the "GUID" as a list of bytes
> > and writing that exactly into the guest memory.
> 
> This is an artifact of the HyperV only caring about x86 and thus leaving
> endianness unspecified in the spec for GenID. QEMU docs say
> 
> 
> Endian-ness Considerations:
> ---
> 
> Although not specified in Microsoft's document, it is assumed that the
> device is expected to use little-endian format.
> 
> All GUID passed in via command line or monitor are treated as big-endian.
> GUID values displayed via monitor are shown in big-endian format.
> 
> 
> So by extension if libvirt is passing the value from its XML straight
> to QEMU, then libvirt has effectively defined that the XML is storing
> it big-endian too.
> 
> This could be where the confusion with VMX config is coming into play,
> though the byte re-ordering in v2v seems more complex than just an
> endianess-fiddle ?

qemu's qemu_uuid_bswap function only swaps some of the fields.

I think even more that applying qemu_uuid_bswap to these (not really)
"UUIDs" is a nonsense.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [PATCH 0/1] vmx: Fix mapping

2021-09-29 Thread Richard W.M. Jones

On Wed, Sep 29, 2021 at 10:46:38AM +0100, Richard W.M. Jones wrote:
> I don't know why we decided to use a GUID for this.  The feature
> itself (https://go.microsoft.com/fwlink/?LinkId=260709) defines it as
> an 128 bit / 8 byte number.  The only connection to GUIDs is the size.

*cough* .. 16 bytes :-)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [PATCH 0/1] vmx: Fix mapping

2021-09-29 Thread Richard W.M. Jones

On Wed, Sep 29, 2021 at 11:07:30AM +0100, Daniel P. Berrangé wrote:
> I'm not sure if we actually need the full driver or not for testing
> purposes. The the GenID is just in memory somewhere, and the somewhere
> is reported via ACPI table entry. For QEMU its easy as the data is
> exposed via fw_cfg which can be read from sysfs directly without
> even needing to look at ACPI entries to find it. Not sure how we
> find it with VMWare/HyperV though.

This still has the problem that qemu is mangling the vmgenid.
Nevertheless, on qemu-6.1.0-5.fc36.x86_64 I added this to a Linux
guest:

  11223344-5566-7788-99aa-bbccddeeff00

which turned into:

  -device vmgenid,guid=11223344-5566-7788-99aa-bbccddeeff00,id=vmgenid0

Inside the guest:

# ls /sys/firmware/qemu_fw_cfg/by_name/etc/vmgenid_guid/ -l
total 0
-r. 1 root root 4096 Sep 29 11:16 key
-r. 1 root root 4096 Sep 29 11:16 name
-r. 1 root root0 Sep 29 11:16 raw
-r. 1 root root 4096 Sep 29 11:16 size
# hexdump -C /sys/firmware/qemu_fw_cfg/by_name/etc/vmgenid_guid/raw 
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0020  00 00 00 00 00 00 00 00  44 33 22 11 66 55 88 77  |D3".fU.w|
0030  99 aa bb cc dd ee ff 00  00 00 00 00 00 00 00 00  ||
0040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
1000


But I think what I really need to do is look at the raw physical
address:

# hexdump -C /sys/firmware/qemu_fw_cfg/by_name/etc/vmgenid_addr/raw 
  28 f0 ff 7f 00 00 00 00   |(...|
0008

# dd if=/dev/mem bs=1 skip=$(( 0x7028 )) count=16 | hexdump -C
16+0 records in
16+0 records out
16 bytes copied, 6.0392e-05 s, 265 kB/s
  44 33 22 11 66 55 88 77  99 aa bb cc dd ee ff 00  |D3".fU.w|
0010


I think for VMware I'm really going to need the kernel driver, unless
there's some way that iasl can be used to extract the information?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Re: [PATCH 0/1] vmx: Fix mapping

2021-09-29 Thread Richard W.M. Jones

Looking at the qemu code the problem IMHO is:

https://gitlab.com/qemu-project/qemu/-/blob/6b54a31bf7b403672a798b6443b1930ae6c74dea/docs/specs/vmgenid.txt#L189
https://gitlab.com/qemu-project/qemu/-/blob/6b54a31bf7b403672a798b6443b1930ae6c74dea/hw/acpi/vmgenid.c#L37

This byte swapping makes no sense to me.  How do we know that the
guest is little endian?  What will this code do for BE guests?  I
think qemu would be better off treating the "GUID" as a list of bytes
and writing that exactly into the guest memory.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Re: [PATCH 0/1] vmx: Fix mapping

2021-09-29 Thread Richard W.M. Jones

On Wed, Sep 29, 2021 at 10:33:43AM +0100, Daniel P. Berrangé wrote:
> On Wed, Sep 29, 2021 at 10:20:44AM +0100, Richard W.M. Jones wrote:
> > On Wed, Sep 29, 2021 at 10:01:55AM +0200, Michal Privoznik wrote:
> > > Apparently, parsing vmx.genid is not as easy as I thought. Anyway, it
> > > was brought up in a private thread that libvirt doesn't report correct
> > > UUIDs. For instance for the following input:
> > > 
> > >   vm.genid = "-8536691797830587195"
> > >   vm.genidX = "-1723453263670062919"
> > 
> > (The two lines above come from a VMware .vmx file)
> > 
> > The only thing that really matters is what the guest sees.  I ran
> > VMGENID.EXE (https://bugzilla.redhat.com/show_bug.cgi?id=1598350#c3
> > https://docs.microsoft.com/en-gb/windows/win32/hyperv_v2/virtual-machine-generation-identifier)
> > inside the guest and it showed:
> > 
> >   8987940a09512cc5:e81510634ff550b9
> > 
> > Note these numbers are the hex equivalents of the VMware .vmx numbers:
> > 
> > >>> print("%x" % (2**64-8536691797830587195))
> > 8987940a09512cc5
> > >>> print("%x" % (2**64-1723453263670062919))
> > e81510634ff550b9
> > 
> > > Libvirt would report:
> > > 
> > >   8987940a-0951-2cc5-e815-10634ff550b9
> > > 
> > > while virt-v2v would report:
> > > 
> > >   09512cc5-940a-8987-b950-f54f631015e8
> > 
> > After thinking about this a bit more, IMHO the real problem is
> > with qemu.  If you pass this to qemu:
> > 
> >   -device vmgenid,guid=8987940a-0951-2cc5-e815-10634ff550b9,id=vmgenid0
> > 
> > then inside the guest VMGENID shows 2cc509518987940a:b950f54f631015e8 
> > (wrong)
> > 
> > If you pass this to qemu:
> > 
> >   ...guid=09512cc5-940a-8987-b950-f54f631015e8...
> > 
> > then inside the guest it sees 8987940a09512cc5:e81510634ff550b9
> > (the correct values, matching VMware).
> > 
> > This is the reason why virt-v2v mangles the GUID, in order to trick
> > libvirt into passing a mangled GUID which gets mangled again by qemu
> > which is the same as unmangling it.
> > 
> > So another way to fix this could be for us to fix qemu.  We could just
> > pass the two numbers to qemu instead of using GUIDs, eg:
> > 
> >   -device vmgenid,low=0x8987940a09512cc5,high=0xe81510634ff550b9,id=vmgenid0
> > 
> > and then we'd fix the implementation in qemu so that the low/high
> > values match what is seen by VMGENID.EXE in the guest.
> > 
> > Or we could have libvirt use the current GUID in  but to do the
> > mangling when it gets passed through to qemu (instead of virt-v2v
> > doing the mangling).
> 
> On the libvirt side, the #1 most important thing is that a given
> XML string
> 
>   8987940a-0951-2cc5-e815-10634ff550b9
> 
> results in the same value being exposed to the guest OS, for both
> the QEMU and VMWare drivers.  Whehter the guest sees the bytes in
> the same order is not essential, but it would be less of a surprise
> if it did match.

I don't know why we decided to use a GUID for this.  The feature
itself (https://go.microsoft.com/fwlink/?LinkId=260709) defines it as
an 128 bit / 8 byte number.  The only connection to GUIDs is the size.

> Ultimately as long as the mapping from libvirt XML to guest visible
> string is consistent across drivers, that's sufficient.
> 
> > Adding qemu-devel because I'm interesting to see if the qemu
> > developers would prefer to fix this properly in qemu.
> 
> No matter what order QEMU accepts the data in, it can be said to be
> functionally correct. If the order is "unusual" though, it ought to
> at least be documented how the QEMU order corresponds to guest visible
> order.
> 
> Incidentally I wonder how much to trust VMGENID.EXE and whether that
> actally reports what it gets out of guest memory "as is", or whether
> it has done any byte re-ordering.
> 
> I'm curious what Linux kernel sees for the genid on Vmware vs KVM
> hosts, as probably I'd trust that data more ?

As far as I can tell no driver has successfully made it upstream,
although there have been a few attempts:

  https://lkml.org/lkml/2018/3/1/498

Here's a more recent one from 10 months ago:

  
https://lore.kernel.org/linux-doc/3e05451b-a9cd-4719-99d0-72750a304...@amazon.com/

If I have time I'll patch a Linux kernel with the second patch and see
how it relates to the VMware and KVM parameters, but it probably won't
be today.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1226 matches

Mail list logo