Re: [Libguestfs] [nbdkit PATCH] RFC: noextents: Add extentmode=data config parameter

2022-02-08 Thread Laszlo Ersek
On 02/08/22 21:29, Eric Blake wrote:
> On Tue, Feb 08, 2022 at 12:39:02PM +0100, Laszlo Ersek wrote:
>> On 02/04/22 17:44, Eric Blake wrote:
>>> While writing tests for an nbdcopy bug, I found myself wanting a way
>>> to easily view an entire image as data, but without disabling extents
>>> support altogether.  The existing extentlist filter can do this, but
>>> requires a secondary file.
>>>
>>> Still an RFC because I need testsuite coverage similar to
>>> test-nozero.sh (eek - we don't have any direct test of the noextent
>>> filter, but only indirect coverage through other tests).  Also, are
>>> there any other extentmode=MODE values that might make sense?
>>> ---
> 
>>> +=item B
>>> +
>>> +Extent support is not advertised to the client; clients should not
>>> +query for extent information, and must assume the entire disk is
>>> +allocated.
>>> +
>>> +This is the default if the C parameter is not specified.
>>> +
>>> +=item B
>>> +
>>> +(nbdkit E 1.30)
>>> +
>>> +Extent support is advertised, but extent requests from the client will
>>> +be answered with a claim that the entire disk forms a single allocated
>>> +data extent.
>>> +
>>> +=back
>>> +
>>> +All other parameters are passed through to and processed by the
>>> +underlying plugin in the normal way.
>>
>> Is this necessary to spell out? (Looked at a random other filter's
>> documentation, and didn't see such a statement, so I think it's the
>> default.)
>>
>> (The same question applies to "plugin-args" in the synopsys, more or
>> less...)
> 
> Hmm, we aren't always consistent, but I agree that it can be pruned
> without loss.
> 
>>> +/* Produce single data extent. */
>>> +static int
>>> +noextents_extents (nbdkit_next *next,
>>> +   void *handle, uint32_t count, uint64_t offset,
>>> +   uint32_t flags,
>>> +   struct nbdkit_extents *ret_extents,
>>> +   int *err)
>>> +{
>>> +  assert (extentmode == DATA);
>>> +  return nbdkit_add_extent (ret_extents, offset, count, 0);
>>>  }
>>
>> I don't have the context in which this function may be called, in my
>> head, but this implementation seems to reflect precisely the requested
>> offset range back at the client, so a question arises:
>>
>> - What if the client requests an offset range that (at least partially)
>> exceeds the size of the file? In that case, I think we should not report
>> that range as existent. For example, the chunked block status query in
>> virt-v2v asks for 2GB offset ranges, and it's expected that the returned
>> extent list will not exceed the file size.
> 
> The code in server/ guarantees that we cannot call into a filter or
> plugin with an extents request that would read out of bounds; ie. on
> input, offset+count will never exceed what next->get_size() would tell
> us anyways.

Great!

(This is what I sort of expected, but wasn't sure about, hence "I don't
have the context in which this function may be called".)

> Conversely, nbdkit_add_extent() already permits us to
> pass in redundant information prior to offset (as long as we make
> forward progress by eventually using nbdkit_add_extent for at least
> one byte at offset before returning), as well as to provide more
> information than needed (the upper layer can set a clamp, such as when
> FLAG_REQ_ONE is in use by the client, or at the 32-bit boundary
> condition, where our additions beyond that clamp are merely ignored).
> So we could just as easily write
> 
>   return nbdkit_add_extent (ret_extents, 0, INT64_MAX, 0);
> 
> which will be trimmed to size as needed before going over the wire
> back to the client.

Wow, that's very safe API then!

> But now that I've written that, I can see that
> there is indeed a useful benchmarking distinction between using
> offset/count vs. 0/INT_MAX - the former behaves as if the client had
> always passed in FLAG_REQ_ONE (we can never progress faster than the
> client wants us to go), while the latter gives as much information to
> the client as possible (subject to 32-bit limits, until my 64-bit
> block status work is revived).  So instead of just one new mode named
> 'data', maybe we want a larger family of possible modes:
> 
> - mask (default; existing behavior of masking extent support so client
>   can't query them at all); client must assume disk is data
> - data-match (reply that the extent is data, but match the size of the
>   client request; behaves as if the client unconditionally uses
>   FLAG_REQ_ONE); client sees disk as data
> - data-full (reply that the entire remainder of the extent was data,
>   although the client's use of FLAG_REQ_ONE clamps it back to the
>   query size); client sees disk as data
> - req-one (pass the query on through to the plugin, but then
>   purposefully trim it back before replying to the client so that it
>   behaves as if the client unconditionally uses FLAG_REQ_ONE); client
>   can see holes, but cannot read ahead
> - plugin (pass the query on through to the plugin, with no

Re: [Libguestfs] [libnbd PATCH] copy: Fail nbdcopy if NBD read or write fails

2022-02-08 Thread Laszlo Ersek
On 02/08/22 14:43, Richard W.M. Jones wrote:
> On Fri, Feb 04, 2022 at 09:18:07AM +0100, Laszlo Ersek wrote:
>> (And there was no easy way to skip connect-tcp6.)
> 
> Wwe use this in libnbd.spec:
> 
> https://src.fedoraproject.org/rpms/libnbd/blob/rawhide/f/libnbd.spec#_214

Yes (I know what the link points to, without looking :)); that's why one
of my earlier patches (or patch sets) exposed the real issue in it only
in Brew! :)

> 
> In libguestfs we have a whole infrastructure for skipping tests by
> setting $SKIP_* variables in localenv.
> 
> https://libguestfs.org/guestfs-building.1.html#local-files

Ah, this is very nice.

For the various v2v projects, I've created a bunch of env setup scripts,
and those scripts consider the inter-project dependencies (based on the
graph you gave me earlier). So they more or less "chain" the various
"run" scripts in the project roots.

One problem with the ignored local* files in the git worktrees could be
that "git clean -ffdx" probably removes them. I have a $HOME/bin/scripts
dir on my $PATH, where I keep such scripts.

> https://github.com/libguestfs/libguestfs/blob/0553f90462f456b1fe7100346aa1a4f12fccb288/tests/test-functions.sh#L39
> 
> For libnbd/nbdkit it might be possible to automate it in the same way
> with tests/functions.sh.in + $0, but it's not implemented at the
> moment.  Of course it'd be better if the tests didn't fail :-)

Yes, absolutely.

Thanks!
Laszlo

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [nbdkit PATCH] RFC: noextents: Add extentmode=data config parameter

2022-02-08 Thread Laszlo Ersek
On 02/04/22 17:44, Eric Blake wrote:
> While writing tests for an nbdcopy bug, I found myself wanting a way
> to easily view an entire image as data, but without disabling extents
> support altogether.  The existing extentlist filter can do this, but
> requires a secondary file.
> 
> Still an RFC because I need testsuite coverage similar to
> test-nozero.sh (eek - we don't have any direct test of the noextent
> filter, but only indirect coverage through other tests).  Also, are
> there any other extentmode=MODE values that might make sense?
> ---
>  filters/noextents/nbdkit-noextents-filter.pod | 32 ++--
>  filters/noextents/noextents.c | 50 ++-
>  2 files changed, 75 insertions(+), 7 deletions(-)
> 
> diff --git a/filters/noextents/nbdkit-noextents-filter.pod 
> b/filters/noextents/nbdkit-noextents-filter.pod
> index 891b197d..aac2f097 100644
> --- a/filters/noextents/nbdkit-noextents-filter.pod
> +++ b/filters/noextents/nbdkit-noextents-filter.pod
> @@ -4,7 +4,7 @@ nbdkit-noextents-filter - disable extents in the underlying 
> plugin
> 
>  =head1 SYNOPSIS
> 
> - nbdkit --filter=noextents plugin
> + nbdkit --filter=noextents plugin [plugin-args...] [extentmode=MODE]
> 
>  =head1 DESCRIPTION
> 
> @@ -23,9 +23,31 @@ performance (C is known to be one such system).
> 
>  =head1 PARAMETERS
> 
> -There are no parameters specific to nbdkit-noextents-filter.  Any
> -parameters are passed through to and processed by the underlying
> -plugin in the normal way.
> +The parameter C is optional, and controls which mode the
> +filter will use.
> +
> +=over 4
> +
> +=item B
> +
> +Extent support is not advertised to the client; clients should not
> +query for extent information, and must assume the entire disk is
> +allocated.
> +
> +This is the default if the C parameter is not specified.
> +
> +=item B
> +
> +(nbdkit E 1.30)
> +
> +Extent support is advertised, but extent requests from the client will
> +be answered with a claim that the entire disk forms a single allocated
> +data extent.
> +
> +=back
> +
> +All other parameters are passed through to and processed by the
> +underlying plugin in the normal way.

Is this necessary to spell out? (Looked at a random other filter's
documentation, and didn't see such a statement, so I think it's the
default.)

(The same question applies to "plugin-args" in the synopsys, more or
less...)

> 
>  =head1 FILES
> 
> @@ -61,4 +83,4 @@ Richard W.M. Jones
> 
>  =head1 COPYRIGHT
> 
> -Copyright (C) 2019 Red Hat Inc.
> +Copyright (C) 2019-2022 Red Hat Inc.
> diff --git a/filters/noextents/noextents.c b/filters/noextents/noextents.c
> index f3044809..36231a35 100644
> --- a/filters/noextents/noextents.c
> +++ b/filters/noextents/noextents.c
> @@ -1,5 +1,5 @@
>  /* nbdkit
> - * Copyright (C) 2019 Red Hat Inc.
> + * Copyright (C) 2019-2022 Red Hat Inc.
>   *
>   * Redistribution and use in source and binary forms, with or without
>   * modification, are permitted provided that the following conditions are
> @@ -32,19 +32,65 @@
> 
>  #include 
> 
> +#include 
> +#include 
> +
>  #include 
> 
> +static enum ExtentMode {
> +  MASK,
> +  DATA,
> +} extentmode;
> +
> +static int
> +noextents_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
> +  const char *key, const char *value)
> +{
> +  if (strcmp (key, "extentmode") == 0) {
> +if (strcmp (value, "mask") == 0)
> +  extentmode = MASK;
> +else if (strcmp (value, "data") == 0)
> +  extentmode = DATA;
> +else {
> +  nbdkit_error ("unknown extentmode '%s'", value);
> +  return -1;
> +}
> +return 0;
> +  }
> +
> +  return next (nxdata, key, value);
> +}
> +
> +#define noextents_config_help \
> +  "extentmode=One of 'mask' (default), 'data'.\n"
> +
> +/* Advertise desired extents support. */
>  static int
>  noextents_can_extents (nbdkit_next *next,
> void *handle)
>  {
> -  return 0;
> +  return extentmode == DATA;
> +}
> +
> +/* Produce single data extent. */
> +static int
> +noextents_extents (nbdkit_next *next,
> +   void *handle, uint32_t count, uint64_t offset,
> +   uint32_t flags,
> +   struct nbdkit_extents *ret_extents,
> +   int *err)
> +{
> +  assert (extentmode == DATA);
> +  return nbdkit_add_extent (ret_extents, offset, count, 0);
>  }

I don't have the context in which this function may be called, in my
head, but this implementation seems to reflect precisely the requested
offset range back at the client, so a question arises:

- What if the client requests an offset range that (at least partially)
exceeds the size of the file? In that case, I think we should not report
that range as existent. For example, the chunked block status query in
virt-v2v asks for 2GB offset ranges, and it's expected that the returned
extent list will not exceed the file size.

(I understand that a server is permitted to cover a larger offset range
than requested 

Re: [Libguestfs] [libnbd PATCH v2 1/5] python: tests: Fix error handling

2022-02-08 Thread Laszlo Ersek
On 02/07/22 02:03, Nir Soffer wrote:
> On Fri, Feb 4, 2022 at 10:37 AM Laszlo Ersek  wrote:
>>
>> On 02/03/22 23:49, Nir Soffer wrote:
>>> On Thu, Feb 3, 2022 at 10:26 PM Eric Blake  wrote:

 Like a lot of the C examples, the aio copy test ignores read and write
 errors in the completion callback, which can cause silent data
 corruption. The failure in the test is not critical, but this is a bad
 example that may be copied by developers to a real application.

 The test dies with an assertion failure now if completion callback
 fails.  Tested with the temporary patch of:

 | diff --git i/python/t/590-aio-copy.py w/python/t/590-aio-copy.py
 | index 861fa6c8..4cd64d83 100644
 | --- i/python/t/590-aio-copy.py
 | +++ w/python/t/590-aio-copy.py
 | @@ -117,7 +117,8 @@ src.set_handle_name("src")
 |  dst = nbd.NBD()
 |  dst.set_handle_name("dst")
 |  src.connect_command(["nbdkit", "-s", "--exit-with-parent", "-r",
 | - "pattern", "size=%d" % disk_size])
 | + "--filter=error", "pattern", "error-pread-rate=1",
 | + "size=%d" % disk_size])
 |  dst.connect_command(["nbdkit", "-s", "--exit-with-parent",
 |   "memory", "size=%d" % disk_size])
 |  asynch_copy(src, dst)
 ---
  python/t/590-aio-copy.py | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/python/t/590-aio-copy.py b/python/t/590-aio-copy.py
 index 6cde5934..861fa6c8 100644
 --- a/python/t/590-aio-copy.py
 +++ b/python/t/590-aio-copy.py
 @@ -1,5 +1,5 @@
  # libnbd Python bindings
 -# Copyright (C) 2010-2019 Red Hat Inc.
 +# Copyright (C) 2010-2022 Red Hat Inc.
  #
  # This program is free software; you can redistribute it and/or modify
  # it under the terms of the GNU General Public License as published by
 @@ -36,6 +36,7 @@ def asynch_copy(src, dst):
  # This callback is called when any pread from the source
  # has completed.
  def read_completed(buf, offset, error):
 +assert not error
>>>
>>> This works for the test, since the test is not compiled
>>> to pyc file, which removes the asserts (like C -DNODBUG)
>>> by default when building rpms.
>>>
>>> If someone will copy this to a real application they will have no
>>> error checking.
>>
>> I consider this a catastrophic bug in python, to be honest. Eliminating
>> assertions should never be done without an explicit request from whoever
>> builds the package.
> 
> I checked this and asserts are not removed automatically.
> 
> They are removed only when using the -O or -OO options:
> 
> $ python -O -c 'assert 0; print("assert was removed")'
> assert was removed
> 
> Or:
> 
> $ PYTHONOPTIMIZE=1 python -c 'assert 0; print("assert was removed")'
> assert was removed
> 
> Or when compiling modules, if you use the -o1 argument:
> 
> $ cat test.py
> assert 0
> print("assert was removed")
> 
> $ python -m compileall -o1 test.py
> Compiling 'test.py'...
> 
> $ python __pycache__/test.cpython-310.opt-1.pyc
> assert was removed
> 
> So this is similar to -DNODEBUG, but unlike a compiled program, asserts
> can be removed at runtime without your control.

Thank you for explaining!
Laszlo

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [nbdkit PATCH] RFC: noextents: Add extentmode=data config parameter

2022-02-08 Thread Richard W.M. Jones
On Fri, Feb 04, 2022 at 10:44:38AM -0600, Eric Blake wrote:
> While writing tests for an nbdcopy bug, I found myself wanting a way
> to easily view an entire image as data, but without disabling extents
> support altogether.  The existing extentlist filter can do this, but
> requires a secondary file.
> 
> Still an RFC because I need testsuite coverage similar to
> test-nozero.sh (eek - we don't have any direct test of the noextent
> filter, but only indirect coverage through other tests).  Also, are
> there any other extentmode=MODE values that might make sense?

I probably would have added another filter, but this way makes sense too.

Rich.

> ---
>  filters/noextents/nbdkit-noextents-filter.pod | 32 ++--
>  filters/noextents/noextents.c | 50 ++-
>  2 files changed, 75 insertions(+), 7 deletions(-)
> 
> diff --git a/filters/noextents/nbdkit-noextents-filter.pod 
> b/filters/noextents/nbdkit-noextents-filter.pod
> index 891b197d..aac2f097 100644
> --- a/filters/noextents/nbdkit-noextents-filter.pod
> +++ b/filters/noextents/nbdkit-noextents-filter.pod
> @@ -4,7 +4,7 @@ nbdkit-noextents-filter - disable extents in the underlying 
> plugin
> 
>  =head1 SYNOPSIS
> 
> - nbdkit --filter=noextents plugin
> + nbdkit --filter=noextents plugin [plugin-args...] [extentmode=MODE]
> 
>  =head1 DESCRIPTION
> 
> @@ -23,9 +23,31 @@ performance (C is known to be one such system).
> 
>  =head1 PARAMETERS
> 
> -There are no parameters specific to nbdkit-noextents-filter.  Any
> -parameters are passed through to and processed by the underlying
> -plugin in the normal way.
> +The parameter C is optional, and controls which mode the
> +filter will use.
> +
> +=over 4
> +
> +=item B
> +
> +Extent support is not advertised to the client; clients should not
> +query for extent information, and must assume the entire disk is
> +allocated.
> +
> +This is the default if the C parameter is not specified.
> +
> +=item B
> +
> +(nbdkit E 1.30)
> +
> +Extent support is advertised, but extent requests from the client will
> +be answered with a claim that the entire disk forms a single allocated
> +data extent.
> +
> +=back
> +
> +All other parameters are passed through to and processed by the
> +underlying plugin in the normal way.
> 
>  =head1 FILES
> 
> @@ -61,4 +83,4 @@ Richard W.M. Jones
> 
>  =head1 COPYRIGHT
> 
> -Copyright (C) 2019 Red Hat Inc.
> +Copyright (C) 2019-2022 Red Hat Inc.
> diff --git a/filters/noextents/noextents.c b/filters/noextents/noextents.c
> index f3044809..36231a35 100644
> --- a/filters/noextents/noextents.c
> +++ b/filters/noextents/noextents.c
> @@ -1,5 +1,5 @@
>  /* nbdkit
> - * Copyright (C) 2019 Red Hat Inc.
> + * Copyright (C) 2019-2022 Red Hat Inc.
>   *
>   * Redistribution and use in source and binary forms, with or without
>   * modification, are permitted provided that the following conditions are
> @@ -32,19 +32,65 @@
> 
>  #include 
> 
> +#include 
> +#include 
> +
>  #include 
> 
> +static enum ExtentMode {
> +  MASK,
> +  DATA,
> +} extentmode;
> +
> +static int
> +noextents_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
> +  const char *key, const char *value)
> +{
> +  if (strcmp (key, "extentmode") == 0) {
> +if (strcmp (value, "mask") == 0)
> +  extentmode = MASK;
> +else if (strcmp (value, "data") == 0)
> +  extentmode = DATA;
> +else {
> +  nbdkit_error ("unknown extentmode '%s'", value);
> +  return -1;
> +}
> +return 0;
> +  }
> +
> +  return next (nxdata, key, value);
> +}
> +
> +#define noextents_config_help \
> +  "extentmode=One of 'mask' (default), 'data'.\n"
> +
> +/* Advertise desired extents support. */
>  static int
>  noextents_can_extents (nbdkit_next *next,
> void *handle)
>  {
> -  return 0;
> +  return extentmode == DATA;
> +}
> +
> +/* Produce single data extent. */
> +static int
> +noextents_extents (nbdkit_next *next,
> +   void *handle, uint32_t count, uint64_t offset,
> +   uint32_t flags,
> +   struct nbdkit_extents *ret_extents,
> +   int *err)
> +{
> +  assert (extentmode == DATA);
> +  return nbdkit_add_extent (ret_extents, offset, count, 0);
>  }
> 
>  static struct nbdkit_filter filter = {
>.name  = "noextents",
>.longname  = "nbdkit noextents filter",
> +  .config= noextents_config,
> +  .config_help   = noextents_config_help,
>.can_extents   = noextents_can_extents,
> +  .extents   = noextents_extents,
>  };
> 
>  NBDKIT_REGISTER_FILTER(filter)
> -- 
> 2.34.1
> 
> ___
> Libguestfs mailing list
> Libguestfs@redhat.com
> https://listman.redhat.com/mailman/listinfo/libguestfs

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests 

Re: [Libguestfs] [PATCH libnbd] golang: make-dist.sh: Generate the list file

2022-02-08 Thread Richard W.M. Jones
On Mon, Feb 07, 2022 at 01:28:30AM +0200, Nir Soffer wrote:
> On Sun, Feb 6, 2022 at 9:13 PM Richard W.M. Jones  wrote:
> >
> > On Sun, Feb 06, 2022 at 07:21:02PM +0200, Nir Soffer wrote:
> > > Generated the list file when creating the distribution. Since the Go
> > > tool treat the list file on the proxy server as the source of truth, we
> > > do the same. The new list file is created by downloading the current
> > > list file, sorting it, and appending the current version.
> > >
> > > Creating a distribution tarball requires now access to
> > > download.libguestfs.org.
> > >
> > > With this change the distribution tarball can be extract on the server
> > > without any additional manual process.
> > >
> > > Signed-off-by: Nir Soffer 
> > > ---
> > >  golang/make-dist.sh | 16 +---
> > >  1 file changed, 5 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/golang/make-dist.sh b/golang/make-dist.sh
> > > index a590e6c6..5fe006ff 100755
> > > --- a/golang/make-dist.sh
> > > +++ b/golang/make-dist.sh
> > > @@ -86,48 +86,42 @@ rm -rf libguestfs.org
> > >  #
> > >  # libguestfs.org
> > >  # └── libnbd
> > >  #├── @latest
> > >  #└── @v
> > >  #├── list
> > >  #├── v1.11.4.info
> > >  #├── v1.11.4.mod
> > >  #└── v1.11.4.zip
> > >  #
> > > -# We create @latest and @v/*{.info,mod,zip} here.
> > > -#
> > > -# The "@v/list" file must be created on the web server after uploading
> > > -# a new release:
> > > -#
> > > -# $ cd libguestfs.org/libnbd/@v
> > > -# $ ls -1 v*.info | awk -F.info '{print $1}' > list
> > > -# $ cat list
> > > -# v1.11.3
> > > -# v1.11.4
> > > -#
> > >  # See https://golang.org/ref/mod#serving-from-proxy
> > >
> > >  module_dir=libguestfs.org/libnbd
> > >  v_dir=$module_dir/@v
> > >
> > >  mkdir -p $v_dir
> > >
> > >  # Go wants a string in RFC 3339 format, git strict ISO 8601 format is
> > >  # compatible.
> > >  info="{
> > >\"Version\": \"$version\",
> > >\"Time\": \"$(git show -s --format=%cI)\"
> > >  }"
> > >  echo "$info" > $module_dir/@latest
> > >  echo "$info" > $v_dir/$version.info
> > >
> > >  cp go.mod $v_dir/$version.mod
> > >  mv $version.zip $v_dir
> > >
> > > +# Create the list file by amending the curent file on the server.
> > > +list_url=https://download.libguestfs.org/libnbd/golang/libguestfs.org/libnbd/@v/list
> > > +curl --silent --show-error "$list_url" | sort > $v_dir/list
> > > +grep -q "$version" $v_dir/list || echo "$version" >> $v_dir/list
> > > +
> > >  # Create tarball to upload and extract on the webserver. It should be
> > >  # extracted in the directory pointed by the "go-import" meta tag.
> > >  output=$PWD/libnbd-golang-$version.tar.gz
> > >  tar czf $output libguestfs.org
> > >
> > >  rm -rf libguestfs.org
> > >
> > >  echo output written to $output
> >
> > Yes this seems a reasonable approach.
> >
> > ACK
> 
> Pushed as 3895a43f3e9f00f1d612d619caf62adb7ace2772
> 
> This should be tested with the actual server, so make sure it
> will work for the next release.
> 
> Can you do a build from current git and upload it to the server?

When I do the next 1.11 release I'll test it, won't be long.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

[Libguestfs] LIBNBD SECURITY: Silent image corruption with nbdcopy

2022-02-08 Thread Eric Blake
We have discovered a security flaw with potential moderate impact in
libnbd.

Lifecycle
-

Reported: 2022-01-24  Fixed: 2022-02-04  Published: 2022-01-27

This has been assigned CVE-2022-0485.

Credit
--

Reported by Nir Soffer  and patched by Eric Blake


Description
---

libnbd is a Network Block Device (NBD) client library.  nbdcopy is an
application shipped as part of the libnbd project designed to copy
files (typically disk images) where at least one of the source or
destination locations is an NBD server.

A flaw in the nbdcopy program silently ignored any asynchronous read
or write errors reported by an NBD server.  With no error messages and
no change to the exit status, a client of the nbdcopy application has
no indication that the corruption occurred, where the loss of data
integrity may have further implications in a denial of service attack
against disk images.  What's more, in some NBD client/server
configurations, the data that nbdcopy writes to the destination image
could come from uninitialized heap data in the nbdcopy process, where
the leaked information may aid in mounting a more sophisticated attack
against a machine using a long-running nbdcopy.

If TLS is not in use, a meddler-in-the-middler attacker can easily
corrupt the destination image by faking error replies to NBD read and
write commands.  But even when TLS is in use, the nature of the flaw
means that nbdcopy cannot detect legitimate error reports from the
server.  Furthermore, even though nbdcopy attempts to detect abrupt
disconnection of a TLS connection, the nature of asynchronous commands
means that we were unable to rule out whether a well-timed MitM
attacker could cause undetected corruption near the end of a copy
process by aborting a TLS stream at the right moment, even without
having access to the content of the TLS stream.

Test if libnbd is vulnerable


With nbdkit 1.14 or later installed, the following serves as an easy
test of the vulnerability:

$ nbdkit -U - --filter=error pattern size=1M error-pread-rate=1 \
  --run 'nbdcopy $uri null:' && echo vulnerable

The error messages printed from nbdkit demonstrate the failed NBD
transactions.  Where a vulnerable version of nbdcopy ignores those
errors and results in 'vulnerable' being printed to stdout, a fixed
version will diagnose a read failure to stderr.

Workarounds
---

Use of 'nbdcopy --synchronous' will avoid undected data corruption,
but comes at a potential performance cost by avoiding the speed
benefits of asynchronous operations.  If your version of nbdcopy lacks
the '--synchronous' command-line option (all 1.4.x releases), it is
not vulnerable.

Connecting a vulnerable version of nbdcopy to an NBD server that
supports structured replies by default (such as qemu-nbd 2.11 or
nbdkit 1.12 and newer) does not eliminate the risk of data corruption
on an error, but is sufficient to guarantee that any corruptions will
either read as all zeroes or else as prior contents of the destination
image.  This is safer than connecting nbdcopy to an NBD server that
lacks structured replies (such as nbd-server 3.23, current as of this
notice), where the data corruption could also leak heap contents from
the nbdcopy process.

Although the use of TLS does not avoid the bug, it is still
recommended to use TLS when utilizing nbdcopy to copy files across
machines, so that the NBD server can be trusted to not be malicious.

It is recommended to apply the fix or upgrade to a fixed version.

Fixes
-

The flaw was introduced in libnbd 1.5.6 (commit bc896eec4d), when
nbdcopy gained the ability to default to using asynchronous NBD reads
and writes.  A fix for the missing error detection is available for
all affected stable branches, and the current development branch.

However, note that the stable-1.6 branch of libnbd has other known
data corruption bugs when communicating with an NBD server that
supports trim operations, which were repaired for 1.8 but not
backported to any 1.6.x release.  While the trim bug did not amount to
a CVE, if your reason for upgrading libnbd is to get a version of
nbdcopy that avoids known data corruption issues, it is recommended
that you use a newer release.

* development branch (1.11)

https://gitlab.com/nbdkit/libnbd/-/commit/8d444b41d09a700c7ee6f9182a649f3f2d325abb
  or use libnbd >= 1.11.8 from
  http://download.libguestfs.org/libnbd/1.7-development/

* stable branch 1.10

https://gitlab.com/nbdkit/libnbd/-/commit/9219d2e70c770d8efb98d6e8eaf68e8e354631e3
  or use libnbd >= 1.10.4 from
  http://download.libguestfs.org/libnbd/1.10-stable/

* stable branch 1.8

https://gitlab.com/nbdkit/libnbd/-/commit/7ed8072a90922372dd1ba32bddd5526e2d0a42de
  or use libnbd >= 1.8.7 from
  http://download.libguestfs.org/libnbd/1.8-stable/

* stable branch 1.6

https://gitlab.com/nbdkit/libnbd/-/commit/6c8f2f859926b82094fb5e85c446ea099700fa10
  or use libnbd >= 1.6.6 from
  

Re: [Libguestfs] [PATCH libnbd] golang: make-dist.sh: Generate the list file

2022-02-08 Thread Eric Blake
On Sun, Feb 06, 2022 at 07:21:02PM +0200, Nir Soffer wrote:
> Generated the list file when creating the distribution. Since the Go
> tool treat the list file on the proxy server as the source of truth, we
> do the same. The new list file is created by downloading the current
> list file, sorting it, and appending the current version.
> 
> Creating a distribution tarball requires now access to
> download.libguestfs.org.
> 
> With this change the distribution tarball can be extract on the server
> without any additional manual process.
> 
> Signed-off-by: Nir Soffer 
> ---

>  
> +# Create the list file by amending the curent file on the server.
> +list_url=https://download.libguestfs.org/libnbd/golang/libguestfs.org/libnbd/@v/list
> +curl --silent --show-error "$list_url" | sort > $v_dir/list

Do we want to use 'LC_ALL=C sort' for deterministic sorting, rather
than facing differences when different maintainers use locales with
slightly different collation rules?

Because the curl command is piped to sort, we don't exit the script
with non-zero status if the curl step fails.  Is that problematic?

> +grep -q "$version" $v_dir/list || echo "$version" >> $v_dir/list
> +
>  # Create tarball to upload and extract on the webserver. It should be
>  # extracted in the directory pointed by the "go-import" meta tag.
>  output=$PWD/libnbd-golang-$version.tar.gz
>  tar czf $output libguestfs.org
>  
>  rm -rf libguestfs.org
>  
>  echo output written to $output
> -- 
> 2.34.1
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd] golang: make-dist.sh: Generate the list file

2022-02-08 Thread Nir Soffer
On Mon, Feb 7, 2022 at 3:13 PM Eric Blake  wrote:
>
> On Sun, Feb 06, 2022 at 07:21:02PM +0200, Nir Soffer wrote:
> > Generated the list file when creating the distribution. Since the Go
> > tool treat the list file on the proxy server as the source of truth, we
> > do the same. The new list file is created by downloading the current
> > list file, sorting it, and appending the current version.
> >
> > Creating a distribution tarball requires now access to
> > download.libguestfs.org.
> >
> > With this change the distribution tarball can be extract on the server
> > without any additional manual process.
> >
> > Signed-off-by: Nir Soffer 
> > ---
>
> >
> > +# Create the list file by amending the curent file on the server.
> > +list_url=https://download.libguestfs.org/libnbd/golang/libguestfs.org/libnbd/@v/list
> > +curl --silent --show-error "$list_url" | sort > $v_dir/list
>
> Do we want to use 'LC_ALL=C sort' for deterministic sorting, rather
> than facing differences when different maintainers use locales with
> slightly different collation rules?

Yes, sounds good.

> Because the curl command is piped to sort, we don't exit the script
> with non-zero status if the curl step fails.  Is that problematic?

This will create a corrupted list file - need to fix.

Thanks!

> > +grep -q "$version" $v_dir/list || echo "$version" >> $v_dir/list
> > +
> >  # Create tarball to upload and extract on the webserver. It should be
> >  # extracted in the directory pointed by the "go-import" meta tag.
> >  output=$PWD/libnbd-golang-$version.tar.gz
> >  tar czf $output libguestfs.org
> >
> >  rm -rf libguestfs.org
> >
> >  echo output written to $output
> > --
> > 2.34.1
> >
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 7/9] golang: aio_buffer.go: Speed up FromBytes()

2022-02-08 Thread Eric Blake
On Sun, Jan 30, 2022 at 01:33:35AM +0200, Nir Soffer wrote:
> Using Slice() we can use builtin copy() instead of a manual loop, which
> is 4.6 times faster with a typical 256k buffer:

Again, the benefits of the underlying libraries being able to
vectorize over larger chunks of memory ;)

> 
> Before:
> BenchmarkFromBytes-12 9806111474 ns/op
> 
> After:
> BenchmarkFromBytes-1248193 24106 ns/op
> 
> Signed-off-by: Nir Soffer 
> ---
>  golang/aio_buffer.go | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
>

ACK.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 6/9] golang: tests: Use AioBuffer.Slice()

2022-02-08 Thread Eric Blake
On Sun, Jan 30, 2022 at 01:33:34AM +0200, Nir Soffer wrote:
> Slice() is easier to use and faster than Get() or Bytes(). Lets use the

Let's

> new way.
> 
> Signed-off-by: Nir Soffer 
> ---
>  golang/libnbd_500_aio_pread_test.go  | 2 +-
>  golang/libnbd_510_aio_pwrite_test.go | 8 +---
>  golang/libnbd_620_aio_buffer_test.go | 8 +---
>  3 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/golang/libnbd_500_aio_pread_test.go 
> b/golang/libnbd_500_aio_pread_test.go
> index 0811378c..bd0208ef 100644
> --- a/golang/libnbd_500_aio_pread_test.go

ACK.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 4/9] golang: aio_buffer.go: Add MakeAioBufferZero()

2022-02-08 Thread Eric Blake
On Sun, Jan 30, 2022 at 01:33:32AM +0200, Nir Soffer wrote:
> Make it easy to create a zeroed buffer via calloc(), preventing leaking
> sensitive info from the heap.
> 
> Benchmarking show that creating a zeroed buffer is much slower compared

shows

> with uninitialized buffer, but much faster compared with manually
> initializing the buffer with a loop.
> 
> BenchmarkMakeAioBuffer-12  7252674   148.1 ns/op
> BenchmarkMakeAioBufferZero-12   262107  4181 ns/op
> BenchmarkAioBufferZero-1217581 68759 ns/op
> 
> It is interesting that creating a zeroed buffer is 3 times faster
> compared with making a new []byte slice:
> 
> BenchmarkMakeAioBufferZero-12   247710  4440 ns/op
> BenchmarkMakeByteSlice-1284117 13733 ns/op

Some of this is due to how much vectorization the standard library
(whether libc or Go's core libraries) can do when bulk-zeroing
(zeroing 64 bits, or even a cache line at a time, in an unrolled loop,
is always going to be more performant than a naive loop of one byte at
a time).

> 
> Signed-off-by: Nir Soffer 
> ---
>  golang/aio_buffer.go |  6 ++
>  golang/libnbd_620_aio_buffer_test.go | 16 

Another file that may fit better in the 0xx naming, especially if we
decide to duplicate similar functionality into the python or OCaml
bindings of being able to pre-generate a known-zero buffer for use in
passing to nbd_pread.

As a helper API, this seems useful.  But do we need any man page
documentation of a language-specific helper function?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 2/9] golang: aio_buffer.go: Make it safer to use

2022-02-08 Thread Eric Blake
On Sun, Jan 30, 2022 at 01:33:30AM +0200, Nir Soffer wrote:
> If a Go program tries to use AioBuffer after calling AioBuffer.Free(),
> the program may silently corrupt data, accessing memory that does not
> belong to the buffer any more, or segfault if the address is not mapped.
> In the worst case, it can corrupt memory silently. Calling Free() twice
> may silently free unrelated memory.
> 
> Make the buffer safer to use by Freeing only on the first call and
> setting the pointer to nil. This makes multiple calls to Free()
> harmless, just like the underlying C.free().
> 
> Trying to access Bytes() and Get() after calling Free() will always
> panic now, revealing the bug in the program.
> 
> Trying to use AioBuffer with libnbd API will likely segfault and panic.
> I did not try to test this.
> 
> Signed-off-by: Nir Soffer 
> ---
>  golang/aio_buffer.go |  5 +++-
>  golang/libnbd_620_aio_buffer_test.go | 41 
>  2 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/golang/aio_buffer.go b/golang/aio_buffer.go
> index 2bc69a01..2b77d6ee 100644
> --- a/golang/aio_buffer.go
> +++ b/golang/aio_buffer.go
> @@ -46,20 +46,23 @@ func MakeAioBuffer(size uint) AioBuffer {
>  func FromBytes(buf []byte) AioBuffer {
>   size := len(buf)
>   ret := MakeAioBuffer(uint(size))
>   for i := 0; i < len(buf); i++ {
>   *ret.Get(uint(i)) = buf[i]
>   }
>   return ret
>  }
>  
>  func (b *AioBuffer) Free() {
> - C.free(b.P)
> + if b.P != nil {
> + C.free(b.P)
> + b.P = nil
> + }

Good.

> +++ b/golang/libnbd_620_aio_buffer_test.go
> @@ -53,20 +53,61 @@ func TestAioBuffer(t *testing.T) {

See patch 1 comments about the file name.  Otherwise looks good.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [nbdkit PATCH] RFC: noextents: Add extentmode=data config parameter

2022-02-08 Thread Eric Blake
On Tue, Feb 08, 2022 at 12:39:02PM +0100, Laszlo Ersek wrote:
> On 02/04/22 17:44, Eric Blake wrote:
> > While writing tests for an nbdcopy bug, I found myself wanting a way
> > to easily view an entire image as data, but without disabling extents
> > support altogether.  The existing extentlist filter can do this, but
> > requires a secondary file.
> > 
> > Still an RFC because I need testsuite coverage similar to
> > test-nozero.sh (eek - we don't have any direct test of the noextent
> > filter, but only indirect coverage through other tests).  Also, are
> > there any other extentmode=MODE values that might make sense?
> > ---

> > +=item B
> > +
> > +Extent support is not advertised to the client; clients should not
> > +query for extent information, and must assume the entire disk is
> > +allocated.
> > +
> > +This is the default if the C parameter is not specified.
> > +
> > +=item B
> > +
> > +(nbdkit E 1.30)
> > +
> > +Extent support is advertised, but extent requests from the client will
> > +be answered with a claim that the entire disk forms a single allocated
> > +data extent.
> > +
> > +=back
> > +
> > +All other parameters are passed through to and processed by the
> > +underlying plugin in the normal way.
> 
> Is this necessary to spell out? (Looked at a random other filter's
> documentation, and didn't see such a statement, so I think it's the
> default.)
> 
> (The same question applies to "plugin-args" in the synopsys, more or
> less...)

Hmm, we aren't always consistent, but I agree that it can be pruned
without loss.

> > +/* Produce single data extent. */
> > +static int
> > +noextents_extents (nbdkit_next *next,
> > +   void *handle, uint32_t count, uint64_t offset,
> > +   uint32_t flags,
> > +   struct nbdkit_extents *ret_extents,
> > +   int *err)
> > +{
> > +  assert (extentmode == DATA);
> > +  return nbdkit_add_extent (ret_extents, offset, count, 0);
> >  }
> 
> I don't have the context in which this function may be called, in my
> head, but this implementation seems to reflect precisely the requested
> offset range back at the client, so a question arises:
> 
> - What if the client requests an offset range that (at least partially)
> exceeds the size of the file? In that case, I think we should not report
> that range as existent. For example, the chunked block status query in
> virt-v2v asks for 2GB offset ranges, and it's expected that the returned
> extent list will not exceed the file size.

The code in server/ guarantees that we cannot call into a filter or
plugin with an extents request that would read out of bounds; ie. on
input, offset+count will never exceed what next->get_size() would tell
us anyways.  Conversely, nbdkit_add_extent() already permits us to
pass in redundant information prior to offset (as long as we make
forward progress by eventually using nbdkit_add_extent for at least
one byte at offset before returning), as well as to provide more
information than needed (the upper layer can set a clamp, such as when
FLAG_REQ_ONE is in use by the client, or at the 32-bit boundary
condition, where our additions beyond that clamp are merely ignored).
So we could just as easily write

  return nbdkit_add_extent (ret_extents, 0, INT64_MAX, 0);

which will be trimmed to size as needed before going over the wire
back to the client.  But now that I've written that, I can see that
there is indeed a useful benchmarking distinction between using
offset/count vs. 0/INT_MAX - the former behaves as if the client had
always passed in FLAG_REQ_ONE (we can never progress faster than the
client wants us to go), while the latter gives as much information to
the client as possible (subject to 32-bit limits, until my 64-bit
block status work is revived).  So instead of just one new mode named
'data', maybe we want a larger family of possible modes:

- mask (default; existing behavior of masking extent support so client
  can't query them at all); client must assume disk is data
- data-match (reply that the extent is data, but match the size of the
  client request; behaves as if the client unconditionally uses
  FLAG_REQ_ONE); client sees disk as data
- data-full (reply that the entire remainder of the extent was data,
  although the client's use of FLAG_REQ_ONE clamps it back to the
  query size); client sees disk as data
- req-one (pass the query on through to the plugin, but then
  purposefully trim it back before replying to the client so that it
  behaves as if the client unconditionally uses FLAG_REQ_ONE); client
  can see holes, but cannot read ahead
- plugin (pass the query on through to the plugin, with no
  alterations to the plugin's result - normalizes the impact of having
  the filter in the stack when comparing to other modes); client can
  see holes, and can benefit from reading ahead

We can also bike-shed on better names for those modes (maybe
'disabled' instead of 'mask'; or 'passthrough' instead of 

Re: [Libguestfs] [PATCH libnbd 1/9] golang: tests: Add test for AioBuffer

2022-02-08 Thread Eric Blake
On Sun, Jan 30, 2022 at 01:33:29AM +0200, Nir Soffer wrote:
> Add unit tests and benchmarks for AioBuffer. The tests are trivial but
> they server as running documentation, and they point out important
> details about the type.
> 
> The benchmarks how efficient is allocation a new buffer, zeroing it, and
> interfacing with Go code.

Wording suggestion:

The benchmarks show the efficiency of allocating a new buffer, zeroing
it, and interfacing with Go code.

> 
> This tests will also ensure that we don't break anything by the next

Either "These tests" or "This test"

> changes.
> 
> To run the benchmarks use:
> 
> $ go test -run=xxx -bench=.
> goos: linux
> goarch: amd64
> pkg: libguestfs.org/libnbd
> cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
> BenchmarkMakeAioBuffer-12  6871759   157.2 ns/op
> BenchmarkAioBufferZero-1217551 69552 ns/op
> BenchmarkFromBytes-12 9632139112 ns/op
> BenchmarkAioBufferBytes-12   69375 16410 ns/op
> PASS
> oklibguestfs.org/libnbd   5.843s
> 
> Signed-off-by: Nir Soffer 
> ---
>  golang/Makefile.am   |   1 +
>  golang/libnbd_620_aio_buffer_test.go | 104 +++
>  2 files changed, 105 insertions(+)
>  create mode 100644 golang/libnbd_620_aio_buffer_test.go
> 
> diff --git a/golang/Makefile.am b/golang/Makefile.am
> index 10fb8934..ae0486dd 100644
> --- a/golang/Makefile.am
> +++ b/golang/Makefile.am
> @@ -37,20 +37,21 @@ source_files = \
>   libnbd_300_get_size_test.go \
>   libnbd_400_pread_test.go \
>   libnbd_405_pread_structured_test.go \
>   libnbd_410_pwrite_test.go \
>   libnbd_460_block_status_test.go \
>   libnbd_500_aio_pread_test.go \
>   libnbd_510_aio_pwrite_test.go \
>   libnbd_590_aio_copy_test.go \
>   libnbd_600_debug_callback_test.go \
>   libnbd_610_error_test.go \
> + libnbd_620_aio_buffer_test.go \

As discussed in a different thread, the numbering here groups
somewhat-related functionality, and helps us keep cross-language tests
correlated over testing the same features.  Since you aren't adding
counterpart tests to python or ocaml, I don't know what number would
be best.  But our existing numbering is more along the lines of 0xx
for language-level loading, 1xx for NBD handle tests, 2xx for
connection tests, 3xx for initial APIs after connecting, 4xx for
synchronous APIs, 5xx for asynch APIs, and 6xx for high-level usage
patterns.  This feels like it might fit better in the 0xx series,
since the benchmark does not use any NBD handle.

>   $(NULL)
>  
>  generator_built = \
>   bindings.go \
>   closures.go \
>   wrappers.go \
>   wrappers.h \
>   $(NULL)
>  
>  EXTRA_DIST = \
> diff --git a/golang/libnbd_620_aio_buffer_test.go 
> b/golang/libnbd_620_aio_buffer_test.go
> new file mode 100644
> index ..2632f87f
> --- /dev/null
> +++ b/golang/libnbd_620_aio_buffer_test.go
> @@ -0,0 +1,104 @@
> +/* libnbd golang tests
> + * Copyright (C) 2013-2021 Red Hat Inc.

You may want to add 2022.

Take the rest of my review with a grain of salt; I'm not (yet?) a Go expert.

> +
> +package libnbd
> +
> +import (
> + "bytes"
> + "testing"
> +)
> +
> +func TestAioBuffer(t *testing.T) {
> + /* Create a buffer with unitinialized backing array. */

uninitialized

> + buf := MakeAioBuffer(uint(32))
> + defer buf.Free()
> +
> + /* Initialize backing array contents. */
> + for i := uint(0); i < buf.Size; i++ {
> + *buf.Get(i) = 0
> + }
> +
> + /* Create a slice by copying the backing arrary contents into Go 
> memory. */
> + b := buf.Bytes()
> +
> + zeroes := make([]byte, 32)
> + if !bytes.Equal(b, zeroes) {
> + t.Fatalf("Expected %v, got %v", zeroes, buf.Bytes())
> + }
> +
> + /* Modifying returned slice does not modify the buffer. */
> + for i := 0; i < len(b); i++ {
> + b[i] = 42
> + }
> +
> + /* Bytes() still returns zeroes. */
> + if !bytes.Equal(buf.Bytes(), zeroes) {
> + t.Fatalf("Expected %v, got %v", zeroes, buf.Bytes())
> + }
> +
> + /* Create a nother buffer from Go slice. */

another

> + buf2 := FromBytes(zeroes)
> + defer buf2.Free()
> +
> + if !bytes.Equal(buf2.Bytes(), zeroes) {
> + t.Fatalf("Expected %v, got %v", zeroes, buf2.Bytes())
> + }
> +}
> +
> +// Typical buffer size.
> +const bufferSize uint = 256 * 1024
> +
> +// Benchmark creating uninitilized buffer.

an uninitialized

> +func BenchmarkMakeAioBuffer(b *testing.B) {
> + for i := 0; i < b.N; i++ {
> + buf := MakeAioBuffer(bufferSize)
> + buf.Free()
> + }
> +}
> +
> +// Benchmark zeroing a buffer.
> +func BenchmarkAioBufferZero(b *testing.B) {
> + for i := 0; i < b.N; i++ {
> + buf := MakeAioBuffer(bufferSize)
> + for i := uint(0); i < bufferSize; i++ {
> +

Re: [Libguestfs] [PATCH libnbd] generator/Go.ml: Simplify copy_uint32_array

2022-02-08 Thread Eric Blake
On Sun, Feb 06, 2022 at 07:45:20PM +0200, Nir Soffer wrote:
> Create a slice backed up by the entries pointer, and copy the data with
> builtin copy(). This can be 3x times faster but I did not measure it.
> 
> Eric posted a similar patch[1] last year, but the patch seems to be
> stuck with unrelated incomplete work.
> 
> [1] https://listman.redhat.com/archives/libguestfs/2021-December/msg00065.html

Your version looks slightly nicer than mine.  ACK.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH 1/5] output/rhv-upload-plugin: Fix flush and close

2022-02-08 Thread Richard W.M. Jones
On Sat, Dec 18, 2021 at 10:36:29PM +0200, Nir Soffer wrote:
> When locking the http pool, we wait until all connections are idle, and
> take them from the pool. But since we used pool.qsize(), which is the
> number of items currently in the queue, we did not wait for all
> connections.
> 
> This leads to following issues:
> 
> - We send flush request only for some connections, which does not ensure
>   that all uploaded data is flushed to storage.
> 
> - We close only some of the connections in cleanup(). This should not
>   matter since the connections are closed when the plugin process
>   terminates.
> 
> An example import showing sending only one FLUSH request instead of 4:
> https://bugzilla.redhat.com/2032324#c8
> 
> Fixed by creating a bounded queue and using pool.maxsize to get all the
> connections from the pool.
> ---
>  output/rhv-upload-plugin.py | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/output/rhv-upload-plugin.py b/output/rhv-upload-plugin.py
> index 1cb837dd..bad0e8a3 100644
> --- a/output/rhv-upload-plugin.py
> +++ b/output/rhv-upload-plugin.py
> @@ -307,30 +307,30 @@ class UnixHTTPConnection(HTTPConnection):
>  
>  def connect(self):
>  self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
>  if self.timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
>  self.sock.settimeout(timeout)
>  self.sock.connect(self.path)
>  
>  
>  # Connection pool.
>  def create_http_pool(url, options):
> -pool = queue.Queue()
> -
>  count = min(options["max_readers"],
>  options["max_writers"],
>  MAX_CONNECTIONS)
>  
>  nbdkit.debug("creating http pool connections=%d" % count)
>  
>  unix_socket = options["unix_socket"] if is_ovirt_host else None
>  
> +pool = queue.Queue(count)
> +
>  for i in range(count):
>  http = create_http(url, unix_socket=unix_socket)
>  pool.put(http)
>  
>  return pool
>  
>  
>  @contextmanager
>  def http_context(pool):
>  """
> @@ -347,22 +347,22 @@ def http_context(pool):
>  def iter_http_pool(pool):
>  """
>  Wait until all inflight requests are done, and iterate on imageio
>  connections.
>  
>  The pool is empty during iteration. New requests issued during iteration
>  will block until iteration is done.
>  """
>  locked = []
>  
> -# Lock the pool by taking the connection out.
> -while len(locked) < pool.qsize():
> +# Lock the pool by taking all connections out.
> +while len(locked) < pool.maxsize:
>  locked.append(pool.get())
>  
>  try:
>  for http in locked:
>  yield http
>  finally:
>  # Unlock the pool by puting the connection back.
>  for http in locked:
>  pool.put(http)
>  
> @@ -371,21 +371,21 @@ def close_http_pool(pool):
>  """
>  Wait until all inflight requests are done, close all connections and 
> remove
>  them from the pool.
>  
>  No request can be served by the pool after this call.
>  """
>  nbdkit.debug("closing http pool")
>  
>  locked = []
>  
> -while len(locked) < pool.qsize():
> +while len(locked) < pool.maxsize:
>  locked.append(pool.get())
>  
>  for http in locked:
>  http.close()
>  
>  
>  def create_http(url, unix_socket=None):
>  """
>  Create http connection for transfer url.

Obvious bug fix, ACK

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH 5/5] output/rhv-upload-plugin: Keep connections alive

2022-02-08 Thread Richard W.M. Jones
On Sat, Dec 18, 2021 at 10:36:33PM +0200, Nir Soffer wrote:
> When importing from vddk, nbdcopy may be blocked for few minutes(!)
> trying to get extents. While nbdcopy is blocked, imageio server closes
> the idle connections. When we finally get a request from nbdcopy, we
> fail to detect that the connection was closed.
> 
> Detecting a closed connection is hard and racy. In the good case, we get
> a BrokenPipe error. In the bad case, imageio closed the socket right
> after we sent a request, and we get an invalid status line. When using
> imageio proxy, we may get http error (e.g. 500) if the proxy connection
> to imageio server on the host was closed.
> 
> Even worse, when we find that the connection was closed, it is not safe
> to reopen the connection, since qemu-nbd does not ensure yet that data
> written to the previous connection will be flushed when we flush the new
> connection.
> 
> Fix the issue by keeping the connections alive. A pool keeper thread
> sends a flush request on idle connection every ~30 seconds. This also
> improves data integrity and efficiency, using idle time to flush written
> data.
> 
> Fixes https://bugzilla.redhat.com/2032324
> ---
>  output/rhv-upload-plugin.py | 71 +
>  1 file changed, 71 insertions(+)
> 
> diff --git a/output/rhv-upload-plugin.py b/output/rhv-upload-plugin.py
> index 8d088c4e..172da358 100644
> --- a/output/rhv-upload-plugin.py
> +++ b/output/rhv-upload-plugin.py
> @@ -13,50 +13,60 @@
>  # GNU General Public License for more details.
>  #
>  # You should have received a copy of the GNU General Public License along
>  # with this program; if not, write to the Free Software Foundation, Inc.,
>  # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>  
>  import json
>  import queue
>  import socket
>  import ssl
> +import threading
>  import time
>  
>  from contextlib import contextmanager
>  from http.client import HTTPSConnection, HTTPConnection
>  from urllib.parse import urlparse
>  
>  import nbdkit
>  
>  # Using version 2 supporting the buffer protocol for better performance.
>  API_VERSION = 2
>  
>  # Maximum number of connection to imageio server. Based on testing with 
> imageio
>  # client, this give best performance.
>  MAX_CONNECTIONS = 4
>  
> +# Maximum idle time allowed for imageio connections.
> +IDLE_TIMEOUT = 30
> +
>  # Required parameters.
>  size = None
>  url = None
>  
>  # Optional parameters.
>  cafile = None
>  insecure = False
>  is_ovirt_host = False
>  
>  # List of options read from imageio server.
>  options = None
>  
>  # Pool of HTTP connections.
>  pool = None
>  
> +# Set when plugin is cleaning up.
> +done = threading.Event()
> +
> +# Set when periodic flush request fails.
> +pool_error = None
> +
>  
>  # Parse parameters.
>  def config(key, value):
>  global cafile, url, is_ovirt_host, insecure, size
>  
>  if key == "cafile":
>  cafile = value
>  elif key == "insecure":
>  insecure = value.lower() in ['true', '1']
>  elif key == "is_ovirt_host":
> @@ -84,25 +94,31 @@ def after_fork():
>  options = get_options(http, url)
>  http.close()
>  
>  nbdkit.debug("imageio features: flush=%(can_flush)r "
>   "zero=%(can_zero)r unix_socket=%(unix_socket)r "
>   "max_readers=%(max_readers)r max_writers=%(max_writers)r"
>   % options)
>  
>  pool = create_http_pool(url, options)
>  
> +t = threading.Thread(target=pool_keeper, name="poolkeeper")
> +t.daemon = True
> +t.start()
> +
>  
>  # This function is not actually defined before nbdkit 1.28, but it
>  # doesn't particularly matter if we don't close the pool because
>  # clients should call flush().
>  def cleanup():
> +nbdkit.debug("cleaning up")
> +done.set()
>  close_http_pool(pool)
>  
>  
>  def thread_model():
>  """
>  Using parallel model to speed up transfer with multiple connections to
>  imageio server.
>  """
>  return nbdkit.THREAD_MODEL_PARALLEL
>  
> @@ -272,20 +288,23 @@ def emulate_zero(h, count, offset, flags):
>  r = http.getresponse()
>  if r.status != 200:
>  request_failed(r,
> "could not write zeroes offset %d size %d" %
> (offset, count))
>  
>  r.read()
>  
>  
>  def flush(h, flags):
> +if pool_error:
> +raise pool_error
> +
>  # Wait until all inflight requests are completed, and send a flush
>  # request for all imageio connections.
>  locked = []
>  
>  # Lock the pool by taking all connections out.
>  while len(locked) < pool.maxsize:
>  locked.append(pool.get())
>  
>  try:
>  for item in locked:
> @@ -348,26 +367,78 @@ def create_http_pool(url, options):
>  
>  pool = queue.Queue(count)
>  
>  for i in range(count):
>  http = create_http(url, unix_socket=unix_socket)
>  pool.put(PoolItem(http))

Re: [Libguestfs] [PATCH 3/5] output/rhv-upload-plugin: Extract send_flush() helper

2022-02-08 Thread Richard W.M. Jones
On Sat, Dec 18, 2021 at 10:36:31PM +0200, Nir Soffer wrote:
> Extract a helper for sending flush request for single connection, and
> inline the iter_http_pool() helper into flush(), its only user.
> ---
>  output/rhv-upload-plugin.py | 54 -
>  1 file changed, 23 insertions(+), 31 deletions(-)
> 
> diff --git a/output/rhv-upload-plugin.py b/output/rhv-upload-plugin.py
> index bad0e8a3..f7e5950f 100644
> --- a/output/rhv-upload-plugin.py
> +++ b/output/rhv-upload-plugin.py
> @@ -271,36 +271,51 @@ def emulate_zero(h, count, offset, flags):
>  r = http.getresponse()
>  if r.status != 200:
>  request_failed(r,
> "could not write zeroes offset %d size %d" %
> (offset, count))
>  
>  r.read()
>  
>  
>  def flush(h, flags):
> +# Wait until all inflight requests are completed, and send a flush
> +# request for all imageio connections.
> +locked = []
> +
> +# Lock the pool by taking all connections out.
> +while len(locked) < pool.maxsize:
> +locked.append(pool.get())
> +
> +try:
> +for http in locked:
> +send_flush(http)
> +finally:
> +# Unlock the pool by puting the connection back.
> +for http in locked:
> +pool.put(http)
> +
> +
> +def send_flush(http):
>  # Construct the JSON request for flushing.
>  buf = json.dumps({'op': "flush"}).encode()
>  
>  headers = {"Content-Type": "application/json",
> "Content-Length": str(len(buf))}
>  
> -# Wait until all inflight requests are completed, and send a flush
> -# request for all imageio connections.
> -for http in iter_http_pool(pool):
> -http.request("PATCH", url.path, body=buf, headers=headers)
> +http.request("PATCH", url.path, body=buf, headers=headers)
>  
> -r = http.getresponse()
> -if r.status != 200:
> -request_failed(r, "could not flush")
> +r = http.getresponse()
> +if r.status != 200:
> +request_failed(r, "could not flush")
>  
> -r.read()
> +r.read()
>  
>  
>  # Modify http.client.HTTPConnection to work over a Unix domain socket.
>  # Derived from uhttplib written by Erik van Zijst under an MIT license.
>  # (https://pypi.org/project/uhttplib/)
>  # Ported to Python 3 by Irit Goihman.
>  class UnixHTTPConnection(HTTPConnection):
>  def __init__(self, path, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
>  self.path = path
>  HTTPConnection.__init__(self, "localhost", timeout=timeout)
> @@ -337,43 +352,20 @@ def http_context(pool):
>  Context manager yielding an imageio http connection from the pool. Blocks
>  until a connection is available.
>  """
>  http = pool.get()
>  try:
>  yield http
>  finally:
>  pool.put(http)
>  
>  
> -def iter_http_pool(pool):
> -"""
> -Wait until all inflight requests are done, and iterate on imageio
> -connections.
> -
> -The pool is empty during iteration. New requests issued during iteration
> -will block until iteration is done.
> -"""
> -locked = []
> -
> -# Lock the pool by taking all connections out.
> -while len(locked) < pool.maxsize:
> -locked.append(pool.get())
> -
> -try:
> -for http in locked:
> -yield http
> -finally:
> -# Unlock the pool by puting the connection back.
> -for http in locked:
> -pool.put(http)
> -
> -
>  def close_http_pool(pool):
>  """
>  Wait until all inflight requests are done, close all connections and 
> remove
>  them from the pool.
>  
>  No request can be served by the pool after this call.
>  """
>  nbdkit.debug("closing http pool")
>  
>  locked = []

This one looks like a neutral refactoring, so ACK

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] Performance regression in -o rhv-upload questions

2022-02-08 Thread Richard W.M. Jones
On Tue, Feb 08, 2022 at 04:44:33PM +0200, Nir Soffer wrote:
> This happens because virt-v2v try to finalize the transfer *before* closing 
> the
> connections to imageio server.
> 
> Current imageio release mark the a ticket as canceled, but will not remove it
> if the ticket has open connections from clients. If the clients are
> idle, the connection
> closes after 60 seconds, so when engine tries again to remove the ticket, the
> operation succeeds.

I posted this patch:
https://listman.redhat.com/archives/libguestfs/2022-February/msg00113.html

It's possibly not quite right.  We now just kill nbdkit and
immediately go on to finalization, so there might still be a race
(since killing nbdkit only starts the process of nbdkit shutting down,
closing the Python plugin etc).  I don't know if this is a concern or
not, but the patch does avoid the 60s timeout in my limited testing.

> Upstream version improved this flow to wait only for ongoing
> requests. If there are no ongoing requests the ticket is removed
> immediately, ignoring the open connections. The connections will be
> closed when the client closes the connection on when reading from
> the socket times out.

"Ongoing" here means that an HTTP request is being processed?

> You can test upstream version from here:
> https://github.com/oVirt/ovirt-imageio/actions/runs/1811058094
> 
> But if you want to be compatible with released imageio version, you
> must close the connections to imageio *before* finalizing the transfers.
> 
> We already discussed this issue last year, when we handled the bug
> when vddk blocks for many minutes during block status and imageio
> connection is droped, and you opened a bug for this.
> 
> > It's not a completely like-for-like comparison because the rhv-upload
> > backend changed a lot between these versions.  In particular if I was
> > going to pick some suspicious change it would be my refactoring here
> > which was supposed to be neutral but maybe isn't:
> >
> >   
> > https://github.com/libguestfs/virt-v2v/commit/143a22860216b94d3a81706193088d50c03fc35c
> >
> > Unfortunately this commit is almost impossible to revert because of
> > deep restructuring of surrounding code.
> 
> The refactoring is not the issue, the issue is that we do not terminate
> the output nbdkit before finalizing.
>
> For some output we cannot do this since we want to query nbdkit for
> more info (e.g block status) but for rhv upload we don't have anything
> to do with nbkdit instance and we must close the connection before
> we finalize, so we should close the output right after we finish the copy.

It's not really desirable to kill nbdkit like I did in my patch.  We
do really want to open up virt-v2v to allow other processes to query
those sockets during the transfer.

An alternative to the patch I posted might be to have a way to request
that nbdkit disconnects its HTTP pool.

I will also note that HTTP is supposed to be stateless.  An open TCP
connection should not have meaning.

> > Another idea:
> >
> > Old virt-v2v uses qemu-img convert which does not flush by default.
> > New virt-v2v uses nbdcopy with the --flush option, so it will call
> > imageio PATCH /path ... "op":"flush" at the end.  However removing
> > nbdcopy --flush didn't help very much (only a couple of seconds off).
> 
> Calling flush is right, we cannot remove it. Although there is bug in nbdkit,
> and if flushes during close() flow even if you did not send a flush.
> 
> > Do you have any other ideas?
> >
> > What exactly does imageio do during the finalizing_success state?
> 
> The flow is this:
> - engine send request to vdsm to delete the ticket
> - vdsm connects to imageio control socket and send DELETE /tickets/{ticket-id}
> - imageio mark the ticket as canceled, so no new request can succeed
>   and no new connection to attach to the ticket
> - imageio mark all ongoing requests as canceled, so if the request is in
>   read/write loop, it will abort on the the next read/write complete
> - upstream: imageio waits until all ongoing operations complete
> - release: imageio waits until all connections are closed

This may answer my HTTP stateless point above.

> - if the ticket could not be removed in 60 seconds, imageio returns 409 
> Conflict
> - imageio returns 200 OK
> - engine retries the delete if it failed
> - when engine succeeds and finish other finalization tasks, the image
> transfer will
>   be reported as finished.
> 
> I would expect the flow to time out on virt-v2v side, since we use 60 seconds
> timeout, and when removing a ticket times out, it takes 60 seconds, so you
> should not succeed to finalize in 61 seconds.  There is a bug in ovirt
> engine that cause this flow to succeed.

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


[Libguestfs] [PATCH v2v] output: -o rhv-upload: Kill nbdkit instances before finalizing

2022-02-08 Thread Richard W.M. Jones
If nbdkit instance(s) are still running then they will hold open some
http connections to imageio.  In some versions of imageio, starting
finalization in this state causes a 60 second timeout.

See-also: 
https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html
Thanks: Nir Soffer
---
 output/output_rhv_upload.ml | 43 +++--
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/output/output_rhv_upload.ml b/output/output_rhv_upload.ml
index 4d8dc1c135..1fefed123c 100644
--- a/output/output_rhv_upload.ml
+++ b/output/output_rhv_upload.ml
@@ -280,12 +280,26 @@ e command line has to match the number of guest disk 
images (for this guest: %d)
 ignore (Python_script.run_command cancel_script json_params [])
   in
 
-  (* Set up an at-exit handler so we delete the orphan disks on failure. *)
+  (* Set up an at-exit handler to perform some cleanups.
+   * - Kill nbdkit PIDs (only before finalization).
+   * - Delete the orphan disks (only on conversion failure).
+   *)
+  let nbdkit_pids = ref [] in
   On_exit.f (
 fun () ->
+  (* Kill the nbdkit PIDs. *)
+  List.iter (
+fun pid ->
+  try
+kill pid Sys.sigterm
+  with exn -> debug "%s" (Printexc.to_string exn)
+  ) !nbdkit_pids;
+  nbdkit_pids := [];
+
   (* virt-v2v writes v2vdir/done on success only. *)
-  let success = Sys.file_exists (dir // "done") in
-  if not success then (
+  let on_failure = not (Sys.file_exists (dir // "done")) in
+
+  if on_failure then (
 if disk_uuids <> [] then
   cancel !transfer_ids disk_uuids
   )
@@ -351,11 +365,7 @@ e command line has to match the number of guest disk 
images (for this guest: %d)
   if is_ovirt_host then
 Nbdkit.add_arg cmd "is_ovirt_host" "true";
   let _, pid = Nbdkit.run_unix ~socket cmd in
-
-  (* --exit-with-parent should ensure nbdkit is cleaned
-   * up when we exit, but it's not supported everywhere.
-   *)
-  On_exit.kill pid;
+  List.push_front pid nbdkit_pids
   ) (List.combine disks disk_uuids);
 
   (* Stash some data we will need during finalization. *)
@@ -363,7 +373,7 @@ e command line has to match the number of guest disk images 
(for this guest: %d)
   let t = (disk_sizes : int64 list), disk_uuids, !transfer_ids,
   finalize_script, createvm_script, json_params,
   rhv_storagedomain_uuid, rhv_cluster_uuid,
-  rhv_cluster_cpu_architecture, rhv_cluster_name in
+  rhv_cluster_cpu_architecture, rhv_cluster_name, nbdkit_pids in
   t
 
 and rhv_upload_finalize dir source inspect target_meta
@@ -374,7 +384,8 @@ and rhv_upload_finalize dir source inspect target_meta
 (disk_sizes, disk_uuids, transfer_ids,
  finalize_script, createvm_script, json_params,
  rhv_storagedomain_uuid, rhv_cluster_uuid,
- rhv_cluster_cpu_architecture, rhv_cluster_name) =
+ rhv_cluster_cpu_architecture, rhv_cluster_name,
+ nbdkit_pids) =
   (* Check the cluster CPU arch matches what we derived about the
* guest during conversion.
*)
@@ -386,6 +397,16 @@ and rhv_upload_finalize dir source inspect target_meta
   rhv_cluster_name target_meta.guestcaps.gcaps_arch arch
   );
 
+  (* We must kill all our nbdkit instances before finalizing the
+   * transfer.  See:
+   * https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html
+   *
+   * We want to fail here if the kill fails because nbdkit
+   * died already, as that would be unexpected.
+   *)
+  List.iter (fun pid -> kill pid Sys.sigterm) !nbdkit_pids;
+  nbdkit_pids := []; (* Don't kill them again in the On_exit handler. *)
+
   (* Finalize all the transfers. *)
   let json_params =
 let ids = List.map (fun id -> JSON.String id) transfer_ids in
@@ -442,7 +463,7 @@ module RHVUpload = struct
   type t = int64 list * string list * string list *
Python_script.script * Python_script.script *
JSON.field list * string option * string option *
-   string option * string
+   string option * string * int list ref
 
   let to_string options =
 "-o rhv-upload" ^
-- 
2.32.0

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] Performance regression in -o rhv-upload questions

2022-02-08 Thread Richard W.M. Jones
On Tue, Feb 08, 2022 at 04:44:33PM +0200, Nir Soffer wrote:
> On Tue, Feb 8, 2022 at 3:27 PM Richard W.M. Jones  wrote:
> > virt-v2v 1.44.2:
> > Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.44.2-rhv-upload.log
> >
> >   [  63.1] Copying disk 1/1
> >   ...
> >   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finalizing_success
> >   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finished_success
> >   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a finalized in 6.105 seconds
> >   [  89.2] Creating output metadata
> >   [  89.8] Finishing off
> >
> > virt-v2v 1.45.97:
> > Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.45.97-rhv-upload.log
> >
> >   [  71.0] Copying disk 1/1
> >   [  82.7] Creating output metadata
> >   ...
> >   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finalizing_success
> >   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finished_success
> >   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc finalized in 61.552 seconds
> >   [ 144.9] Finishing off
>
> This happens because virt-v2v try to finalize the transfer *before*
> closing the connections to imageio server.
>
> Current imageio release mark the a ticket as canceled, but will not
> remove it if the ticket has open connections from clients. If the
> clients are idle, the connection closes after 60 seconds, so when
> engine tries again to remove the ticket, the operation succeeds.

Ah ha!  I should be able to fix this easily enough ...  Let me try a
few things and then I'll come back to the rest of your message.

Rich.

> Upstream version improved this flow to wait only for ongoing requests. If 
> there
> are no ongoing requests the ticket is removed immediately, ignoring the open
> connections. The connections will be closed when the client closes the
> connection
> on when reading from the socket times out.
> 
> You can test upstream version from here:
> https://github.com/oVirt/ovirt-imageio/actions/runs/1811058094
> 
> But if you want to be compatible with released imageio version, you
> must close the connections to imageio *before* finalizing the transfers.
> 
> We already discussed this issue last year, when we handled the bug
> when vddk blocks for many minutes during block status and imageio
> connection is droped, and you opened a bug for this.
> 
> > It's not a completely like-for-like comparison because the rhv-upload
> > backend changed a lot between these versions.  In particular if I was
> > going to pick some suspicious change it would be my refactoring here
> > which was supposed to be neutral but maybe isn't:
> >
> >   
> > https://github.com/libguestfs/virt-v2v/commit/143a22860216b94d3a81706193088d50c03fc35c
> >
> > Unfortunately this commit is almost impossible to revert because of
> > deep restructuring of surrounding code.
> 
> The refactoring is not the issue, the issue is that we do not terminate
> the output nbdkit before finalizing.
> 
> For some output we cannot do this since we want to query nbdkit for
> more info (e.g block status) but for rhv upload we don't have anything
> to do with nbkdit instance and we must close the connection before
> we finalize, so we should close the output right after we finish the copy.
> 
> > Another idea:
> >
> > Old virt-v2v uses qemu-img convert which does not flush by default.
> > New virt-v2v uses nbdcopy with the --flush option, so it will call
> > imageio PATCH /path ... "op":"flush" at the end.  However removing
> > nbdcopy --flush didn't help very much (only a couple of seconds off).
> 
> Calling flush is right, we cannot remove it. Although there is bug in nbdkit,
> and if flushes during close() flow even if you did not send a flush.
> 
> > Do you have any other ideas?
> >
> > What exactly does imageio do during the finalizing_success state?
> 
> The flow is this:
> - engine send request to vdsm to delete the ticket
> - vdsm connects to imageio control socket and send DELETE /tickets/{ticket-id}
> - imageio mark the ticket as canceled, so no new request can succeed
>   and no new connection to attach to the ticket
> - imageio mark all ongoing requests as canceled, so if the request is in
>   read/write loop, it will abort on the the next read/write complete
> - upstream: imageio waits until all ongoing operations complete
> - release: imageio waits until all connections are closed
> - if the ticket could not be removed in 60 seconds, imageio returns 409 
> Conflict
> - imageio returns 200 OK
> - engine retries the delete if it failed
> - when engine succeeds and finish other finalization tasks, the image
> transfer will
>   be reported as finished.
> 
> I would expect the flow to time out on virt-v2v side, since we use 60 seconds
> timeout, and when removing a ticket times out, it takes 60 seconds, so you
> should not succeed to finalize in 61 seconds.  There is a bug in ovirt
> engine that cause this flow to succeed.
> 
> Nir

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and 

Re: [Libguestfs] Performance regression in -o rhv-upload questions

2022-02-08 Thread Nir Soffer
On Tue, Feb 8, 2022 at 3:27 PM Richard W.M. Jones  wrote:
>
> Hi Nir,
>
> https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c4
>
> I'm looking for some advice/help with a performance regression in
> virt-v2v between 1.44.2 and the latest version.  It's very obvious and
> reproducible when I do a conversion from a local disk to local RHV
> using -o rhv-upload, specifically:
>
> $ time ./run virt-v2v -i disk /var/tmp/fedora-35.qcow2 -o rhv-upload -oc 
> https://ovirt4410.home.annexia.org/ovirt-engine/api -op /tmp/ovirt-passwd -oo 
> rhv-direct -os ovirt-data -on test5 -of raw
>
> The guest is an ordinary Fedora guest containing some junk data.
>
> Now that I've been able to reproduce the problem locally, it turns out
> to be not at all what I thought it was going to be.  The timings show
> that:
>
>  - new virt-v2v is much faster doing the copy, but
>
>  - new virt-v2v takes ages finalizing the transfer (about 10x longer
>than the old version)
>
> virt-v2v 1.44.2:
> Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.44.2-rhv-upload.log
>
>   [  63.1] Copying disk 1/1
>   ...
>   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finalizing_success
>   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finished_success
>   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a finalized in 6.105 seconds
>   [  89.2] Creating output metadata
>   [  89.8] Finishing off
>
> virt-v2v 1.45.97:
> Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.45.97-rhv-upload.log
>
>   [  71.0] Copying disk 1/1
>   [  82.7] Creating output metadata
>   ...
>   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finalizing_success
>   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finished_success
>   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc finalized in 61.552 seconds
>   [ 144.9] Finishing off

This happens because virt-v2v try to finalize the transfer *before* closing the
connections to imageio server.

Current imageio release mark the a ticket as canceled, but will not remove it
if the ticket has open connections from clients. If the clients are
idle, the connection
closes after 60 seconds, so when engine tries again to remove the ticket, the
operation succeeds.

Upstream version improved this flow to wait only for ongoing requests. If there
are no ongoing requests the ticket is removed immediately, ignoring the open
connections. The connections will be closed when the client closes the
connection
on when reading from the socket times out.

You can test upstream version from here:
https://github.com/oVirt/ovirt-imageio/actions/runs/1811058094

But if you want to be compatible with released imageio version, you
must close the connections to imageio *before* finalizing the transfers.

We already discussed this issue last year, when we handled the bug
when vddk blocks for many minutes during block status and imageio
connection is droped, and you opened a bug for this.

> It's not a completely like-for-like comparison because the rhv-upload
> backend changed a lot between these versions.  In particular if I was
> going to pick some suspicious change it would be my refactoring here
> which was supposed to be neutral but maybe isn't:
>
>   
> https://github.com/libguestfs/virt-v2v/commit/143a22860216b94d3a81706193088d50c03fc35c
>
> Unfortunately this commit is almost impossible to revert because of
> deep restructuring of surrounding code.

The refactoring is not the issue, the issue is that we do not terminate
the output nbdkit before finalizing.

For some output we cannot do this since we want to query nbdkit for
more info (e.g block status) but for rhv upload we don't have anything
to do with nbkdit instance and we must close the connection before
we finalize, so we should close the output right after we finish the copy.

> Another idea:
>
> Old virt-v2v uses qemu-img convert which does not flush by default.
> New virt-v2v uses nbdcopy with the --flush option, so it will call
> imageio PATCH /path ... "op":"flush" at the end.  However removing
> nbdcopy --flush didn't help very much (only a couple of seconds off).

Calling flush is right, we cannot remove it. Although there is bug in nbdkit,
and if flushes during close() flow even if you did not send a flush.

> Do you have any other ideas?
>
> What exactly does imageio do during the finalizing_success state?

The flow is this:
- engine send request to vdsm to delete the ticket
- vdsm connects to imageio control socket and send DELETE /tickets/{ticket-id}
- imageio mark the ticket as canceled, so no new request can succeed
  and no new connection to attach to the ticket
- imageio mark all ongoing requests as canceled, so if the request is in
  read/write loop, it will abort on the the next read/write complete
- upstream: imageio waits until all ongoing operations complete
- release: imageio waits until all connections are closed
- if the ticket could not be removed in 60 seconds, imageio returns 409 Conflict
- imageio returns 200 OK
- engine retries the delete if it 

[Libguestfs] Performance regression in -o rhv-upload questions

2022-02-08 Thread Richard W.M. Jones
Hi Nir,

https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c4

I'm looking for some advice/help with a performance regression in
virt-v2v between 1.44.2 and the latest version.  It's very obvious and
reproducible when I do a conversion from a local disk to local RHV
using -o rhv-upload, specifically:

$ time ./run virt-v2v -i disk /var/tmp/fedora-35.qcow2 -o rhv-upload -oc 
https://ovirt4410.home.annexia.org/ovirt-engine/api -op /tmp/ovirt-passwd -oo 
rhv-direct -os ovirt-data -on test5 -of raw

The guest is an ordinary Fedora guest containing some junk data.

Now that I've been able to reproduce the problem locally, it turns out
to be not at all what I thought it was going to be.  The timings show
that:

 - new virt-v2v is much faster doing the copy, but

 - new virt-v2v takes ages finalizing the transfer (about 10x longer
   than the old version)

virt-v2v 1.44.2:
Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.44.2-rhv-upload.log

  [  63.1] Copying disk 1/1
  ...
  transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finalizing_success
  transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finished_success
  transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a finalized in 6.105 seconds
  [  89.2] Creating output metadata
  [  89.8] Finishing off

virt-v2v 1.45.97:
Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.45.97-rhv-upload.log

  [  71.0] Copying disk 1/1
  [  82.7] Creating output metadata
  ...
  transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finalizing_success
  transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finished_success
  transfer 6ea3d724-16f9-4bda-a33e-69a783480abc finalized in 61.552 seconds
  [ 144.9] Finishing off

It's not a completely like-for-like comparison because the rhv-upload
backend changed a lot between these versions.  In particular if I was
going to pick some suspicious change it would be my refactoring here
which was supposed to be neutral but maybe isn't:

  
https://github.com/libguestfs/virt-v2v/commit/143a22860216b94d3a81706193088d50c03fc35c

Unfortunately this commit is almost impossible to revert because of
deep restructuring of surrounding code.

Both branches have your earlier fix to finalization, so it shouldn't
be this:

  
https://github.com/libguestfs/virt-v2v/commit/79702b28329d15a7485801ed7e915d486fcc0cf4

Another idea:

Old virt-v2v uses qemu-img convert which does not flush by default.
New virt-v2v uses nbdcopy with the --flush option, so it will call
imageio PATCH /path ... "op":"flush" at the end.  However removing
nbdcopy --flush didn't help very much (only a couple of seconds off).

Do you have any other ideas?

What exactly does imageio do during the finalizing_success state?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [libnbd PATCH] copy: Fail nbdcopy if NBD read or write fails

2022-02-08 Thread Richard W.M. Jones
On Fri, Feb 04, 2022 at 09:18:07AM +0100, Laszlo Ersek wrote:
> (And there was no easy way to skip connect-tcp6.)

Wwe use this in libnbd.spec:

https://src.fedoraproject.org/rpms/libnbd/blob/rawhide/f/libnbd.spec#_214

In libguestfs we have a whole infrastructure for skipping tests by
setting $SKIP_* variables in localenv.

https://libguestfs.org/guestfs-building.1.html#local-files
https://github.com/libguestfs/libguestfs/blob/0553f90462f456b1fe7100346aa1a4f12fccb288/tests/test-functions.sh#L39

For libnbd/nbdkit it might be possible to automate it in the same way
with tests/functions.sh.in + $0, but it's not implemented at the
moment.  Of course it'd be better if the tests didn't fail :-)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs