Re: [Libguestfs] [PATCH v2] v2v: rhv-upload-plugin: Defer imageio connection

2021-01-21 Thread Eric Blake
On 1/21/21 6:16 AM, Nir Soffer wrote:
> On Thu, Jan 21, 2021 at 1:57 PM Nir Soffer  wrote:
> ...
> 
> How I tested:
> 
> I could not use the example python plugin with the added delay, since
> os.preadv() is not available in python 3.6 (added in 3.7).
> 
> I could not use the delay filter since it supports only integer delay.

You are correct that it currently doesn't take floating point delays,
but it does take an 'ms' suffix to sleep by milliseconds instead of seconds.

 With
> fedora 32 image with 231 extents, qemu-img calls block_status 960 times
> during import and this is too slow to test.
> 
> So I used this hack:
> 
> $ git diff
> diff --git a/plugins/file/file.c b/plugins/file/file.c
> index 1651079a..99a95af0 100644
> --- a/plugins/file/file.c
> +++ b/plugins/file/file.c
> @@ -785,6 +785,8 @@ do_extents (void *handle, uint32_t count, uint64_t offset,
>const bool req_one = flags & NBDKIT_FLAG_REQ_ONE;
>uint64_t end = offset + count;
> 
> +  usleep(30);

That looks like it could be achieved by '--filter delay-extents=300ms'.


> I also tried to import using imageio client, using this pipeline:
> 
> imageio nbd client -> qemu-nbd -> overlay.qcow2 -> nbdkit
> 
> On imageio side, we always try to get extents for 2g, and we don't use
> NBD_CMD_FLAG_REQ_ONE. But on nbdkit we see that all extents
> calls use req_one=1.
> 
> Looks like the issue is this code in qemu nbd driver:
> 
> 1676 static int coroutine_fn nbd_client_co_block_status(
> 1677 BlockDriverState *bs, bool want_zero, int64_t offset,
> int64_t bytes,
> 1678 int64_t *pnum, int64_t *map, BlockDriverState **file)
> 1679 {
> ...
> 1685 NBDRequest request = {
> 1686 .type = NBD_CMD_BLOCK_STATUS,
> 1687 .from = offset,
> 1688 .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
> 1689MIN(bytes, s->info.size - offset)),
> 1690 .flags = NBD_CMD_FLAG_REQ_ONE,
> 1691 };

Yes, qemu is not yet optimized to remember more than just the current
extent.

> 
> Also qemu-img convert is getting extents twice:
> 
> 1. One call per extents before starting the copy:
> 
> nbdkit: file.0: debug: file: extents count=196608 offset=851968 req_one=1
> nbdkit: file.1: debug: file: extents count=196608 offset=1376256 req_one=1
> nbdkit: file.3: debug: file: extents count=393216 offset=1638400 req_one=1
> nbdkit: file.4: debug: file: extents count=458752 offset=2686976 req_one=1
> nbdkit: file.5: debug: file: extents count=983040 offset=3211264 req_one=1
> ...
> nbdkit: file.14: debug: file: extents count=524288 offset=6441402368 req_one=1
> 
> 2. One call per extent before each pread(), during copy:

Yes, that's probably another thing that could be improved in qemu.

> Total 960 calls per one import.
> 
> A tiny delay in getting extents can slow down the entire import.

Known issue in qemu-img; that's partly why libnbd's nbdcopy is currently
faster.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v2] v2v: rhv-upload-plugin: Defer imageio connection

2021-01-21 Thread Nir Soffer
On Thu, Jan 21, 2021 at 2:08 PM Richard W.M. Jones  wrote:
>
> On Thu, Jan 21, 2021 at 01:56:55PM +0200, Nir Soffer wrote:
> > This happens only when not using unix socket, for example when running
> > on non-ovirt host, or ovirt host from another data center, or when using
> > -oo rhv_direct=false
> >
> > When using unix socket, we close the initial HTTPSConnection, and
> > created a new UnixHTTPConnection. This connection is not connected to
> > the server yet. When qemu-img tries to access the server, the connection
> > is connected automatically.
> >
> > Fix the issue by closing the initial connection used to get server
> > options and initialize the handle, and storing a closed connection in
> > the handle.
>
> I think this description could be a bit clearer.  If I'm understanding
> it correctly, this description could be better:
>
> "
>   This happens only when not using unix socket, for example when running
>   on non-ovirt host, or ovirt host from another data center, or when using
>   -oo rhv_direct=false
>
>   After this change, in the unix domain socket case:
>
>   - We close the socket.

The tcp socket

>   - We call optimize_http which immediately reopens the unix socket.

No, we create a connection object, but it does not create the underlying
socket until we send() is called:

HTTPConnection.__init__():
https://github.com/python/cpython/blob/34df10a9a16b38d54421eeeaf73ec89828563be7/Lib/http/client.py#L241

HTTPConnection.send():
https://github.com/python/cpython/blob/34df10a9a16b38d54421eeeaf73ec89828563be7/Lib/http/client.py#L972

>   In the non-unix domain socket case:
>
>   - We close the socket.
>
>   - We call optimize_http which does nothing.

Right

>   - Later when qemu-img starts to read or write to nbdkit,
> HTTPConnection transparently reconnects to the oVirt server.
> Doing it this way avoids keeping the TCP connection idle after the
> initial open(), avoiding the bug described above.

Right. I'll try to make this more clear before pushing.

> "
>
> > --- a/v2v/rhv-upload-plugin.py
> > +++ b/v2v/rhv-upload-plugin.py
> > @@ -117,6 +117,15 @@ def open(readonly):
> >  destination_url = parse_transfer_url(transfer)
> >  http = create_http(destination_url)
> >  options = get_options(http, destination_url)
> > +
> > +# Close the initial connection to imageio server. When qemu-img 
> > will
> > +# try to access the server, HTTPConnection will reconnect
> > +# automatically. If we keep this connection idle and qemu-img is 
> > too
> > +# slow getting image extents, imageio server may close the 
> > connection,
> > +# and the import will fail on the first write.
> > +# See https://bugzilla.redhat.com/1916176.
> > +http.close()
> > +
> >  http = optimize_http(http, host, options)
> >  except:
> >  cancel_transfer(connection, transfer)
>
> Anyway, please feel free to push it now if you want.
>
> Thanks,
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> Fedora Windows cross-compiler. Compile Windows programs, test, and
> build Windows installers. Over 100 libraries supported.
> http://fedoraproject.org/wiki/MinGW
>

___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v2] v2v: rhv-upload-plugin: Defer imageio connection

2021-01-21 Thread Nir Soffer
On Thu, Jan 21, 2021 at 1:57 PM Nir Soffer  wrote:
...

How I tested:

I could not use the example python plugin with the added delay, since
os.preadv() is not available in python 3.6 (added in 3.7).

I could not use the delay filter since it supports only integer delay. With
fedora 32 image with 231 extents, qemu-img calls block_status 960 times
during import and this is too slow to test.

So I used this hack:

$ git diff
diff --git a/plugins/file/file.c b/plugins/file/file.c
index 1651079a..99a95af0 100644
--- a/plugins/file/file.c
+++ b/plugins/file/file.c
@@ -785,6 +785,8 @@ do_extents (void *handle, uint32_t count, uint64_t offset,
   const bool req_one = flags & NBDKIT_FLAG_REQ_ONE;
   uint64_t end = offset + count;

+  usleep(30);
+
   do {
 off_t pos;

Started nbdkit with fedora 32 image:

$ ./nbdkit -f -v ./plugins/file/.libs/nbdkit-file-plugin.so
file=/var/tmp/fedora-32.raw

Add overlay to make it work with virt-v2v -i disk:

$ qemu-img create -f qcow2 -b nbd://localhost -F raw /var/tmp/fedora-32.qcow2

With this, getting all extents takes more than 60 seconds, enough to get the
connection closed:

$ time qemu-img map --output json nbd://localhost >/dev/null

real 1m9.416s
user 0m0.007s
sys 0m0.017s

I imported the qcow2 image on ovirt host, with --oo rhv_direct=true/false.

Looking at imageio log, we see that the initial connection is closed,
and the second
connection is made once qemu finish to get extetents for the entire image.

I also tried to import using imageio client, using this pipeline:

imageio nbd client -> qemu-nbd -> overlay.qcow2 -> nbdkit

On imageio side, we always try to get extents for 2g, and we don't use
NBD_CMD_FLAG_REQ_ONE. But on nbdkit we see that all extents
calls use req_one=1.

Looks like the issue is this code in qemu nbd driver:

1676 static int coroutine_fn nbd_client_co_block_status(
1677 BlockDriverState *bs, bool want_zero, int64_t offset,
int64_t bytes,
1678 int64_t *pnum, int64_t *map, BlockDriverState **file)
1679 {
...
1685 NBDRequest request = {
1686 .type = NBD_CMD_BLOCK_STATUS,
1687 .from = offset,
1688 .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
1689MIN(bytes, s->info.size - offset)),
1690 .flags = NBD_CMD_FLAG_REQ_ONE,
1691 };

Also qemu-img convert is getting extents twice:

1. One call per extents before starting the copy:

nbdkit: file.0: debug: file: extents count=196608 offset=851968 req_one=1
nbdkit: file.1: debug: file: extents count=196608 offset=1376256 req_one=1
nbdkit: file.3: debug: file: extents count=393216 offset=1638400 req_one=1
nbdkit: file.4: debug: file: extents count=458752 offset=2686976 req_one=1
nbdkit: file.5: debug: file: extents count=983040 offset=3211264 req_one=1
...
nbdkit: file.14: debug: file: extents count=524288 offset=6441402368 req_one=1

2. One call per extent before each pread(), during copy:

nbdkit: file.0: debug: file: extents count=196608 offset=851968 req_one=1
nbdkit: file.1: debug: file: extents count=196608 offset=1376256 req_one=1
nbdkit: file.3: debug: file: extents count=393216 offset=1638400 req_one=1
nbdkit: file.4: debug: file: extents count=458752 offset=2686976 req_one=1
nbdkit: file.5: debug: file: pread count=458752 offset=2686976
nbdkit: file.5: debug: file: extents count=983040 offset=3211264 req_one=1
nbdkit: file.6: debug: file: pread count=983040 offset=3211264
nbdkit: file.7: debug: file: extents count=2031616 offset=4259840 req_one=1
nbdkit: file.8: debug: file: pread count=2031616 offset=4259840
nbdkit: file.9: debug: file: extents count=11599872 offset=6356992 req_one=1
nbdkit: file.10: debug: file: pread count=2097152 offset=6356992
nbdkit: file.11: debug: file: pread count=2097152 offset=8454144
...
nbdkit: file.14: debug: file: extents count=393216 offset=5591269376 req_one=1
nbdkit: file.6: debug: file: pread count=393216 offset=5591269376
nbdkit: file.6: debug: file: extents count=313851904 offset=5591728128 req_one=1
nbdkit: file.4: debug: file: pread count=1048576 offset=5591728128
nbdkit: file.4: debug: file: extents count=312803328 offset=5592776704 req_one=1
nbdkit: file.8: debug: file: extents count=535232512 offset=5905580032 req_one=1
nbdkit: file.3: debug: file: extents count=524288 offset=6441402368 req_one=1
nbdkit: file.13: debug: client sent NBD_CMD_DISC, closing connection

Total 960 calls per one import.

A tiny delay in getting extents can slow down the entire import.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v2] v2v: rhv-upload-plugin: Defer imageio connection

2021-01-21 Thread Richard W.M. Jones
On Thu, Jan 21, 2021 at 01:56:55PM +0200, Nir Soffer wrote:
> This happens only when not using unix socket, for example when running
> on non-ovirt host, or ovirt host from another data center, or when using
> -oo rhv_direct=false
> 
> When using unix socket, we close the initial HTTPSConnection, and
> created a new UnixHTTPConnection. This connection is not connected to
> the server yet. When qemu-img tries to access the server, the connection
> is connected automatically.
> 
> Fix the issue by closing the initial connection used to get server
> options and initialize the handle, and storing a closed connection in
> the handle.

I think this description could be a bit clearer.  If I'm understanding
it correctly, this description could be better:

"
  This happens only when not using unix socket, for example when running
  on non-ovirt host, or ovirt host from another data center, or when using
  -oo rhv_direct=false

  After this change, in the unix domain socket case:

  - We close the socket.

  - We call optimize_http which immediately reopens the unix socket.

  In the non-unix domain socket case:

  - We close the socket.

  - We call optimize_http which does nothing.

  - Later when qemu-img starts to read or write to nbdkit,
HTTPConnection transparently reconnects to the oVirt server.
Doing it this way avoids keeping the TCP connection idle after the
initial open(), avoiding the bug described above.
"

> --- a/v2v/rhv-upload-plugin.py
> +++ b/v2v/rhv-upload-plugin.py
> @@ -117,6 +117,15 @@ def open(readonly):
>  destination_url = parse_transfer_url(transfer)
>  http = create_http(destination_url)
>  options = get_options(http, destination_url)
> +
> +# Close the initial connection to imageio server. When qemu-img will
> +# try to access the server, HTTPConnection will reconnect
> +# automatically. If we keep this connection idle and qemu-img is too
> +# slow getting image extents, imageio server may close the 
> connection,
> +# and the import will fail on the first write.
> +# See https://bugzilla.redhat.com/1916176.
> +http.close()
> +
>  http = optimize_http(http, host, options)
>  except:
>  cancel_transfer(connection, transfer)

Anyway, please feel free to push it now if you want.

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

___
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs