Re: [Libguestfs] [libnbd PATCH] golang: Bump minimum Go version to 1.17

2023-08-17 Thread Nir Soffer
On Tue, Aug 15, 2023 at 9:53 PM Eric Blake  wrote:

> On Mon, Aug 14, 2023 at 01:43:37PM -0500, Eric Blake wrote:
> > > > +++ b/golang/configure/test.go
> > > > @@ -25,8 +25,19 @@
> > > >  import (
> > > > "fmt"
> > > > "runtime"
> > > > +   "unsafe"
> > > >  )
> > > >
> > > > +func check_slice(arr *uint32, cnt int) []uint32 {
> > > > +   /* We require unsafe.Slice(), introduced in 1.17 */
> > > > +   ret := make([]uint32, cnt)
> > > > +   s := unsafe.Slice(arr, cnt)
> > > > +   for i, item := range s {
> > > > +   ret[i] = uint32(item)
> > > > +   }
> > > > +   return ret
> > > > +}
> > > >
> > >
> > > I'm not sure what is the purpose of this test - requiring the Go
> version is
> > > good
> > > enough since the code will not compile with an older version. EVen if
> it
> > > would,
> > > it will not compile without unsafe.Slice so no special check is needed.
>
> Turns out it does matter.  On our CI system, Ubuntu 20.04 has Go
> 1.13.8 installed, and without this feature test, it compiled just fine
> (it wasn't until later versions of Go that go.mod's version request
> causes a compile failure if not satisfied).
>

How does it compile when unsafe.Slice is undefined?

Quick test with unrelated test app:

$ go build; echo $?
# cobra-test
./main.go:10:6: undefined: cmd.NoSuchMethod
1

Or you mean the compile test for configure works and we want to make
the configure test fail to compile?

https://gitlab.com/nbdkit/libnbd/-/jobs/4870816575
>
> But while investigating that, I also noticed that libvirt-ci just
> recently dropped all Debian 10 support in favor of Debian 12, so I'm
> working on updating ci/manifest.yml to match.
>

Sounds like a good idea
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [libnbd PATCH] golang: Bump minimum Go version to 1.17

2023-08-13 Thread Nir Soffer
On Sat, Aug 12, 2023 at 12:18 AM Eric Blake  wrote:

> Go 1.17 or newer is required to use unsafe.Slice(), which in turn
> allows us to write a simpler conversion from a C array to a Go object
> during callbacks.
>
> To check if this makes sense, look at
> https://repology.org/project/go/versions compared to our list in
> ci/manifest.yml, at the time I made this commit:
>
> Alpine 3.15: 1.17.10
> AlmaLinux 8: 1.19.10
> CentOS Stream 8: 1.20.4
> Debian 10: 1.11.6
> Debian 11: 1.15.15 (mainline), 1.19.8 (backports)
> Debian 12: 1.19.8
> Fedoar 36: 1.19.8
> FreeBSD Ports: 1.20.7
> OpenSUSE Leap 15.3: 1.16.3
> OpenSUSE Leap 15.4: 1.18.1
> Ubuntu 18.04: 1.18.1
>
> We previously required a minimum of 1.13 for module support, which
> means Debian 10 was already not supporting Go bindings.  OpenSUSE Leap
> 15.3 loses support, but is relatively old these days.  All other
> systems appear unaffected by this bump in requirements, at least if
> they can be configured to use developer backports.
>
> Suggested-by: Nir Soffer 
> Signed-off-by: Eric Blake 
> ---
>
> This replaces
> https://listman.redhat.com/archives/libguestfs/2023-August/032227.html
>
>  generator/GoLang.ml  |  8 
>  README.md|  2 +-
>  golang/configure/go.mod  |  4 ++--
>  golang/configure/test.go | 11 +++
>  golang/go.mod|  4 ++--
>  5 files changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/generator/GoLang.ml b/generator/GoLang.ml
> index 73df5254..55ff1b8a 100644
> --- a/generator/GoLang.ml
> +++ b/generator/GoLang.ml
> @@ -517,10 +517,10 @@ let
>
>  func copy_uint32_array(entries *C.uint32_t, count C.size_t) []uint32 {
>  ret := make([]uint32, int(count))
> -// See
> https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
> -// TODO: Use unsafe.Slice() when we require Go 1.17.
> -s := (*[1 << 30]uint32)(unsafe.Pointer(entries))[:count:count]
>

We have another instance of this pattern in AioBuffer.Slice


> -copy(ret, s)
> +s := unsafe.Slice(entries, count)
> +for i, item := range s {
> +ret[i] = uint32(item)
> +}
>  return ret
>  }
>  ";
> diff --git a/README.md b/README.md
> index c7166613..8524038e 100644
> --- a/README.md
> +++ b/README.md
> @@ -105,7 +105,7 @@ ## Building from source
>  * Python >= 3.3 to build the Python 3 bindings and NBD shell (nbdsh).
>  * FUSE 3 to build the nbdfuse program.
>  * Linux >= 6.0 and ublksrv library to build nbdublk program.
> -* go and cgo, for compiling the golang bindings and tests.
> +* go and cgo >= 1.17, for compiling the golang bindings and tests.
>  * bash-completion >= 1.99 for tab completion.
>
>  Optional, only needed to run the test suite:
> diff --git a/golang/configure/go.mod b/golang/configure/go.mod
> index ce3e4f39..fcdb28db 100644
> --- a/golang/configure/go.mod
> +++ b/golang/configure/go.mod
> @@ -1,4 +1,4 @@
>  module libguestfs.org/configure
>
> -// First version of golang with working module support.
> -go 1.13
> +// First version of golang with working module support and unsafe.Slice.
>

"First version of golang with working module support" is not relevant for
1.17,
maybe "For unsafe.Slice"?


> +go 1.17
> diff --git a/golang/configure/test.go b/golang/configure/test.go
> index fe742f2b..a15c9ea3 100644
> --- a/golang/configure/test.go
> +++ b/golang/configure/test.go
> @@ -25,8 +25,19 @@
>  import (
> "fmt"
> "runtime"
> +   "unsafe"
>  )
>
> +func check_slice(arr *uint32, cnt int) []uint32 {
> +   /* We require unsafe.Slice(), introduced in 1.17 */
> +   ret := make([]uint32, cnt)
> +   s := unsafe.Slice(arr, cnt)
> +   for i, item := range s {
> +   ret[i] = uint32(item)
> +   }
> +   return ret
> +}
>

I'm not sure what is the purpose of this test - requiring the Go version is
good
enough since the code will not compile with an older version. EVen if it
would,
it will not compile without unsafe.Slice so no special check is needed.


> +
>  func main() {
> fmt.Println(runtime.Version())
>
> diff --git a/golang/go.mod b/golang/go.mod
> index fc772840..1b72e77d 100644
> --- a/golang/go.mod
> +++ b/golang/go.mod
> @@ -1,4 +1,4 @@
>  module libguestfs.org/libnbd
>
> -// First version of golang with working module support.
> -go 1.13
> +// First version of golang with working module support and unsafe.Slice.
> +go 1.17
> --
> 2.41.0
>
>
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [libnbd PATCH v4 05/25] golang: Change logic of copy_uint32_array

2023-08-08 Thread Nir Soffer
On Thu, Aug 3, 2023 at 4:57 AM Eric Blake  wrote:
>
> Commit 6725fa0e12 changed copy_uint32_array() to utilize a Go hack for
> accessing a C array as a Go slice in order to potentially benefit from
> any optimizations in Go's copy() for bulk transfer of memory over
> naive one-at-a-time iteration.  But that commit also acknowledged that
> no benchmark timings were performed, which would have been useful to
> demonstrat an actual benefit for using hack in the first place.  And

Why do you call this a hack? This is the documented way to create a Go
slice from memory.

> since we are copying data anyways (rather than using the slice to
> avoid a copy), and network transmission costs have a higher impact to
> performance than in-memory copying speed, it's hard to justify keeping
> the hack without hard data.

Since this is not a hack we don't need to justify it :-)

> What's more, while using Go's copy() on an array of C uint32_t makes
> sense for 32-bit extents, our corresponding 64-bit code uses a struct
> which does not map as nicely to Go's copy().

If we return a slice of the C extent type, copy() can work, but it is probably
not what we want to return.

> Using a common style
> between both list copying helpers is beneficial to mainenance.
>
> Additionally, at face value, converting C.size_t to int may truncate;
> we could avoid that risk if we were to uniformly use uint64 instead of
> int.  But we can equally just panic if the count is oversized: our
> state machine guarantees that the server's response fits within 64M
> bytes (count will be smaller than that, since it is multiple bytes per
> extent entry).

Good to check this, but not related to changing the way we copy the array.

> Suggested-by: Laszlo Ersek 
> CC: Nir Soffer 
> Signed-off-by: Eric Blake 
> ---
>
> v4: new patch to the series, but previously posted as part of the
> golang cleanups.  Since then: rework the commit message as it is no
> longer a true revert, and add a panic() if count exceeds expected
> bounds.
> ---
>  generator/GoLang.ml | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/generator/GoLang.ml b/generator/GoLang.ml
> index 73df5254..cc7d78b6 100644
> --- a/generator/GoLang.ml
> +++ b/generator/GoLang.ml
> @@ -516,11 +516,16 @@ let
>  /* Closures. */
>
>  func copy_uint32_array(entries *C.uint32_t, count C.size_t) []uint32 {
> +if (uint64(count) > 64*1024*1024) {
> +panic(\"violation of state machine guarantee\")

This is unwanted in a library, it means the entire application will crash
because of a bug in the library. Can we convert this to an error in the caller?

> +}
>  ret := make([]uint32, int(count))
> -// See 
> https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
> -// TODO: Use unsafe.Slice() when we require Go 1.17.
> -s := (*[1 << 30]uint32)(unsafe.Pointer(entries))[:count:count]

Can we require Go 1.17? (current version is 1.20)

In Go >= 1.17, we can use something like:

s := unsafe.Slice(C.uint32_t, length)

> -copy(ret, s)
> +addr := uintptr(unsafe.Pointer(entries))
> +for i := 0; i < int(count); i++ {
> +ptr := (*C.uint32_t)(unsafe.Pointer(addr))
> +ret[i] = uint32(*ptr)
> +addr += unsafe.Sizeof(*ptr)
> +}

This loop is worse than the ugly line creating a slice.
With a slice we can do:

for i, item := range s {
ret[i] = uint32(item)
}

(I did not try to compile this)

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Libnbd asynchronous API with epoll

2023-07-09 Thread Nir Soffer
On Fri, Jul 7, 2023 at 11:59 AM Tage Johansson 
wrote:

> On 7/6/2023 7:06 PM, Nir Soffer wrote:
>
> - After calling for example aio_notify_read(3), can I know that the next
> reading from the file descriptor would block?
>
> No, you have to call again aio_get_direction() and poll again until the
> event happens.
>
> Well, what I mean is:
>
> After calling aio_notify_read, if aio_get_direction returns
> AIO_DIRECTION_READ or AIO_DIRECTION_BOTH, can I know that the reading on
> the file descriptor actually blocked?
>

Yes - it never blocks.


> Or might there be cases when aio_notify_read returns and the next
> direction includes a read and there is still more data to read on the file
> descriptor?
>

Sure it is expected that the socket is readable but more data will be
available
later...


> I guess this is the case, but I must know or the client may hang
> unexpectedly.
>

Libnbd uses non-blocking socket so it will never hang.
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Libnbd asynchronous API with epoll

2023-07-06 Thread Nir Soffer
On Wed, Jul 5, 2023 at 3:38 PM Tage Johansson 
wrote:

> As part of the Rust bindings for Libnbd, I try to integrate the
> asynchronous (aio_*) functions with Tokio
> , the most used asynchronous runtime
> in Rust. However, in its eventloop, Tokio uses epoll(7) instead of poll(2)
> (which is used internally in Libnbd). The difference is that poll(2) uses
> level-triggered notifications as aposed to epoll(7) which uses
> edge-triggered notifications.
>

According to epoll(7) section "Level-triggered and edge-triggered" says:

   By  contrast,  when  used  as a level-triggered interface (the
default,
   when EPOLLET is not specified), epoll is simply a faster  poll(2),
 and
   can be used wherever the latter is used since it shares the same
seman‐
   tics.

So you should not have any issue using epoll instead of poll.

> - After calling aio_get_direction(3), can I know that reading/writing
> would actually block?
>
No, this is just the events that libnbd wants to get.

- After calling for example aio_notify_read(3), can I know that the next
> reading from the file descriptor would block?
>
No, you have to call again aio_get_direction() and poll again until the
event happens.

Nir
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


[Libguestfs] [PATCH libnbd v2] README: Document additional packages

2023-04-17 Thread Nir Soffer
When building from git we need autoconf, automake and libtool.

Signed-off-by: Nir Soffer 
---

Changes sinve v1:
- Remove `,` between package namses (Laszlo)

 README.md | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/README.md b/README.md
index c7166613..7eed0e31 100644
--- a/README.md
+++ b/README.md
@@ -32,10 +32,17 @@ ## License
 very liberal license.
 
 
 ## Building from source
 
+Building from source requires additional packages. On rpm based system
+use:
+
+```
+dnf install autoconf automake libtool
+```
+
 To build from git:
 
 ```
 autoreconf -i
 ./configure
-- 
2.39.2

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd] README: Document additional packages

2023-04-17 Thread Nir Soffer
On Mon, Apr 17, 2023 at 7:38 PM Laszlo Ersek  wrote:
>
> On 4/17/23 18:36, Nir Soffer wrote:
> > When building from git we need autoconf, automake and libtool.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  README.md | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/README.md b/README.md
> > index c7166613..42a187c0 100644
> > --- a/README.md
> > +++ b/README.md
> > @@ -32,10 +32,17 @@ ## License
> >  very liberal license.
> >
> >
> >  ## Building from source
> >
> > +Building from source requires additional packages. On rpm based system
> > +use:
> > +
> > +```
> > +dnf install autoconf, automake, libtool
> > +```
>
> Are the comma characters intentional?

Copied from the spec, fixing.

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


[Libguestfs] [PATCH libnbd] README: Document additional packages

2023-04-17 Thread Nir Soffer
When building from git we need autoconf, automake and libtool.

Signed-off-by: Nir Soffer 
---
 README.md | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/README.md b/README.md
index c7166613..42a187c0 100644
--- a/README.md
+++ b/README.md
@@ -32,10 +32,17 @@ ## License
 very liberal license.
 
 
 ## Building from source
 
+Building from source requires additional packages. On rpm based system
+use:
+
+```
+dnf install autoconf, automake, libtool
+```
+
 To build from git:
 
 ```
 autoreconf -i
 ./configure
-- 
2.39.2

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length

2023-03-06 Thread Nir Soffer
On Sun, Mar 5, 2023 at 10:42 AM Wouter Verhelst  wrote:
>
> On Fri, Mar 03, 2023 at 04:17:40PM -0600, Eric Blake wrote:
> > On Fri, Dec 16, 2022 at 10:32:01PM +0300, Vladimir Sementsov-Ogievskiy 
> > wrote:
> > > s-o-b line missed.
> >
> > I'm not sure if the NBD project has a strict policy on including one,
> > but I don't mind adding it.
>
> I've never required it, mostly because it's something that I myself
> always forget, too, so, *shrug*.
>
> (if there were a way in git to make it add that automatically, that
> would help; I've looked but haven't found it)

What I'm using in all projects that require signed-off-by is:

$ cat .git/hooks/commit-msg
#!/bin/sh

# Add Signed-off-by trailer.
sob=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p')
git interpret-trailers --in-place --trailer "$sob" "$1"

You can also use a pre-commit hook but the commit-msg hook is more
convenient.

And in github you can add the DCO application to the project:
https://github.com/apps/dco

Once installed it will check that all commits are signed off, and
provide helpful error
messages to contributors.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [PATCH] docs: Prefer 'cookie' over 'handle'

2023-03-04 Thread Nir Soffer
On Sat, Mar 4, 2023 at 12:15 AM Eric Blake  wrote:
>
> In libnbd, we quickly learned that distinguishing between 'handle'
> (verb for acting on an object) and 'handle' (noun describing which
> object to act on) could get confusing; we solved it by renaming the
> latter to 'cookie'.  Copy that approach into the NBD spec, and make it
> obvious that a cookie is opaque data from the point of view of the
> server.

Good change, will make it easier to search code.

But the actual text does not make it clear that a cookie is opaque data from
point of view of the client. Maybe make this more clear?

> Makes no difference to implementations (other than older code
> still using 'handle' may be slightly harder to tie back to the spec).

To avoid confusion with older code that carefully used "handle" to match
the spec, maybe add a note that "cookie" was named "handle" before?

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Checksums and other verification

2023-03-02 Thread Nir Soffer
On Thu, Mar 2, 2023 at 10:46 AM Richard W.M. Jones  wrote:
>
> On Mon, Feb 27, 2023 at 07:09:33PM +0200, Nir Soffer wrote:
> > On Mon, Feb 27, 2023 at 6:41 PM Richard W.M. Jones  
> > wrote:
> > > I think it would be more useful if (or in addition) it could compute
> > > the checksum of a stream which is being converted with 'qemu-img
> > > convert'.  Extra points if it can compute the checksum over either the
> > > input or output stream.
> >
> > I thought about this, it could be a filter that you add in the graph
> > that gives you checksum as a side effect of copying. But this requires
> > disabling unordered writes, which is pretty bad for performance.
> >
> > But even if you compute the checksum during a transfer, you want to
> > verify it by reading the transferred data from storage. Once you computed
> > the checksum you can keep it for verifying the same image in the future.
>
> The use-case I have in mind is being able to verify a download when
> you already know the checksum and are copying / converting the image
> in flight.
>
> eg: You are asked to download https://example.com/distro-cloud.qcow2
> with some published checksum and you will on the fly download and
> convert this to raw, but want to verify the checksum (of the qcow2)
> during the conversion step.  (Or at some point, but during the convert
> avoids having to spool the image locally.)

I'm thinking about the same flow. I think the best way to verify is:

1. The remote server publishes a block-checksum of the image
2. The system gets the block-checksum from the server (from http header?)
3. The system pulls data from the server, pushes to the target disk in
the wanted format
4. The system computes a checksum of the target disk

This way you verify the entire pipeline including the storage. If we
compute a checksum
during the conversion, we verify only that we got the correct data
from the server.

If we care only about verifying the transfer from the server, we can compute the
checksum during the download, which is likely to be sequential (so easy to
integrate with blkhash)

If we want to validate nbdcopy, it will be much harder to compute a checksum
inside nbdcopy because it does not stream the data in order.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Checksums and other verification

2023-02-28 Thread Nir Soffer
On Tue, Feb 28, 2023 at 4:13 PM Laszlo Ersek  wrote:
>
> On 2/28/23 12:39, Richard W.M. Jones wrote:
> > On Tue, Feb 28, 2023 at 12:24:04PM +0100, Laszlo Ersek wrote:
> >> On 2/27/23 17:44, Richard W.M. Jones wrote:
> >>> On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote:
>  Or intentionally choose a hash that can be computed out-of-order,
>  such as a Merkle Tree.  But we'd need a standard setup for all
>  parties to agree on how the hash is to be computed and checked, if
>  it is going to be anything more than just a linear hash of the
>  entire guest-visible contents.
> >>>
> >>> Unfortunately I suspect that by far the easiest way for people who
> >>> host images to compute checksums is to run 'shaXXXsum' on them or
> >>> sign them with a GPG signature, rather than engaging in a novel hash
> >>> function.  Indeed that's what is happening now:
> >>>
> >>> https://alt.fedoraproject.org/en/verify.html
> >>
> >> If the output is produced with unordered writes, but the complete
> >> output needs to be verified with a hash *chain*, that still allows
> >> for some level of asynchrony. The start of the hashing need not be
> >> delayed until after the end of output, only after the start of
> >> output.
> >>
> >> For example, nbdcopy could maintain the highest offset up to which
> >> the output is contiguous, and on a separate thread, it could be
> >> hashing the output up to that offset.
> >>
> >> Considering a gigantic output, as yet unassembled blocks could likely
> >> not be buffered in memory (that's why the writes are unordered in the
> >> first place!), so the hashing thread would have to re-read the output
> >> via NBD. Whether that would cause performance to improve or to
> >> deteriorate is undecided IMO. If the far end of the output network
> >> block device can accommodate a reader that is independent of the
> >> writers, then this level of overlap is beneficial. Otherwise, this
> >> extra reader thread would just add more thrashing, and we'd be better
> >> off with a separate read-through once writing is complete.
> >
> > In my mind I'm wondering if there's any mathematical result that lets
> > you combine each hash(block_i) into the final hash(block[1..N])
> > without needing to compute the hash of each block in order.
>
> I've now checked:
>
> https://en.wikipedia.org/wiki/SHA-2
> https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction
> https://en.wikipedia.org/wiki/One-way_compression_function#Construction_from_block_ciphers
> https://en.wikipedia.org/wiki/One-way_compression_function#Davies%E2%80%93Meyer
>
> Consider the following order of steps:
>
> - precompute hash(block[n]), with some initial IV
> - throw away block[n]
> - wait until block[n-1] is processed, providing the actual IV for
>   hashing block[n]
> - mix the new IV into hash(block[n]) without having access to block[n]
>
> If such a method existed, it would break the security (i.e., the
> original design) of the hash, IMO, as it would separate the IV from
> block[n]. In a way, it would make the "mix" and "concat" operators (of
> the underlying block cipher's chaining method) distributive. I believe
> then you could generate a bunch of *valid* hash(block[n]) values as a
> mere function of the IV, without having access to block[n]. You could
> perhaps use that for probing against other hash(block[m]) values, and
> maybe determine repeating patterns in the plaintext. I'm not a
> cryptographer so I can't exactly show what security property is broken
> by separating the IV from block[n].
>
> > (This is what blkhash solves, but unfortunately the output isn't
> > compatible with standard hashes.)
>
> Assuming blkhash is a Merkle Tree implementation, blkhash solves a
> different problem IMO.

blkhash uses a flat Merkle tree, described here:
https://www.researchgate.net/publication/323243320_Foundations_of_Applied_Cryptography_and_Cybersecurity
seection 3.9.4 2lMT: the Flat Merkle Tree construction

To support parallel hashing it uses more complex construction, but this
can be simplified to a single flat Merkle tree.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Checksums and other verification

2023-02-27 Thread Nir Soffer
On Mon, Feb 27, 2023 at 6:41 PM Richard W.M. Jones  wrote:
>
> On Mon, Feb 27, 2023 at 04:24:33PM +0200, Nir Soffer wrote:
> > On Mon, Feb 27, 2023 at 3:56 PM Richard W.M. Jones  
> > wrote:
> > >
> > >
> > > https://github.com/kubevirt/containerized-data-importer/issues/1520
> > >
> > > Hi Eric,
> > >
> > > We had a question from the Kubevirt team related to the above issue.
> > > The question is roughly if it's possible to calculate the checksum of
> > > an image as an nbdkit filter and/or in the qemu block layer.
> > >
> > > Supplemental #1: could qemu-img convert calculate a checksum as it goes
> > > along?
> > >
> > > Supplemental #2: could we detect various sorts of common errors, such
> > > a webserver that is incorrectly configured and serves up an error page
> > > containing ""; or something which is supposed to be a disk image
> > > but does not "look like" (in some ill-defined sense) a disk image,
> > > eg. it has no partition table.
> > >
> > > I'm not sure if qemu has any existing features covering the above (and
> > > I know for sure that nbdkit doesn't).
> > >
> > > One issue is that calculating a checksum involves a linear scan of the
> > > image, although we can at least skip holes.
> >
> > Kubvirt can use blksum
> > https://fosdem.org/2023/schedule/event/vai_blkhash_fast_disk/
> >
> > But we need to package it for Fedora/CentOS Stream.
> >
> > I also work on "qemu-img checksum", getting more reviews on this can help:
> > Lastest version:
> > https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00971.html
> > Last reveiw are here:
> > https://lists.nongnu.org/archive/html/qemu-block/2022-12/
> >
> > More work is needed on the testing framework changes.
>
> I think it would be more useful if (or in addition) it could compute
> the checksum of a stream which is being converted with 'qemu-img
> convert'.  Extra points if it can compute the checksum over either the
> input or output stream.

I thought about this, it could be a filter that you add in the graph
that gives you checksum as a side effect of copying. But this requires
disabling unordered writes, which is pretty bad for performance.

But even if you compute the checksum during a transfer, you want to
verify it by reading the transferred data from storage. Once you computed
the checksum you can keep it for verifying the same image in the future.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Checksums and other verification

2023-02-27 Thread Nir Soffer
On Mon, Feb 27, 2023 at 3:56 PM Richard W.M. Jones  wrote:
>
>
> https://github.com/kubevirt/containerized-data-importer/issues/1520
>
> Hi Eric,
>
> We had a question from the Kubevirt team related to the above issue.
> The question is roughly if it's possible to calculate the checksum of
> an image as an nbdkit filter and/or in the qemu block layer.
>
> Supplemental #1: could qemu-img convert calculate a checksum as it goes
> along?
>
> Supplemental #2: could we detect various sorts of common errors, such
> a webserver that is incorrectly configured and serves up an error page
> containing ""; or something which is supposed to be a disk image
> but does not "look like" (in some ill-defined sense) a disk image,
> eg. it has no partition table.
>
> I'm not sure if qemu has any existing features covering the above (and
> I know for sure that nbdkit doesn't).
>
> One issue is that calculating a checksum involves a linear scan of the
> image, although we can at least skip holes.

Kubvirt can use blksum
https://fosdem.org/2023/schedule/event/vai_blkhash_fast_disk/

But we need to package it for Fedora/CentOS Stream.

I also work on "qemu-img checksum", getting more reviews on this can help:
Lastest version:
https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00971.html
Last reveiw are here:
https://lists.nongnu.org/archive/html/qemu-block/2022-12/

More work is needed on the testing framework changes.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [PATCH 1/2] python: Avoid crash if callback parameters cannot be built

2023-02-20 Thread Nir Soffer
On Mon, Feb 20, 2023 at 10:45 AM Laszlo Ersek  wrote:
>
> On 2/17/23 17:52, Eric Blake wrote:
> > On Thu, Feb 16, 2023 at 03:09:02PM +0100, Laszlo Ersek wrote:
>
> >> - Py_BuildValue with the "O" format specifier transfers the new list's
> >> *sole* reference (= ownership) to the just-built higher-level object "args"
> >
> > Reference transfer is done with "N", not "O".  That would be an
> > alternative to decreasing the refcount of py_array on success, but not
> > eliminate the need to decrease the refcount on Py_BuildValue failure.
> >
> >>
> >> - when "args" is killed (decref'd), it takes care of "py_array".
> >>
> >> Consequently, if Py_BuildValue fails, "py_array" continues owning the
> >> new list -- and I believe that, if we take the new error branch, we leak
> >> the object pointed-to by "py_array". Is that the case?
> >
> > Not quite.  "O" is different than "N".
>
> I agree with you *now*, looking up the "O" specification at
> .
>
> However, when I was writing my email, I looked up Py_BuildValue at that
> time as well, just elsewhere. I don't know where. Really. And then that
> documentation said that the reference count would *not* be increased. I
> distinctly remember that, because it surprised me -- I actually recalled
> an *even earlier* experience reading the documentation, which had again
> stated that "O" would increase the reference count.

Maybe here:
https://docs.python.org/2/c-api/arg.html#building-values

Looks like another incompatibility between python 2 and 3.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [libnbd PATCH v2 3/3] nbdsh: Improve --help and initial banner contents.

2023-01-31 Thread Nir Soffer
On Tue, Jan 31, 2023 at 12:34 AM Richard W.M. Jones  wrote:
>
> On Fri, Nov 04, 2022 at 04:18:31PM -0500, Eric Blake wrote:
> > Document all options in --help output.  If -n is not in use, then
> > enhance the banner to print the current state of h, and further tailor
> > the advice given on useful next steps to take to mention opt_go when
> > using --opt-mode.
>
> I had to partially revert this patch (reverting most of it) because it
> unfortunately breaks the implicit handle creation :-(
>
> https://gitlab.com/nbdkit/libnbd/-/commit/5a02c7d2cc6a201f9e5531c0c20c2f3c22b805a2
>
> I'm not actually sure how to do this correctly in Python.  I made
> several attempts, but I don't think Python is very good about having a
> variable which is only defined on some paths -- maybe it's not
> possible at all.

Can you share the error when it breaks?

I'm not sure what is the issue, but usually if you have a global
variable created only in
some flows, adding:

thing = None

At the start of the module makes sure that the name exists later,
regardless of the flow
taken. Code can take the right action based on:

if thing is None:
...

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v2v v2] -o rhv-upload: Improve error message for invalid or missing -os parameter

2023-01-28 Thread Nir Soffer
On Fri, Jan 27, 2023 at 2:26 PM Richard W.M. Jones  wrote:
>
> For -o rhv-upload, the -os parameter specifies the storage domain.
> Because the RHV API allows globs when searching for a domain, if you
> used a parameter like -os 'data*' then this would confuse the Python
> code, since it can glob to the name of a storage domain, but then
> later fail when we try to exact match the storage domain we found.
> The result of this was a confusing error in the precheck script:
>
>   IndexError: list index out of range
>
> This fix validates the output storage parameter before trying to use
> it.  Since valid storage domain names cannot contain glob characters
> or spaces, it avoids the problems above and improves the error message
> that the user sees:
>
>   $ virt-v2v [...] -o rhv-upload -os ''
>   ...
>   RuntimeError: The storage domain (-os) parameter ‘’ is not valid
>   virt-v2v: error: failed server prechecks, see earlier errors
>
>   $ virt-v2v [...] -o rhv-upload -os 'data*'
>   ...
>   RuntimeError: The storage domain (-os) parameter ‘data*’ is not valid
>   virt-v2v: error: failed server prechecks, see earlier errors
>

Makes sense, the new errors are very helpful.

> Although the IndexError should no longer happen, I also added a
> try...catch around that code to improve the error in case it still
> happens.

Theoretically it can happen if the admin changes the storage domain
name or detaches the domain from the data center in the window
after the precheck completes and before the transfer starts.

>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1986386
> Reported-by: Junqin Zhou
> Thanks: Nir Soffer
> ---
>  output/rhv-upload-precheck.py | 27 +--
>  1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/output/rhv-upload-precheck.py b/output/rhv-upload-precheck.py
> index 1dc1b8498a..ba125611ba 100644
> --- a/output/rhv-upload-precheck.py
> +++ b/output/rhv-upload-precheck.py
> @@ -18,6 +18,7 @@
>
>  import json
>  import logging
> +import re
>  import sys
>
>  from urllib.parse import urlparse
> @@ -46,6 +47,15 @@ output_password = output_password.rstrip()
>  parsed = urlparse(params['output_conn'])
>  username = parsed.username or "admin@internal"
>
> +# Check the storage domain name is valid
> +# (https://bugzilla.redhat.com/show_bug.cgi?id=1986386#c1)
> +# Also this means it cannot contain spaces or glob symbols, so
> +# the search below is valid.
> +output_storage = params['output_storage']
> +if not re.match('^[-a-zA-Z0-9_]+$', output_storage):

The comment in the bug does not point to the docs or code enforcing
the domain name restrictions, but I validated with ovirt 4.5. Trying to
create a domain name with a space or using Hebrew characters is blocked
in the UI, displaying an error. See attached screenshots.

I think it is highly unlikely that this limit will change in the
future since nobody
is working on oVirt now, but if it does change this may prevent uploading to an
existing storage domain.

> +raise RuntimeError("The storage domain (-os) parameter ‘%s’ is not 
> valid" %
> +   output_storage)
> +
>  # Connect to the server.
>  connection = sdk.Connection(
>  url=params['output_conn'],
> @@ -60,28 +70,33 @@ system_service = connection.system_service()
>
>  # Check whether there is a datacenter for the specified storage.
>  data_centers = system_service.data_centers_service().list(
> -search='storage.name=%s' % params['output_storage'],
> +search='storage.name=%s' % output_storage,
>  case_sensitive=True,
>  )
>  if len(data_centers) == 0:
>  storage_domains = system_service.storage_domains_service().list(
> -search='name=%s' % params['output_storage'],
> +search='name=%s' % output_storage,
>  case_sensitive=True,
>  )
>  if len(storage_domains) == 0:
>  # The storage domain does not even exist.
>  raise RuntimeError("The storage domain ‘%s’ does not exist" %
> -   (params['output_storage']))
> +   output_storage)
>
>  # The storage domain is not attached to a datacenter
>  # (shouldn't happen, would fail on disk creation).
>  raise RuntimeError("The storage domain ‘%s’ is not attached to a DC" %
> -   (params['output_storage']))
> +   output_storage)
>  datacenter = data_centers[0]
>
>  # Get the storage domain.
>  storage_domains = connection.follow_link(datacenter.storage_domains)
> -storage_domain = [sd for sd in storage_domains if sd.name == 
> params['output_storage']][0]
> +try:
> +storage_domain = [sd for sd in s

Re: [Libguestfs] [PATCH v2v] -o rhv-upload: Give a nicer error if the storage domain does not exist

2023-01-27 Thread Nir Soffer
On Fri, Jan 27, 2023 at 1:18 PM Nir Soffer  wrote:
>
> On Thu, Jan 26, 2023 at 2:31 PM Richard W.M. Jones  wrote:
> >
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1986386
> > Reported-by: Junqin Zhou
> > ---
> >  output/rhv-upload-precheck.py | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/output/rhv-upload-precheck.py b/output/rhv-upload-precheck.py
> > index 1dc1b8498a..35ea021032 100644
> > --- a/output/rhv-upload-precheck.py
> > +++ b/output/rhv-upload-precheck.py
> > @@ -81,7 +81,12 @@ datacenter = data_centers[0]
> >
> >  # Get the storage domain.
> >  storage_domains = connection.follow_link(datacenter.storage_domains)
> > -storage_domain = [sd for sd in storage_domains if sd.name == 
> > params['output_storage']][0]
> > +try:
> > +storage_domain = [sd for sd in storage_domains \
> > +  if sd.name == params['output_storage']][0]
>
> Using `\` may work but it is needed. You can do this:
>
> storage_domain = [sd for sd in storage_domains
>if sd.name == params['output_storage']][0]
>
> This is also the common way to indent list comprehension that
> makes the expression more clear.
>
> > +except IndexError:
> > +raise RuntimeError("The storage domain ‘%s’ does not exist" %
> > +   params['output_storage'])
>
> The fix is safe and makes sense.
>
> Not sure why we list all storage domains when we already know the name,
> maybe Albert would like to clean up this mess later.

Like this:
https://github.com/oVirt/python-ovirt-engine-sdk4/blob/2aa50266056b7ee0b72597f346cbf0f006041566/examples/list_storage_domains.py#L93

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [PATCH v2v] -o rhv-upload: Give a nicer error if the storage domain does not exist

2023-01-27 Thread Nir Soffer
On Thu, Jan 26, 2023 at 2:31 PM Richard W.M. Jones  wrote:
>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1986386
> Reported-by: Junqin Zhou
> ---
>  output/rhv-upload-precheck.py | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/output/rhv-upload-precheck.py b/output/rhv-upload-precheck.py
> index 1dc1b8498a..35ea021032 100644
> --- a/output/rhv-upload-precheck.py
> +++ b/output/rhv-upload-precheck.py
> @@ -81,7 +81,12 @@ datacenter = data_centers[0]
>
>  # Get the storage domain.
>  storage_domains = connection.follow_link(datacenter.storage_domains)
> -storage_domain = [sd for sd in storage_domains if sd.name == 
> params['output_storage']][0]
> +try:
> +storage_domain = [sd for sd in storage_domains \
> +  if sd.name == params['output_storage']][0]

Using `\` may work but it is needed. You can do this:

storage_domain = [sd for sd in storage_domains
   if sd.name == params['output_storage']][0]

This is also the common way to indent list comprehension that
makes the expression more clear.

> +except IndexError:
> +raise RuntimeError("The storage domain ‘%s’ does not exist" %
> +   params['output_storage'])

The fix is safe and makes sense.

Not sure why we list all storage domains when we already know the name,
maybe Albert would like to clean up this mess later.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [PATCH v2v] -o rhv-upload: Give a nicer error if the storage domain

2023-01-27 Thread Nir Soffer
On Thu, Jan 26, 2023 at 2:31 PM Richard W.M. Jones  wrote:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1986386
>
> My RHV instance is dead at the moment so I didn't do much more than
> check this compiles and passes the one test we have.  Also I want to
> spend as little time as possible on RHV outputs for virt-v2v since the
> RHV product will be discontinued soon.
>
> I did want to point out some things:
>
>  - The preceeding code is probably wrong.
>
> https://github.com/libguestfs/virt-v2v/blob/master/output/rhv-upload-transfer.py#L70
>
>It attempts to search for the output storage using:
>
> storage_domains = system_service.storage_domains_service().list(
> search='name=%s' % params['output_storage'],
> case_sensitive=True,
> )

I think the search is correct. This is explained in
https://bugzilla.redhat.com/1986386#c1

>I couldn't find any documentation about what can go into that
>search string, but it's clearly a lot more complicated than just
>pasting in the literal name after "name=".  At the very least,
>spaces are not permitted, see:
>
> https://github.com/libguestfs/virt-v2v/blob/master/output/rhv-upload-transfer.py#L70

True, search can be an expression.

>  - The bug reporter used "data*" as the name and I suspect that is
>parsed in some way (wildcard? regexp? I've no idea).

It is treated as glob pattern, also explained in comment 1.

>  - Probably for the same reason, the preceeding code ought to fail
>with an error if the output storage domain doesn't exist.  The fact
>we reach the code patched here at all also indicates some bug,
>maybe in the search string.
>
> As I say above, I don't especially care about any of this.

I'm not working on RHV since August 2022. Adding Albert who is current
RHV storage maintainer.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [libnbd PATCH v2 3/3] nbdsh: Improve --help and initial banner contents.

2022-11-04 Thread Nir Soffer
On Fri, Nov 4, 2022 at 11:18 PM Eric Blake  wrote:
[...]

> @@ -127,7 +129,10 @@ def __call__(self, parser, namespace, values,
> option_string=None):
>  os.environ["LIBNBD_DEBUG"] = "1"
>
>  # Create the handle.
> -if not args.n:
> +if args.n:
> +pass
> +else:
>

Why add useless branch?


> +global h
>  h = nbd.NBD()
>  h.set_handle_name("nbdsh")
>
> @@ -165,21 +170,35 @@ def line(x): lines.append(x)
>  def blank(): line("")
>  def example(ex, desc): line("%-34s # %s" % (ex, desc))
>
> +connect_hint = False
> +go_hint = False
>  blank()
>  line("Welcome to nbdsh, the shell for interacting with")
>  line("Network Block Device (NBD) servers.")
>  blank()
> -if not args.n:
> -line("The ‘nbd’ module has already been imported and there")
> -line("is an open NBD handle called ‘h’.")
> -blank()
> -else:
> +if args.n:
>  line("The ‘nbd’ module has already been imported.")
>  blank()
>  example("h = nbd.NBD()", "Create a new handle.")
> -if False:  # args.uri is None:
>

Good that this was removed, but it will be better to remove in the previous
patch.

Nir
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [libnbd PATCH v2 2/3] nbdsh: Allow -u interleaved with -c

2022-11-04 Thread Nir Soffer
On Fri, Nov 4, 2022 at 11:18 PM Eric Blake  wrote
[...]

Sorry but I did not read, but I noticed this:


> @@ -165,7 +177,7 @@ def example(ex, desc): line("%-34s # %s" % (ex, desc))
>  line("The ‘nbd’ module has already been imported.")
>  blank()
>  example("h = nbd.NBD()", "Create a new handle.")
> -if args.uri is None:
> +if False:  # args.uri is None:
>

If False will never run, so why not remove the entire branch?

Is this leftover from debugging?

Nir
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] libnbd golang failure on RISC-V

2022-06-09 Thread Nir Soffer
On Thu, Jun 9, 2022 at 6:24 PM Richard W.M. Jones  wrote:
>
> On Thu, Jun 09, 2022 at 03:20:02PM +0100, Daniel P. Berrangé wrote:
> > > + go test -count=1 -v
> > > === RUN   Test010Load
> > > --- PASS: Test010Load (0.00s)
> > > === RUN   TestAioBuffer
> > > --- PASS: TestAioBuffer (0.00s)
> > > === RUN   TestAioBufferFree
> > > --- PASS: TestAioBufferFree (0.00s)
> > > === RUN   TestAioBufferBytesAfterFree
> > > SIGABRT: abort
> > > PC=0x3fdf6f9bac m=0 sigcode=18446744073709551610
> >
> > So suggesting TestAioBufferBytesAfterFree is as fault, but quite
> > odd as that test case is trivial and whle it allocates some
> > native memory it doesn't seem to write to it. Unless the problem
> > happened in an earlier test case and we have delayed detection ?
> >
> > I guess I'd try throwing darts at the wall by chopping out bits
> > of test code to see what makes it disappear.
> >
> > Perhaps also try swapping MakeAioBuffer with MakeAioBufferZero
> > in case pre-existing data into the C.malloc()d block is confusing
> > Go ?
>
> Interestingly if I remove libnbd_020_aio_buffer_test.go completely,
> and disable GODEBUG, then the tests pass.  (Reproducer commands at end
> of email).  So I guess at least one of the problems is confined to
> this test and/or functions it calls in the main library.
> Unfortunately this test is huge.
>
> At your suggestion, replacing every MakeAioBuffer with
> MakeAioBufferZero in that test, but it didn't help.  Also tried
> replacing malloc -> calloc in the aio_buffer.go implementation which
> didn't help.
>
> I'll try some more random things ...
>
> Rich.
>
>
> $ emacs -nw run.in# comment out GODEBUG line
> $ emacs -nw golang/Makefile.am   # remove libnbd_020_aio_buffer_test.go line
> $ mv golang/libnbd_020_aio_buffer_test.go 
> golang/libnbd_020_aio_buffer_test.nogo
> $ make run
> $ make
> $ make -C golang check
> ...
> PASS: run-tests.sh

So when skipping libnbd_020_aio_buffer_test.go we don't get the warning about
Go pointer in C memory?

If true, can you find the test triggering this issue?

You can run only some tests using -run={regex}, for example:

$ go test -v -run=TestAioBuffer.+AfterFree
=== RUN   TestAioBufferBytesAfterFree
--- PASS: TestAioBufferBytesAfterFree (0.00s)
=== RUN   TestAioBufferSliceAfterFree
--- PASS: TestAioBufferSliceAfterFree (0.00s)
=== RUN   TestAioBufferGetAfterFree
--- PASS: TestAioBufferGetAfterFree (0.00s)
PASS

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


[Libguestfs] [PATCH libnbd] golang: aio_buffer.go: Explicit panic() on invalid usage

2022-06-09 Thread Nir Soffer
Previously we depended on the behavior on common platforms to panic when
trying to use a nil pointer, but Richard reported that it segfault on
RISC-V. Avoid the undocumented assumptions and panic explicitly with a
useful panic message.

Signed-off-by: Nir Soffer 
---
 golang/aio_buffer.go | 9 +
 1 file changed, 9 insertions(+)

diff --git a/golang/aio_buffer.go b/golang/aio_buffer.go
index 52ea54de..325dbc98 100644
--- a/golang/aio_buffer.go
+++ b/golang/aio_buffer.go
@@ -65,28 +65,37 @@ func (b *AioBuffer) Free() {
if b.P != nil {
C.free(b.P)
b.P = nil
}
 }
 
 // Bytes copies the underlying C array to Go allocated memory and return a
 // slice. Modifying the returned slice does not modify the underlying buffer
 // backing array.
 func (b *AioBuffer) Bytes() []byte {
+   if b.P == nil {
+   panic("Using AioBuffer after Free()")
+   }
return C.GoBytes(b.P, C.int(b.Size))
 }
 
 // Slice creates a slice backed by the underlying C array. The slice can be
 // used to access or modify the contents of the underlying array. The slice
 // must not be used after caling Free().
 func (b *AioBuffer) Slice() []byte {
+   if b.P == nil {
+   panic("Using AioBuffer after Free()")
+   }
// See 
https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
// TODO: Use unsafe.Slice() when we require Go 1.17.
return (*[1<<30]byte)(b.P)[:b.Size:b.Size]
 }
 
 // Get returns a pointer to a byte in the underlying C array. The pointer can
 // be used to modify the underlying array. The pointer must not be used after
 // calling Free().
 func (b *AioBuffer) Get(i uint) *byte {
+   if b.P == nil {
+   panic("Using AioBuffer after Free()")
+   }
return (*byte)(unsafe.Pointer(uintptr(b.P) + uintptr(i)))
 }
-- 
2.36.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd golang failure on RISC-V

2022-06-09 Thread Nir Soffer
On Thu, Jun 9, 2022 at 6:48 PM Richard W.M. Jones  wrote:
>
> On Thu, Jun 09, 2022 at 04:24:12PM +0100, Richard W.M. Jones wrote:
> > On Thu, Jun 09, 2022 at 03:20:02PM +0100, Daniel P. Berrangé wrote:
> > > > + go test -count=1 -v
> > > > === RUN   Test010Load
> > > > --- PASS: Test010Load (0.00s)
> > > > === RUN   TestAioBuffer
> > > > --- PASS: TestAioBuffer (0.00s)
> > > > === RUN   TestAioBufferFree
> > > > --- PASS: TestAioBufferFree (0.00s)
> > > > === RUN   TestAioBufferBytesAfterFree
> > > > SIGABRT: abort
> > > > PC=0x3fdf6f9bac m=0 sigcode=18446744073709551610
> > >
> > > So suggesting TestAioBufferBytesAfterFree is as fault, but quite
> > > odd as that test case is trivial and whle it allocates some
> > > native memory it doesn't seem to write to it. Unless the problem
> > > happened in an earlier test case and we have delayed detection ?
> > >
> > > I guess I'd try throwing darts at the wall by chopping out bits
> > > of test code to see what makes it disappear.
> > >
> > > Perhaps also try swapping MakeAioBuffer with MakeAioBufferZero
> > > in case pre-existing data into the C.malloc()d block is confusing
> > > Go ?
> >
> > Interestingly if I remove libnbd_020_aio_buffer_test.go completely,
> > and disable GODEBUG, then the tests pass.  (Reproducer commands at end
> > of email).  So I guess at least one of the problems is confined to
> > this test and/or functions it calls in the main library.
> > Unfortunately this test is huge.
> >
> > At your suggestion, replacing every MakeAioBuffer with
> > MakeAioBufferZero in that test, but it didn't help.  Also tried
> > replacing malloc -> calloc in the aio_buffer.go implementation which
> > didn't help.
> >
> > I'll try some more random things ...
>
> Adding a few printf's shows something interesting:
>
> === RUN   TestAioBufferBytesAfterFree
> calling Free on 0x3fbc1882b0
> calling C.GoBytes on 0x3fbc1882b0
> SIGABRT: abort
> PC=0x3fe6aaebac m=0 sigcode=18446744073709551610
>
> goroutine 21 [running]:
> gsignal
> :0
> abort
> :0
> runtime.throwException
> ../../../libgo/runtime/go-unwind.c:128
> runtime.unwindStack
> ../../../libgo/go/runtime/panic.go:535
> panic
> ../../../libgo/go/runtime/panic.go:750
> runtime.panicmem
> ../../../libgo/go/runtime/panic.go:210
> runtime.sigpanic
> ../../../libgo/go/runtime/signal_unix.go:634
> _wordcopy_fwd_aligned
> :0
> __GI_memmove
> :0
> runtime.stringtoslicebyte
> ../../../libgo/go/runtime/string.go:155
> __go_string_to_byte_array
> ../../../libgo/go/runtime/string.go:509
> _cgo_23192bdcbd72_Cfunc_GoBytes
> ./cgo-c-prolog-gccgo:46
>
> This is a simple use after free because the Free function in
> aio_buffer.go frees the array and then the Bytes function attempts to
> copy b.Size bytes from the NULL pointer.
>
> I didn't write this test so I'm not quite sure what it's trying to
> achieve.

The test verifies that using the buffer in the wrong way fails in a clean
way (panic) and not silent double free like it was before
https://gitlab.com/nbdkit/libnbd/-/commit/3394f47556cac009fa7d39c9e2f7e5f2468bd65d

> It seems to be deliberately trying to cause a panic, but
> causes a segfault instead?  (And why only on RISC-V?)
>
>   func TestAioBufferBytesAfterFree(t *testing.T) {
> buf := MakeAioBuffer(uint(32))
> buf.Free()
>
> defer func() {
> if r := recover(); r == nil {
> t.Fatal("Did not recover from panic calling Bytes() 
> after Free()")
> }
> }()
>
> buf.Bytes()
>   }
>
> Since this only happens on RISC-V I guess it might be something to do
> with the golang implementation on this architecture being unable to
> turn segfaults into panics.
>
> Removing all three *AfterFree tests fixes the tests.

But this hides the real issue - if users use Bytes() in the wrong way, we want
the panic, not the segfault - the tests are good!

> It seems a bit of an odd function however.  Wouldn't it be better to
> changes the Bytes function so that it tests if the pointer is NULL and
> panics?

I cannot find now any docs for GoBytes, maybe I tested that it panics
in this case,
but this does not work this arch (bug?). Panic with a good error message about
the wrong usage will be much better.

>
> NB: this _does not_ address the other problem where GODEBUG=cgocheck=2
> complains about "fatal error: Go pointer stored into non-Go memory".

Do we keep go pointers in buffers allocated in C?

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [libnbd PATCH 3/2] python: Slice pread_structured buffer from original

2022-06-01 Thread Nir Soffer
On Wed, Jun 1, 2022 at 3:24 AM Eric Blake  wrote:
>
> On Wed, Jun 01, 2022 at 02:20:48AM +0300, Nir Soffer wrote:
> > On Tue, May 31, 2022 at 6:52 PM Eric Blake  wrote:
> > >
> > > On Tue, May 31, 2022 at 10:49:03AM -0500, Eric Blake wrote:
> > > > This patch fixes the corner-case regression introduced by the previous
> > > > patch where the pread_structured callback buffer lifetime ends as soon
> > > > as the callback (that is, where accessing a stashed callback parameter
> > > > can result in ValueError instead of modifying a harmless copy).  With
> > > > careful effort, we can create a memoryview of the Python object that
> > > > we read into, then slice that memoryview as part of the callback; now
> > > > the slice will be valid as long as the original object is also valid.
> > > >
> > >
> > > > | @@ -76,8 +77,24 @@ chunk_wrapper (void *user_data, const vo
> > > > |PyObject *py_subbuf = NULL;
> > > > |PyObject *py_error = NULL;
> > > > |
> > > > | -  /* Cast away const */
> > > > | -  py_subbuf = PyMemoryView_FromMemory ((char *) subbuf, count, 
> > > > PyBUF_READ);
> > > > | +  if (data->buf) {
> >
> > In which case we don't have data->buf?
>
> Right now, in nbd_internal_py_aio_read_structured.  Fixing that will
> eventually become patch 4/2 for this series (the idea is that instead
> of requiring the user to pass in an nbd.Buffer object, we should take
> any buffer-like object, populate data->buf with zero-copy semantics,
> and we're good to go.

Avoiding the nbd.Buffer class and using the buffer protocol sounds like
the way to go.

> But to avoid breaking back-compat, we either
> have to also special-case existing code using nbd.Buffer, or enhance
> the nbd.Buffer class to implement the buffer-like interface).

Backward compatibility is very important, but I'm not sure if we
have enough users of the python bindings to care about it now.

>
> >
> > > > | +PyObject *start, *end, *slice;
> > > > | +assert (PyMemoryView_Check (data->buf));
> >
> > Why do we need data->buf to be a memoryview?
>
> Maybe it doesn't.  It looks like (at least from python, rather than in
> the C coding side of things) that you can apply the slice operation to
> bytes and bytearray.  But there may be other buffer-like objects that
> don't directly support slice while memoryview always does; and one of
> the reasons memoryview is part of the standard python library is to
> make it easy to add operations on top of any buffer-like object.
> memoryview also takes care of doing a temporary copy to consolidate
> out a contiguous view if the original buffer is not contiguous; you
> don't need that with bytes or bytearray, but definitely need it with
> array.array and friends.

In the python side, the reason you need memoryview is to avoid the copy
when creating a slice. I guess the C API works in the same way but I did
not check.


> > We can create a new memoryview form data->buf - it only needs to be an 
> > object
> > supporting the buffer protocol. Basically we need the C version of:
> >
> > memoryview(buf)[start:stop]
> >
> > buf can be bytes, bytearray, mmap, or another memoryview.
>
> Yes - and that's what this C code is.  memoryview(buf) was created
> when populating data->buf (whether the original was bytes, bytearray,
> mmap, or other buffer-like object), then this code follows up with
> creating the slice [start:stop] then pulling it altogether with
> view.__getitem__(slice).

Sounds good!

I think Richard's suggestion to extract a helper for slicing a memoryview
will make this code much easier to read and review.

>
> >
> > > > | +ptrdiff_t offs = subbuf - PyMemoryView_GET_BUFFER 
> > > > (data->buf)->buf;
> >
> > Because PyMemoryView_Check is inside the assert, build with NDEBUG will
> > remove the check, and this call may crash if data->buf is not a memoryview.
>
> It's more a proof that the earlier code in
> nbd_internal_py_pread_structured correctly set data->buf.  I'm not
> worried about an NDEBUG build failing; this is one case where an
> assert() really is more for documentation.

This looks like a coding error as is. We can add a comment or change the code
to look more correct. If we really  don't have a memory view, the
slice will copy
the data (unless C level slices are unsafe, unlikely), and we want to get a view
without coping. So it would be better to fail loudly.

> > It would be nicer if we could get the offset without looking into the 
> > internal
> > buffer o

Re: [Libguestfs] [libnbd PATCH 3/2] python: Slice pread_structured buffer from original

2022-05-31 Thread Nir Soffer
On Tue, May 31, 2022 at 6:52 PM Eric Blake  wrote:
>
> On Tue, May 31, 2022 at 10:49:03AM -0500, Eric Blake wrote:
> > This patch fixes the corner-case regression introduced by the previous
> > patch where the pread_structured callback buffer lifetime ends as soon
> > as the callback (that is, where accessing a stashed callback parameter
> > can result in ValueError instead of modifying a harmless copy).  With
> > careful effort, we can create a memoryview of the Python object that
> > we read into, then slice that memoryview as part of the callback; now
> > the slice will be valid as long as the original object is also valid.
> >
>
> > | @@ -76,8 +77,24 @@ chunk_wrapper (void *user_data, const vo
> > |PyObject *py_subbuf = NULL;
> > |PyObject *py_error = NULL;
> > |
> > | -  /* Cast away const */
> > | -  py_subbuf = PyMemoryView_FromMemory ((char *) subbuf, count, 
> > PyBUF_READ);
> > | +  if (data->buf) {

In which case we don't have data->buf?

> > | +PyObject *start, *end, *slice;
> > | +assert (PyMemoryView_Check (data->buf));

Why do we need data->buf to be a memoryview?

We can create a new memoryview form data->buf - it only needs to be an object
supporting the buffer protocol. Basically we need the C version of:

memoryview(buf)[start:stop]

buf can be bytes, bytearray, mmap, or another memoryview.

> > | +ptrdiff_t offs = subbuf - PyMemoryView_GET_BUFFER (data->buf)->buf;

Because PyMemoryView_Check is inside the assert, build with NDEBUG will
remove the check, and this call may crash if data->buf is not a memoryview.

It would be nicer if we could get the offset without looking into the internal
buffer of the memoryview.

> > | +start = PyLong_FromLong (offs);
> > | +if (!start) { PyErr_PrintEx (0); goto out; }
> > | +end = PyLong_FromLong (offs + count);
> > | +if (!end) { Py_DECREF (start); PyErr_PrintEx (0); goto out; }
> > | +slice = PySlice_New (start, end, NULL);
> > | +Py_DECREF (start);
> > | +Py_DECREF (end);
> > | +if (!slice) { PyErr_PrintEx (0); goto out; }
> > | +py_subbuf = PyObject_GetItem (data->buf, slice);
>
> Missing a Py_DECREF (slice) here.  Easy enough to add...
>
> > +++ b/generator/Python.ml
> > @@ -187,8 +187,24 @@ let
> > pr "PyList_SET_ITEM (py_%s, i_%s, py_e_%s);\n" n n n;
> > pr "  }\n"
> >  | CBBytesIn (n, len) ->
> > -   pr "  /* Cast away const */\n";
> > -   pr "  py_%s = PyMemoryView_FromMemory ((char *) %s, %s, 
> > PyBUF_READ);\n" n n len;
> > +   pr "  if (data->buf) {\n";
> > +   pr "PyObject *start, *end, *slice;\n";
> > +   pr "assert (PyMemoryView_Check (data->buf));\n";
> > +   pr "ptrdiff_t offs = %s - PyMemoryView_GET_BUFFER 
> > (data->buf)->buf;\n" n;
> > +   pr "start = PyLong_FromLong (offs);\n";
> > +   pr "if (!start) { PyErr_PrintEx (0); goto out; }\n";
> > +   pr "end = PyLong_FromLong (offs + %s);\n" len;
> > +   pr "if (!end) { Py_DECREF (start); PyErr_PrintEx (0); goto out; 
> > }\n";
> > +   pr "slice = PySlice_New (start, end, NULL);\n";
> > +   pr "Py_DECREF (start);\n";
> > +   pr "Py_DECREF (end);\n";
> > +   pr "if (!slice) { PyErr_PrintEx (0); goto out; }\n";
> > +   pr "py_%s = PyObject_GetItem (data->buf, slice);\n" n;
>
> ...here
>
> And I really wish python didn't make it so hard to grab a slice of
> another object using C code.  Having to create 3 temporary PyObjects
> instead of having a utility C function that takes normal integers was
> annoying.

Does this work?

PySlice_New(NULL, NULL, NULL);
PySlice_AdjustIndices(length, start, stop, step);

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [libnbd PATCH 2/2] python: Optimize away copy during pread_structured

2022-05-31 Thread Nir Soffer
On Tue, May 31, 2022 at 5:15 PM Eric Blake  wrote:
>
> The Py_BuildValue "y#" format copies data; this is because Python
> wants to allow our C memory to go out of scope without worrying about
> whether the user's python callback has stashed off a longer-living
> reference to its incoming parameter.  But it is inefficient; we can do
> better by utilizing Python's memoryview for a zero-copy exposure to
> the callback's C buffer, as well as a .release method that we can
> utilize just before our C memory goes out of scope.  Now, if the user
> stashes away a reference, they will get a clean Python error if they
> try to access the memory after the fact. This IS an API change (code
> previously expecting a stashed copy to be long-lived will break), but
> we never promised Python API stability, and anyone writing a callback
> that saves off data was already risky (neither libnbd nor nbdkit's
> testsuite had such a case).  For a demonstration of the new error,
> where the old code succeeded:
>
> $ ./run nbdsh
> nbd> h.connect_command(["nbdkit", "-s", "memory", "10"])
> nbd> copy=None
> nbd> def f(b,o,s,e):
> ...   global copy
> ...   copy = b
> ...   print(b[0])
> ...
> nbd> print(copy)
> None
> nbd> h.pread_structured(10,0,f)
> 0
> bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
> nbd> copy
> 
> nbd> copy[0]
> Traceback (most recent call last):
>   File "/usr/lib64/python3.10/code.py", line 90, in runcode
> exec(code, self.locals)
>   File "", line 1, in 
> ValueError: operation forbidden on released memoryview object
>
> To demonstrate the speedup, I tested:
>
> $ export script='
> def f(b,o,s,e):
>  pass
> m=1024*1024
> size=h.get_size()
> for i in range(size // m):
>  buf = h.pread_structured(m, m*i, f)
> '
> $ time ./run nbdkit -U - memory 10G --run 'nbdsh -u "$uri" -c "$script"'
>
> On my machine, this took 9.1s pre-patch, and 3.0s post-patch, for an
> approximate 65% speedup.

Looks like ~300% speedup to me. I guess 3 times faster is the most clear way
to describe the change.

> The corresponding diff to the generated code is:
>
> | --- python/methods.c.bak2022-05-31 07:57:25.256293091 -0500
> | +++ python/methods.c2022-05-31 08:14:09.570567858 -0500
> | @@ -73,8 +73,12 @@ chunk_wrapper (void *user_data, const vo
> |
> |PyGILState_STATE py_save = PyGILState_UNLOCKED;
> |PyObject *py_args, *py_ret;
> | +  PyObject *py_subbuf = NULL;
> |PyObject *py_error = NULL;
> |
> | +  /* Cast away const */

I think it will be more clear and helpful as:

/* Casting subbuf to char* is safe since we use PyBUF_READ. */

> | +  py_subbuf = PyMemoryView_FromMemory ((char *) subbuf, count, PyBUF_READ);

Maybe py_view?

And also change the doscring to mention "view" instead of "subbuf".

> | +  if (!py_subbuf) { PyErr_PrintEx (0); goto out; }
> |PyObject *py_error_modname = PyUnicode_FromString ("ctypes");
> |if (!py_error_modname) { PyErr_PrintEx (0); goto out; }
> |PyObject *py_error_mod = PyImport_Import (py_error_modname);
> | @@ -84,7 +88,7 @@
> |Py_DECREF (py_error_mod);
> |if (!py_error) { PyErr_PrintEx (0); goto out; }
> |
> | -  py_args = Py_BuildValue ("(" "y#" "K" "I" "O" ")", subbuf, (int) count, 
> offset, status, py_error);
> | +  py_args = Py_BuildValue ("(" "O" "K" "I" "O" ")", py_subbuf, offset, 
> status, py_error);
> |if (!py_args) { PyErr_PrintEx (0); goto out; }
> |
> |py_save = PyGILState_Ensure ();
> | @@ -111,6 +115,11 @@ chunk_wrapper (void *user_data, const vo
> |};
> |
> |   out:
> | +  if (py_subbuf) {
> | +PyObject *tmp = PyObject_CallMethod(py_subbuf, "release", NULL);

What it the user does:

def f(b,o,s,e):
use b...
b.release()

Or:

def f(b,o,s,e):
with b:
use b...

We would release a released memoryview here.

I don't think this is likely to happen though.

> | +Py_XDECREF (tmp);
> | +Py_DECREF (py_subbuf);
> | +  }
> |if (py_error) {
> |  PyObject *py_error_ret = PyObject_GetAttrString (py_error, "value");
> |  *error = PyLong_AsLong (py_error_ret);
> ---
>  generator/Python.ml | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/generator/Python.ml b/generator/Python.ml
> index 1c4446e..0191f79 100644
> --- a/generator/Python.ml
> +++ b/generator/Python.ml
> @@ -169,8 +169,7 @@ let
>pr "  PyObject *py_args, *py_ret;\n";
>List.iter (
>  function
> -| CBArrayAndLen (UInt32 n, _) ->
> -   pr "  PyObject *py_%s = NULL;\n" n
> +| CBArrayAndLen (UInt32 n, _) | CBBytesIn (n, _)
>  | CBMutable (Int n) ->
> pr "  PyObject *py_%s = NULL;\n" n
>  | _ -> ()
> @@ -187,7 +186,10 @@ let
> pr "if (!py_e_%s) { PyErr_PrintEx (0); goto out; }\n" n;
> pr "PyList_SET_ITEM (py_%s, i_%s, py_e_%s);\n" n n n;
> pr "  }\n"
> -| CBBytesIn _
> +| CBBytesIn (n, len) ->
> +   pr "  /* Cast away const */\n";
> +   pr "  py_%s = PyMemoryView_FromMemory 

Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighter checking of structured read replies

2022-05-31 Thread Nir Soffer
On Tue, May 31, 2022 at 10:23 PM Eric Blake  wrote:
>
> On Tue, May 31, 2022 at 08:40:57PM +0300, Nir Soffer wrote:
> > > > > @@ -364,6 +363,8 @@ STATE_MACHINE {
> > > > >SET_NEXT_STATE (%.DEAD);
> > > > >return 0;
> > > > >  }
> > > > > +if (cmd->data_seen <= cmd->count)
> > > > > +  cmd->data_seen += length;
> > > >
> > > > This does not feel right. if you received more data, it should be 
> > > > counted,
> > > > and if this causes data_seen to be bigger than cmd->count, isn't this a 
> > > > fatal
> > > > error?
> > >
> > > cmd->count is at most 64M; it represents how much we asked the server
> > > to provide.  length was just validated (in the elided statement
> > > between these two hunks) to be <= cmd->count (otherwise, the server is
> > > known-buggy for replying with more than we asked, and we've already
> > > moved to %.DEAD state).  cmd->data_seen is a cumulative count of all
> > > bytes seen in prior chunks, plus the current chunk.  If we have not
> > > yet passed cmd->count, then this chunk counts towards the cumulative
> > > limit (and there is no overflow, since 64M*2 < UINT32_MAX).  If we
> > > have already passed cmd->count (in a previous chunk), then we don't
> > > further increase cmd->count, but we already know that we will fail the
> > > overall read later.  In other words, we can stop incrementing
> > > cmd->data_seen as soon as we know it exceeds cmd->count, and by
> > > stopping early, we still detect server bugs without overflowing
> > > uint32_t.
> >
> > But we can have this case:
> >
> > 1. ask for 32m
> > 2. server sends 16m (data_seen increase to 16m)
> > 3. server sends 16m (data_seen increase to 32m)
> > 4. server sends 1m (data_seen does not increase)
>
> Yes it does. 32m <= cmd->count is true, so we bump data_seen to 33m.

Right! I missed this.

> Then, later on when retiring the command, we note that 33m != 32m and
> fail the read with EIO (if it has not already failed for other
> reasons).
>
> > 5. entire request succeeds
> >
> > Shouldn't we fail if server sends unexpected data?
> >
> > If we detected that all data was received, and we get
> > unexpected data, why not fail immediately?
> >
> > cmd->data_seen += length
> > if (cmd->data_seen > cmd->count)
> > switch to dead state?
>
> Switching immediately to a dead state is also possible, but it's nice
> to try and keep the connection alive as long as we can with a nice
> diagnosis of a failed CMD_READ but still allow further commands,
> rather than an abrupt disconnect that takes out all other use of the
> server.

I agree, this is better.

Reviewed-by: Nir Soffer 

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighter checking of structured read replies

2022-05-31 Thread Nir Soffer
On Tue, May 31, 2022 at 7:13 PM Eric Blake  wrote:
>
> On Tue, May 31, 2022 at 06:59:25PM +0300, Nir Soffer wrote:
> > On Tue, May 31, 2022 at 5:15 PM Eric Blake  wrote:
> > >
> > > Now that we allow clients to bypass buffer pre-initialization, it
> > > becomes more important to detect when a buggy server using structured
> > > replies does not send us enough bytes to cover the requested read
> > > size.  Our check is not perfect (a server that duplicates reply chunks
> > > for byte 0 and omits byte 1 gets past our check), but this is a
> > > tighter sanity check so that we are less likely to report a successful
> > > read containing uninitialized memory on a buggy server.
> >
> > Nice!
> >
> > > Because we have a maximum read buffer size of 64M, and first check
> > > that the server's chunk fits bounds, we don't have to worry about
> > > overflowing a uint32_t, even if a server sends enough duplicate
> > > responses that an actual sum would overflow.
> > > ---
>
> > > +++ b/generator/states-reply-structured.c
> > > @@ -354,7 +354,6 @@ STATE_MACHINE {
> > >  assert (cmd); /* guaranteed by CHECK */
> > >
> > >  assert (cmd->data && cmd->type == NBD_CMD_READ);
> > > -cmd->data_seen = true;
> > >
> > >  /* Length of the data following. */
> > >  length -= 8;
> > > @@ -364,6 +363,8 @@ STATE_MACHINE {
> > >SET_NEXT_STATE (%.DEAD);
> > >return 0;
> > >  }
> > > +if (cmd->data_seen <= cmd->count)
> > > +  cmd->data_seen += length;
> >
> > This does not feel right. if you received more data, it should be counted,
> > and if this causes data_seen to be bigger than cmd->count, isn't this a 
> > fatal
> > error?
>
> cmd->count is at most 64M; it represents how much we asked the server
> to provide.  length was just validated (in the elided statement
> between these two hunks) to be <= cmd->count (otherwise, the server is
> known-buggy for replying with more than we asked, and we've already
> moved to %.DEAD state).  cmd->data_seen is a cumulative count of all
> bytes seen in prior chunks, plus the current chunk.  If we have not
> yet passed cmd->count, then this chunk counts towards the cumulative
> limit (and there is no overflow, since 64M*2 < UINT32_MAX).  If we
> have already passed cmd->count (in a previous chunk), then we don't
> further increase cmd->count, but we already know that we will fail the
> overall read later.  In other words, we can stop incrementing
> cmd->data_seen as soon as we know it exceeds cmd->count, and by
> stopping early, we still detect server bugs without overflowing
> uint32_t.

But we can have this case:

1. ask for 32m
2. server sends 16m (data_seen increase to 16m)
3. server sends 16m (data_seen increase to 32m)
4. server sends 1m (data_seen does not increase)
5. entire request succeeds

Shouldn't we fail if server sends unexpected data?

If we detected that all data was received, and we get
unexpected data, why not fail immediately?

cmd->data_seen += length
if (cmd->data_seen > cmd->count)
switch to dead state?

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighter checking of structured read replies

2022-05-31 Thread Nir Soffer
On Tue, May 31, 2022 at 5:15 PM Eric Blake  wrote:
>
> Now that we allow clients to bypass buffer pre-initialization, it
> becomes more important to detect when a buggy server using structured
> replies does not send us enough bytes to cover the requested read
> size.  Our check is not perfect (a server that duplicates reply chunks
> for byte 0 and omits byte 1 gets past our check), but this is a
> tighter sanity check so that we are less likely to report a successful
> read containing uninitialized memory on a buggy server.

Nice!

> Because we have a maximum read buffer size of 64M, and first check
> that the server's chunk fits bounds, we don't have to worry about
> overflowing a uint32_t, even if a server sends enough duplicate
> responses that an actual sum would overflow.
> ---
>  lib/internal.h  | 2 +-
>  generator/states-reply-simple.c | 4 ++--
>  generator/states-reply-structured.c | 6 --
>  lib/aio.c   | 7 +--
>  4 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/lib/internal.h b/lib/internal.h
> index 885cee1..4121a5c 100644
> --- a/lib/internal.h
> +++ b/lib/internal.h
> @@ -352,8 +352,8 @@ struct command {
>void *data; /* Buffer for read/write */
>struct command_cb cb;
>enum state state; /* State to resume with on next POLLIN */
> -  bool data_seen; /* For read, true if at least one data chunk seen */
>bool initialized; /* For read, true if getting a hole may skip memset */
> +  uint32_t data_seen; /* For read, cumulative size of data chunks seen */
>uint32_t error; /* Local errno value */
>  };
>
> diff --git a/generator/states-reply-simple.c b/generator/states-reply-simple.c
> index 7dc26fd..2a7b9a9 100644
> --- a/generator/states-reply-simple.c
> +++ b/generator/states-reply-simple.c
> @@ -1,5 +1,5 @@
>  /* nbd client library in userspace: state machine
> - * Copyright (C) 2013-2019 Red Hat Inc.
> + * Copyright (C) 2013-2022 Red Hat Inc.
>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public
> @@ -40,7 +40,7 @@ STATE_MACHINE {
>if (cmd->error == 0 && cmd->type == NBD_CMD_READ) {
>  h->rbuf = cmd->data;
>  h->rlen = cmd->count;
> -cmd->data_seen = true;
> +cmd->data_seen = cmd->count;
>  SET_NEXT_STATE (%RECV_READ_PAYLOAD);
>}
>else {
> diff --git a/generator/states-reply-structured.c 
> b/generator/states-reply-structured.c
> index 12c24f5..cabd543 100644
> --- a/generator/states-reply-structured.c
> +++ b/generator/states-reply-structured.c
> @@ -354,7 +354,6 @@ STATE_MACHINE {
>  assert (cmd); /* guaranteed by CHECK */
>
>  assert (cmd->data && cmd->type == NBD_CMD_READ);
> -cmd->data_seen = true;
>
>  /* Length of the data following. */
>  length -= 8;
> @@ -364,6 +363,8 @@ STATE_MACHINE {
>SET_NEXT_STATE (%.DEAD);
>return 0;
>  }
> +if (cmd->data_seen <= cmd->count)
> +  cmd->data_seen += length;

This does not feel right. if you received more data, it should be counted,
and if this causes data_seen to be bigger than cmd->count, isn't this a fatal
error?

>  /* Now this is the byte offset in the read buffer. */
>  offset -= cmd->offset;
>
> @@ -422,13 +423,14 @@ STATE_MACHINE {
>  assert (cmd); /* guaranteed by CHECK */
>
>  assert (cmd->data && cmd->type == NBD_CMD_READ);
> -cmd->data_seen = true;
>
>  /* Is the data within bounds? */
>  if (! structured_reply_in_bounds (offset, length, cmd)) {
>SET_NEXT_STATE (%.DEAD);
>return 0;
>  }
> +if (cmd->data_seen <= cmd->count)
> +  cmd->data_seen += length;

Same here

>  /* Now this is the byte offset in the read buffer. */
>  offset -= cmd->offset;
>
> diff --git a/lib/aio.c b/lib/aio.c
> index 9744840..dc01f90 100644
> --- a/lib/aio.c
> +++ b/lib/aio.c
> @@ -1,5 +1,5 @@
>  /* NBD client library in userspace
> - * Copyright (C) 2013-2019 Red Hat Inc.
> + * Copyright (C) 2013-2022 Red Hat Inc.
>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public
> @@ -91,8 +91,11 @@ nbd_unlocked_aio_command_completed (struct nbd_handle *h,
>assert (cmd->type != NBD_CMD_DISC);
>/* The spec states that a 0-length read request is unspecified; but
> * it is easy enough to treat it as successful as an extension.
> +   * Conversely, make sure a server sending structured replies sent
> +   * enough data chunks to cover the overall count (although we do not
> +   * detect if it duplicated some bytes while omitting others).
> */
> -  if (type == NBD_CMD_READ && !cmd->data_seen && cmd->count && !error)
> +  if (type == NBD_CMD_READ && cmd->data_seen != cmd->count && !error)
>  error = EIO;
>
>/* Retire it from the list and free it. */
> --
> 2.36.1
>
> ___
> Libguestfs mailing list
> 

Re: [Libguestfs] [PATCH] -o rhv-upload: wait for VM creation task

2022-04-14 Thread Nir Soffer
On Thu, Apr 14, 2022 at 11:11 AM Richard W.M. Jones  wrote:
>
>
> Sorry, that patch was incomplete.  Here's a better patch.
>
> Rich.
>
> commit d2c018676111de0d5fb895301fb9035c8763f5bb (HEAD -> master)
> Author: Richard W.M. Jones 
> Date:   Thu Apr 14 09:09:15 2022 +0100
>
> -o rhv-upload: Use time.monotonic
>
> In Python >= 3.3 we can use a monotonic instead of system clock, which
> ensures the clock will never go backwards during these loops.
>
> Thanks: Nir Soffer
>
> diff --git a/output/rhv-upload-finalize.py b/output/rhv-upload-finalize.py
> index 4d1dcfb2f4..1221e766ac 100644
> --- a/output/rhv-upload-finalize.py
> +++ b/output/rhv-upload-finalize.py
> @@ -73,7 +73,7 @@ def finalize_transfer(connection, transfer_id, disk_id):
>  .image_transfers_service()
>  .image_transfer_service(transfer_id))
>
> -start = time.time()
> +start = time.monotonic()
>
>  transfer_service.finalize()
>
> @@ -125,14 +125,14 @@ def finalize_transfer(connection, transfer_id, disk_id):
>  raise RuntimeError(
>  "transfer %s was paused by system" % (transfer.id,))
>
> -if time.time() > start + timeout:
> +if time.monotonic() > start + timeout:
>  raise RuntimeError(
>  "timed out waiting for transfer %s to finalize, "
>  "transfer is %s"
>  % (transfer.id, transfer.phase))
>
>  debug("transfer %s finalized in %.3f seconds"
> -  % (transfer_id, time.time() - start))
> +  % (transfer_id, time.monotonic() - start))
>
>
>  # Parameters are passed in via a JSON doc from the OCaml code.
> diff --git a/output/rhv-upload-transfer.py b/output/rhv-upload-transfer.py
> index cf4f8807e6..62b842b67b 100644
> --- a/output/rhv-upload-transfer.py
> +++ b/output/rhv-upload-transfer.py
> @@ -128,13 +128,13 @@ def create_disk(connection):
>  # can't start if the disk is locked.
>
>  disk_service = disks_service.disk_service(disk.id)
> -endt = time.time() + timeout
> +endt = time.monotonic() + timeout
>  while True:
>  time.sleep(1)
>  disk = disk_service.get()
>  if disk.status == types.DiskStatus.OK:
>  break
> -if time.time() > endt:
> +if time.monotonic() > endt:
>  raise RuntimeError(
>  "timed out waiting for disk %s to become unlocked" % disk.id)
>
> @@ -176,7 +176,7 @@ def create_transfer(connection, disk, host):
>  # If the transfer was paused, we need to cancel it to remove the disk,
>  # otherwise the system will remove the disk and transfer shortly after.
>
> -endt = time.time() + timeout
> +endt = time.monotonic() + timeout
>  while True:
>  time.sleep(1)
>  try:
> @@ -204,7 +204,7 @@ def create_transfer(connection, disk, host):
>  "unexpected transfer %s phase %s"
>  % (transfer.id, transfer.phase))
>
> -if time.time() > endt:
> +if time.monotonic() > endt:
>  transfer_service.cancel()
>  raise RuntimeError(
>  "timed out waiting for transfer %s" % transfer.id)

Looks good.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH] -o rhv-upload: wait for VM creation task

2022-04-13 Thread Nir Soffer
On Tue, Apr 12, 2022 at 9:35 PM Tomáš Golembiovský  wrote:
>
> oVirt API call for VM creation finishes before the VM is actually
> created. Entities may be still locked after virt-v2v terminates and if
> user tries to perform (scripted) actions after virt-v2v those operations
> may fail. To prevent this it is useful to monitor the task and wait for
> the completion. This will also help to prevent some corner case
> scenarios (that would be difficult to debug) when the VM creation job
> fails after virt-v2v already termintates with success.
>
> Thanks: Nir Soffer
> Signed-off-by: Tomáš Golembiovský 
> Reviewed-by: Arik Hadas 
> ---
>  output/rhv-upload-createvm.py | 57 ++-
>  1 file changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/output/rhv-upload-createvm.py b/output/rhv-upload-createvm.py
> index 50bb7e34..c6a6fbd6 100644
> --- a/output/rhv-upload-createvm.py
> +++ b/output/rhv-upload-createvm.py
> @@ -19,12 +19,54 @@
>  import json
>  import logging
>  import sys
> +import time
> +import uuid
>
>  from urllib.parse import urlparse
>
>  import ovirtsdk4 as sdk
>  import ovirtsdk4.types as types
>
> +
> +def debug(s):
> +if params['verbose']:
> +print(s, file=sys.stderr)
> +sys.stderr.flush()
> +
> +
> +def jobs_completed(system_service, correlation_id):
> +jobs_service = system_service.jobs_service()
> +
> +try:
> +jobs = jobs_service.list(
> +search="correlation_id=%s" % correlation_id)
> +except sdk.Error as e:
> +debug(
> +"Error searching for jobs with correlation id %s: %s" %
> +(correlation_id, e))
> +# We dont know, assume that jobs did not complete yet.

don't?

> +return False
> +
> +# STARTED is the only "in progress" status, other mean the job has

"other" ->  "anything else"?

> +# already terminated

Missing . at the end of the comment.

> +if all(job.status != types.JobStatus.STARTED for job in jobs):
> +failed_jobs = [(job.description, str(job.status))
> +   for job in jobs
> +   if job.status != types.JobStatus.FINISHED]
> +if failed_jobs:
> +raise RuntimeError(
> +"Failed to create a VM! Failed jobs: %r" % failed_jobs)
> +return True
> +else:
> +jobs_status = [(job.description, str(job.status)) for job in jobs]

jobs_status is a little confusing since this is a list of (description, status)
tuples. Maybe "running_jobs"?

It is also more consistent with "failed_jobs" above.

> +debug("Some jobs with correlation id %s are running: %s" %
> +  (correlation_id, jobs_status))
> +return False
> +
> +
> +# Seconds to wait for the VM import job to complete in oVirt.
> +timeout = 5 * 60
> +
>  # Parameters are passed in via a JSON doc from the OCaml code.
>  # Because this Python code ships embedded inside virt-v2v there
>  # is no formal API here.
> @@ -67,6 +109,7 @@ system_service = connection.system_service()
>  cluster = 
> system_service.clusters_service().cluster_service(params['rhv_cluster_uuid'])
>  cluster = cluster.get()
>
> +correlation_id = str(uuid.uuid4())
>  vms_service = system_service.vms_service()
>  vm = vms_service.add(
>  types.Vm(
> @@ -77,5 +120,17 @@ vm = vms_service.add(
>  data=ovf,
>  )
>  )
> -)
> +),
> +query={'correlation_id': correlation_id},
>  )
> +
> +# Wait for the import job to finish
> +endt = time.time() + timeout

Since we use python 3, it is better to use time.monotonic()
which is affected by system time changes.

> +while True:
> +time.sleep(1)

Since we wait up to 300 seconds, maybe use a longer delay?
Or maybe we don't need to wait for 300 seconds?

> +if jobs_completed(system_service, correlation_id):
> +break
> +if time.time() > endt:
> +    raise RuntimeError(
> +"Timed out waiting for VM creation!"
> +" Jobs still running for correlation id %s" % correlation_id)
> --
> 2.35.1
>
> ___
> Libguestfs mailing list
> Libguestfs@redhat.com
> https://listman.redhat.com/mailman/listinfo/libguestfs

With or without suggested minor improvements,

Reviewed-by: Nir Soffer 

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length

2022-04-08 Thread Nir Soffer
On Fri, Apr 8, 2022 at 6:47 PM Eric Blake  wrote:
>
> On Fri, Apr 08, 2022 at 04:48:59PM +0300, Nir Soffer wrote:
...
> > > BTW attached is an nbdkit plugin that creates an NBD server that
> > > responds with massive numbers of byte-granularity extents, in case
> > > anyone wants to test how nbdkit and/or clients respond:
> > >
> > > $ chmod +x /var/tmp/lots-of-extents.py
> > > $ /var/tmp/lots-of-extents.py -f
> > >
> > > $ nbdinfo --map nbd://localhost | head
> > >  0   13  hole,zero
> > >  1   10  data
> > >  2   13  hole,zero
> > >  3   10  data
> > >  4   13  hole,zero
> > >  5   10  data
> > >  6   13  hole,zero
> > >  7   10  data
> > >  8   13  hole,zero
> > >  9   10  data
> > > $ nbdinfo --map --totals nbd://localhost
> > > 524288  50.0%   0 data
> > > 524288  50.0%   3 hole,zero
> >
> > This is a malicious server. A good client will drop the connection when
> > receiving the first 1 byte chunk.
>
> Depends on the server.  Most servers don't serve 1-byte extents, and
> the NBD spec even recommends that extents be at least 512 bytes in
> size, and requires that extents be a multiple of any minimum block
> size if one was advertised by the server.
>
> But even though most servers don't have 1-byte extents does not mean
> that the NBD protocol must forbid them.

Forbidding this simplifies clients without limiting real world use cases.

What is a reason to allow this?

> > The real issue here is not enforcing or suggesting a limit on the number of
> > extent the server returns, but enforcing a limit on the minimum size of
> > a chunk.
> >
> > Since this is the network *block device* protocol it should not allow chunks
> > smaller than the device block size, so anything smaller than 512 bytes
> > should be invalid response from the server.
>
> No, not an invalid response, but merely a discouraged one - and that
> text is already present in the wording of NBD_CMD_BLOCK_STATUS.

My suggestion is to make it an invalid response, because there are no block
devices that can return such a response.

> > Even the last chunk should not be smaller than 512 bytes. The fact that you
> > can serve a file with size that is not aligned to 512 bytes does not mean 
> > that
> > the export size can be unaligned to the logical block size. There are no 
> > real
> > block devices that have such alignment so the protocol should not allow 
> > this.
> > A good server will round the file size down the logical block size to avoid 
> > this
> > issue.
> >
> > How about letting the client set a minimum size of a chunk? This way we
> > avoid the issue of limiting the number of chunks. Merging small chunks
> > is best done on the server side instead of wasting bandwidth and doing
> > this on the client side.
>
> The client can't set the minimum block size, but the server can
> certainly advertise one, and must obey that advertisement.  Or are you
> asking for a new extension where the client mandates what the minimum
> granularity must be from the server in responses to NBD_CMD_READ and
> NBD_CMD_BLOCK_STATUS, when the client wants a larger granularity than
> what the server advertises?  That's a different extension than this
> patch, but may be worth considering.

Yes, this should really be discussed in another thread.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length

2022-04-08 Thread Nir Soffer
On Fri, Apr 8, 2022 at 2:52 PM Richard W.M. Jones  wrote:
>
> On Fri, Apr 08, 2022 at 09:25:01AM +0200, Wouter Verhelst wrote:
> > Hi Eric,
> >
> > On Thu, Apr 07, 2022 at 04:37:19PM -0500, Eric Blake wrote:
> > > The spec was silent on how many extents a server could reply with.
> > > However, both qemu and nbdkit (the two server implementations known to
> > > have implemented the NBD_CMD_BLOCK_STATUS extension) implement a hard
> > > cap, and will truncate the amount of extents in a reply to avoid
> > > sending a client a reply larger than the maximum NBD_CMD_READ response
> > > they are willing to tolerate:
> > >
> > > When qemu first implemented NBD_CMD_BLOCK_STATUS for the
> > > base:allocation context (qemu commit e7b1948d51, Mar 2018), it behaved
> > > as if NBD_CMD_FLAG_REQ_ONE were always passed by the client, and never
> > > responded with more than one extent.  Later, when adding its
> > > qemu:dirty-bitmap:XXX context extension (qemu commit 3d068aff16, Jun
> > > 2018), it added a cap to 128k extents (1M+4 bytes), and that cap was
> > > applied to base:allocation once qemu started sending multiple extents
> > > for that context as well (qemu commit fb7afc797e, Jul 2018).  Qemu
> > > extents are never smaller than 512 bytes (other than an exception at
> > > the end of a file whose size is not aligned to 512), but even so, a
> > > request for just under 4G of block status could produce 8M extents,
> > > resulting in a reply of 64M if it were not capped smaller.
> > >
> > > When nbdkit first implemented NBD_CMD_BLOCK_STATUS (nbdkit 4ca66f70a5,
> > > Mar 2019), it did not impose any restriction on the number of extents
> > > in the reply chunk.  But because it allows extents as small as one
> > > byte, it is easy to write a server that can amplify a client's request
> > > of status over 1M of the image into a reply over 8M in size, and it
> > > was very easy to demonstrate that a hard cap was needed to avoid
> > > crashing clients or otherwise killing the connection (a bad server
> > > impacting the client negatively); unique to nbdkit's situation is the
> > > fact that because it is designed for plugin server implementations,
> > > not capping the length of extent also posed a problem to nbdkit as the
> > > server (a client requesting large block status could cause the server
> > > to run out of memory depending on the plugin providing the server
> > > callbacks).  So nbdkit enforced a bound of 1M extents (8M+4 bytes,
> > > nbdkit commit 6e0dc839ea, Jun 2019).
> > >
> > > Since the limit chosen by these two implementations is different, and
> > > since nbdkit has versions that were not limited, add this as a SHOULD
> > > NOT instead of MUST NOT constraint on servers implementing block
> > > status.  It does not matter that qemu picked a smaller limit that it
> > > truncates to, since we have already documented that the server may
> > > truncate for other reasons (such as it being inefficient to collect
> > > that many extents in the first place).  But documenting the limit now
> > > becomes even more important in the face of a future addition of 64-bit
> > > requests, where a client's request is no longer bounded to 4G and
> > > could thereby produce even more than 8M extents for the corner case
> > > when every 512 bytes is a new extent, if it were not for this
> > > recommendation.
> >
> > It feels backwards to me to make this a restriction on the server side.
> > You're saying there are server implementations that will be inefficient
> > if there are more than 2^20 extents, and therefore no server should send
> > more than those, even if it can do so efficiently.
> >
> > Isn't it up to the server implementation to decide what can be done
> > efficiently?
> >
> > Perhaps we can make the language about possibly reducing length of
> > extens a bit stronger; but I don't think adding explicit limits for a
> > server's own protection is necessary.
>
> I agree, but for a different reason.
>
> I think Eric should add language that servers can consider limiting
> response sizes in order to prevent possible amplification issues
> and/or simply overwhelming the client with work (bad server DoS
> attacks against clients are a thing!), but I don't think it's
> necessarily a "SHOULD" issue.
>
> BTW attached is an nbdkit plugin that creates an NBD server that
> responds with massive numbers of byte-granularity extents, in case
> anyone wants to test how nbdkit and/or clients respond:
>
> $ chmod +x /var/tmp/lots-of-extents.py
> $ /var/tmp/lots-of-extents.py -f
>
> $ nbdinfo --map nbd://localhost | head
>  0   13  hole,zero
>  1   10  data
>  2   13  hole,zero
>  3   10  data
>  4   13  hole,zero
>  5   10  data
>  6   13  hole,zero
>  7   10  data
>  8   13  hole,zero
>  9   10  data
> $ nbdinfo --map --totals 

Re: [Libguestfs] [PATCH libnbd 3/3] copy: Do not initialize read buffer

2022-03-10 Thread Nir Soffer
On Thu, Mar 10, 2022 at 5:58 PM Eric Blake  wrote:
>
> On Sun, Mar 06, 2022 at 10:27:30PM +0200, Nir Soffer wrote:
> > nbdcopy checks pread error now, so we will never leak uninitialized data
> > from the heap to the destination server. Testing show 3-8% speedup when
> > copying a real image.
> >
>
> > +++ b/copy/nbd-ops.c
> > @@ -52,20 +52,21 @@ static void
> >  open_one_nbd_handle (struct rw_nbd *rwn)
> >  {
> >struct nbd_handle *nbd;
> >
> >nbd = nbd_create ();
> >if (nbd == NULL) {
> >  fprintf (stderr, "%s: %s\n", prog, nbd_get_error ());
> >  exit (EXIT_FAILURE);
> >}
> >
> > +  nbd_set_pread_initialize (nbd, false);
> >nbd_set_debug (nbd, verbose);
>
> Pre-existing that we did not check for failure from nbd_set_debug(),
> so it is not made worse by not checking for failure of
> nbd_set_pread_initialize().
>
> Then again, nbd_set_debug() is currently documented as being able to
> fail, but in practice cannot - we do not restrict it to a subset of
> states, and its implementation is dirt-simple in lib/debug.c.  We may
> want (as a separate patch) to tweak this function to be mared as
> may_set_error=false, the way nbd_get_debug() is (as long as such
> change does not impact the API).
>
> Similarly, nbd_set_pread_initialize() has no restrictions on which
> states it can be used in, so maybe we should also mark it as
> may_set_error=false.  Contrast that with things like
> nbd_set_request_block_size(), which really do make sense to limit to
> certain states (once negotiation is done, changing the flag has no
> effect).
>
> So we may have further cleanups to do, but once you add the comments
> requested by Rich throughout the series, and the error checking I
> suggested in 2/3, with the series.

I'm worried about one issue - if we use uninitialized memory, and a bad server
returns an invalid structured reply with missing data or zero chunk,
we will leak
the uninitialize memory to the destination.

This can be mitigated by several ways:
- always initialize the buffers (current state, slower)
- use memory pool with initialized memory
  (like https://apr.apache.org/docs/apr/trunk/group__apr__pools.html)
- detect bad structured reply (we discussed this previously)

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd | Failed pipeline for master | 0cd77478

2022-03-06 Thread Nir Soffer
On Sun, Mar 6, 2022 at 10:40 PM Richard W.M. Jones  wrote:
>
> On Sun, Mar 06, 2022 at 08:28:09PM +, GitLab wrote:
> > GitLab
> >✖ Pipeline #485634933 has failed!
> >
> > Project   nbdkit / libnbd
> > Branch● master
> > Commit● 0cd77478
> >   copy: Minor cleanups Minor fixes suggested by ...
> > Commit Author ● Nir Soffer
> >
> > Pipeline #485634933 triggered by ●   Nir Soffer
> >had 2 failed jobs.
> >   Failed jobs
> > ✖ builds x86_64-centos-8
> > ✖ builds  x86_64-ubuntu-2004
>
> I'll fix this.  In brief it happened because centos:8 is no longer a
> thing (sadly), we have to replace it with almalinux; and the Ubuntu
> problem looks like a temporary failure.

I retried the ubuntu build and this time it was successful:
https://gitlab.com/nbdkit/libnbd/-/jobs/2168746668

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [PATCH libnbd 1/3] golang: examples: Do not initialize pread buffer

2022-03-06 Thread Nir Soffer
On Sun, Mar 6, 2022 at 10:35 PM Richard W.M. Jones  wrote:
>
> On Sun, Mar 06, 2022 at 10:27:28PM +0200, Nir Soffer wrote:
> > The aio_copy example checks errors properly, so it will not leak
> > uninitialized memory to the destination image. Testing shows 5% speedup
> > when copying a real image.
> >
> > $ qemu-nbd --read-only --persistent --shared 8 --cache none --aio native \
> > --socket /tmp/src.sock --format raw fedora-35-data.raw &
> >
> > $ hyperfine -p "sleep 5" "./aio_copy-init $SRC >/dev/null" 
> > "./aio_copy-no-init $SRC >/dev/null"
> >
> > Benchmark 1: ./aio_copy-init nbd+unix:///?socket=/tmp/src.sock >/dev/null
> >   Time (mean ± σ):  1.452 s ±  0.027 s[User: 0.330 s, System: 0.489 
> > s]
> >   Range (min … max):1.426 s …  1.506 s10 runs
> >
> > Benchmark 2: ./aio_copy-no-init nbd+unix:///?socket=/tmp/src.sock >/dev/null
> >   Time (mean ± σ):  1.378 s ±  0.009 s[User: 0.202 s, System: 0.484 
> > s]
> >   Range (min … max):1.369 s …  1.399 s10 runs
> >
> > Summary
> >   './aio_copy-no-init nbd+unix:///?socket=/tmp/src.sock >/dev/null' ran
> > 1.05 ± 0.02 times faster than './aio_copy-init 
> > nbd+unix:///?socket=/tmp/src.sock >/dev/null'
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  golang/examples/aio_copy/aio_copy.go   | 5 +
> >  golang/examples/simple_copy/simple_copy.go | 5 +
> >  2 files changed, 10 insertions(+)
> >
> > diff --git a/golang/examples/aio_copy/aio_copy.go 
> > b/golang/examples/aio_copy/aio_copy.go
> > index bb20b478..89eac4df 100644
> > --- a/golang/examples/aio_copy/aio_copy.go
> > +++ b/golang/examples/aio_copy/aio_copy.go
> > @@ -84,20 +84,25 @@ func main() {
> >   err = h.ConnectUri(flag.Arg(0))
> >   if err != nil {
> >   panic(err)
> >   }
> >
> >   size, err := h.GetSize()
> >   if err != nil {
> >   panic(err)
> >   }
> >
> > + err = h.SetPreadInitialize(false)
> > + if err != nil {
> > + panic(err)
> > + }
> > +
>
> In patch #2 you added a comment above the call.
>
> Because this is an example and so people may just copy the code
> blindly without understanding it, I think adding a comment here and
> below is worth doing too.
>
> > diff --git a/golang/examples/simple_copy/simple_copy.go 
> > b/golang/examples/simple_copy/simple_copy.go
> > index e8fa1f76..2a2ed0ff 100644
> > --- a/golang/examples/simple_copy/simple_copy.go
> > +++ b/golang/examples/simple_copy/simple_copy.go
> > @@ -63,20 +63,25 @@ func main() {
> >   err = h.ConnectUri(flag.Arg(0))
> >   if err != nil {
> >   panic(err)
> >   }
> >
> >   size, err := h.GetSize()
> >   if err != nil {
> >   panic(err)
> >   }
> >
> > + err = h.SetPreadInitialize(false)
> > + if err != nil {
> > + panic(err)
> > + }
> > +
>
> And above this one.

Yes, good idea to explain why it is needed and the risk when working
with a bad server.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


[Libguestfs] [PATCH libnbd 3/3] copy: Do not initialize read buffer

2022-03-06 Thread Nir Soffer
nbdcopy checks pread error now, so we will never leak uninitialized data
from the heap to the destination server. Testing show 3-8% speedup when
copying a real image.

On laptop with 12 cores and 2 consumer NVMe drives:

$ qemu-nbd --read-only --persistent --shared 8 --cache none --aio native \
--socket /tmp/src.sock --format raw fedora-35-data.raw &

$ qemu-nbd --persistent --shared 8 --cache none --aio native --discard unmap \
--socket /tmp/dst.sock --format raw dst.raw &

$ hyperfine -p "sleep 5" "nbdcopy $SRC $DST" ".libs/nbdcopy $SRC $DST"

Benchmark 1: nbdcopy nbd+unix:///?socket=/tmp/src.sock 
nbd+unix:///?socket=/tmp/dst.sock
  Time (mean ± σ):  2.065 s ±  0.057 s[User: 0.296 s, System: 1.414 s]
  Range (min … max):2.000 s …  2.163 s10 runs

Benchmark 2: .libs/nbdcopy nbd+unix:///?socket=/tmp/src.sock 
nbd+unix:///?socket=/tmp/dst.sock
  Time (mean ± σ):  1.911 s ±  0.050 s[User: 0.059 s, System: 1.544 s]
  Range (min … max):1.827 s …  1.980 s10 runs

Summary
  '.libs/nbdcopy nbd+unix:///?socket=/tmp/src.sock 
nbd+unix:///?socket=/tmp/dst.sock' ran
1.08 ± 0.04 times faster than 'nbdcopy nbd+unix:///?socket=/tmp/src.sock 
nbd+unix:///?socket=/tmp/dst.sock'

On server with 80 cores and a fast NVMe drive:

$ hyperfine "./nbdcopy-init $SRC $DST" "./nbdcopy-no-init $SRC $DST"

Benchmark 1: ./nbdcopy-init nbd+unix:///?socket=/tmp/src.sock 
nbd+unix:///?socket=/tmp/dst.sock
  Time (mean ± σ):  2.652 s ±  0.033 s[User: 0.345 s, System: 1.306 s]
  Range (min … max):2.619 s …  2.709 s10 runs

Benchmark 2: ./nbdcopy-no-init nbd+unix:///?socket=/tmp/src.sock 
nbd+unix:///?socket=/tmp/dst.sock
  Time (mean ± σ):  2.572 s ±  0.031 s[User: 0.055 s, System: 1.409 s]
  Range (min … max):2.537 s …  2.629 s10 runs

Summary
  './nbdcopy-no-init nbd+unix:///?socket=/tmp/src.sock 
nbd+unix:///?socket=/tmp/dst.sock' ran
1.03 ± 0.02 times faster than './nbdcopy-init 
nbd+unix:///?socket=/tmp/src.sock nbd+unix:///?socket=/tmp/dst.sock'

Signed-off-by: Nir Soffer 
---
 copy/nbd-ops.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/copy/nbd-ops.c b/copy/nbd-ops.c
index adfe4de5..1218d6d0 100644
--- a/copy/nbd-ops.c
+++ b/copy/nbd-ops.c
@@ -52,20 +52,21 @@ static void
 open_one_nbd_handle (struct rw_nbd *rwn)
 {
   struct nbd_handle *nbd;
 
   nbd = nbd_create ();
   if (nbd == NULL) {
 fprintf (stderr, "%s: %s\n", prog, nbd_get_error ());
 exit (EXIT_FAILURE);
   }
 
+  nbd_set_pread_initialize (nbd, false);
   nbd_set_debug (nbd, verbose);
 
   /* Set the handle name for debugging.  We could use rwn->rw.name
* here but it is usually set to the lengthy NBD URI
* (eg. "nbd://localhost:10809") which makes debug messages very
* long.
*/
   if (verbose) {
 char *name;
 const size_t index = rwn->handles.len;
-- 
2.35.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


[Libguestfs] [PATCH libnbd 0/3] Disablble pread buffer initiialization

2022-03-06 Thread Nir Soffer
Disbaling pread initialization is atually measurable and gives small speedup in
nbdcopy and some examples.

Some cases are safer; in copy-libev example, we allocate all buffers with
calloc() and resuse them for the entire copy, so the initialization is
completly useless.

In nbdcopy and Go aio_copy example, we allocate new buffer using malloc(), so
when working with bad server that does not return all data chunks in structured
reply, we can leak uninitialized memory to the destination server. I think this
issue should be solved by libnbd; it should verfify that the server return all
the expected chunks and fail the request if not.

Nir Soffer (3):
  golang: examples: Do not initialize pread buffer
  examples: copy-libev: Do not initialize pread buffer
  copy: Do not initialize read buffer

 copy/nbd-ops.c | 1 +
 examples/copy-libev.c  | 6 ++
 golang/examples/aio_copy/aio_copy.go   | 5 +
 golang/examples/simple_copy/simple_copy.go | 5 +
 4 files changed, 17 insertions(+)

-- 
2.35.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd 1/3] golang: examples: Do not initialize pread buffer

2022-03-06 Thread Nir Soffer
The aio_copy example checks errors properly, so it will not leak
uninitialized memory to the destination image. Testing shows 5% speedup
when copying a real image.

$ qemu-nbd --read-only --persistent --shared 8 --cache none --aio native \
--socket /tmp/src.sock --format raw fedora-35-data.raw &

$ hyperfine -p "sleep 5" "./aio_copy-init $SRC >/dev/null" "./aio_copy-no-init 
$SRC >/dev/null"

Benchmark 1: ./aio_copy-init nbd+unix:///?socket=/tmp/src.sock >/dev/null
  Time (mean ± σ):  1.452 s ±  0.027 s[User: 0.330 s, System: 0.489 s]
  Range (min … max):1.426 s …  1.506 s10 runs

Benchmark 2: ./aio_copy-no-init nbd+unix:///?socket=/tmp/src.sock >/dev/null
  Time (mean ± σ):  1.378 s ±  0.009 s[User: 0.202 s, System: 0.484 s]
  Range (min … max):1.369 s …  1.399 s10 runs

Summary
  './aio_copy-no-init nbd+unix:///?socket=/tmp/src.sock >/dev/null' ran
1.05 ± 0.02 times faster than './aio_copy-init 
nbd+unix:///?socket=/tmp/src.sock >/dev/null'

Signed-off-by: Nir Soffer 
---
 golang/examples/aio_copy/aio_copy.go   | 5 +
 golang/examples/simple_copy/simple_copy.go | 5 +
 2 files changed, 10 insertions(+)

diff --git a/golang/examples/aio_copy/aio_copy.go 
b/golang/examples/aio_copy/aio_copy.go
index bb20b478..89eac4df 100644
--- a/golang/examples/aio_copy/aio_copy.go
+++ b/golang/examples/aio_copy/aio_copy.go
@@ -84,20 +84,25 @@ func main() {
err = h.ConnectUri(flag.Arg(0))
if err != nil {
panic(err)
}
 
size, err := h.GetSize()
if err != nil {
panic(err)
}
 
+   err = h.SetPreadInitialize(false)
+   if err != nil {
+   panic(err)
+   }
+
var offset uint64
 
for offset < size || queue.Len() > 0 {
 
for offset < size && inflightRequests() < *requests {
length := *requestSize
if size-offset < uint64(length) {
length = uint(size - offset)
}
startRead(offset, length)
diff --git a/golang/examples/simple_copy/simple_copy.go 
b/golang/examples/simple_copy/simple_copy.go
index e8fa1f76..2a2ed0ff 100644
--- a/golang/examples/simple_copy/simple_copy.go
+++ b/golang/examples/simple_copy/simple_copy.go
@@ -63,20 +63,25 @@ func main() {
err = h.ConnectUri(flag.Arg(0))
if err != nil {
panic(err)
}
 
size, err := h.GetSize()
if err != nil {
panic(err)
}
 
+   err = h.SetPreadInitialize(false)
+   if err != nil {
+   panic(err)
+   }
+
buf := make([]byte, *requestSize)
var offset uint64
 
for offset < size {
if size-offset < uint64(len(buf)) {
buf = buf[:offset-size]
}
 
err = h.Pread(buf, offset, nil)
if err != nil {
-- 
2.35.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] [PATCH libnbd 8/8] copy: Adaptive queue size

2022-03-06 Thread Nir Soffer
On Wed, Feb 23, 2022 at 3:56 PM Nir Soffer  wrote:
>
> On Wed, Feb 23, 2022 at 3:29 PM Richard W.M. Jones  wrote:
> >
> > On Wed, Feb 23, 2022 at 02:53:47PM +0200, Nir Soffer wrote:
> > > I'll send more patches for the suggested improvements next week.
> >
> > I'd like to an upstream stable release early next week, ideally Monday
> > if possible.
>
> The additional changes are internal details that do not need to be in
> the release. For example make free_command() accept NULL.

Both fixes suggested by Eric push in 0cd77478a7ac863ef092b9e4295b6f1de6d687ac.

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 8/8] copy: Adaptive queue size

2022-02-23 Thread Nir Soffer
On Wed, Feb 23, 2022 at 3:29 PM Richard W.M. Jones  wrote:
>
> On Wed, Feb 23, 2022 at 02:53:47PM +0200, Nir Soffer wrote:
> > I'll send more patches for the suggested improvements next week.
>
> I'd like to an upstream stable release early next week, ideally Monday
> if possible.

The additional changes are internal details that do not need to be in
the release. For example make free_command() accept NULL.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 8/8] copy: Adaptive queue size

2022-02-23 Thread Nir Soffer
On Tue, Feb 22, 2022 at 1:48 PM Nir Soffer  wrote:
>
> On Mon, Feb 21, 2022 at 5:41 PM Eric Blake  wrote:
> >
> > On Sun, Feb 20, 2022 at 02:14:03PM +0200, Nir Soffer wrote:
> > > Limit the size of the copy queue also by the number of queued bytes.
> > > This allows using many concurrent small requests, required to get good
> > > performance, but limiting the number of large requests that are actually
> > > faster with lower concurrency.
> > >
> > > New --queue-size option added to control the maximum queue size. With 2
> > > MiB we can have 8 256 KiB requests per connection. The default queue
> > > size is 16 MiB, to match the default --requests value (64) with the
> > > default --request-size (256 KiB). Testing show that using more than 16
> > > 256 KiB requests with one connection do not improve anything.
> >
> > s/do/does/
> >
> > >
> > > The new option will simplify limiting memory usage when using large
> > > requests, like this change in virt-v2v:
> > > https://github.com/libguestfs/virt-v2v/commit/c943420219fa0ee971fc228aa4d9127c5ce973f7
> > >
> > > I tested this change with 3 images:
> > >
> > > - Fedora 35 + 3g of random data - hopefully simulates a real image
> > > - Fully allocated image - the special case when every read command is
> > >   converted to a write command.
> > > - Zero image - the special case when every read command is converted to
> > >   a zero command.
> > >
> > > On 2 machines:
> > >
> > > - laptop: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz, 12 cpus,
> > >   1.5 MiB L2 cache per 2 cpus, 12 MiB L3 cache.
> > > - server: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz, 80 cpus,
> > >   1 MiB L2 cache per cpu, 27.5 MiB L3 cache.
> > >
> > > In all cases, both source and destination are served by qemu-nbd, using
> > > --cache=none --aio=native. Because qemu-nbd does not support MULTI_CON
> >
> > MULTI_CONN
> >
> > > for writing, we are testing a single connection when copying an to
> >
> > Did you mean 'copying an image to'?
>
> Yes
>
> >
> > > qemu-nbd. I tested also copying to null: since in this case we use 4
> > > connections (these tests are marked with /ro).
> > >
> > > Results for copying all images on all machines with nbdcopy v1.11.0 and
> > > this change. "before" and "after" are average time of 10 runs.
> > >
> > > image   machinebeforeafterqueue sizeimprovement
> > > ===
> > > fedora  laptop  3.0442.129   2m   +43%
> > > fulllaptop  4.9003.136   2m   +56%
> > > zerolaptop  3.1472.624   2m   +20%
> > > -------
> > > fedora  server  2.3242.189  16m+6%
> > > fullserver  3.5213.380   8m+4%
> > > zeroserver  2.2972.338  16m-2%
> > > ---
> > > fedora/ro   laptop  2.0401.663   1m   +23%
> > > fedora/ro   server  1.5851.393   2m   +14%
> > >
> > > Signed-off-by: Nir Soffer 
> > > ---
> > >  copy/main.c | 52 -
> > >  copy/multi-thread-copying.c | 18 +++--
> > >  copy/nbdcopy.h  |  1 +
> > >  copy/nbdcopy.pod| 12 +++--
> > >  4 files changed, 55 insertions(+), 28 deletions(-)
> > >
> >
> > >  static void __attribute__((noreturn))
> > >  usage (FILE *fp, int exitcode)
> > >  {
> > >fprintf (fp,
> > >  "\n"
> > >  "Copy to and from an NBD server:\n"
> > >  "\n"
> > >  "nbdcopy [--allocated] [-C N|--connections=N]\n"
> > >  "[--destination-is-zero|--target-is-zero] [--flush]\n"
> > >  "[--no-extents] [-p|--progress|--progress=FD]\n"
> > > -"[--request-size=N] [-R N|--requests=N] [-S N|--sparse=N]\n"
> > > -"[--synchronous] [-T N|--threads=N] [-v|--verbose]\n"
> > > +"[--request-size=N] [--queue-size=N] [-R N|--requests=N]\n"
> >
> > 

Re: [Libguestfs] [PATCH libnbd 8/8] copy: Adaptive queue size

2022-02-22 Thread Nir Soffer
On Mon, Feb 21, 2022 at 5:41 PM Eric Blake  wrote:
>
> On Sun, Feb 20, 2022 at 02:14:03PM +0200, Nir Soffer wrote:
> > Limit the size of the copy queue also by the number of queued bytes.
> > This allows using many concurrent small requests, required to get good
> > performance, but limiting the number of large requests that are actually
> > faster with lower concurrency.
> >
> > New --queue-size option added to control the maximum queue size. With 2
> > MiB we can have 8 256 KiB requests per connection. The default queue
> > size is 16 MiB, to match the default --requests value (64) with the
> > default --request-size (256 KiB). Testing show that using more than 16
> > 256 KiB requests with one connection do not improve anything.
>
> s/do/does/
>
> >
> > The new option will simplify limiting memory usage when using large
> > requests, like this change in virt-v2v:
> > https://github.com/libguestfs/virt-v2v/commit/c943420219fa0ee971fc228aa4d9127c5ce973f7
> >
> > I tested this change with 3 images:
> >
> > - Fedora 35 + 3g of random data - hopefully simulates a real image
> > - Fully allocated image - the special case when every read command is
> >   converted to a write command.
> > - Zero image - the special case when every read command is converted to
> >   a zero command.
> >
> > On 2 machines:
> >
> > - laptop: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz, 12 cpus,
> >   1.5 MiB L2 cache per 2 cpus, 12 MiB L3 cache.
> > - server: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz, 80 cpus,
> >   1 MiB L2 cache per cpu, 27.5 MiB L3 cache.
> >
> > In all cases, both source and destination are served by qemu-nbd, using
> > --cache=none --aio=native. Because qemu-nbd does not support MULTI_CON
>
> MULTI_CONN
>
> > for writing, we are testing a single connection when copying an to
>
> Did you mean 'copying an image to'?

Yes

>
> > qemu-nbd. I tested also copying to null: since in this case we use 4
> > connections (these tests are marked with /ro).
> >
> > Results for copying all images on all machines with nbdcopy v1.11.0 and
> > this change. "before" and "after" are average time of 10 runs.
> >
> > image   machinebeforeafterqueue sizeimprovement
> > ===
> > fedora  laptop  3.0442.129   2m   +43%
> > fulllaptop  4.9003.136   2m   +56%
> > zerolaptop  3.1472.624   2m   +20%
> > ---
> > fedora  server  2.3242.189  16m+6%
> > fullserver  3.5213.380       8m+4%
> > zeroserver  2.2972.338  16m-2%
> > ---
> > fedora/ro   laptop  2.0401.663   1m   +23%
> > fedora/ro   server  1.5851.393   2m   +14%
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  copy/main.c | 52 -
> >  copy/multi-thread-copying.c | 18 +++--
> >  copy/nbdcopy.h  |  1 +
> >  copy/nbdcopy.pod| 12 +++--
> >  4 files changed, 55 insertions(+), 28 deletions(-)
> >
>
> >  static void __attribute__((noreturn))
> >  usage (FILE *fp, int exitcode)
> >  {
> >fprintf (fp,
> >  "\n"
> >  "Copy to and from an NBD server:\n"
> >  "\n"
> >  "nbdcopy [--allocated] [-C N|--connections=N]\n"
> >  "[--destination-is-zero|--target-is-zero] [--flush]\n"
> >  "[--no-extents] [-p|--progress|--progress=FD]\n"
> > -"[--request-size=N] [-R N|--requests=N] [-S N|--sparse=N]\n"
> > -"[--synchronous] [-T N|--threads=N] [-v|--verbose]\n"
> > +"[--request-size=N] [--queue-size=N] [-R N|--requests=N]\n"
>
> The options are listed in mostly alphabetic order already, so
> --queue-size before --request-size makes more sense to me.
>
> > @@ -104,33 +106,35 @@ main (int argc, char *argv[])
> >  {
> >enum {
> >  HELP_OPTION = CHAR_MAX + 1,
> >  LONG_OPTIONS,
> >  SHORT_OPTIONS,
> >  ALLOCATED_OPTION,
> >  DESTINATION_IS_ZERO_OPTION,
> >  FLUSH_OPTION,
> >  NO_EXTENTS_OPTION,
> >  REQUEST_SIZE_OPTION,
&

Re: [Libguestfs] [PATCH libnbd 4/8] copy: Separate finishing a command from freeing it

2022-02-22 Thread Nir Soffer
On Mon, Feb 21, 2022 at 5:08 PM Eric Blake  wrote:
>
> On Sun, Feb 20, 2022 at 02:13:59PM +0200, Nir Soffer wrote:
> > free_command() was abused as a completion callback. Introduce
> > finish_command() completion callback, so code that want to free a
> > command does not have to add dummy errors.
> >
> > This will make it easier to manage worker state when a command
> > completes.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  copy/multi-thread-copying.c | 34 --
> >  1 file changed, 20 insertions(+), 14 deletions(-)
>
> In addition to Rich's review,
>
> >
> >  static int
> > -free_command (void *vp, int *error)
> > +finished_command (void *vp, int *error)
> >  {
> >struct command *command = vp;
> > -  struct buffer *buffer = command->slice.buffer;
> >
> >if (*error) {
> >  fprintf (stderr, "write at offset %" PRId64 " failed: %s\n",
> >   command->offset, strerror (*error));
> >  exit (EXIT_FAILURE);
> >}
> >
> > +  free_command (command);
> > +
> > +  return 1; /* auto-retires the command */
> > +}
> > +
> > +static void
> > +free_command (struct command *command)
> > +{
> > +  struct buffer *buffer = command->slice.buffer;
>
> Do we want to allow 'free_command (NULL)', in which case this should
> check if command is non-NULL before initializing buffer?  Doing so may
> make some other cleanup paths easier to write.

I agree it is better to behave like free(). Will improve later.

>
> But for now, all callers pass in non-NULL, so ACK either way.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 3/8] copy: Extract create_command and create_buffer helpers

2022-02-22 Thread Nir Soffer
On Mon, Feb 21, 2022 at 5:03 PM Eric Blake  wrote:
>
> On Sun, Feb 20, 2022 at 02:13:58PM +0200, Nir Soffer wrote:
> > Creating a new command requires lot of boilerplate that makes it harder
> > to focus on the interesting code. Extract a helpers to create a command,
> > and the command slice buffer.
> >
> > create_buffer is called only once, but the compiler is smart enough to
> > inline it, and adding it makes the code much simpler.
> >
> > This change is a refactoring except fixing perror() message for calloc()
> > failure.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  copy/multi-thread-copying.c | 87 +++--
> >  1 file changed, 54 insertions(+), 33 deletions(-)
>
> >if (exts.ptr[i].zero) {
> >  /* The source is zero so we can proceed directly to skipping,
> >   * fast zeroing, or writing zeroes at the destination.
> >   */
> > -command = calloc (1, sizeof *command);
> > -if (command == NULL) {
> > -  perror ("malloc");
> > -  exit (EXIT_FAILURE);
> > -}
> > -command->offset = exts.ptr[i].offset;
> > -command->slice.len = exts.ptr[i].length;
> > -command->slice.base = 0;
>
> This assignment is dead code after calloc,...
>
> > -command->index = index;
> > +command = create_command (exts.ptr[i].offset, exts.ptr[i].length,
> > +  true, index);
> >  fill_dst_range_with_zeroes (command);
> >}
> >
> >else /* data */ {
> >  /* As the extent might be larger than permitted for a single
> >   * command, we may have to split this into multiple read
> >   * requests.
> >   */
> >  while (exts.ptr[i].length > 0) {
> >len = exts.ptr[i].length;
> >if (len > request_size)
> >  len = request_size;
> > -  data = malloc (len);
> > -  if (data == NULL) {
> > -perror ("malloc");
> > -exit (EXIT_FAILURE);
> > -  }
> > -  buffer = calloc (1, sizeof *buffer);
> > -  if (buffer == NULL) {
> > -perror ("malloc");
> > -exit (EXIT_FAILURE);
> > -  }
> > -  buffer->data = data;
> > -  buffer->refs = 1;
> > -  command = calloc (1, sizeof *command);
> > -  if (command == NULL) {
> > -perror ("malloc");
> > -exit (EXIT_FAILURE);
> > -  }
> > -  command->offset = exts.ptr[i].offset;
> > -  command->slice.len = len;
> > -  command->slice.base = 0;
>
> ...as was this,...
>
> > -  command->slice.buffer = buffer;
> > -  command->index = index;
> > +
> > +  command = create_command (exts.ptr[i].offset, len,
> > +false, index);
> >
> >wait_for_request_slots (index);
> >
> >/* Begin the asynch read operation. */
> >src->ops->asynch_read (src, command,
> >   (nbd_completion_callback) {
> > .callback = finished_read,
> > .user_data = command,
> >   });
> >
> > @@ -331,20 +305,67 @@ poll_both_ends (uintptr_t index)
> >  else if ((fds[1].revents & POLLOUT) != 0)
> >dst->ops->asynch_notify_write (dst, index);
> >  else if ((fds[1].revents & (POLLERR | POLLNVAL)) != 0) {
> >errno = ENOTCONN;
> >perror (dst->name);
> >exit (EXIT_FAILURE);
> >  }
> >}
> >  }
> >
> > +/* Create a new buffer. */
> > +static struct buffer*
> > +create_buffer (size_t len)
> > +{
> > +  struct buffer *buffer;
> > +
> > +  buffer = calloc (1, sizeof *buffer);
> > +  if (buffer == NULL) {
> > +perror ("calloc");
> > +exit (EXIT_FAILURE);
> > +  }
> > +
> > +  buffer->data = malloc (len);
> > +  if (buffer->data == NULL) {
> > +perror ("malloc");
> > +exit (EXIT_FAILURE);
> > +  }
> > +
> > +  buffer->refs = 1;
> > +
> > +  return buffer;
> > +}
> > +
> > +/* Create a 

Re: [Libguestfs] [PATCH libnbd 2/8] copy: Rename copy_subcommand to create_subcommand

2022-02-22 Thread Nir Soffer
On Mon, Feb 21, 2022 at 4:52 PM Eric Blake  wrote:
>
> On Sun, Feb 20, 2022 at 02:13:57PM +0200, Nir Soffer wrote:
> > copy_subcommand creates a new command without copying the original
> > command. Rename the function to make this more clear.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  copy/multi-thread-copying.c | 29 ++---
> >  1 file changed, 14 insertions(+), 15 deletions(-)
> >
> >  if (!last_is_zero) {
> >/* Write the last data (if any). */
> >if (i - last_offset > 0) {
> > -newcommand = copy_subcommand (command,
> > +newcommand = create_subcommand (command,
> >last_offset, i - last_offset,
> >false);
>
> Indentation needs updates here.

Will fix before pushing.

>
> >  dst->ops->asynch_write (dst, newcommand,
> >  (nbd_completion_callback) {
> >.callback = free_command,
> >.user_data = newcommand,
> >  });
> >}
> >/* Start the new zero range. */
> >last_offset = i;
> > @@ -431,55 +430,55 @@ finished_read (void *vp, int *error)
> >  }
> >}
> >else {
> >  /* It's data.  If the last was data too, do nothing =>
> >   * coalesce.  Otherwise write the last zero range and start a
> >   * new data.
> >   */
> >  if (last_is_zero) {
> >/* Write the last zero range (if any). */
> >if (i - last_offset > 0) {
> > -newcommand = copy_subcommand (command,
> > -  last_offset, i - last_offset,
> > -  true);
> > +newcommand = create_subcommand (command,
> > +last_offset, i - last_offset,
> > +true);
>
> But you got it right elsewhere.
>
> ACK.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 1/8] copy: Remove wrong references to holes

2022-02-22 Thread Nir Soffer
On Mon, Feb 21, 2022 at 4:51 PM Eric Blake  wrote:
>
> On Sun, Feb 20, 2022 at 02:13:56PM +0200, Nir Soffer wrote:
> > In the past nbdcopy was looking for hole extents instead of zero
> > extents. When we fixed this, we forgot to update some comments and
> > variable names referencing hole instead of zero.
>
> Might be nice to add:
>
> Fixes: d5f65e36 ("copy: Do not use trim for zeroing", v1.7.3)
>
> or whatever commit you think would be better.

I will add the relevant commit before pushing.

>
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  copy/multi-thread-copying.c | 34 +-
> >  1 file changed, 17 insertions(+), 17 deletions(-)
> >
>
> ACK.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 7/8] copy: Track worker queue size

2022-02-21 Thread Nir Soffer
On Mon, Feb 21, 2022 at 12:17 PM Richard W.M. Jones  wrote:
>
> On Mon, Feb 21, 2022 at 08:28:54AM +0200, Nir Soffer wrote:
> > On Sun, Feb 20, 2022 at 8:53 PM Richard W.M. Jones  
> > wrote:
> > >
> > > On Sun, Feb 20, 2022 at 02:14:02PM +0200, Nir Soffer wrote:
> > > > +static inline void
> > > > +increase_queue_size(struct worker *worker, size_t len)
> > >
> > >   ^ space
> > >
> > > and the same in the next function:
> >
> > Sure will fix before pushing.
> >
> > Do we have a way to format the source automatically with spaces
> > before ()?
>
> I don't think anyone was written GNU indent rules yet ..

Seems that it is supported:

   -pcs, --space-after-procedure-calls
   Insert a space between the name of the procedure being
called and the ‘(’.
   See  STATEMENTS.

>
> > > > +{
> > > > +  worker->queue_size += len;
> > > > +}
> > > > +
> > > > +static inline void
> > > > +decrease_queue_size(struct worker *worker, size_t len)
> > > > +{
> > > > +  assert (worker->queue_size >= len);
> > > > +  worker->queue_size -= len;
> > > > +}
> > >
> > > Do we not need any locking here?
> >
> > Since every worker thread accesses only its data, no locking is needed.
>
> OK
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-df lists disk usage of guests without needing to install any
> software inside the virtual machine.  Supports Linux and Windows.
> http://people.redhat.com/~rjones/virt-df/
>


___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH libnbd 7/8] copy: Track worker queue size

2022-02-20 Thread Nir Soffer
On Sun, Feb 20, 2022 at 8:53 PM Richard W.M. Jones  wrote:
>
> On Sun, Feb 20, 2022 at 02:14:02PM +0200, Nir Soffer wrote:
> > +static inline void
> > +increase_queue_size(struct worker *worker, size_t len)
>
>   ^ space
>
> and the same in the next function:

Sure will fix before pushing.

Do we have a way to format the source automatically with spaces
before ()?

>
> > +{
> > +  worker->queue_size += len;
> > +}
> > +
> > +static inline void
> > +decrease_queue_size(struct worker *worker, size_t len)
> > +{
> > +  assert (worker->queue_size >= len);
> > +  worker->queue_size -= len;
> > +}
>
> Do we not need any locking here?

Since every worker thread accesses only its data, no locking is needed.

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd 8/8] copy: Adaptive queue size

2022-02-20 Thread Nir Soffer
Limit the size of the copy queue also by the number of queued bytes.
This allows using many concurrent small requests, required to get good
performance, but limiting the number of large requests that are actually
faster with lower concurrency.

New --queue-size option added to control the maximum queue size. With 2
MiB we can have 8 256 KiB requests per connection. The default queue
size is 16 MiB, to match the default --requests value (64) with the
default --request-size (256 KiB). Testing show that using more than 16
256 KiB requests with one connection do not improve anything.

The new option will simplify limiting memory usage when using large
requests, like this change in virt-v2v:
https://github.com/libguestfs/virt-v2v/commit/c943420219fa0ee971fc228aa4d9127c5ce973f7

I tested this change with 3 images:

- Fedora 35 + 3g of random data - hopefully simulates a real image
- Fully allocated image - the special case when every read command is
  converted to a write command.
- Zero image - the special case when every read command is converted to
  a zero command.

On 2 machines:

- laptop: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz, 12 cpus,
  1.5 MiB L2 cache per 2 cpus, 12 MiB L3 cache.
- server: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz, 80 cpus,
  1 MiB L2 cache per cpu, 27.5 MiB L3 cache.

In all cases, both source and destination are served by qemu-nbd, using
--cache=none --aio=native. Because qemu-nbd does not support MULTI_CON
for writing, we are testing a single connection when copying an to
qemu-nbd. I tested also copying to null: since in this case we use 4
connections (these tests are marked with /ro).

Results for copying all images on all machines with nbdcopy v1.11.0 and
this change. "before" and "after" are average time of 10 runs.

image   machinebeforeafterqueue sizeimprovement
===
fedora  laptop  3.0442.129   2m   +43%
fulllaptop  4.9003.136   2m   +56%
zerolaptop  3.1472.624   2m   +20%
---
fedora  server  2.3242.189  16m+6%
fullserver  3.5213.380   8m+4%
zeroserver  2.2972.338  16m-2%
---
fedora/ro   laptop  2.0401.663   1m   +23%
fedora/ro   server  1.5851.393   2m   +14%

Signed-off-by: Nir Soffer 
---
 copy/main.c | 52 -
 copy/multi-thread-copying.c | 18 +++--
 copy/nbdcopy.h  |  1 +
 copy/nbdcopy.pod| 12 +++--
 4 files changed, 55 insertions(+), 28 deletions(-)

diff --git a/copy/main.c b/copy/main.c
index 390de1eb..832f99da 100644
--- a/copy/main.c
+++ b/copy/main.c
@@ -36,53 +36,55 @@
 
 #include 
 
 #include 
 
 #include "ispowerof2.h"
 #include "human-size.h"
 #include "version.h"
 #include "nbdcopy.h"
 
-bool allocated; /* --allocated flag */
-unsigned connections = 4;   /* --connections */
-bool destination_is_zero;   /* --destination-is-zero flag */
-bool extents = true;/* ! --no-extents flag */
-bool flush; /* --flush flag */
-unsigned max_requests = 64; /* --requests */
-bool progress;  /* -p flag */
-int progress_fd = -1;   /* --progress=FD */
-unsigned request_size = 1<<18;  /* --request-size */
-unsigned sparse_size = 4096;/* --sparse */
-bool synchronous;   /* --synchronous flag */
-unsigned threads;   /* --threads */
-struct rw *src, *dst;   /* The source and destination. */
-bool verbose;   /* --verbose flag */
-
-const char *prog;   /* program name (== basename argv[0]) */
+bool allocated; /* --allocated flag */
+unsigned connections = 4;   /* --connections */
+bool destination_is_zero;   /* --destination-is-zero flag */
+bool extents = true;/* ! --no-extents flag */
+bool flush; /* --flush flag */
+unsigned max_requests = 64; /* --requests */
+bool progress;  /* -p flag */
+int progress_fd = -1;   /* --progress=FD */
+unsigned request_size = 1<<18;  /* --request-size */
+unsigned queue_size = 16<<20;   /* --queue-size */
+unsigned sparse_size = 4096;/* --sparse */
+bool synchronous;   /* --synchronous flag */
+unsigned threads;   /* --threads */
+struct rw *src, *dst;   /* The source and destination. */
+bool verbose;   /* --verbose flag */
+
+const char *prog;   /* program name (== basename argv[0

[Libguestfs] [PATCH libnbd 7/8] copy: Track worker queue size

2022-02-20 Thread Nir Soffer
Tracking the number of queued bytes per worker will allow optimizing the
number of in flight requests based on the actual requests size.

The goal is to allow large number of small requests, required to get
good performance, and in the same time limit the number of large
requests, that can be faster with lower number of requests.

Signed-off-by: Nir Soffer 
---
 copy/multi-thread-copying.c | 33 +
 copy/nbdcopy.h  |  6 ++
 2 files changed, 39 insertions(+)

diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c
index 8ba721fe..620dc571 100644
--- a/copy/multi-thread-copying.c
+++ b/copy/multi-thread-copying.c
@@ -133,20 +133,45 @@ multi_thread_copying (void)
 static void wait_for_request_slots (size_t index);
 static unsigned in_flight (size_t index);
 static void poll_both_ends (size_t index);
 static int finished_read (void *vp, int *error);
 static int finished_command (void *vp, int *error);
 static void free_command (struct command *command);
 static void fill_dst_range_with_zeroes (struct command *command);
 static struct command *create_command (uint64_t offset, size_t len, bool zero,
struct worker *worker);
 
+/* Tracking worker queue size.
+ *
+ * The queue size is increased when starting a read command.
+ *
+ * The queue size is decreased when a read command is converted to zero
+ * subcommand in finished_read(), or when a write command completes in
+ * finished_command().
+ *
+ * Zero commands are not considered in the queue size since they have no
+ * payload.
+ */
+
+static inline void
+increase_queue_size(struct worker *worker, size_t len)
+{
+  worker->queue_size += len;
+}
+
+static inline void
+decrease_queue_size(struct worker *worker, size_t len)
+{
+  assert (worker->queue_size >= len);
+  worker->queue_size -= len;
+}
+
 /* There are 'threads' worker threads, each copying work ranges from
  * src to dst until there are no more work ranges.
  */
 static void *
 worker_thread (void *wp)
 {
   struct worker *w = wp;
   uint64_t offset, count;
   extent_list exts = empty_vector;
 
@@ -180,20 +205,23 @@ worker_thread (void *wp)
 while (exts.ptr[i].length > 0) {
   len = exts.ptr[i].length;
   if (len > request_size)
 len = request_size;
 
   command = create_command (exts.ptr[i].offset, len,
 false, w);
 
   wait_for_request_slots (w->index);
 
+  /* NOTE: Must increase the queue size after waiting. */
+  increase_queue_size (w, len);
+
   /* Begin the asynch read operation. */
   src->ops->asynch_read (src, command,
  (nbd_completion_callback) {
.callback = finished_read,
.user_data = command,
  });
 
   exts.ptr[i].offset += len;
   exts.ptr[i].length -= len;
 }
@@ -455,20 +483,21 @@ finished_read (void *vp, int *error)
 /* It's data.  If the last was data too, do nothing =>
  * coalesce.  Otherwise write the last zero range and start a
  * new data.
  */
 if (last_is_zero) {
   /* Write the last zero range (if any). */
   if (i - last_offset > 0) {
 newcommand = create_subcommand (command,
 last_offset, i - last_offset,
 true);
+decrease_queue_size (command->worker, newcommand->slice.len);
 fill_dst_range_with_zeroes (newcommand);
   }
   /* Start the new data. */
   last_offset = i;
   last_is_zero = false;
 }
   }
 } /* for i */
 
 /* Write the last_offset up to i. */
@@ -480,20 +509,21 @@ finished_read (void *vp, int *error)
 dst->ops->asynch_write (dst, newcommand,
 (nbd_completion_callback) {
   .callback = finished_command,
   .user_data = newcommand,
 });
   }
   else {
 newcommand = create_subcommand (command,
 last_offset, i - last_offset,
 true);
+decrease_queue_size (command->worker, newcommand->slice.len);
 fill_dst_range_with_zeroes (newcommand);
   }
 }
 
 /* There may be an unaligned tail, so write that. */
 if (end - i > 0) {
   newcommand = create_subcommand (command, i, end - i, false);
   dst->ops->asynch_write (dst, newcommand,
   (nbd_completion_callback) {
 .callback = finished_command,
@@ -573,20 +603,23 @@ static int
 finished_command (void *vp, int *error)
 {
   struct co

[Libguestfs] [PATCH libnbd 6/8] copy: Keep worker pointer in command

2022-02-20 Thread Nir Soffer
Replace the command index with a worker pointer. The nbd-ops access the
index via the worker pointer. This allows commands to modify worker
state during processing.

Signed-off-by: Nir Soffer 
---
 copy/multi-thread-copying.c | 12 ++--
 copy/nbd-ops.c  |  6 +++---
 copy/nbdcopy.h  |  2 +-
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c
index a1a8d09c..8ba721fe 100644
--- a/copy/multi-thread-copying.c
+++ b/copy/multi-thread-copying.c
@@ -131,21 +131,21 @@ multi_thread_copying (void)
 }
 
 static void wait_for_request_slots (size_t index);
 static unsigned in_flight (size_t index);
 static void poll_both_ends (size_t index);
 static int finished_read (void *vp, int *error);
 static int finished_command (void *vp, int *error);
 static void free_command (struct command *command);
 static void fill_dst_range_with_zeroes (struct command *command);
 static struct command *create_command (uint64_t offset, size_t len, bool zero,
-   size_t index);
+   struct worker *worker);
 
 /* There are 'threads' worker threads, each copying work ranges from
  * src to dst until there are no more work ranges.
  */
 static void *
 worker_thread (void *wp)
 {
   struct worker *w = wp;
   uint64_t offset, count;
   extent_list exts = empty_vector;
@@ -161,36 +161,36 @@ worker_thread (void *wp)
 
 for (i = 0; i < exts.len; ++i) {
   struct command *command;
   size_t len;
 
   if (exts.ptr[i].zero) {
 /* The source is zero so we can proceed directly to skipping,
  * fast zeroing, or writing zeroes at the destination.
  */
 command = create_command (exts.ptr[i].offset, exts.ptr[i].length,
-  true, w->index);
+  true, w);
 fill_dst_range_with_zeroes (command);
   }
 
   else /* data */ {
 /* As the extent might be larger than permitted for a single
  * command, we may have to split this into multiple read
  * requests.
  */
 while (exts.ptr[i].length > 0) {
   len = exts.ptr[i].length;
   if (len > request_size)
 len = request_size;
 
   command = create_command (exts.ptr[i].offset, len,
-false, w->index);
+false, w);
 
   wait_for_request_slots (w->index);
 
   /* Begin the asynch read operation. */
   src->ops->asynch_read (src, command,
  (nbd_completion_callback) {
.callback = finished_read,
.user_data = command,
  });
 
@@ -332,38 +332,38 @@ create_buffer (size_t len)
 exit (EXIT_FAILURE);
   }
 
   buffer->refs = 1;
 
   return buffer;
 }
 
 /* Create a new command for read or zero. */
 static struct command *
-create_command (uint64_t offset, size_t len, bool zero, size_t index)
+create_command (uint64_t offset, size_t len, bool zero, struct worker *worker)
 {
   struct command *command;
 
   command = calloc (1, sizeof *command);
   if (command == NULL) {
 perror ("calloc");
 exit (EXIT_FAILURE);
   }
 
   command->offset = offset;
   command->slice.len = len;
   command->slice.base = 0;
 
   if (!zero)
 command->slice.buffer = create_buffer (len);
 
-  command->index = index;
+  command->worker = worker;
 
   return command;
 }
 
 /* Create a sub-command of an existing command.  This creates a slice
  * referencing the buffer of the existing command without copying.
  */
 static struct command *
 create_subcommand (struct command *command, uint64_t offset, size_t len,
bool zero)
@@ -379,21 +379,21 @@ create_subcommand (struct command *command, uint64_t 
offset, size_t len,
 perror ("calloc");
 exit (EXIT_FAILURE);
   }
   newcommand->offset = offset;
   newcommand->slice.len = len;
   if (!zero) {
 newcommand->slice.buffer = command->slice.buffer;
 newcommand->slice.buffer->refs++;
 newcommand->slice.base = offset - command->offset;
   }
-  newcommand->index = command->index;
+  newcommand->worker = command->worker;
 
   return newcommand;
 }
 
 /* Callback called when src has finished one read command.  This
  * initiates a write.
  */
 static int
 finished_read (void *vp, int *error)
 {
diff --git a/copy/nbd-ops.c b/copy/nbd-ops.c
index dca86e88..adfe4de5 100644
--- a/copy/nbd-ops.c
+++ b/copy/nbd-ops.c
@@ -296,57 +296,57 @@ nbd_ops_synch_zero (struct rw *rw, uint64_t offset, 
uint64_t count,
   return true;
 }
 
 static void
 nbd_ops_asynch_read (struct rw *rw,
  struct command *command,
  nbd_completion_callback cb)
 {

[Libguestfs] [PATCH libnbd 2/8] copy: Rename copy_subcommand to create_subcommand

2022-02-20 Thread Nir Soffer
copy_subcommand creates a new command without copying the original
command. Rename the function to make this more clear.

Signed-off-by: Nir Soffer 
---
 copy/multi-thread-copying.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c
index 632d7006..2d16d2df 100644
--- a/copy/multi-thread-copying.c
+++ b/copy/multi-thread-copying.c
@@ -332,26 +332,25 @@ poll_both_ends (uintptr_t index)
   dst->ops->asynch_notify_write (dst, index);
 else if ((fds[1].revents & (POLLERR | POLLNVAL)) != 0) {
   errno = ENOTCONN;
   perror (dst->name);
   exit (EXIT_FAILURE);
 }
   }
 }
 
 /* Create a sub-command of an existing command.  This creates a slice
- * referencing the buffer of the existing command in order to avoid
- * copying.
+ * referencing the buffer of the existing command without copying.
  */
 static struct command *
-copy_subcommand (struct command *command, uint64_t offset, size_t len,
- bool zero)
+create_subcommand (struct command *command, uint64_t offset, size_t len,
+   bool zero)
 {
   const uint64_t end = command->offset + command->slice.len;
   struct command *newcommand;
 
   assert (command->offset <= offset && offset < end);
   assert (offset + len <= end);
 
   newcommand = calloc (1, sizeof *newcommand);
   if (newcommand == NULL) {
 perror ("calloc");
@@ -409,21 +408,21 @@ finished_read (void *vp, int *error)
  i + sparse_size <= end;
  i += sparse_size) {
   if (is_zero (slice_ptr (command->slice) + i-start, sparse_size)) {
 /* It's a zero range.  If the last was a zero too then we do
  * nothing here which coalesces.  Otherwise write the last data
  * and start a new zero range.
  */
 if (!last_is_zero) {
   /* Write the last data (if any). */
   if (i - last_offset > 0) {
-newcommand = copy_subcommand (command,
+newcommand = create_subcommand (command,
   last_offset, i - last_offset,
   false);
 dst->ops->asynch_write (dst, newcommand,
 (nbd_completion_callback) {
   .callback = free_command,
   .user_data = newcommand,
 });
   }
   /* Start the new zero range. */
   last_offset = i;
@@ -431,55 +430,55 @@ finished_read (void *vp, int *error)
 }
   }
   else {
 /* It's data.  If the last was data too, do nothing =>
  * coalesce.  Otherwise write the last zero range and start a
  * new data.
  */
 if (last_is_zero) {
   /* Write the last zero range (if any). */
   if (i - last_offset > 0) {
-newcommand = copy_subcommand (command,
-  last_offset, i - last_offset,
-  true);
+newcommand = create_subcommand (command,
+last_offset, i - last_offset,
+true);
 fill_dst_range_with_zeroes (newcommand);
   }
   /* Start the new data. */
   last_offset = i;
   last_is_zero = false;
 }
   }
 } /* for i */
 
 /* Write the last_offset up to i. */
 if (i - last_offset > 0) {
   if (!last_is_zero) {
-newcommand = copy_subcommand (command,
-  last_offset, i - last_offset,
-  false);
+newcommand = create_subcommand (command,
+last_offset, i - last_offset,
+false);
 dst->ops->asynch_write (dst, newcommand,
 (nbd_completion_callback) {
   .callback = free_command,
   .user_data = newcommand,
 });
   }
   else {
-newcommand = copy_subcommand (command,
-  last_offset, i - last_offset,
-  true);
+newcommand = create_subcommand (command,
+last_offset, i - last_offset,
+true);
 fill_dst_range_with_zeroes (newcommand);
   }
 }
 
 /* There may be an unaligned tail, so write that. */
 if (end - i > 0) {
-  newcommand = copy_subcommand (command, i, end - i, false);
+  newcommand = create_subcommand (command, i, end - i, false);
   dst->ops-&g

[Libguestfs] [PATCH libnbd 4/8] copy: Separate finishing a command from freeing it

2022-02-20 Thread Nir Soffer
free_command() was abused as a completion callback. Introduce
finish_command() completion callback, so code that want to free a
command does not have to add dummy errors.

This will make it easier to manage worker state when a command
completes.

Signed-off-by: Nir Soffer 
---
 copy/multi-thread-copying.c | 34 --
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c
index 855d1ba4..aa6a9f41 100644
--- a/copy/multi-thread-copying.c
+++ b/copy/multi-thread-copying.c
@@ -126,21 +126,22 @@ multi_thread_copying (void)
 }
   }
 
   free (workers);
 }
 
 static void wait_for_request_slots (uintptr_t index);
 static unsigned in_flight (uintptr_t index);
 static void poll_both_ends (uintptr_t index);
 static int finished_read (void *vp, int *error);
-static int free_command (void *vp, int *error);
+static int finished_command (void *vp, int *error);
+static void free_command (struct command *command);
 static void fill_dst_range_with_zeroes (struct command *command);
 static struct command *create_command (uint64_t offset, size_t len, bool zero,
uintptr_t index);
 
 /* There are 'threads' worker threads, each copying work ranges from
  * src to dst until there are no more work ranges.
  */
 static void *
 worker_thread (void *indexp)
 {
@@ -402,53 +403,52 @@ finished_read (void *vp, int *error)
  command->offset, strerror (*error));
 exit (EXIT_FAILURE);
   }
 
   if (allocated || sparse_size == 0) {
 /* If sparseness detection (see below) is turned off then we write
  * the whole command.
  */
 dst->ops->asynch_write (dst, command,
 (nbd_completion_callback) {
-  .callback = free_command,
+  .callback = finished_command,
   .user_data = command,
 });
   }
   else {   /* Sparseness detection. */
 const uint64_t start = command->offset;
 const uint64_t end = start + command->slice.len;
 uint64_t last_offset = start;
 bool last_is_zero = false;
 uint64_t i;
 struct command *newcommand;
-int dummy = 0;
 
 /* Iterate over whole blocks in the command, starting on a block
  * boundary.
  */
 for (i = MIN (ROUND_UP (start, sparse_size), end);
  i + sparse_size <= end;
  i += sparse_size) {
   if (is_zero (slice_ptr (command->slice) + i-start, sparse_size)) {
 /* It's a zero range.  If the last was a zero too then we do
  * nothing here which coalesces.  Otherwise write the last data
  * and start a new zero range.
  */
 if (!last_is_zero) {
   /* Write the last data (if any). */
   if (i - last_offset > 0) {
 newcommand = create_subcommand (command,
   last_offset, i - last_offset,
   false);
 dst->ops->asynch_write (dst, newcommand,
 (nbd_completion_callback) {
-  .callback = free_command,
+  .callback = finished_command,
   .user_data = newcommand,
 });
   }
   /* Start the new zero range. */
   last_offset = i;
   last_is_zero = true;
 }
   }
   else {
 /* It's data.  If the last was data too, do nothing =>
@@ -471,46 +471,46 @@ finished_read (void *vp, int *error)
 } /* for i */
 
 /* Write the last_offset up to i. */
 if (i - last_offset > 0) {
   if (!last_is_zero) {
 newcommand = create_subcommand (command,
 last_offset, i - last_offset,
 false);
 dst->ops->asynch_write (dst, newcommand,
 (nbd_completion_callback) {
-  .callback = free_command,
+  .callback = finished_command,
   .user_data = newcommand,
 });
   }
   else {
 newcommand = create_subcommand (command,
 last_offset, i - last_offset,
 true);
 fill_dst_range_with_zeroes (newcommand);
   }
 }
 
 /* There may be an unaligned tail, so write that. */
 if (end - i > 0) {
   newcommand = create_subcommand (command, i, end - i, false);
   dst->ops->asynch_write (dst, newcommand,
   (nbd_completion_callback) {
-.callback = free_command,
+ 

[Libguestfs] [PATCH libnbd 3/8] copy: Extract create_command and create_buffer helpers

2022-02-20 Thread Nir Soffer
Creating a new command requires lot of boilerplate that makes it harder
to focus on the interesting code. Extract a helpers to create a command,
and the command slice buffer.

create_buffer is called only once, but the compiler is smart enough to
inline it, and adding it makes the code much simpler.

This change is a refactoring except fixing perror() message for calloc()
failure.

Signed-off-by: Nir Soffer 
---
 copy/multi-thread-copying.c | 87 +++--
 1 file changed, 54 insertions(+), 33 deletions(-)

diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c
index 2d16d2df..855d1ba4 100644
--- a/copy/multi-thread-copying.c
+++ b/copy/multi-thread-copying.c
@@ -128,20 +128,22 @@ multi_thread_copying (void)
 
   free (workers);
 }
 
 static void wait_for_request_slots (uintptr_t index);
 static unsigned in_flight (uintptr_t index);
 static void poll_both_ends (uintptr_t index);
 static int finished_read (void *vp, int *error);
 static int free_command (void *vp, int *error);
 static void fill_dst_range_with_zeroes (struct command *command);
+static struct command *create_command (uint64_t offset, size_t len, bool zero,
+   uintptr_t index);
 
 /* There are 'threads' worker threads, each copying work ranges from
  * src to dst until there are no more work ranges.
  */
 static void *
 worker_thread (void *indexp)
 {
   uintptr_t index = (uintptr_t) indexp;
   uint64_t offset, count;
   extent_list exts = empty_vector;
@@ -150,71 +152,43 @@ worker_thread (void *indexp)
 size_t i;
 
 assert (0 < count && count <= THREAD_WORK_SIZE);
 if (extents)
   src->ops->get_extents (src, index, offset, count, );
 else
   default_get_extents (src, index, offset, count, );
 
 for (i = 0; i < exts.len; ++i) {
   struct command *command;
-  struct buffer *buffer;
-  char *data;
   size_t len;
 
   if (exts.ptr[i].zero) {
 /* The source is zero so we can proceed directly to skipping,
  * fast zeroing, or writing zeroes at the destination.
  */
-command = calloc (1, sizeof *command);
-if (command == NULL) {
-  perror ("malloc");
-  exit (EXIT_FAILURE);
-}
-command->offset = exts.ptr[i].offset;
-command->slice.len = exts.ptr[i].length;
-command->slice.base = 0;
-command->index = index;
+command = create_command (exts.ptr[i].offset, exts.ptr[i].length,
+  true, index);
 fill_dst_range_with_zeroes (command);
   }
 
   else /* data */ {
 /* As the extent might be larger than permitted for a single
  * command, we may have to split this into multiple read
  * requests.
  */
 while (exts.ptr[i].length > 0) {
   len = exts.ptr[i].length;
   if (len > request_size)
 len = request_size;
-  data = malloc (len);
-  if (data == NULL) {
-perror ("malloc");
-exit (EXIT_FAILURE);
-  }
-  buffer = calloc (1, sizeof *buffer);
-  if (buffer == NULL) {
-perror ("malloc");
-exit (EXIT_FAILURE);
-  }
-  buffer->data = data;
-  buffer->refs = 1;
-  command = calloc (1, sizeof *command);
-  if (command == NULL) {
-perror ("malloc");
-exit (EXIT_FAILURE);
-  }
-  command->offset = exts.ptr[i].offset;
-  command->slice.len = len;
-  command->slice.base = 0;
-  command->slice.buffer = buffer;
-  command->index = index;
+
+  command = create_command (exts.ptr[i].offset, len,
+false, index);
 
   wait_for_request_slots (index);
 
   /* Begin the asynch read operation. */
   src->ops->asynch_read (src, command,
  (nbd_completion_callback) {
.callback = finished_read,
.user_data = command,
  });
 
@@ -331,20 +305,67 @@ poll_both_ends (uintptr_t index)
 else if ((fds[1].revents & POLLOUT) != 0)
   dst->ops->asynch_notify_write (dst, index);
 else if ((fds[1].revents & (POLLERR | POLLNVAL)) != 0) {
   errno = ENOTCONN;
   perror (dst->name);
   exit (EXIT_FAILURE);
 }
   }
 }
 
+/* Create a new buffer. */
+static struct buffer*
+create_buffer (size_t len)
+{
+  struct buffer *buffer;
+
+  buffer = calloc (1, sizeof *buffer);
+  if (buffer == NULL) {
+perror ("calloc");
+exit (EXIT_FAILURE);
+  }
+
+  buffer->data = malloc (len);
+  if (buffer->data == NULL) {
+perror ("malloc");
+exit (EXIT_FAILURE);
+  }
+
+  buffer->refs = 

[Libguestfs] [PATCH libnbd 0/8] nbdcopy: Adaptive queue size

2022-02-20 Thread Nir Soffer
This series add adaptive queue size feature, which gives great
performance improvmemnt on my laptop, but less exciting results on a
real server. When qemu-nbd will support MULTI-CON for writes, this
should become more interesting.

To implement this I added a worker struct for keeping worker state, and
cleaned up the completion flow and other stuff. I think these cleanups
are a good idea even if we do not add adaptive queue size.

Nir Soffer (8):
  copy: Remove wrong references to holes
  copy: Rename copy_subcommand to create_subcommand
  copy: Extract create_command and create_buffer helpers
  copy: Separate finishing a command from freeing it
  copy: Introduce worker struct
  copy: Keep worker pointer in command
  copy: Track worker queue size
  copy: Adaptive queue size

 copy/file-ops.c |   4 +-
 copy/main.c |  58 +---
 copy/multi-thread-copying.c | 270 ++--
 copy/nbd-ops.c  |  16 +--
 copy/nbdcopy.h  |  31 +++--
 copy/nbdcopy.pod|  12 +-
 copy/null-ops.c |   4 +-
 copy/pipe-ops.c |   2 +-
 8 files changed, 248 insertions(+), 149 deletions(-)

-- 
2.35.1


___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd 5/8] copy: Introduce worker struct

2022-02-20 Thread Nir Soffer
I want to keep more info per worker, and using a worker struct is the
natural way to do this. This also allows cleaning up the ops-* interface
which accepted uintptr_t index while the index is never a pointer. I
think the pointer is a result of passing the index to the thread using
the void* pointer.

The worker struct is used only by the multi-threading-copy module, but
in future patch I want to keep the worker pointer in the command, to
allow commands to update worker state when they finish.

Signed-off-by: Nir Soffer 
---
 copy/file-ops.c |  4 +--
 copy/main.c |  6 ++---
 copy/multi-thread-copying.c | 49 +++--
 copy/nbd-ops.c  | 10 
 copy/nbdcopy.h  | 24 +++---
 copy/null-ops.c |  4 +--
 copy/pipe-ops.c |  2 +-
 7 files changed, 53 insertions(+), 46 deletions(-)

diff --git a/copy/file-ops.c b/copy/file-ops.c
index aaf04ade..ab378754 100644
--- a/copy/file-ops.c
+++ b/copy/file-ops.c
@@ -614,27 +614,27 @@ file_asynch_zero (struct rw *rw, struct command *command,
 {
   int dummy = 0;
 
   if (!file_synch_zero (rw, command->offset, command->slice.len, allocate))
 return false;
   cb.callback (cb.user_data, );
   return true;
 }
 
 static unsigned
-file_in_flight (struct rw *rw, uintptr_t index)
+file_in_flight (struct rw *rw, size_t index)
 {
   return 0;
 }
 
 static void
-file_get_extents (struct rw *rw, uintptr_t index,
+file_get_extents (struct rw *rw, size_t index,
   uint64_t offset, uint64_t count,
   extent_list *ret)
 {
   ret->len = 0;
 
 #ifdef SEEK_HOLE
   struct rw_file *rwf = (struct rw_file *)rw;
   static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER;
 
   if (rwf->seek_hole_supported) {
diff --git a/copy/main.c b/copy/main.c
index 67788b5d..390de1eb 100644
--- a/copy/main.c
+++ b/copy/main.c
@@ -513,44 +513,44 @@ print_rw (struct rw *rw, const char *prefix, FILE *fp)
 
   fprintf (fp, "%s: %s \"%s\"\n", prefix, rw->ops->ops_name, rw->name);
   fprintf (fp, "%s: size=%" PRIi64 " (%s)\n",
prefix, rw->size, human_size (buf, rw->size, NULL));
 }
 
 /* Default implementation of rw->ops->get_extents for backends which
  * don't/can't support extents.  Also used for the --no-extents case.
  */
 void
-default_get_extents (struct rw *rw, uintptr_t index,
+default_get_extents (struct rw *rw, size_t index,
  uint64_t offset, uint64_t count,
  extent_list *ret)
 {
   struct extent e;
 
   ret->len = 0;
 
   e.offset = offset;
   e.length = count;
   e.zero = false;
   if (extent_list_append (ret, e) == -1) {
 perror ("realloc");
 exit (EXIT_FAILURE);
   }
 }
 
 /* Implementations of get_polling_fd and asynch_notify_* for backends
  * which don't support polling.
  */
 void
-get_polling_fd_not_supported (struct rw *rw, uintptr_t index,
+get_polling_fd_not_supported (struct rw *rw, size_t index,
   int *fd_rtn, int *direction_rtn)
 {
   /* Not an error, this causes poll to ignore the fd. */
   *fd_rtn = -1;
   *direction_rtn = LIBNBD_AIO_DIRECTION_READ;
 }
 
 void
-asynch_notify_read_write_not_supported (struct rw *rw, uintptr_t index)
+asynch_notify_read_write_not_supported (struct rw *rw, size_t index)
 {
   /* nothing */
 }
diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c
index aa6a9f41..a1a8d09c 100644
--- a/copy/multi-thread-copying.c
+++ b/copy/multi-thread-copying.c
@@ -70,184 +70,185 @@ get_next_offset (uint64_t *offset, uint64_t *count)
  * the commands.  We might move this into a callback, but those
  * are called from threads and not necessarily in monotonic order
  * so the progress bar would move erratically.
  */
 progress_bar (*offset, src->size);
   }
   pthread_mutex_unlock ();
   return r;
 }
 
-static void *worker_thread (void *ip);
+static void *worker_thread (void *wp);
 
 void
 multi_thread_copying (void)
 {
-  pthread_t *workers;
+  struct worker *workers;
   size_t i;
   int err;
 
   /* Some invariants that should be true if the main program called us
* correctly.
*/
   assert (threads > 0);
   assert (threads == connections);
 /*
   if (src.ops == _ops)
 assert (src.u.nbd.handles.size == connections);
   if (dst.ops == _ops)
 assert (dst.u.nbd.handles.size == connections);
 */
   assert (src->size != -1);
 
-  workers = malloc (sizeof (pthread_t) * threads);
+  workers = calloc (threads, sizeof *workers);
   if (workers == NULL) {
-perror ("malloc");
+perror ("calloc");
 exit (EXIT_FAILURE);
   }
 
   /* Start the worker threads. */
   for (i = 0; i < threads; ++i) {
-err = pthread_create ([i], NULL, worker_thread,
-  (void *)(uintptr_t)i);
+workers[i].index = i;
+err = pthread_create ([i

[Libguestfs] [PATCH libnbd 1/8] copy: Remove wrong references to holes

2022-02-20 Thread Nir Soffer
In the past nbdcopy was looking for hole extents instead of zero
extents. When we fixed this, we forgot to update some comments and
variable names referencing hole instead of zero.

Signed-off-by: Nir Soffer 
---
 copy/multi-thread-copying.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c
index 7459b446..632d7006 100644
--- a/copy/multi-thread-copying.c
+++ b/copy/multi-thread-copying.c
@@ -337,36 +337,36 @@ poll_both_ends (uintptr_t index)
 }
   }
 }
 
 /* Create a sub-command of an existing command.  This creates a slice
  * referencing the buffer of the existing command in order to avoid
  * copying.
  */
 static struct command *
 copy_subcommand (struct command *command, uint64_t offset, size_t len,
- bool hole)
+ bool zero)
 {
   const uint64_t end = command->offset + command->slice.len;
   struct command *newcommand;
 
   assert (command->offset <= offset && offset < end);
   assert (offset + len <= end);
 
   newcommand = calloc (1, sizeof *newcommand);
   if (newcommand == NULL) {
 perror ("calloc");
 exit (EXIT_FAILURE);
   }
   newcommand->offset = offset;
   newcommand->slice.len = len;
-  if (!hole) {
+  if (!zero) {
 newcommand->slice.buffer = command->slice.buffer;
 newcommand->slice.buffer->refs++;
 newcommand->slice.base = offset - command->offset;
   }
   newcommand->index = command->index;
 
   return newcommand;
 }
 
 /* Callback called when src has finished one read command.  This
@@ -390,76 +390,76 @@ finished_read (void *vp, int *error)
 dst->ops->asynch_write (dst, command,
 (nbd_completion_callback) {
   .callback = free_command,
   .user_data = command,
 });
   }
   else {   /* Sparseness detection. */
 const uint64_t start = command->offset;
 const uint64_t end = start + command->slice.len;
 uint64_t last_offset = start;
-bool last_is_hole = false;
+bool last_is_zero = false;
 uint64_t i;
 struct command *newcommand;
 int dummy = 0;
 
 /* Iterate over whole blocks in the command, starting on a block
  * boundary.
  */
 for (i = MIN (ROUND_UP (start, sparse_size), end);
  i + sparse_size <= end;
  i += sparse_size) {
   if (is_zero (slice_ptr (command->slice) + i-start, sparse_size)) {
-/* It's a hole.  If the last was a hole too then we do nothing
- * here which coalesces.  Otherwise write the last data and
- * start a new hole.
+/* It's a zero range.  If the last was a zero too then we do
+ * nothing here which coalesces.  Otherwise write the last data
+ * and start a new zero range.
  */
-if (!last_is_hole) {
+if (!last_is_zero) {
   /* Write the last data (if any). */
   if (i - last_offset > 0) {
 newcommand = copy_subcommand (command,
   last_offset, i - last_offset,
   false);
 dst->ops->asynch_write (dst, newcommand,
 (nbd_completion_callback) {
   .callback = free_command,
   .user_data = newcommand,
 });
   }
-  /* Start the new hole. */
+  /* Start the new zero range. */
   last_offset = i;
-  last_is_hole = true;
+  last_is_zero = true;
 }
   }
   else {
 /* It's data.  If the last was data too, do nothing =>
- * coalesce.  Otherwise write the last hole and start a new
- * data.
+ * coalesce.  Otherwise write the last zero range and start a
+ * new data.
  */
-if (last_is_hole) {
-  /* Write the last hole (if any). */
+if (last_is_zero) {
+  /* Write the last zero range (if any). */
   if (i - last_offset > 0) {
 newcommand = copy_subcommand (command,
   last_offset, i - last_offset,
   true);
 fill_dst_range_with_zeroes (newcommand);
   }
   /* Start the new data. */
   last_offset = i;
-  last_is_hole = false;
+  last_is_zero = false;
 }
   }
 } /* for i */
 
 /* Write the last_offset up to i. */
 if (i - last_offset > 0) {
-  if (!last_is_hole) {
+  if (!last_is_zero) {
 newcommand = copy_subcommand (command,
   last_offset, i - last_offset,
   false);
 

Re: [Libguestfs] [PATCH libnbd v2 1/9] golang: tests: Add test for AioBuffer

2022-02-15 Thread Nir Soffer
On Tue, Feb 15, 2022 at 12:18 AM Nir Soffer  wrote:

> On Mon, Feb 14, 2022 at 3:22 PM Eric Blake  wrote:
>
>> On Fri, Feb 11, 2022 at 03:21:21AM +0200, Nir Soffer wrote:
>> > Add unit tests and benchmarks for AioBuffer. The tests are trivial but
>> > they server as running documentation, and they point out important
>>
>> serve
>>
>
> Fixed
>
>
>>
>> > details about the type.
>> >
>> > The benchmarks show the efficiency of allocating a new buffer, zeroing
>> > it, and interfacing with Go code.
>> >
>> > These tests will also ensure that we don't break anything by the next
>> > changes.
>> >
>> > To run the benchmarks use:
>> >
>> > $ go test -run=xxx -bench=.
>> > goos: linux
>> > goarch: amd64
>> > pkg: libguestfs.org/libnbd
>> > cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
>> > BenchmarkMakeAioBuffer-12  6871759   157.2 ns/op
>> > BenchmarkAioBufferZero-1217551 69552 ns/op
>> > BenchmarkFromBytes-12 9632139112 ns/op
>> > BenchmarkAioBufferBytes-12   69375 16410 ns/op
>> > PASS
>> > oklibguestfs.org/libnbd   5.843s
>> >
>> > To make sure the benchmarks will not break, we run them in "make check"
>> > with a very short timeout. For actual performance testing run "go test"
>> > directly.
>>
>> Sounds good to me.
>>
>
> Thanks, I'll push tomorrow if you don't have more comments.
>

Pushed
as 
28137eea9b78ca32fce97f8c68483f447b861bc9..9099657ef541a635eae0d6d79080ad5bb0bc3281
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v2v 2/2] -o rhv-upload: Change the default to direct connection

2022-02-15 Thread Nir Soffer
On Tue, Feb 15, 2022 at 6:49 PM Richard W.M. Jones 
wrote:

> To connect via a proxy you must now use “-oo rhv-proxy”.  This is
> usually slower and not needed.
>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2033096
> Thanks: Nir Soffer
> ---
>  output/output_rhv_upload.ml | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/output/output_rhv_upload.ml b/output/output_rhv_upload.ml
> index d3b5b412db..5826e1ada5 100644
> --- a/output/output_rhv_upload.ml
> +++ b/output/output_rhv_upload.ml
> @@ -52,7 +52,7 @@ module RHVUpload = struct
>
>-oo rhv-cafile=CA.PEM Set ‘ca.pem’ certificate bundle filename.
>-oo rhv-cluster=CLUSTERNAME   Set RHV cluster name.
> -  -oo rhv-proxy Connect via oVirt Engine proxy (default:
> true).
> +  -oo rhv-proxy Connect via oVirt Engine proxy (default:
> false).
>-oo rhv-verifypeer[=true|false] Verify server identity (default: false).
>
>  You can override the UUIDs of the disks, instead of using autogenerated
> UUIDs
> @@ -81,7 +81,7 @@ after their uploads (if you do, you must supply one for
> each disk):
>
>  let rhv_cafile = ref None in
>  let rhv_cluster = ref None in
> -let rhv_direct = ref false in
> +let rhv_direct = ref true in
>  let rhv_verifypeer = ref false in
>  let rhv_disk_uuids = ref None in
>
> --
> 2.35.1
>

Looks good

Nir
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v2v 1/2] -o rhv-upload: Replace -oo rhv-direct with -oo rhv-proxy

2022-02-15 Thread Nir Soffer
On Tue, Feb 15, 2022 at 6:49 PM Richard W.M. Jones 
wrote:

> This simply replaces the existing -oo rhv-direct option with a new -oo
> rhv-proxy option.  Note that using this option "bare" (ie. just “-oo
> rhv-proxy”) does nothing in the current commit because the default is
> still to use the proxy.
>
> Related: https://bugzilla.redhat.com/show_bug.cgi?id=2033096
> Thanks: Nir Soffer
> ---
>  docs/virt-v2v-output-rhv.pod   | 12 +---
>  docs/virt-v2v.pod  | 15 ++-
>  output/output_rhv_upload.ml|  8 +---
>  tests/test-v2v-o-rhv-upload.sh |  1 -
>  4 files changed, 16 insertions(+), 20 deletions(-)
>
> diff --git a/docs/virt-v2v-output-rhv.pod b/docs/virt-v2v-output-rhv.pod
> index bd5e80c873..2ce697f4d7 100644
> --- a/docs/virt-v2v-output-rhv.pod
> +++ b/docs/virt-v2v-output-rhv.pod
> @@ -8,7 +8,7 @@ virt-v2v-output-rhv - Using virt-v2v to convert guests to
> oVirt or RHV
>  [-op PASSWORD] [-of raw]
>  [-oo rhv-cafile=FILE]
>  [-oo rhv-cluster=CLUSTER]
> -[-oo rhv-direct]
> +[-oo rhv-proxy]
>  [-oo rhv-disk-uuid=UUID ...]
>  [-oo rhv-verifypeer]
>
> @@ -129,13 +129,11 @@ the specified UUIDs must not conflict with the UUIDs
> of existing disks
>
>  =back
>
> -=item I<-oo rhv-direct>
> +=item I<-oo rhv-proxy>
>
> -If this option is given then virt-v2v will attempt to directly upload
> -the disk to the oVirt node, otherwise it will proxy the upload through
> -the oVirt engine.  Direct upload requires that you have network access
> -to the oVirt nodes.  Non-direct upload is slightly slower but should
> -work in all situations.
> +Proxy the upload through oVirt Engine.  This is slower than uploading
> +directly to the oVirt node but may be necessary if you do not have
> +direct network access to the nodes.
>
>  =item I<-oo rhv-verifypeer>
>
> diff --git a/docs/virt-v2v.pod b/docs/virt-v2v.pod
> index 143c50671c..f50d27a0f0 100644
> --- a/docs/virt-v2v.pod
> +++ b/docs/virt-v2v.pod
> @@ -86,8 +86,7 @@ interface(s) are connected to the target network called
> C.
>   virt-v2v -ic vpx://vcenter.example.com/Datacenter/esxi vmware_guest \
> -o rhv-upload -oc https://ovirt-engine.example.com/ovirt-engine/api \
> -os ovirt-data -op /tmp/ovirt-admin-password -of raw \
> -   -oo rhv-cafile=/tmp/ca.pem -oo rhv-direct \
> -   --bridge ovirtmgmt
> +   -oo rhv-cafile=/tmp/ca.pem --bridge ovirtmgmt
>
>  In this case the host running virt-v2v acts as a B.
>
> @@ -621,14 +620,12 @@ on the oVirt engine.
>  For I<-o rhv-upload> (L) only, set the RHV Cluster
>  Name.  If not given it uses C.
>
> -=item B<-oo rhv-direct>
> +=item B<-oo rhv-proxy>
>
> -For I<-o rhv-upload> (L) only, if this option is
> given
> -then virt-v2v will attempt to directly upload the disk to the oVirt
> -node, otherwise it will proxy the upload through the oVirt engine.
> -Direct upload requires that you have network access to the oVirt
> -nodes.  Non-direct upload is slightly slower but should work in all
> -situations.
> +For I<-o rhv-upload> (L) only, proxy the
> +upload through oVirt Engine.  This is slower than uploading directly
> +to the oVirt node but may be necessary if you do not have direct
> +network access to the nodes.
>
>  =item B<-oo rhv-verifypeer>
>
> diff --git a/output/output_rhv_upload.ml b/output/output_rhv_upload.ml
> index 7c2434bde4..d3b5b412db 100644
> --- a/output/output_rhv_upload.ml
> +++ b/output/output_rhv_upload.ml
> @@ -50,9 +50,9 @@ module RHVUpload = struct
>let query_output_options () =
>  printf (f_"Output options (-oo) which can be used with -o rhv-upload:
>
> -  -oo rhv-cafile=CA.PEM   Set ‘ca.pem’ certificate bundle
> filename.
> -  -oo rhv-cluster=CLUSTERNAME Set RHV cluster name.
> -  -oo rhv-direct[=true|false] Use direct transfer mode (default:
> false).
> +  -oo rhv-cafile=CA.PEM Set ‘ca.pem’ certificate bundle filename.
> +  -oo rhv-cluster=CLUSTERNAME   Set RHV cluster name.
> +  -oo rhv-proxy Connect via oVirt Engine proxy (default:
> true).
>-oo rhv-verifypeer[=true|false] Verify server identity (default: false).
>
>  You can override the UUIDs of the disks, instead of using autogenerated
> UUIDs
> @@ -97,6 +97,8 @@ after their uploads (if you do, you must supply one for
> each disk):
>   rhv_cluster := Some v
>| "rhv-direct", "" -> rhv_direct := true
>| "rhv-direct", v -> rhv_direct := bool_of_

Re: [Libguestfs] [PATCH v2v] v2v/v2v.ml: Choose nbdcopy max requests for implicit buffer of 64M

2022-02-15 Thread Nir Soffer
On Tue, Feb 15, 2022 at 7:01 PM Richard W.M. Jones 
wrote:

> On Tue, Feb 15, 2022 at 06:38:55PM +0200, Nir Soffer wrote:
> > On Tue, Feb 15, 2022 at 5:54 PM Richard W.M. Jones 
> wrote:
> >
> > Pick the nbdcopy --requests parameter to target an implicit buffer
> > size of 64M inside nbdcopy.  However don't set nbdcopy --request <
> 64.
> >
> > If request_size == 256K (the default) => requests = 256
> > If request_size == 8M => requests = 64 (buffer size 512M)
> >
> >
> > Considering the total bytes buffered makes sense. I did the same in
> another
> > application that only reads from NBD using libnbd async API. I'm using:
> >
> > max_requests = 16
> > max_bytes = 2m
> >
> > So if you have small requests (e.g. 4k), you get 16 inflight requests per
> > connection
> > and with 4 connections 64 inflight requests on the storage side.
> >
> > But if you have large requests (256k), you get only 8 requests per
> connection
> > and
> > 32 requests on the storage side.
> >
> > This was tested in a read-only case both on my laptop with fast NVMe
> > (Samsung 970 EVO Plus 1T) and with super fast NVMe on Dell server,
> > and with shared storage (NetApp iSCSI).
> >
> > With fast NVMe, limiting the maximum buffered bytes to 1M is actually
> > ~10% faster, but with shared storage using more requests is faster.
> >
> > What you suggest here will result in:
> > small requests: 256 requests per connection, 1024 requests on storage
> side
> > large requests: 64 requests per connection, 156 requests on storage side.
>
> So a note here that we're not using multi-conn when converting from
> VDDK because VDDK doesn't behave well:
>
>
> https://github.com/libguestfs/virt-v2v/commit/bb0e698360470cb4ff5992e8e01a3165f56fe41e
>
> > I don't think any storage can handle such a large amount of connections
> better.
> >
> > I think we should test --requests 8 first, it may show nice speedup
> comapred
> > to what we see in
> > https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c33
> >
> > Looks like in
> > https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c32
> >
> > We introduced 2 changes at the same time, which makes it impossible to
> tell
> > the effect of any single change.
>
> I couldn't measure any performance benefit from increasing the number
> of requests, but also it didn't have any down-side.  Ming Xie also did
> a test and she didn't see any benefit or loss either.
>
> The purpose of the patch (which I didn't explain well) was to ensure
> that if we make the request-size larger, we don't blow up nbdcopy
> memory usage too much.  So aim for a target amount of memory consumed
> in nbdcopy buffers (64M), but conservatively never reducing #buffers
> below the current setting (64).


The intent is good, but I think we need to refine the
actual sizes, but this can be done later.

Also this should be better to do in nbdcopy instead of virt-v2v, since it
will improve all callers of nbdcopy.

Nir
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v2v] v2v/v2v.ml: Choose nbdcopy max requests for implicit buffer of 64M

2022-02-15 Thread Nir Soffer
On Tue, Feb 15, 2022 at 5:54 PM Richard W.M. Jones 
wrote:

> Pick the nbdcopy --requests parameter to target an implicit buffer
> size of 64M inside nbdcopy.  However don't set nbdcopy --request < 64.
>
> If request_size == 256K (the default) => requests = 256
> If request_size == 8M => requests = 64 (buffer size 512M)
>

Considering the total bytes buffered makes sense. I did the same in another
application that only reads from NBD using libnbd async API. I'm using:

max_requests = 16
max_bytes = 2m

So if you have small requests (e.g. 4k), you get 16 inflight requests per
connection
and with 4 connections 64 inflight requests on the storage side.

But if you have large requests (256k), you get only 8 requests per
connection and
32 requests on the storage side.

This was tested in a read-only case both on my laptop with fast NVMe
(Samsung 970 EVO Plus 1T) and with super fast NVMe on Dell server,
and with shared storage (NetApp iSCSI).

With fast NVMe, limiting the maximum buffered bytes to 1M is actually
~10% faster, but with shared storage using more requests is faster.

What you suggest here will result in:
small requests: 256 requests per connection, 1024 requests on storage side
large requests: 64 requests per connection, 156 requests on storage side.

I don't think any storage can handle such a large amount of connections
better.

I think we should test --requests 8 first, it may show nice speedup comapred
to what we see in
https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c33

Looks like in
https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c32

We introduced 2 changes at the same time, which makes it impossible to tell
the effect of any single change.

Nir


> ---
>  v2v/v2v.ml | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/v2v/v2v.ml b/v2v/v2v.ml
> index cadf864d5c..7bd47c1e7e 100644
> --- a/v2v/v2v.ml
> +++ b/v2v/v2v.ml
> @@ -641,14 +641,27 @@ and nbdcopy ?request_size output_alloc input_uri
> output_uri =
> *)
>let cmd = ref [] in
>List.push_back_list cmd [ "nbdcopy"; input_uri; output_uri ];
> +
>(match request_size with
>  | None -> ()
>  | Some size -> List.push_back cmd (sprintf "--request-size=%d" size)
>);
> +  (* Choose max requests to target an implicit buffer size of 64M. *)
> +  let requests =
> +let target_buffer_size = 64 * 1024 * 1024 in
> +let request_size =
> +  match request_size with
> +  | None -> 256 * 1024 (* default in nbdcopy 1.10+ *)
> +  | Some size -> size in
> +min 64 (target_buffer_size / request_size) in
> +  List.push_back cmd (sprintf "--requests=%d" requests);
> +
>List.push_back cmd "--flush";
>(*List.push_back cmd "--verbose";*)
> +
>if not (quiet ()) then List.push_back cmd "--progress";
>if output_alloc = Types.Preallocated then List.push_back cmd
> "--allocated";
> +
>let cmd = !cmd in
>
>if run_command cmd <> 0 then
> --
> 2.35.1
>
>
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH libnbd v2 1/9] golang: tests: Add test for AioBuffer

2022-02-14 Thread Nir Soffer
On Mon, Feb 14, 2022 at 3:22 PM Eric Blake  wrote:

> On Fri, Feb 11, 2022 at 03:21:21AM +0200, Nir Soffer wrote:
> > Add unit tests and benchmarks for AioBuffer. The tests are trivial but
> > they server as running documentation, and they point out important
>
> serve
>

Fixed


>
> > details about the type.
> >
> > The benchmarks show the efficiency of allocating a new buffer, zeroing
> > it, and interfacing with Go code.
> >
> > These tests will also ensure that we don't break anything by the next
> > changes.
> >
> > To run the benchmarks use:
> >
> > $ go test -run=xxx -bench=.
> > goos: linux
> > goarch: amd64
> > pkg: libguestfs.org/libnbd
> > cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
> > BenchmarkMakeAioBuffer-12  6871759   157.2 ns/op
> > BenchmarkAioBufferZero-1217551 69552 ns/op
> > BenchmarkFromBytes-12 9632139112 ns/op
> > BenchmarkAioBufferBytes-12   69375 16410 ns/op
> > PASS
> > oklibguestfs.org/libnbd   5.843s
> >
> > To make sure the benchmarks will not break, we run them in "make check"
> > with a very short timeout. For actual performance testing run "go test"
> > directly.
>
> Sounds good to me.
>

Thanks, I'll push tomorrow if you don't have more comments.

Nir
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v2] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-14 Thread Nir Soffer
On Mon, Feb 14, 2022 at 3:01 PM Richard W.M. Jones  wrote:
>
> On Mon, Feb 14, 2022 at 12:53:01PM +0100, Laszlo Ersek wrote:
> > On 02/14/22 10:56, Richard W.M. Jones wrote:
> > > This change slowed things down (slightly) for me, although the change
> > > is within the margin of error so it probably made no difference.
> > >
> > > Before:
> > >
> > > $ time ./run virt-v2v -i disk /var/tmp/fedora-35.qcow2 -o rhv-upload -oc 
> > > https://ovirt4410/ovirt-engine/api -op /tmp/ovirt-passwd -oo rhv-direct 
> > > -os ovirt-data -on test14 -of raw
> > > [   0.0] Setting up the source: -i disk /var/tmp/fedora-35.qcow2
> > > [   1.0] Opening the source
> > > [   6.5] Inspecting the source
> > > [  10.5] Checking for sufficient free disk space in the guest
> > > [  10.5] Converting Fedora Linux 35 (Thirty Five) to run on KVM
> > > virt-v2v: warning: /files/boot/grub2/device.map/hd0 references unknown
> > > device "vda".  You may have to fix this entry manually after conversion.
> > > virt-v2v: This guest has virtio drivers installed.
> > > [  57.0] Mapping filesystem data to avoid copying unused and blank areas
> > > [  59.0] Closing the overlay
> > > [  59.6] Assigning disks to buses
> > > [  59.6] Checking if the guest needs BIOS or UEFI to boot
> > > [  59.6] Setting up the destination: -o rhv-upload -oc 
> > > https://ovirt4410/ovirt-engine/api -os ovirt-data
> > > [  79.3] Copying disk 1/1
> > > █ 100% []
> > > [  89.9] Creating output metadata
> > > [  94.0] Finishing off
> > >
> > > real 1m34.213s
> > > user 0m6.585s
> > > sys  0m11.880s
> > >
> > >
> > > After:
> > >
> > > $ time ./run virt-v2v -i disk /var/tmp/fedora-35.qcow2 -o rhv-upload -oc 
> > > https://ovirt4410/ovirt-engine/api -op /tmp/ovirt-passwd -oo rhv-direct 
> > > -os ovirt-data -on test15 -of raw
> > > [   0.0] Setting up the source: -i disk /var/tmp/fedora-35.qcow2
> > > [   1.0] Opening the source
> > > [   7.4] Inspecting the source
> > > [  11.7] Checking for sufficient free disk space in the guest
> > > [  11.7] Converting Fedora Linux 35 (Thirty Five) to run on KVM
> > > virt-v2v: warning: /files/boot/grub2/device.map/hd0 references unknown
> > > device "vda".  You may have to fix this entry manually after conversion.
> > > virt-v2v: This guest has virtio drivers installed.
> > > [  59.6] Mapping filesystem data to avoid copying unused and blank areas
> > > [  61.5] Closing the overlay
> > > [  62.2] Assigning disks to buses
> > > [  62.2] Checking if the guest needs BIOS or UEFI to boot
> > > [  62.2] Setting up the destination: -o rhv-upload -oc 
> > > https://ovirt4410/ovirt-engine/api -os ovirt-data
> > > [  81.6] Copying disk 1/1
> > > █ 100% []
> > > [  91.3] Creating output metadata
> > > [  96.0] Finishing off
> > >
> > > real 1m36.275s
> > > user 0m4.700s
> > > sys  0m14.070s
> >
> > My ACK on Nir's v2 patch basically means that I defer to you on its
> > review -- I don't have anything against it, but I understand it's
> > (perhaps a temporary) workaround until we find a more sustainable (and
> > likely much more complex) solution.
>
> Sure, I don't mind taking this as a temporary solution.  The code
> itself is perfectly fine.  The request size here is essentially an
> optimization hint, it doesn't affect the architecture.
>
> An architectural problem that affects both nbdkit & nbdcopy is that
> NBD commands drive the nbdkit backend and the nbdcopy loop.  If we
> make the nbdcopy --request-size larger, NBD commands ask for more
> data, nbdkit-vddk-plugin makes larger VixDiskLib_ReadAsynch requests,
> which at some point breaks the VMware server.  (This is fairly easy to
> solve in nbdkit-vddk-plugin or with a filter.)
>
> But nbdcopy needs to be reworked to make the input and output requests
> separate, so that nbdcopy will coalesce and split blocks as it copies.
> This is difficult.
>
> Another problem I'm finding (eg
> https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c9) is that
> performance of new virt-v2v is extremely specific to input and output
> mode, and hardware and network configurations.  For reasons that I
> don't fully understand.

It would be interesting to test this patch in the same environment and see
how it affects the results.

Do we see slow down only when using vddk? maybe it is related to the slow
extent calls?

Nir


___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v2] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-14 Thread Nir Soffer
On Mon, Feb 14, 2022 at 11:56 AM Richard W.M. Jones  wrote:
>
> This change slowed things down (slightly) for me, although the change
> is within the margin of error so it probably made no difference.
>
> Before:
...
> [  79.3] Copying disk 1/1
> █ 100% []
> [  89.9] Creating output metadata

10.6 seconds...

> After:
...
> [  81.6] Copying disk 1/1
> █ 100% []
> [  91.3] Creating output metadata

9.7 seconds - 9% speedup.

We cannot compare the total time since creating a disk can be take 4-16
seconds.

What kind of storage is this? Is this the local storage hack used as NFS?
You may get much better performance with local disk, hiding the delays in
writing to shared storage. But most oVirt users use NFS or GlsuterFS.

I tested on NFS, tuned to simulate a fast NFS server.

Testing such changes should be done on a real server with real storage.
I'll try to get a server in our scale lab to do more real testing.

Nir


___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

[Libguestfs] [PATCH v2] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-13 Thread Nir Soffer
Output modules can specify now request_size to override the default
request size in nbdcopy.

The rhv-upload plugin is translating every NBD command to HTTP request,
translated back to NBD command on imageio server. The HTTP client and
server, and the NBD client on the imageio server side are synchronous
and implemented in python, so they have high overhead per request. To
get good performance we need to use larger request size.

Testing shows that request size of 4 MiB speeds up the copy disk phase
from 14.7 seconds to 7.9 seconds (1.8x times faster). Request size of 8
MiB is a little bit faster but is not compatible with VDDK input.

Here are stats extracted from imageio log when importing Fedora 35 image
with 3 GiB of random data. For each copy, we have 4 connection stats.

Before:

connection 1 ops, 14.767843 s
dispatch 4023 ops, 11.427662 s
zero 38 ops, 0.053840 s, 327.91 MiB, 5.95 GiB/s
write 3981 ops, 8.975877 s, 988.61 MiB, 110.14 MiB/s
flush 4 ops, 0.001023 s

connection 1 ops, 14.770026 s
dispatch 4006 ops, 11.408732 s
zero 37 ops, 0.057205 s, 633.21 MiB, 10.81 GiB/s
write 3965 ops, 8.907420 s, 986.65 MiB, 110.77 MiB/s
flush 4 ops, 0.000280 s

connection 1 ops, 14.768180 s
dispatch 4057 ops, 11.430712 s
zero 42 ops, 0.030011 s, 470.47 MiB, 15.31 GiB/s
write 4011 ops, 9.002055 s, 996.98 MiB, 110.75 MiB/s
flush 4 ops, 0.000261 s

connection 1 ops, 14.770744 s
dispatch 4037 ops, 11.462050 s
zero 45 ops, 0.026668 s, 750.82 MiB, 27.49 GiB/s
write 3988 ops, 9.002721 s, 989.36 MiB, 109.90 MiB/s
flush 4 ops, 0.000282 s

After:

connection 1 ops, 7.940377 s
dispatch 323 ops, 6.695582 s
zero 37 ops, 0.079958 s, 641.12 MiB, 7.83 GiB/s
write 282 ops, 6.300242 s, 1.01 GiB, 164.54 MiB/s
flush 4 ops, 0.000537 s

connection 1 ops, 7.908156 s
dispatch 305 ops, 6.643475 s
zero 36 ops, 0.144166 s, 509.43 MiB, 3.45 GiB/s
write 265 ops, 6.179187 s, 941.23 MiB, 152.32 MiB/s
flush 4 ops, 0.000324 s

connection 1 ops, 7.942349 s
dispatch 325 ops, 6.744800 s
zero 45 ops, 0.185335 s, 622.19 MiB, 3.28 GiB/s
write 276 ops, 6.236819 s, 995.45 MiB, 159.61 MiB/s
flush 4 ops, 0.000369 s

connection 1 ops, 7.955572 s
dispatch 317 ops, 6.721212 s
zero 43 ops, 0.135771 s, 409.68 MiB, 2.95 GiB/s
write 270 ops, 6.326366 s, 988.26 MiB, 156.21 MiB/s
flush 4 ops, 0.001439 s
---

Changes since v1:

- Decrease request size to 4 MiB for compatibility with VDDK input.
  (Richard)
- Reimplement in a nicer way based on
  
https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165
  (Richard)

v1 was here:
https://listman.redhat.com/archives/libguestfs/2022-February/msg00183.html

 output/output.ml|  1 +
 output/output.mli   |  2 ++
 output/output_disk.ml   |  2 ++
 output/output_glance.ml |  2 ++
 output/output_json.ml   |  2 ++
 output/output_libvirt.ml|  2 ++
 output/output_null.ml   |  2 ++
 output/output_openstack.ml  |  2 +-
 output/output_qemu.ml   |  2 ++
 output/output_rhv.ml|  2 ++
 output/output_rhv_upload.ml |  7 +++
 output/output_vdsm.ml   |  2 ++
 v2v/v2v.ml  | 11 ---
 13 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/output/output.ml b/output/output.ml
index 786ee5d5..7256b547 100644
--- a/output/output.ml
+++ b/output/output.ml
@@ -39,20 +39,21 @@ type options = {
 module type OUTPUT = sig
   type poptions
   type t
   val to_string : options -> string
   val query_output_options : unit -> unit
   val parse_options : options -> Types.source -> poptions
   val setup : string -> poptions -> Types.source -> t
   val finalize : string -> poptions -> t ->
  Types.source -> Types.inspect -> Types.target_meta ->
  unit
+  val request_size : int option
 end
 
 let error_option_cannot_be_used_in_output_mode mode opt =
   error (f_"-o %s: %s option cannot be used in this output mode") mode opt
 
 let get_disks dir =
   let rec loop acc i =
 let socket = sprintf "%s/in%d" dir i in
 if Sys.file_exists socket then (
   let size = Utils.with_nbd_connect_unix ~socket NBD.get_size in
diff --git a/output/output.mli b/output/output.mli
index eed204ed..8e3efd8e 100644
--- a/output/output.mli
+++ b/output/output.mli
@@ -52,20 +52,22 @@ module type OUTPUT = sig
 
   Set up the output mode.  Sets up a disk pipeline
   [dir // "outX"] for each output disk. *)
 
   val finalize : string -> poptions -> t ->
  Types.source -> Types.inspect -> Types.target_meta ->
  unit
   (** [finalize dir poptions t inspect target_meta]
 
   Finalizes the conversion and writes metadata. *)
+
+  val request_size : int option
 end
 
 (** Helper functions for output modes. *)
 
 val error_option_cannot_be_used_in_output_mode : string -> string -> unit
 (** [error_option_cannot_be_used_in_output_mode mode option]
 prints error message that option cannot be used in this output mode. *)
 
 val get_disks : string -> (int * int64) list
 (** Examines the v2v 

Re: [Libguestfs] [PATCH] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-13 Thread Nir Soffer
On Sun, Feb 13, 2022 at 5:13 PM Nir Soffer  wrote:
>
> On Sun, Feb 13, 2022 at 11:41 AM Richard W.M. Jones  wrote:
> >
> > On Sat, Feb 12, 2022 at 10:49:42PM +0200, Nir Soffer wrote:
> > > rhv-upload plugin is translating every NBD command to HTTP request,
> > > translated back to NBD command on imageio server. The HTTP client and
> > > server, and the NBD client on the imageio server side are synchronous
> > > and implemented in python, so they have high overhead per request. To
> > > get good performance we need to use larger request size.
> > >
> > > Testing shows that request size of 8MiB is best, speeding up the copy
> > > disk phase from 14.7 seconds to 7.7 seconds (1.9x times faster).
> >
> > Unfortunately this will break VDDK since it cannot handle very large
> > requests (I think 4M is about the max without reconfiguring the
> > server).
>
> Are you sure it will break VDDK?
>
> Request size limit is in VDDK API, not in the nbdkit plugin. When you
> request 8M request, the VDDK plugin should allocated a 8M buffer,
> and issue multiple calls to VDDK APIs, using VDDK maximum
> request size to fill the buffer.
>
> If the VDDK plugin does not do this, this is a bug in the plugin, since
> it must respect the underlying API.
>
> If 8M does break the vddk plugin, we can use 4M, it is only a little
> slower then 8M but still much faster than 256k.
>
> > Also larger requests have adverse performance effects in
> > other configurations, although I understand this patch tries to
> > retrict the change to when the output mode is rhv-upload.
>
> Yes, this affects only -o rhv-upload.
>
> > We need to think of some other approach, but I'm not sure what it is.
> > I'd really like to be able to talk to imageio's NBD server directly!
>
> We have a RFE to implement a local nbd socket, which should be easy,
> but the only attention we got so far was an attempt to close it ;-)
>
> Even if we have a local only nbd socket, lets's say in oVirt 4.6, it will not
> help existing users.
>
> > Other relevant commits:
> > https://github.com/libguestfs/virt-v2v/commit/7ebb2c8db9d4d297fbbef116a9828a9dde700de6
> > https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165
>
> This looks like a nicer way to implement this change.
>
> >
> > [...]
> > > This is an ugly hack; the preferred request size should be a function of
> > > the output module that only output_rhv_upload will override, but I don't
> > > know how to implement this with the current code.
> >
> > Just add a new value to output/output.ml{,i}.  There is no
> > superclassing (this is not OO) so you'll have to add the value to
> > every output module implementation, defaulting to None.
>
> Sounds reasonable, we have only a few outputs.
>
> > However I'd like to think of another approach first.
> >
> >  - Have nbdcopy split and combine requests so request size for input
> >and output can be different?  Sounds complicated but might be
> >necessary one day to support minimum block size.
>
> I think this is not needed for the case of large request size,
> since on nbd client/server size it is easy to handle large requests
> regardless of the underlying API limits.
>
> >  - More efficient Python plugin that might combine requests?  Also
> >complicated ...
>
> This will be slower and complicated.
>
> First we need to solve the issue of getting the right connection to handle
> the nbd command - now every command is handled by a random connection
> from the pool.
>
> Assuming we solved connection pooling, we can keep a buffer for the next
> request, and on every pwrite() fill this buffer. When the buffer is full, or
> when receiving a zero()/flush()/close() request, we need to send the
> buffer. This introduces additional copy which will slow down the transfer.
>
> Finally there is the issue of reporting errors - since we buffer nbd commands
> data, we cannot report errors for every commands,  we need to report
> errors later, either in the middle of the command (buffer becomes full)
> or when the next command is received.
>
> So this will be slower, very hard to implement, and hard to debug.
>
> The way we solved this in imageio client - copying data between nbd
> and http backends, is to implement the copy loop in the http client.
>
> In imageio we have this (simplified) copy loop:
>
> for extent in extents:
> if zero:
> dst.zero(extent)
> else:
> copy(src, dst, extent)
>
> copy() is checking if the src or dst can do efficient copy,

Re: [Libguestfs] [PATCH] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-13 Thread Nir Soffer
On Sun, Feb 13, 2022 at 11:41 AM Richard W.M. Jones  wrote:
>
> On Sat, Feb 12, 2022 at 10:49:42PM +0200, Nir Soffer wrote:
> > rhv-upload plugin is translating every NBD command to HTTP request,
> > translated back to NBD command on imageio server. The HTTP client and
> > server, and the NBD client on the imageio server side are synchronous
> > and implemented in python, so they have high overhead per request. To
> > get good performance we need to use larger request size.
> >
> > Testing shows that request size of 8MiB is best, speeding up the copy
> > disk phase from 14.7 seconds to 7.7 seconds (1.9x times faster).
>
> Unfortunately this will break VDDK since it cannot handle very large
> requests (I think 4M is about the max without reconfiguring the
> server).

Are you sure it will break VDDK?

Request size limit is in VDDK API, not in the nbdkit plugin. When you
request 8M request, the VDDK plugin should allocated a 8M buffer,
and issue multiple calls to VDDK APIs, using VDDK maximum
request size to fill the buffer.

If the VDDK plugin does not do this, this is a bug in the plugin, since
it must respect the underlying API.

If 8M does break the vddk plugin, we can use 4M, it is only a little
slower then 8M but still much faster than 256k.

> Also larger requests have adverse performance effects in
> other configurations, although I understand this patch tries to
> retrict the change to when the output mode is rhv-upload.

Yes, this affects only -o rhv-upload.

> We need to think of some other approach, but I'm not sure what it is.
> I'd really like to be able to talk to imageio's NBD server directly!

We have a RFE to implement a local nbd socket, which should be easy,
but the only attention we got so far was an attempt to close it ;-)

Even if we have a local only nbd socket, lets's say in oVirt 4.6, it will not
help existing users.

> Other relevant commits:
> https://github.com/libguestfs/virt-v2v/commit/7ebb2c8db9d4d297fbbef116a9828a9dde700de6
> https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165

This looks like a nicer way to implement this change.

>
> [...]
> > This is an ugly hack; the preferred request size should be a function of
> > the output module that only output_rhv_upload will override, but I don't
> > know how to implement this with the current code.
>
> Just add a new value to output/output.ml{,i}.  There is no
> superclassing (this is not OO) so you'll have to add the value to
> every output module implementation, defaulting to None.

Sounds reasonable, we have only a few outputs.

> However I'd like to think of another approach first.
>
>  - Have nbdcopy split and combine requests so request size for input
>and output can be different?  Sounds complicated but might be
>necessary one day to support minimum block size.

I think this is not needed for the case of large request size,
since on nbd client/server size it is easy to handle large requests
regardless of the underlying API limits.

>  - More efficient Python plugin that might combine requests?  Also
>complicated ...

This will be slower and complicated.

First we need to solve the issue of getting the right connection to handle
the nbd command - now every command is handled by a random connection
from the pool.

Assuming we solved connection pooling, we can keep a buffer for the next
request, and on every pwrite() fill this buffer. When the buffer is full, or
when receiving a zero()/flush()/close() request, we need to send the
buffer. This introduces additional copy which will slow down the transfer.

Finally there is the issue of reporting errors - since we buffer nbd commands
data, we cannot report errors for every commands,  we need to report
errors later, either in the middle of the command (buffer becomes full)
or when the next command is received.

So this will be slower, very hard to implement, and hard to debug.

The way we solved this in imageio client - copying data between nbd
and http backends, is to implement the copy loop in the http client.

In imageio we have this (simplified) copy loop:

for extent in extents:
if zero:
dst.zero(extent)
else:
copy(src, dst, extent)

copy() is checking if the src or dst can do efficient copy,
and if not it fallbacks to a generic copy:

if hasattr(dst, "read_from"):
dst.read_from(src, extent)
elif hasattr(src, "write_to"):
srt.write_to(dst, extent)
else:
for chunk in extent:
src.read(chunk)
dst.write(chunk)

The http backend implements read_from like this:

send put request headers
for chunk in extent:
src.read(chunk)
socket.write(chunk)

In this way we can use the most efficient request size for the nbd
input (256k), for the http

[Libguestfs] [PATCH] v2v/v2v.ml: Use larger request size for -o rhv-upload

2022-02-12 Thread Nir Soffer
rhv-upload plugin is translating every NBD command to HTTP request,
translated back to NBD command on imageio server. The HTTP client and
server, and the NBD client on the imageio server side are synchronous
and implemented in python, so they have high overhead per request. To
get good performance we need to use larger request size.

Testing shows that request size of 8MiB is best, speeding up the copy
disk phase from 14.7 seconds to 7.7 seconds (1.9x times faster).

Here are stats extracted from imageio log when importing Fedora 35 image
with 3 GiB of random data. For each copy, we have 4 connection stats.

Before:

connection 1 ops, 14.767843 s
dispatch 4023 ops, 11.427662 s
zero 38 ops, 0.053840 s, 327.91 MiB, 5.95 GiB/s
write 3981 ops, 8.975877 s, 988.61 MiB, 110.14 MiB/s
flush 4 ops, 0.001023 s

connection 1 ops, 14.770026 s
dispatch 4006 ops, 11.408732 s
zero 37 ops, 0.057205 s, 633.21 MiB, 10.81 GiB/s
write 3965 ops, 8.907420 s, 986.65 MiB, 110.77 MiB/s
flush 4 ops, 0.000280 s

connection 1 ops, 14.768180 s
dispatch 4057 ops, 11.430712 s
zero 42 ops, 0.030011 s, 470.47 MiB, 15.31 GiB/s
write 4011 ops, 9.002055 s, 996.98 MiB, 110.75 MiB/s
flush 4 ops, 0.000261 s

connection 1 ops, 14.770744 s
dispatch 4037 ops, 11.462050 s
zero 45 ops, 0.026668 s, 750.82 MiB, 27.49 GiB/s
write 3988 ops, 9.002721 s, 989.36 MiB, 109.90 MiB/s
flush 4 ops, 0.000282 s

After:

connection 1 ops, 7.776159 s
dispatch 181 ops, 6.701100 s
zero 27 ops, 0.219959 s, 5.97 MiB, 27.15 MiB/s
write 150 ops, 6.266066 s, 983.13 MiB, 156.90 MiB/s
flush 4 ops, 0.000299 s

connection 1 ops, 7.805616 s
dispatch 187 ops, 6.643718 s
zero 30 ops, 0.227808 s, 809.01 MiB, 3.47 GiB/s
write 153 ops, 6.306260 s, 1.02 GiB, 165.81 MiB/s
flush 4 ops, 0.000306 s

connection 1 ops, 7.780301 s
dispatch 191 ops, 6.535249 s
zero 47 ops, 0.228495 s, 693.31 MiB, 2.96 GiB/s
write 140 ops, 6.033484 s, 958.23 MiB, 158.82 MiB/s
flush 4 ops, 0.001618 s

connection 1 ops, 7.829294 s
dispatch 213 ops, 6.594207 s
zero 56 ops, 0.297876 s, 674.12 MiB, 2.21 GiB/s
write 153 ops, 6.070786 s, 974.56 MiB, 160.53 MiB/s
flush 4 ops, 0.000318 s

This is an ugly hack; the preferred request size should be a function of
the output module that only output_rhv_upload will override, but I don't
know how to implement this with the current code.

Another way is to add this as an output option; this will make it easier
to test and find the best setting that works in a real environment, or
tweak the value in a specific environment if needed.
---
 v2v/v2v.ml | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/v2v/v2v.ml b/v2v/v2v.ml
index fddb0742..b21e2737 100644
--- a/v2v/v2v.ml
+++ b/v2v/v2v.ml
@@ -578,37 +578,45 @@ read the man page virt-v2v(1).
 let input_socket = sprintf "%s/in%d" tmpdir i
 and output_socket = sprintf "%s/out%d" tmpdir i in
 if Sys.file_exists input_socket && Sys.file_exists output_socket then
   loop ((i, input_socket, output_socket) :: acc) (i+1)
 else
   List.rev acc
   in
   let disks = loop [] 0 in
   let nr_disks = List.length disks in
 
+  (* XXX This is a hack for -o rhv-upload that works best with larger
+   * request size.
+   *)
+  let request_size =
+match output_mode with
+| `RHV_Upload -> 8*1024*1024
+| _ -> 0 in
+
   (* Copy the disks. *)
   List.iter (
 fun (i, input_socket, output_socket) ->
   message (f_"Copying disk %d/%d") (i+1) nr_disks;
 
   let input_uri = nbd_uri_of_socket input_socket
   and output_uri = nbd_uri_of_socket output_socket in
 
   (* In verbose mode print some information about each
* side of the pipeline.
*)
   if verbose () then (
 nbdinfo ~content:true input_uri;
 nbdinfo ~content:false output_uri
   );
 
-  nbdcopy output_alloc input_uri output_uri
+  nbdcopy output_alloc input_uri output_uri request_size
   ) disks;
 
   (* End of copying phase. *)
   unlink (tmpdir // "copy");
 
   (* Do the finalization step. *)
   message (f_"Creating output metadata");
   Output_module.finalize tmpdir output_poptions output_t
 source inspect target_meta;
 
@@ -627,26 +635,28 @@ read the man page virt-v2v(1).
  * appliance may be created there.  (RHBZ#1316479, RHBZ#2051394)
  *)
 and check_host_free_space () =
   let free_space = StatVFS.free_space (StatVFS.statvfs large_tmpdir) in
   debug "check_host_free_space: large_tmpdir=%s free_space=%Ld"
 large_tmpdir free_space;
   if free_space < 1_073_741_824L then
 error (f_"insufficient free space in the conversion server temporary 
directory %s (%s).\n\nEither free up space in that directory, or set the 
LIBGUESTFS_CACHEDIR environment variable to point to another directory with 
more than 1GB of free space.\n\nSee also the virt-v2v(1) manual, section 
\"Minimum free space check in the host\".")
   large_tmpdir (human_size free_space)
 
-and nbdcopy output_alloc input_uri output_uri =
+and nbdcopy output_alloc input_uri 

Re: [Libguestfs] [PATCH libnbd v2 1/9] golang: tests: Add test for AioBuffer

2022-02-11 Thread Nir Soffer
On Fri, Feb 11, 2022 at 1:22 PM Richard W.M. Jones  wrote:
>
> On Fri, Feb 11, 2022 at 03:21:21AM +0200, Nir Soffer wrote:
> > Add unit tests and benchmarks for AioBuffer. The tests are trivial but
> > they server as running documentation, and they point out important
> > details about the type.
> >
> > The benchmarks show the efficiency of allocating a new buffer, zeroing
> > it, and interfacing with Go code.
> >
> > These tests will also ensure that we don't break anything by the next
> > changes.
> >
> > To run the benchmarks use:
> >
> > $ go test -run=xxx -bench=.
> [...]
> > +# Run the benchmarks with 10 milliseconds timeout to make sure they do
> > +# not break by mistake, without overloading the CI. For performance
> > +# testing run "go test" directly.
> > +$GOLANG test -run=XXX -bench=. -benchtime=10ms
>
> -run param is a regexp matching the names of the tests to run.  It
> might be best to use something like this instead:
>
>   go test -run= -bench=.
>
> because elsewhere we use "XXX" to mark code that needs to be fixed.

The intent of this command is to run only the benchmark, using -run=XXX
to match no test. I agree this is a poor choice for this project since we use
XXX for other purposes.

>
> Apart from this the whole series seems fine to me, ACK.

Thanks, I'll push this with a better regex.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd v2 6/9] golang: tests: Use AioBuffer.Slice()

2022-02-10 Thread Nir Soffer
Slice() is easier to use and faster than Get() or Bytes(). Let's use the
new way.

Signed-off-by: Nir Soffer 
---
 golang/libnbd_020_aio_buffer_test.go | 8 +---
 golang/libnbd_500_aio_pread_test.go  | 2 +-
 golang/libnbd_510_aio_pwrite_test.go | 8 +---
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/golang/libnbd_020_aio_buffer_test.go 
b/golang/libnbd_020_aio_buffer_test.go
index 4b1c5f93..e07f8973 100644
--- a/golang/libnbd_020_aio_buffer_test.go
+++ b/golang/libnbd_020_aio_buffer_test.go
@@ -22,22 +22,23 @@ import (
"bytes"
"testing"
 )
 
 func TestAioBuffer(t *testing.T) {
/* Create a buffer with uninitialized backing array. */
buf := MakeAioBuffer(uint(32))
defer buf.Free()
 
/* Initialize backing array contents. */
+   s := buf.Slice()
for i := uint(0); i < buf.Size; i++ {
-   *buf.Get(i) = 0
+   s[i] = 0
}
 
/* Create a slice by copying the backing array contents into Go memory. 
*/
b := buf.Bytes()
 
zeroes := make([]byte, 32)
if !bytes.Equal(b, zeroes) {
t.Fatalf("Expected %v, got %v", zeroes, buf.Bytes())
}
 
@@ -45,21 +46,21 @@ func TestAioBuffer(t *testing.T) {
for i := 0; i < len(b); i++ {
b[i] = 42
}
 
/* Bytes() still returns zeroes. */
if !bytes.Equal(buf.Bytes(), zeroes) {
t.Fatalf("Expected %v, got %v", zeroes, buf.Bytes())
}
 
/* Creating a slice without copying the underlying buffer. */
-   s := buf.Slice()
+   s = buf.Slice()
if !bytes.Equal(s, zeroes) {
t.Fatalf("Expected %v, got %v", zeroes, s)
}
 
/* Modifying the slice modifies the underlying buffer. */
for i := 0; i < len(s); i++ {
s[i] = 42
}
 
if !bytes.Equal(buf.Slice(), s) {
@@ -154,22 +155,23 @@ func BenchmarkMakeAioBufferZero(b *testing.B) {
for i := 0; i < b.N; i++ {
buf := MakeAioBufferZero(bufferSize)
buf.Free()
}
 }
 
 // Benchmark zeroing a buffer.
 func BenchmarkAioBufferZero(b *testing.B) {
for i := 0; i < b.N; i++ {
buf := MakeAioBuffer(bufferSize)
+   s := buf.Slice()
for i := uint(0); i < bufferSize; i++ {
-   *buf.Get(i) = 0
+   s[i] = 0
}
buf.Free()
}
 }
 
 // Benchmark creating a buffer by copying a Go slice.
 func BenchmarkFromBytes(b *testing.B) {
for i := 0; i < b.N; i++ {
zeroes := make([]byte, bufferSize)
buf := FromBytes(zeroes)
diff --git a/golang/libnbd_500_aio_pread_test.go 
b/golang/libnbd_500_aio_pread_test.go
index 0811378c..bd0208ef 100644
--- a/golang/libnbd_500_aio_pread_test.go
+++ b/golang/libnbd_500_aio_pread_test.go
@@ -55,14 +55,14 @@ func Test500AioPRead(t *testing.T) {
}
h.Poll(-1)
}
 
// Expected data.
expected := make([]byte, 512)
for i := 0; i < 512; i += 8 {
binary.BigEndian.PutUint64(expected[i:i+8], uint64(i))
}
 
-   if !bytes.Equal(buf.Bytes(), expected) {
+   if !bytes.Equal(buf.Slice(), expected) {
t.Fatalf("did not read expected data")
}
 }
diff --git a/golang/libnbd_510_aio_pwrite_test.go 
b/golang/libnbd_510_aio_pwrite_test.go
index 56cdcb05..493159f2 100644
--- a/golang/libnbd_510_aio_pwrite_test.go
+++ b/golang/libnbd_510_aio_pwrite_test.go
@@ -32,23 +32,25 @@ func Test510AioPWrite(t *testing.T) {
"nbdkit", "-s", "--exit-with-parent", "-v",
"memory", "size=512",
})
if err != nil {
t.Fatalf("could not connect: %s", err)
}
 
/* Write a pattern and read it back. */
buf := MakeAioBuffer(512)
defer buf.Free()
+
+   s := buf.Slice()
for i := 0; i < 512; i += 2 {
-   *buf.Get(uint(i)) = 0x55
-   *buf.Get(uint(i + 1)) = 0xAA
+   s[i] = 0x55
+   s[i+1] = 0xAA
}
 
var cookie uint64
cookie, err = h.AioPwrite(buf, 0, nil)
if err != nil {
t.Fatalf("%s", err)
}
for {
var b bool
b, err = h.AioCommandCompleted(cookie)
@@ -62,14 +64,14 @@ func Test510AioPWrite(t *testing.T) {
}
 
/* We already tested aio_pread, let's just read the data
back in the regular synchronous way. */
buf2 := make([]byte, 512)
err = h.Pread(buf2, 0, nil)
if err != nil {
t.Fatalf("%s", err)
}
 
-   if !bytes.Equal(buf.Bytes(), buf2) {
+   if !bytes.

[Libguestfs] [PATCH libnbd v2 5/9] golang: aio_buffer.go: Add Slice()

2022-02-10 Thread Nir Soffer
AioBuffer.Bytes() cannot be used for copying images from NBD to other
APis because it copies the entire image. Add a new Slice() function,
creating a slice backed by the underlying buffer.

Using Slice() is efficient, but less safe, like Get(). The returned
slice must be used only before calling Free(). This should not be an
issue with typical code.

Testing shows that Slice() is much faster than Bytes() for typical 256k
buffer:

BenchmarkAioBufferBytes-12 86616 16529 ns/op
BenchmarkAioBufferSlice-12  10   0.4630 ns/op

I modified the aio_copy example to use AioBuffer and compiled 2
versions, one using Bytes() and one using Slice(). When copying 6g fully
allocated image, the version using Slice() is 1.6 times faster.

$ hyperfine -r3 "./aio_copy-bytes $SRC >/dev/null" "./aio_copy-slice $SRC 
>/dev/null"

Benchmark 1: ./aio_copy-bytes nbd+unix:///?socket=/tmp/src.sock >/dev/null
  Time (mean ± σ):  3.357 s ±  0.039 s[User: 2.656 s, System: 1.162 s]
  Range (min … max):3.313 s …  3.387 s3 runs

Benchmark 2: ./aio_copy-slice nbd+unix:///?socket=/tmp/src.sock >/dev/null
  Time (mean ± σ):  2.046 s ±  0.009 s[User: 0.423 s, System: 0.892 s]
  Range (min … max):2.037 s …  2.055 s3 runs

Summary
  './aio_copy-slice nbd+unix:///?socket=/tmp/src.sock >/dev/null' ran
1.64 ± 0.02 times faster than './aio_copy-bytes 
nbd+unix:///?socket=/tmp/src.sock >/dev/null'

When copying a 6g empty image (qemu-nbd sends one hole chunk for every
read), the version using Slice() is 2.6 times faster.

$ hyperfine -r3 "./aio_copy-bytes $SRC >/dev/null" "./aio_copy-slice $SRC 
>/dev/null"
Benchmark 1: ./aio_copy-bytes nbd+unix:///?socket=/tmp/src.sock >/dev/null
  Time (mean ± σ):  1.210 s ±  0.023 s[User: 1.428 s, System: 0.345 s]
  Range (min … max):1.191 s …  1.235 s3 runs

Benchmark 2: ./aio_copy-slice nbd+unix:///?socket=/tmp/src.sock >/dev/null
  Time (mean ± σ): 461.4 ms ±  13.1 ms[User: 394.2 ms, System: 76.6 ms]
  Range (min … max):   450.6 ms … 476.0 ms3 runs

Summary
  './aio_copy-slice nbd+unix:///?socket=/tmp/src.sock >/dev/null' ran
2.62 ± 0.09 times faster than './aio_copy-bytes 
nbd+unix:///?socket=/tmp/src.sock >/dev/null'

Signed-off-by: Nir Soffer 
---
 golang/aio_buffer.go |  9 ++
 golang/libnbd_020_aio_buffer_test.go | 41 
 2 files changed, 50 insertions(+)

diff --git a/golang/aio_buffer.go b/golang/aio_buffer.go
index d2e6e350..008d9ae0 100644
--- a/golang/aio_buffer.go
+++ b/golang/aio_buffer.go
@@ -71,16 +71,25 @@ func (b *AioBuffer) Free() {
}
 }
 
 // Bytes copies the underlying C array to Go allocated memory and return a
 // slice. Modifying the returned slice does not modify the underlying buffer
 // backing array.
 func (b *AioBuffer) Bytes() []byte {
return C.GoBytes(b.P, C.int(b.Size))
 }
 
+// Slice creates a slice backed by the underlying C array. The slice can be
+// used to access or modify the contents of the underlying array. The slice
+// must not be used after caling Free().
+func (b *AioBuffer) Slice() []byte {
+   // See 
https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
+   // TODO: Use unsafe.Slice() when we require Go 1.17.
+   return (*[1<<30]byte)(b.P)[:b.Size:b.Size]
+}
+
 // Get returns a pointer to a byte in the underlying C array. The pointer can
 // be used to modify the underlying array. The pointer must not be used after
 // calling Free().
 func (b *AioBuffer) Get(i uint) *byte {
return (*byte)(unsafe.Pointer(uintptr(b.P) + uintptr(i)))
 }
diff --git a/golang/libnbd_020_aio_buffer_test.go 
b/golang/libnbd_020_aio_buffer_test.go
index b3a2a8d9..4b1c5f93 100644
--- a/golang/libnbd_020_aio_buffer_test.go
+++ b/golang/libnbd_020_aio_buffer_test.go
@@ -44,20 +44,35 @@ func TestAioBuffer(t *testing.T) {
/* Modifying returned slice does not modify the buffer. */
for i := 0; i < len(b); i++ {
b[i] = 42
}
 
/* Bytes() still returns zeroes. */
if !bytes.Equal(buf.Bytes(), zeroes) {
t.Fatalf("Expected %v, got %v", zeroes, buf.Bytes())
}
 
+   /* Creating a slice without copying the underlying buffer. */
+   s := buf.Slice()
+   if !bytes.Equal(s, zeroes) {
+   t.Fatalf("Expected %v, got %v", zeroes, s)
+   }
+
+   /* Modifying the slice modifies the underlying buffer. */
+   for i := 0; i < len(s); i++ {
+   s[i] = 42
+   }
+
+   if !bytes.Equal(buf.Slice(), s) {
+   t.Fatalf("Expected %v, got %v", s, buf.Slice())
+   }
+
/* Create another buffer from Go slice. */
buf2 := FromBytes(zeroes)
defer buf2.Free()
 
if !bytes.Equal(buf2.Bytes(), zeroes) {
  

[Libguestfs] [PATCH libnbd v2 9/9] golang: examples: aio_copy: Simplify using AioBuffer

2022-02-10 Thread Nir Soffer
Now that we have an efficient way to use AioBuffer, we don't need the
hacks to create AioBuffer from Go slice.

Benchmarking AioBuffer shows that allocating a 256k buffer is
practically free, so there is no need for the buffer pool. Now we
allocate a new buffer per request, keep it in the command, and free it
when the request is finished.

Signed-off-by: Nir Soffer 
---
 golang/examples/aio_copy/aio_copy.go | 29 +---
 1 file changed, 5 insertions(+), 24 deletions(-)

diff --git a/golang/examples/aio_copy/aio_copy.go 
b/golang/examples/aio_copy/aio_copy.go
index b6f5def1..bb20b478 100644
--- a/golang/examples/aio_copy/aio_copy.go
+++ b/golang/examples/aio_copy/aio_copy.go
@@ -37,53 +37,43 @@
 // Example:
 //
 //   ./aio_copy nbd+unix:///?socket=/tmp.nbd >/dev/null
 //
 package main
 
 import (
"container/list"
"flag"
"os"
-   "sync"
"syscall"
-   "unsafe"
 
"libguestfs.org/libnbd"
 )
 
 var (
// These options give best performance with fast NVMe drive.
requestSize = flag.Uint("request-size", 256*1024, "maximum request size 
in bytes")
requests= flag.Uint("requests", 4, "maximum number of requests in 
flight")
 
h *libnbd.Libnbd
 
// Keeping commands in a queue ensures commands are written in the right
// order, even if they complete out of order. This allows parallel reads
// with non-seekable output.
queue list.List
-
-   // Buffer pool allocating buffers as needed and reusing them.
-   bufPool = sync.Pool{
-   New: func() interface{} {
-   return make([]byte, *requestSize)
-   },
-   }
 )
 
 // command keeps state of single AioPread call while the read is handled by
 // libnbd, until the command reach the front of the queue and can be writen to
 // the output.
 type command struct {
-   buf[]byte
-   length uint
+   buflibnbd.AioBuffer
ready  bool
 }
 
 func main() {
flag.Parse()
 
var err error
 
h, err = libnbd.Create()
if err != nil {
@@ -139,60 +129,51 @@ func waitForCompletion() {
panic(err)
}
 
if inflightRequests() < start {
break // A read completed.
}
}
 }
 
 func startRead(offset uint64, length uint) {
-   buf := bufPool.Get().([]byte)
-
-   // Keep buffer in command so we can put it back into the pool when the
-   // command completes.
-   cmd := {buf: buf, length: length}
-
-   // Create aio buffer from pool buffer to avoid unneeded allocation for
-   // every read, and unneeded copy when completing the read.
-   abuf := libnbd.AioBuffer{P: unsafe.Pointer([0]), Size: length}
+   cmd := {buf: libnbd.MakeAioBuffer(length)}
 
args := libnbd.AioPreadOptargs{
CompletionCallbackSet: true,
CompletionCallback: func(error *int) int {
if *error != 0 {
// This is not documented, but *error is errno 
value translated
// from the the NBD server error.
err := syscall.Errno(*error).Error()
panic(err)
}
cmd.ready = true
return 1
},
}
 
-   _, err := h.AioPread(abuf, offset, )
+   _, err := h.AioPread(cmd.buf, offset, )
if err != nil {
panic(err)
}
 
queue.PushBack(cmd)
 }
 
 func readReady() bool {
return queue.Len() > 0 && queue.Front().Value.(*command).ready
 }
 
 func finishRead() {
e := queue.Front()
queue.Remove(e)
 
cmd := e.Value.(*command)
-   b := cmd.buf[:cmd.length]
 
-   _, err := os.Stdout.Write(b)
+   _, err := os.Stdout.Write(cmd.buf.Slice())
if err != nil {
panic(err)
}
 
-   bufPool.Put(cmd.buf)
+   cmd.buf.Free()
 }
-- 
2.34.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd v2 7/9] golang: aio_buffer.go: Speed up FromBytes()

2022-02-10 Thread Nir Soffer
Using Slice() we can use builtin copy() instead of a manual loop, which
is 4.6 times faster with a typical 256k buffer:

Before:
BenchmarkFromBytes-12   9806111474 ns/op

After:
BenchmarkFromBytes-12  48193 24106 ns/op

Signed-off-by: Nir Soffer 
---
 golang/aio_buffer.go | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/golang/aio_buffer.go b/golang/aio_buffer.go
index 008d9ae0..52ea54de 100644
--- a/golang/aio_buffer.go
+++ b/golang/aio_buffer.go
@@ -47,25 +47,22 @@ func MakeAioBuffer(size uint) AioBuffer {
 
 // MakeAioBuffer makes a new buffer backed by a C allocated array. The
 // underlying buffer is set to zero.
 func MakeAioBufferZero(size uint) AioBuffer {
return AioBuffer{C.calloc(C.ulong(1), C.ulong(size)), size}
 }
 
 // FromBytes makes a new buffer backed by a C allocated array, initialized by
 // copying the given Go slice.
 func FromBytes(buf []byte) AioBuffer {
-   size := len(buf)
-   ret := MakeAioBuffer(uint(size))
-   for i := 0; i < len(buf); i++ {
-   *ret.Get(uint(i)) = buf[i]
-   }
+   ret := MakeAioBuffer(uint(len(buf)))
+   copy(ret.Slice(), buf)
return ret
 }
 
 // Free deallocates the underlying C allocated array. Using the buffer after
 // Free() will panic.
 func (b *AioBuffer) Free() {
if b.P != nil {
C.free(b.P)
b.P = nil
}
-- 
2.34.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd v2 8/9] golang: aio_buffer.go: Benchmark copy flows

2022-02-10 Thread Nir Soffer
Add benchmark for coping a buffer using 3 strategies - reusing same
buffer, making a new uninitialized buffer per copy, and using a zeroed
buffer per copy. This benchmark is the worst possible case, copying a
buffer to memory. Any real I/O will be much slower, hiding the overhead
of allocating or zeroing buffers.

$ go test -run=AioBuffer -bench=Copy -benchtime=5s
goos: linux
goarch: amd64
pkg: libguestfs.org/libnbd
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
BenchmarkAioBufferCopyBaseline-121142508  4523 ns/op
BenchmarkAioBufferCopyMake-12100  5320 ns/op
BenchmarkAioBufferCopyMakeZero-12 728940  8218 ns/op

Signed-off-by: Nir Soffer 
---
 golang/libnbd_020_aio_buffer_test.go | 32 
 1 file changed, 32 insertions(+)

diff --git a/golang/libnbd_020_aio_buffer_test.go 
b/golang/libnbd_020_aio_buffer_test.go
index e07f8973..f38866e7 100644
--- a/golang/libnbd_020_aio_buffer_test.go
+++ b/golang/libnbd_020_aio_buffer_test.go
@@ -195,10 +195,42 @@ func BenchmarkAioBufferBytes(b *testing.B) {
 func BenchmarkAioBufferSlice(b *testing.B) {
buf := MakeAioBuffer(bufferSize)
defer buf.Free()
var r int
 
b.ResetTimer()
for i := 0; i < b.N; i++ {
r += len(buf.Slice())
}
 }
+
+var data = make([]byte, bufferSize)
+
+// Benchmark copying into same buffer, used as baseline for CopyMake and
+// CopyMakeZero benchmarks.
+func BenchmarkAioBufferCopyBaseline(b *testing.B) {
+   buf := MakeAioBufferZero(bufferSize)
+   defer buf.Free()
+
+   b.ResetTimer()
+   for i := 0; i < b.N; i++ {
+   copy(buf.Slice(), data)
+   }
+}
+
+// Benchmark overhead of making a new buffer per read.
+func BenchmarkAioBufferCopyMake(b *testing.B) {
+   for i := 0; i < b.N; i++ {
+   buf := MakeAioBuffer(bufferSize)
+   copy(buf.Slice(), data)
+   buf.Free()
+   }
+}
+
+// Benchmark overhead of making a new zero buffer per read.
+func BenchmarkAioBufferCopyMakeZero(b *testing.B) {
+   for i := 0; i < b.N; i++ {
+   buf := MakeAioBufferZero(bufferSize)
+   copy(buf.Slice(), data)
+   buf.Free()
+   }
+}
-- 
2.34.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd v2 4/9] golang: aio_buffer.go: Add MakeAioBufferZero()

2022-02-10 Thread Nir Soffer
Make it easy to create a zeroed buffer via calloc(), preventing leaking
sensitive info from the heap.

Benchmarking shows that creating a zeroed buffer is much slower compared
with uninitialized buffer, but much faster compared with manually
initializing the buffer with a loop.

BenchmarkMakeAioBuffer-127252674   148.1 ns/op
BenchmarkMakeAioBufferZero-12 262107  4181 ns/op
BenchmarkAioBufferZero-12  17581 68759 ns/op

It is interesting that creating a zeroed buffer is 3 times faster
compared with making a new []byte slice:

BenchmarkMakeAioBufferZero-12 247710  4440 ns/op
BenchmarkMakeByteSlice-12  84117 13733 ns/op

Signed-off-by: Nir Soffer 
---
 golang/aio_buffer.go |  6 ++
 golang/libnbd_020_aio_buffer_test.go | 16 
 2 files changed, 22 insertions(+)

diff --git a/golang/aio_buffer.go b/golang/aio_buffer.go
index 2cd8ceb2..d2e6e350 100644
--- a/golang/aio_buffer.go
+++ b/golang/aio_buffer.go
@@ -38,20 +38,26 @@ type AioBuffer struct {
Punsafe.Pointer
Size uint
 }
 
 // MakeAioBuffer makes a new buffer backed by an uninitialized C allocated
 // array.
 func MakeAioBuffer(size uint) AioBuffer {
return AioBuffer{C.malloc(C.ulong(size)), size}
 }
 
+// MakeAioBuffer makes a new buffer backed by a C allocated array. The
+// underlying buffer is set to zero.
+func MakeAioBufferZero(size uint) AioBuffer {
+   return AioBuffer{C.calloc(C.ulong(1), C.ulong(size)), size}
+}
+
 // FromBytes makes a new buffer backed by a C allocated array, initialized by
 // copying the given Go slice.
 func FromBytes(buf []byte) AioBuffer {
size := len(buf)
ret := MakeAioBuffer(uint(size))
for i := 0; i < len(buf); i++ {
*ret.Get(uint(i)) = buf[i]
}
return ret
 }
diff --git a/golang/libnbd_020_aio_buffer_test.go 
b/golang/libnbd_020_aio_buffer_test.go
index cec74ddc..b3a2a8d9 100644
--- a/golang/libnbd_020_aio_buffer_test.go
+++ b/golang/libnbd_020_aio_buffer_test.go
@@ -51,20 +51,28 @@ func TestAioBuffer(t *testing.T) {
t.Fatalf("Expected %v, got %v", zeroes, buf.Bytes())
}
 
/* Create another buffer from Go slice. */
buf2 := FromBytes(zeroes)
defer buf2.Free()
 
if !bytes.Equal(buf2.Bytes(), zeroes) {
t.Fatalf("Expected %v, got %v", zeroes, buf2.Bytes())
}
+
+   /* Crated a zeroed buffer. */
+   buf3 := MakeAioBufferZero(uint(32))
+   defer buf.Free()
+
+   if !bytes.Equal(buf3.Bytes(), zeroes) {
+   t.Fatalf("Expected %v, got %v", zeroes, buf2.Bytes())
+   }
 }
 
 func TestAioBufferFree(t *testing.T) {
buf := MakeAioBuffer(uint(32))
 
/* Free the underlying C array. */
buf.Free()
 
/* And clear the pointer. */
if buf.P != nil {
@@ -105,20 +113,28 @@ func TestAioBufferGetAfterFree(t *testing.T) {
 const bufferSize uint = 256 * 1024
 
 // Benchmark creating an uninitialized buffer.
 func BenchmarkMakeAioBuffer(b *testing.B) {
for i := 0; i < b.N; i++ {
buf := MakeAioBuffer(bufferSize)
buf.Free()
}
 }
 
+// Benchmark creating zeroed buffer.
+func BenchmarkMakeAioBufferZero(b *testing.B) {
+   for i := 0; i < b.N; i++ {
+   buf := MakeAioBufferZero(bufferSize)
+   buf.Free()
+   }
+}
+
 // Benchmark zeroing a buffer.
 func BenchmarkAioBufferZero(b *testing.B) {
for i := 0; i < b.N; i++ {
buf := MakeAioBuffer(bufferSize)
for i := uint(0); i < bufferSize; i++ {
*buf.Get(i) = 0
}
buf.Free()
}
 }
-- 
2.34.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd v2 3/9] golang: aio_buffer.go: Add missing documentation

2022-02-10 Thread Nir Soffer
Add standard function documentation comments.

The documentation should be available here:
https://pkg.go.dev/libguestfs.org/libnbd#AioBuffer

Signed-off-by: Nir Soffer 
---
 golang/aio_buffer.go | 12 
 1 file changed, 12 insertions(+)

diff --git a/golang/aio_buffer.go b/golang/aio_buffer.go
index 2b77d6ee..2cd8ceb2 100644
--- a/golang/aio_buffer.go
+++ b/golang/aio_buffer.go
@@ -32,37 +32,49 @@ package libnbd
 import "C"
 
 import "unsafe"
 
 /* Asynchronous I/O buffer. */
 type AioBuffer struct {
Punsafe.Pointer
Size uint
 }
 
+// MakeAioBuffer makes a new buffer backed by an uninitialized C allocated
+// array.
 func MakeAioBuffer(size uint) AioBuffer {
return AioBuffer{C.malloc(C.ulong(size)), size}
 }
 
+// FromBytes makes a new buffer backed by a C allocated array, initialized by
+// copying the given Go slice.
 func FromBytes(buf []byte) AioBuffer {
size := len(buf)
ret := MakeAioBuffer(uint(size))
for i := 0; i < len(buf); i++ {
*ret.Get(uint(i)) = buf[i]
}
return ret
 }
 
+// Free deallocates the underlying C allocated array. Using the buffer after
+// Free() will panic.
 func (b *AioBuffer) Free() {
if b.P != nil {
C.free(b.P)
b.P = nil
}
 }
 
+// Bytes copies the underlying C array to Go allocated memory and return a
+// slice. Modifying the returned slice does not modify the underlying buffer
+// backing array.
 func (b *AioBuffer) Bytes() []byte {
return C.GoBytes(b.P, C.int(b.Size))
 }
 
+// Get returns a pointer to a byte in the underlying C array. The pointer can
+// be used to modify the underlying array. The pointer must not be used after
+// calling Free().
 func (b *AioBuffer) Get(i uint) *byte {
return (*byte)(unsafe.Pointer(uintptr(b.P) + uintptr(i)))
 }
-- 
2.34.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd v2 0/9] golang: Safer, easier to use, and faster AioBuffer

2022-02-10 Thread Nir Soffer
Improve AioBuffer to make it safer, easier to use, and faster when integrating
with other Go APIs.

New Go specific APIs:

- MakeAioBufferZero() - creates a new buffer using calloc(), to make it easy
  and efficient to use a zeroed buffer.

- AioBuffer.Slice() - create a slice backed by the underlying buffer without
  copying the contents of the buffer.

Performance improments:

- FromBytes() is 3 time faster
- Code using Bytes() should use Slice() now. aio_copy example shows up to 260%
  speedup.

Improve testing:

- New AioBuffer tests
- New AioBuffer benchmarks

Documention:

- AioBuffer is fully documnted now.

Changes since v1:
- Rename the new test to libnbd_020_ to match the current test numbering
  semantics (Eric)
- We run the benchmarks in make check using very short timeout to keep them
  working without overloading the CI. (Eric)
- Update copyright year (Eric)
- Fix many typos in comments and commit messages (Eric)

v1 was here:
https://listman.redhat.com/archives/libguestfs/2022-January/msg00218.html

Nir Soffer (9):
  golang: tests: Add test for AioBuffer
  golang: aio_buffer.go: Make it safer to use
  golang: aio_buffer.go: Add missing documentation
  golang: aio_buffer.go: Add MakeAioBufferZero()
  golang: aio_buffer.go: Add Slice()
  golang: tests: Use AioBuffer.Slice()
  golang: aio_buffer.go: Speed up FromBytes()
  golang: aio_buffer.go: Benchmark copy flows
  golang: examples: aio_copy: Simplify using AioBuffer

 golang/Makefile.am   |   1 +
 golang/aio_buffer.go |  39 -
 golang/examples/aio_copy/aio_copy.go |  29 +---
 golang/libnbd_020_aio_buffer_test.go | 236 +++
 golang/libnbd_500_aio_pread_test.go  |   2 +-
 golang/libnbd_510_aio_pwrite_test.go |   8 +-
 golang/run-tests.sh  |   5 +
 7 files changed, 286 insertions(+), 34 deletions(-)
 create mode 100644 golang/libnbd_020_aio_buffer_test.go

-- 
2.34.1


___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd v2 2/9] golang: aio_buffer.go: Make it safer to use

2022-02-10 Thread Nir Soffer
If a Go program tries to use AioBuffer after calling AioBuffer.Free(),
the program may silently corrupt data, accessing memory that does not
belong to the buffer any more, or segfault if the address is not mapped.
In the worst case, it can corrupt memory silently. Calling Free() twice
may silently free unrelated memory.

Make the buffer safer to use by Freeing only on the first call and
setting the pointer to nil. This makes multiple calls to Free()
harmless, just like the underlying C.free().

Trying to access Bytes() and Get() after calling Free() will always
panic now, revealing the bug in the program.

Trying to use AioBuffer with libnbd API will likely segfault and panic.
I did not try to test this.

Signed-off-by: Nir Soffer 
---
 golang/aio_buffer.go |  5 +++-
 golang/libnbd_020_aio_buffer_test.go | 41 
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/golang/aio_buffer.go b/golang/aio_buffer.go
index 2bc69a01..2b77d6ee 100644
--- a/golang/aio_buffer.go
+++ b/golang/aio_buffer.go
@@ -46,20 +46,23 @@ func MakeAioBuffer(size uint) AioBuffer {
 func FromBytes(buf []byte) AioBuffer {
size := len(buf)
ret := MakeAioBuffer(uint(size))
for i := 0; i < len(buf); i++ {
*ret.Get(uint(i)) = buf[i]
}
return ret
 }
 
 func (b *AioBuffer) Free() {
-   C.free(b.P)
+   if b.P != nil {
+   C.free(b.P)
+   b.P = nil
+   }
 }
 
 func (b *AioBuffer) Bytes() []byte {
return C.GoBytes(b.P, C.int(b.Size))
 }
 
 func (b *AioBuffer) Get(i uint) *byte {
return (*byte)(unsafe.Pointer(uintptr(b.P) + uintptr(i)))
 }
diff --git a/golang/libnbd_020_aio_buffer_test.go 
b/golang/libnbd_020_aio_buffer_test.go
index 5898746b..cec74ddc 100644
--- a/golang/libnbd_020_aio_buffer_test.go
+++ b/golang/libnbd_020_aio_buffer_test.go
@@ -53,20 +53,61 @@ func TestAioBuffer(t *testing.T) {
 
/* Create another buffer from Go slice. */
buf2 := FromBytes(zeroes)
defer buf2.Free()
 
if !bytes.Equal(buf2.Bytes(), zeroes) {
t.Fatalf("Expected %v, got %v", zeroes, buf2.Bytes())
}
 }
 
+func TestAioBufferFree(t *testing.T) {
+   buf := MakeAioBuffer(uint(32))
+
+   /* Free the underlying C array. */
+   buf.Free()
+
+   /* And clear the pointer. */
+   if buf.P != nil {
+   t.Fatal("Dangling pointer after Free()")
+   }
+
+   /* Additional Free does nothing. */
+   buf.Free()
+}
+
+func TestAioBufferBytesAfterFree(t *testing.T) {
+   buf := MakeAioBuffer(uint(32))
+   buf.Free()
+
+   defer func() {
+   if r := recover(); r == nil {
+   t.Fatal("Did not recover from panic calling Bytes() 
after Free()")
+   }
+   }()
+
+   buf.Bytes()
+}
+
+func TestAioBufferGetAfterFree(t *testing.T) {
+   buf := MakeAioBuffer(uint(32))
+   buf.Free()
+
+   defer func() {
+   if r := recover(); r == nil {
+   t.Fatal("Did not recover from panic calling Get() after 
Free()")
+   }
+   }()
+
+   *buf.Get(0) = 42
+}
+
 // Typical buffer size.
 const bufferSize uint = 256 * 1024
 
 // Benchmark creating an uninitialized buffer.
 func BenchmarkMakeAioBuffer(b *testing.B) {
for i := 0; i < b.N; i++ {
buf := MakeAioBuffer(bufferSize)
buf.Free()
}
 }
-- 
2.34.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd v2 1/9] golang: tests: Add test for AioBuffer

2022-02-10 Thread Nir Soffer
Add unit tests and benchmarks for AioBuffer. The tests are trivial but
they server as running documentation, and they point out important
details about the type.

The benchmarks show the efficiency of allocating a new buffer, zeroing
it, and interfacing with Go code.

These tests will also ensure that we don't break anything by the next
changes.

To run the benchmarks use:

$ go test -run=xxx -bench=.
goos: linux
goarch: amd64
pkg: libguestfs.org/libnbd
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
BenchmarkMakeAioBuffer-126871759   157.2 ns/op
BenchmarkAioBufferZero-12  17551 69552 ns/op
BenchmarkFromBytes-12   9632139112 ns/op
BenchmarkAioBufferBytes-12 69375 16410 ns/op
PASS
ok  libguestfs.org/libnbd   5.843s

To make sure the benchmarks will not break, we run them in "make check"
with a very short timeout. For actual performance testing run "go test"
directly.

Signed-off-by: Nir Soffer 
---
 golang/Makefile.am   |   1 +
 golang/libnbd_020_aio_buffer_test.go | 104 +++
 golang/run-tests.sh  |   5 ++
 3 files changed, 110 insertions(+)
 create mode 100644 golang/libnbd_020_aio_buffer_test.go

diff --git a/golang/Makefile.am b/golang/Makefile.am
index 10fb8934..f170cbc4 100644
--- a/golang/Makefile.am
+++ b/golang/Makefile.am
@@ -19,20 +19,21 @@ include $(top_srcdir)/subdir-rules.mk
 
 source_files = \
aio_buffer.go \
bindings.go \
callbacks.go \
closures.go \
handle.go \
wrappers.go \
wrappers.h \
libnbd_010_load_test.go \
+   libnbd_020_aio_buffer_test.go \
libnbd_100_handle_test.go \
libnbd_110_defaults_test.go \
libnbd_120_set_non_defaults_test.go \
libnbd_200_connect_command_test.go \
libnbd_210_opt_abort_test.go \
libnbd_220_opt_list_test.go \
libnbd_230_opt_info_test.go \
libnbd_240_opt_list_meta_test.go \
libnbd_300_get_size_test.go \
libnbd_400_pread_test.go \
diff --git a/golang/libnbd_020_aio_buffer_test.go 
b/golang/libnbd_020_aio_buffer_test.go
new file mode 100644
index ..5898746b
--- /dev/null
+++ b/golang/libnbd_020_aio_buffer_test.go
@@ -0,0 +1,104 @@
+/* libnbd golang tests
+ * Copyright (C) 2013-2022 Red Hat Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+package libnbd
+
+import (
+   "bytes"
+   "testing"
+)
+
+func TestAioBuffer(t *testing.T) {
+   /* Create a buffer with uninitialized backing array. */
+   buf := MakeAioBuffer(uint(32))
+   defer buf.Free()
+
+   /* Initialize backing array contents. */
+   for i := uint(0); i < buf.Size; i++ {
+   *buf.Get(i) = 0
+   }
+
+   /* Create a slice by copying the backing array contents into Go memory. 
*/
+   b := buf.Bytes()
+
+   zeroes := make([]byte, 32)
+   if !bytes.Equal(b, zeroes) {
+   t.Fatalf("Expected %v, got %v", zeroes, buf.Bytes())
+   }
+
+   /* Modifying returned slice does not modify the buffer. */
+   for i := 0; i < len(b); i++ {
+   b[i] = 42
+   }
+
+   /* Bytes() still returns zeroes. */
+   if !bytes.Equal(buf.Bytes(), zeroes) {
+   t.Fatalf("Expected %v, got %v", zeroes, buf.Bytes())
+   }
+
+   /* Create another buffer from Go slice. */
+   buf2 := FromBytes(zeroes)
+   defer buf2.Free()
+
+   if !bytes.Equal(buf2.Bytes(), zeroes) {
+   t.Fatalf("Expected %v, got %v", zeroes, buf2.Bytes())
+   }
+}
+
+// Typical buffer size.
+const bufferSize uint = 256 * 1024
+
+// Benchmark creating an uninitialized buffer.
+func BenchmarkMakeAioBuffer(b *testing.B) {
+   for i := 0; i < b.N; i++ {
+   buf := MakeAioBuffer(bufferSize)
+   buf.Free()
+   }
+}
+
+// Benchmark zeroing a buffer.
+func BenchmarkAioBufferZero(b *testing.B) {
+   for i := 0; i < b.N; i++ {
+   buf := MakeAioBuffer(bufferSize)
+   for i := uint(0); i < bufferSize; i++ {
+   *buf.Get(i) = 0
+   }
+   b

Re: [Libguestfs] [PATCH libnbd 5/9] golang: aio_buffer.go: Add Slice()

2022-02-10 Thread Nir Soffer
On Tue, Feb 1, 2022 at 3:12 PM Eric Blake  wrote:
>
> On Sun, Jan 30, 2022 at 01:33:33AM +0200, Nir Soffer wrote:
> > AioBuffer.Bytes() cannot be used for coping images from NBD to other
>
> copying
>
> > APis because it copies the entire image. Add a new Slice() function,
> > creating a slice backed by the underling buffer.
>
> underlying
>
> >
> > Using Slice() is efficient, but less safe, like Get(). The returned
> > slice must be used only before calling Free(). This should not be an
> > issue with typical code.
> >
> > Testing show that Slice() is much faster than Bytes() for typical 256k
> > buffer:
> >
> > BenchmarkAioBufferBytes-12   86616 16529 ns/op
> > BenchmarkAioBufferSlice-1210   0.4630 ns/op
> >
> > I modified the aio_copy example to use AioBuffer and complied 2
>
> compiled
>
> > +++ b/golang/libnbd_620_aio_buffer_test.go
> > @@ -44,20 +44,35 @@ func TestAioBuffer(t *testing.T) {
> >   /* Modifying returned slice does not modify the buffer. */
> >   for i := 0; i < len(b); i++ {
> >   b[i] = 42
> >   }
> >
> >   /* Bytes() still returns zeroes. */
> >   if !bytes.Equal(buf.Bytes(), zeroes) {
> >   t.Fatalf("Expected %v, got %v", zeroes, buf.Bytes())
> >   }
> >
> > + /* Creating a slice without copying the underlhing buffer. */
>
> underlying
>
> > + s := buf.Slice()
> > + if !bytes.Equal(s, zeroes) {
> > + t.Fatalf("Expected %v, got %v", zeroes, s)
> > + }
> > +
> > + /* Modifing the slice modifies the underlying buffer. */
>
> Modifying
>
> >  }
> > +
> > +// Benchmark creating a slice without copying the underling buffer.
>
> underlying
>
> > +func BenchmarkAioBufferSlice(b *testing.B) {
> > + buf := MakeAioBuffer(bufferSize)
> > + defer buf.Free()
> > + var r int
> > +
> > + b.ResetTimer()
> > + for i := 0; i < b.N; i++ {
> > + r += len(buf.Slice())
> > + }
> > +}
> > --
> > 2.34.1

Will fix in v2

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 4/9] golang: aio_buffer.go: Add MakeAioBufferZero()

2022-02-10 Thread Nir Soffer
On Tue, Feb 8, 2022 at 10:45 PM Eric Blake  wrote:
>
> On Sun, Jan 30, 2022 at 01:33:32AM +0200, Nir Soffer wrote:
> > Make it easy to create a zeroed buffer via calloc(), preventing leaking
> > sensitive info from the heap.
> >
> > Benchmarking show that creating a zeroed buffer is much slower compared
>
> shows

Will fix

>
> > with uninitialized buffer, but much faster compared with manually
> > initializing the buffer with a loop.
> >
> > BenchmarkMakeAioBuffer-12  7252674   148.1 ns/op
> > BenchmarkMakeAioBufferZero-12   262107  4181 ns/op
> > BenchmarkAioBufferZero-1217581 68759 ns/op
> >
> > It is interesting that creating a zeroed buffer is 3 times faster
> > compared with making a new []byte slice:
> >
> > BenchmarkMakeAioBufferZero-12   247710  4440 ns/op
> > BenchmarkMakeByteSlice-1284117 13733 ns/op
>
> Some of this is due to how much vectorization the standard library
> (whether libc or Go's core libraries) can do when bulk-zeroing
> (zeroing 64 bits, or even a cache line at a time, in an unrolled loop,
> is always going to be more performant than a naive loop of one byte at
> a time).
>
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  golang/aio_buffer.go |  6 ++
> >  golang/libnbd_620_aio_buffer_test.go | 16 
>
> Another file that may fit better in the 0xx naming, especially if we
> decide to duplicate similar functionality into the python or OCaml
> bindings of being able to pre-generate a known-zero buffer for use in
> passing to nbd_pread.
>
> As a helper API, this seems useful.  But do we need any man page
> documentation of a language-specific helper function?

The AioBuffer type is documented here:
https://pkg.go.dev/libguestfs.org/libnbd#AioBuffer

Patch #3 golang: aio_buffer.go: Add missing documentation
adds the missing documentation for the functions,

We can add more documentation for the type. If we need
task-based documentation I think improving libnbd-golang
is the best way.

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 3/9] golang: aio_buffer.go: Add missing documentation

2022-02-10 Thread Nir Soffer
On Tue, Feb 1, 2022 at 3:09 PM Eric Blake  wrote:
>
> On Sun, Jan 30, 2022 at 01:33:31AM +0200, Nir Soffer wrote:
> > Add standard function documentation comments.
> >
> > The documentation should be available here:
> > https://pkg.go.dev/libguestfs.org/libnbd#AioBuffer
> >
> > Signed-off-by: Nir Soffer 
> > ---
>
> >
> > +// MakeAioBuffer makes a new buffer backed by an unitilialized C allocated
>
> uninitialized
>
> > +// array.
> >  func MakeAioBuffer(size uint) AioBuffer {
> >   return AioBuffer{C.malloc(C.ulong(size)), size}
> >  }
> >
> > +// FromBytes makes a new buffer backed by a C allocated array, initialized 
> > by
> > +// copying the given Go slice.
> >  func FromBytes(buf []byte) AioBuffer {
> >   size := len(buf)
> >   ret := MakeAioBuffer(uint(size))
> >   for i := 0; i < len(buf); i++ {
> >   *ret.Get(uint(i)) = buf[i]
> >   }
> >   return ret
> >  }
> >
> > +// Free deallocates the underlying C allocated array. Using the buffer 
> > after
> > +// Free() will panic.
> >  func (b *AioBuffer) Free() {
> >   if b.P != nil {
> >   C.free(b.P)
> >   b.P = nil
> >   }
> >  }
> >
> > +// Bytes copies the underlying C array to Go allocated memory and return a
> > +// slice. Modifying the returned slice does not modify the unerlying buffer
>
> underlying
>
> > +// backking array.
>
> backing
>
> >  func (b *AioBuffer) Bytes() []byte {
> >   return C.GoBytes(b.P, C.int(b.Size))
> >  }
> >
> > +// Get returns a pointer to a byte in the underlying C array. The pointer 
> > can
> > +// be used to modify the underlying array. The pointer must not be used 
> > after
> > +// caling Free().
> >  func (b *AioBuffer) Get(i uint) *byte {
> >   return (*byte)(unsafe.Pointer(uintptr(b.P) + uintptr(i)))
> >  }
> > --
> > 2.34.1

Thanks, will update in v2

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd 1/9] golang: tests: Add test for AioBuffer

2022-02-10 Thread Nir Soffer
On Tue, Feb 8, 2022 at 9:33 PM Eric Blake  wrote:
>
> On Sun, Jan 30, 2022 at 01:33:29AM +0200, Nir Soffer wrote:
> > Add unit tests and benchmarks for AioBuffer. The tests are trivial but
> > they server as running documentation, and they point out important
> > details about the type.
> >
> > The benchmarks how efficient is allocation a new buffer, zeroing it, and
> > interfacing with Go code.
>
> Wording suggestion:
>
> The benchmarks show the efficiency of allocating a new buffer, zeroing
> it, and interfacing with Go code.

Thanks, I will use this.

> >
> > This tests will also ensure that we don't break anything by the next
>
> Either "These tests" or "This test"

Right

> > changes.
> >
> > To run the benchmarks use:
> >
> > $ go test -run=xxx -bench=.
> > goos: linux
> > goarch: amd64
> > pkg: libguestfs.org/libnbd
> > cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
> > BenchmarkMakeAioBuffer-12  6871759   157.2 ns/op
> > BenchmarkAioBufferZero-1217551 69552 ns/op
> > BenchmarkFromBytes-12 9632        139112 ns/op
> > BenchmarkAioBufferBytes-12   69375 16410 ns/op
> > PASS
> > oklibguestfs.org/libnbd   5.843s
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  golang/Makefile.am   |   1 +
> >  golang/libnbd_620_aio_buffer_test.go | 104 +++
> >  2 files changed, 105 insertions(+)
> >  create mode 100644 golang/libnbd_620_aio_buffer_test.go
> >
> > diff --git a/golang/Makefile.am b/golang/Makefile.am
> > index 10fb8934..ae0486dd 100644
> > --- a/golang/Makefile.am
> > +++ b/golang/Makefile.am
> > @@ -37,20 +37,21 @@ source_files = \
> >   libnbd_300_get_size_test.go \
> >   libnbd_400_pread_test.go \
> >   libnbd_405_pread_structured_test.go \
> >   libnbd_410_pwrite_test.go \
> >   libnbd_460_block_status_test.go \
> >   libnbd_500_aio_pread_test.go \
> >   libnbd_510_aio_pwrite_test.go \
> >   libnbd_590_aio_copy_test.go \
> >   libnbd_600_debug_callback_test.go \
> >   libnbd_610_error_test.go \
> > + libnbd_620_aio_buffer_test.go \
>
> As discussed in a different thread, the numbering here groups
> somewhat-related functionality, and helps us keep cross-language tests
> correlated over testing the same features.  Since you aren't adding
> counterpart tests to python or ocaml, I don't know what number would
> be best.  But our existing numbering is more along the lines of 0xx
> for language-level loading, 1xx for NBD handle tests, 2xx for
> connection tests, 3xx for initial APIs after connecting, 4xx for
> synchronous APIs, 5xx for asynch APIs, and 6xx for high-level usage
> patterns.  This feels like it might fit better in the 0xx series,
> since the benchmark does not use any NBD handle.

I agree. When I posted this I did not understand the semantics
and assumed the numbers reflect the order the tests were added.

I'll add this to the 0xx group.

>
> >   $(NULL)
> >
> >  generator_built = \
> >   bindings.go \
> >   closures.go \
> >   wrappers.go \
> >   wrappers.h \
> >   $(NULL)
> >
> >  EXTRA_DIST = \
> > diff --git a/golang/libnbd_620_aio_buffer_test.go 
> > b/golang/libnbd_620_aio_buffer_test.go
> > new file mode 100644
> > index ..2632f87f
> > --- /dev/null
> > +++ b/golang/libnbd_620_aio_buffer_test.go
> > @@ -0,0 +1,104 @@
> > +/* libnbd golang tests
> > + * Copyright (C) 2013-2021 Red Hat Inc.
>
> You may want to add 2022.
>
> Take the rest of my review with a grain of salt; I'm not (yet?) a Go expert.
>
> > +
> > +package libnbd
> > +
> > +import (
> > + "bytes"
> > + "testing"
> > +)
> > +
> > +func TestAioBuffer(t *testing.T) {
> > + /* Create a buffer with unitinialized backing array. */
>
> uninitialized
>
> > + buf := MakeAioBuffer(uint(32))
> > + defer buf.Free()
> > +
> > + /* Initialize backing array contents. */
> > + for i := uint(0); i < buf.Size; i++ {
> > + *buf.Get(i) = 0
> > + }
> > +
> > + /* Create a slice by copying the backing arrary contents into Go 
> > memory. */
> > + b := buf.Bytes()
> > +
> > + zeroes := make([]byte, 32)
> > + if !bytes.Equal(b, zeroes) {
> > + t.Fatalf("Expected %v, got %v", zero

Re: [Libguestfs] [PATCH libnbd] generator/Go.ml: Simplify copy_uint32_array

2022-02-10 Thread Nir Soffer
On Tue, Feb 8, 2022 at 9:23 PM Eric Blake  wrote:
>
> On Sun, Feb 06, 2022 at 07:45:20PM +0200, Nir Soffer wrote:
> > Create a slice backed up by the entries pointer, and copy the data with
> > builtin copy(). This can be 3x times faster but I did not measure it.
> >
> > Eric posted a similar patch[1] last year, but the patch seems to be
> > stuck with unrelated incomplete work.
> >
> > [1] 
> > https://listman.redhat.com/archives/libguestfs/2021-December/msg00065.html
>
> Your version looks slightly nicer than mine.  ACK.

Thanks, pushed as 6725fa0e129f9a60d7b89707ef8604e0aeeeaf43

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH 5/5] output/rhv-upload-plugin: Keep connections alive

2022-02-10 Thread Nir Soffer
On Tue, Feb 8, 2022 at 5:24 PM Richard W.M. Jones  wrote:
...
>
> ACK 4 & 5.

Push series without patch 2 as

99b6e31b output/rhv-upload-plugin: Keep connections alive
02d2236b output/rhv-upload-plugin: Track http last request time
a436a0dc output/rhv-upload-plugin: Extract send_flush() helper
6e4f3270 output/rhv-upload-plugin: Fix flush and close

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH v2v v2] output: -o rhv-upload: Kill nbdkit instances before finalizing

2022-02-09 Thread Nir Soffer
On Wed, Feb 9, 2022 at 12:32 PM Richard W.M. Jones  wrote:
>
> If nbdkit instance(s) are still running then they will hold open some
> http connections to imageio.  In some versions of imageio, starting
> finalization in this state causes a 60 second timeout.
>
> See-also: 
> https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html
> Thanks: Nir Soffer
> ---
>  output/output_rhv_upload.ml | 39 -
>  1 file changed, 30 insertions(+), 9 deletions(-)
>
> diff --git a/output/output_rhv_upload.ml b/output/output_rhv_upload.ml
> index 4d8dc1c135..b500551c5f 100644
> --- a/output/output_rhv_upload.ml
> +++ b/output/output_rhv_upload.ml
> @@ -280,9 +280,22 @@ e command line has to match the number of guest disk 
> images (for this guest: %d)
>  ignore (Python_script.run_command cancel_script json_params [])
>in
>
> -  (* Set up an at-exit handler so we delete the orphan disks on failure. *)
> +  (* Set up an at-exit handler to perform some cleanups.
> +   * - Kill nbdkit PIDs (only before finalization).
> +   * - Delete the orphan disks (only on conversion failure).
> +   *)
> +  let nbdkit_pids = ref [] in
>On_exit.f (
>  fun () ->
> +  (* Kill the nbdkit PIDs. *)
> +  List.iter (
> +fun pid ->
> +  try
> +kill pid Sys.sigterm
> +  with exn -> debug "%s" (Printexc.to_string exn)
> +  ) !nbdkit_pids;
> +  nbdkit_pids := [];
> +
>(* virt-v2v writes v2vdir/done on success only. *)
>let success = Sys.file_exists (dir // "done") in
>if not success then (
> @@ -351,11 +364,7 @@ e command line has to match the number of guest disk 
> images (for this guest: %d)
>if is_ovirt_host then
>  Nbdkit.add_arg cmd "is_ovirt_host" "true";
>let _, pid = Nbdkit.run_unix ~socket cmd in
> -
> -  (* --exit-with-parent should ensure nbdkit is cleaned
> -   * up when we exit, but it's not supported everywhere.
> -   *)
> -  On_exit.kill pid;
> +  List.push_front pid nbdkit_pids
>) (List.combine disks disk_uuids);
>
>(* Stash some data we will need during finalization. *)
> @@ -363,7 +372,7 @@ e command line has to match the number of guest disk 
> images (for this guest: %d)
>let t = (disk_sizes : int64 list), disk_uuids, !transfer_ids,
>finalize_script, createvm_script, json_params,
>rhv_storagedomain_uuid, rhv_cluster_uuid,
> -  rhv_cluster_cpu_architecture, rhv_cluster_name in
> +  rhv_cluster_cpu_architecture, rhv_cluster_name, nbdkit_pids in
>t
>
>  and rhv_upload_finalize dir source inspect target_meta
> @@ -374,7 +383,8 @@ and rhv_upload_finalize dir source inspect target_meta
>  (disk_sizes, disk_uuids, transfer_ids,
>   finalize_script, createvm_script, json_params,
>   rhv_storagedomain_uuid, rhv_cluster_uuid,
> - rhv_cluster_cpu_architecture, rhv_cluster_name) =
> + rhv_cluster_cpu_architecture, rhv_cluster_name,
> + nbdkit_pids) =
>(* Check the cluster CPU arch matches what we derived about the
> * guest during conversion.
> *)
> @@ -386,6 +396,17 @@ and rhv_upload_finalize dir source inspect target_meta
>rhv_cluster_name target_meta.guestcaps.gcaps_arch arch
>);
>
> +  (* We must kill all our nbdkit instances before finalizing the
> +   * transfer.  See:
> +   * 
> https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html
> +   *
> +   * We want to fail here if the kill fails because nbdkit
> +   * died already, as that would be unexpected.
> +   *)
> +  List.iter (fun pid -> kill pid Sys.sigterm) !nbdkit_pids;
> +  List.iter (fun pid -> ignore (waitpid [] pid)) !nbdkit_pids;

Do we kill all nbdkits instances or only rhv-upload-plugin? For example
for vdsm output we try to access nbkit during finalization.

What if the process handles the SIGTERM and does not exit?
One example is a deadlock during termination.

It will be more robust to wait for termination for a short time, and repeat
the process with SIGKILL. Or just use SIGKILL to avoid the extra step.
If we handle removal of pid files and sockets, there is no need for clean
shutdown.

> +  nbdkit_pids := []; (* Don't kill them again in the On_exit handler. *)
> +
>(* Finalize all the transfers. *)
>let json_params =
>  let ids = List.map (fun id -> JSON.String id) transfer_ids in
> @@ -442,7 +463,7 @@ module RHVUpload = struct
>type t = int64 list * string list * string list *
&

Re: [Libguestfs] [PATCH libnbd] golang: make-dist.sh: Generate the list file

2022-02-08 Thread Nir Soffer
On Mon, Feb 7, 2022 at 3:13 PM Eric Blake  wrote:
>
> On Sun, Feb 06, 2022 at 07:21:02PM +0200, Nir Soffer wrote:
> > Generated the list file when creating the distribution. Since the Go
> > tool treat the list file on the proxy server as the source of truth, we
> > do the same. The new list file is created by downloading the current
> > list file, sorting it, and appending the current version.
> >
> > Creating a distribution tarball requires now access to
> > download.libguestfs.org.
> >
> > With this change the distribution tarball can be extract on the server
> > without any additional manual process.
> >
> > Signed-off-by: Nir Soffer 
> > ---
>
> >
> > +# Create the list file by amending the curent file on the server.
> > +list_url=https://download.libguestfs.org/libnbd/golang/libguestfs.org/libnbd/@v/list
> > +curl --silent --show-error "$list_url" | sort > $v_dir/list
>
> Do we want to use 'LC_ALL=C sort' for deterministic sorting, rather
> than facing differences when different maintainers use locales with
> slightly different collation rules?

Yes, sounds good.

> Because the curl command is piped to sort, we don't exit the script
> with non-zero status if the curl step fails.  Is that problematic?

This will create a corrupted list file - need to fix.

Thanks!

> > +grep -q "$version" $v_dir/list || echo "$version" >> $v_dir/list
> > +
> >  # Create tarball to upload and extract on the webserver. It should be
> >  # extracted in the directory pointed by the "go-import" meta tag.
> >  output=$PWD/libnbd-golang-$version.tar.gz
> >  tar czf $output libguestfs.org
> >
> >  rm -rf libguestfs.org
> >
> >  echo output written to $output
> > --
> > 2.34.1
> >
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] Performance regression in -o rhv-upload questions

2022-02-08 Thread Nir Soffer
On Tue, Feb 8, 2022 at 3:27 PM Richard W.M. Jones  wrote:
>
> Hi Nir,
>
> https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c4
>
> I'm looking for some advice/help with a performance regression in
> virt-v2v between 1.44.2 and the latest version.  It's very obvious and
> reproducible when I do a conversion from a local disk to local RHV
> using -o rhv-upload, specifically:
>
> $ time ./run virt-v2v -i disk /var/tmp/fedora-35.qcow2 -o rhv-upload -oc 
> https://ovirt4410.home.annexia.org/ovirt-engine/api -op /tmp/ovirt-passwd -oo 
> rhv-direct -os ovirt-data -on test5 -of raw
>
> The guest is an ordinary Fedora guest containing some junk data.
>
> Now that I've been able to reproduce the problem locally, it turns out
> to be not at all what I thought it was going to be.  The timings show
> that:
>
>  - new virt-v2v is much faster doing the copy, but
>
>  - new virt-v2v takes ages finalizing the transfer (about 10x longer
>than the old version)
>
> virt-v2v 1.44.2:
> Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.44.2-rhv-upload.log
>
>   [  63.1] Copying disk 1/1
>   ...
>   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finalizing_success
>   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finished_success
>   transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a finalized in 6.105 seconds
>   [  89.2] Creating output metadata
>   [  89.8] Finishing off
>
> virt-v2v 1.45.97:
> Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.45.97-rhv-upload.log
>
>   [  71.0] Copying disk 1/1
>   [  82.7] Creating output metadata
>   ...
>   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finalizing_success
>   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finished_success
>   transfer 6ea3d724-16f9-4bda-a33e-69a783480abc finalized in 61.552 seconds
>   [ 144.9] Finishing off

This happens because virt-v2v try to finalize the transfer *before* closing the
connections to imageio server.

Current imageio release mark the a ticket as canceled, but will not remove it
if the ticket has open connections from clients. If the clients are
idle, the connection
closes after 60 seconds, so when engine tries again to remove the ticket, the
operation succeeds.

Upstream version improved this flow to wait only for ongoing requests. If there
are no ongoing requests the ticket is removed immediately, ignoring the open
connections. The connections will be closed when the client closes the
connection
on when reading from the socket times out.

You can test upstream version from here:
https://github.com/oVirt/ovirt-imageio/actions/runs/1811058094

But if you want to be compatible with released imageio version, you
must close the connections to imageio *before* finalizing the transfers.

We already discussed this issue last year, when we handled the bug
when vddk blocks for many minutes during block status and imageio
connection is droped, and you opened a bug for this.

> It's not a completely like-for-like comparison because the rhv-upload
> backend changed a lot between these versions.  In particular if I was
> going to pick some suspicious change it would be my refactoring here
> which was supposed to be neutral but maybe isn't:
>
>   
> https://github.com/libguestfs/virt-v2v/commit/143a22860216b94d3a81706193088d50c03fc35c
>
> Unfortunately this commit is almost impossible to revert because of
> deep restructuring of surrounding code.

The refactoring is not the issue, the issue is that we do not terminate
the output nbdkit before finalizing.

For some output we cannot do this since we want to query nbdkit for
more info (e.g block status) but for rhv upload we don't have anything
to do with nbkdit instance and we must close the connection before
we finalize, so we should close the output right after we finish the copy.

> Another idea:
>
> Old virt-v2v uses qemu-img convert which does not flush by default.
> New virt-v2v uses nbdcopy with the --flush option, so it will call
> imageio PATCH /path ... "op":"flush" at the end.  However removing
> nbdcopy --flush didn't help very much (only a couple of seconds off).

Calling flush is right, we cannot remove it. Although there is bug in nbdkit,
and if flushes during close() flow even if you did not send a flush.

> Do you have any other ideas?
>
> What exactly does imageio do during the finalizing_success state?

The flow is this:
- engine send request to vdsm to delete the ticket
- vdsm connects to imageio control socket and send DELETE /tickets/{ticket-id}
- imageio mark the ticket as canceled, so no new request can succeed
  and no new connection to attach to the ticket
- imageio mark all ongoing requests as canceled, so if the request is in
  read/write loop, it will abort on the the next read/write complete
- upstream: imageio waits until all ongoing operations complete
- release: imageio waits until all connections are closed
- if the ticket could not be removed in 60 seconds, imageio returns 409 Conflict
- imageio returns 200 OK
- engine retries the delete if it 

Re: [Libguestfs] [libnbd PATCH v2 1/5] python: tests: Fix error handling

2022-02-06 Thread Nir Soffer
On Fri, Feb 4, 2022 at 10:37 AM Laszlo Ersek  wrote:
>
> On 02/03/22 23:49, Nir Soffer wrote:
> > On Thu, Feb 3, 2022 at 10:26 PM Eric Blake  wrote:
> >>
> >> Like a lot of the C examples, the aio copy test ignores read and write
> >> errors in the completion callback, which can cause silent data
> >> corruption. The failure in the test is not critical, but this is a bad
> >> example that may be copied by developers to a real application.
> >>
> >> The test dies with an assertion failure now if completion callback
> >> fails.  Tested with the temporary patch of:
> >>
> >> | diff --git i/python/t/590-aio-copy.py w/python/t/590-aio-copy.py
> >> | index 861fa6c8..4cd64d83 100644
> >> | --- i/python/t/590-aio-copy.py
> >> | +++ w/python/t/590-aio-copy.py
> >> | @@ -117,7 +117,8 @@ src.set_handle_name("src")
> >> |  dst = nbd.NBD()
> >> |  dst.set_handle_name("dst")
> >> |  src.connect_command(["nbdkit", "-s", "--exit-with-parent", "-r",
> >> | - "pattern", "size=%d" % disk_size])
> >> | + "--filter=error", "pattern", "error-pread-rate=1",
> >> | + "size=%d" % disk_size])
> >> |  dst.connect_command(["nbdkit", "-s", "--exit-with-parent",
> >> |   "memory", "size=%d" % disk_size])
> >> |  asynch_copy(src, dst)
> >> ---
> >>  python/t/590-aio-copy.py | 4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/python/t/590-aio-copy.py b/python/t/590-aio-copy.py
> >> index 6cde5934..861fa6c8 100644
> >> --- a/python/t/590-aio-copy.py
> >> +++ b/python/t/590-aio-copy.py
> >> @@ -1,5 +1,5 @@
> >>  # libnbd Python bindings
> >> -# Copyright (C) 2010-2019 Red Hat Inc.
> >> +# Copyright (C) 2010-2022 Red Hat Inc.
> >>  #
> >>  # This program is free software; you can redistribute it and/or modify
> >>  # it under the terms of the GNU General Public License as published by
> >> @@ -36,6 +36,7 @@ def asynch_copy(src, dst):
> >>  # This callback is called when any pread from the source
> >>  # has completed.
> >>  def read_completed(buf, offset, error):
> >> +assert not error
> >
> > This works for the test, since the test is not compiled
> > to pyc file, which removes the asserts (like C -DNODBUG)
> > by default when building rpms.
> >
> > If someone will copy this to a real application they will have no
> > error checking.
>
> I consider this a catastrophic bug in python, to be honest. Eliminating
> assertions should never be done without an explicit request from whoever
> builds the package.

I checked this and asserts are not removed automatically.

They are removed only when using the -O or -OO options:

$ python -O -c 'assert 0; print("assert was removed")'
assert was removed

Or:

$ PYTHONOPTIMIZE=1 python -c 'assert 0; print("assert was removed")'
assert was removed

Or when compiling modules, if you use the -o1 argument:

$ cat test.py
assert 0
print("assert was removed")

$ python -m compileall -o1 test.py
Compiling 'test.py'...

$ python __pycache__/test.cpython-310.opt-1.pyc
assert was removed

So this is similar to -DNODEBUG, but unlike a compiled program, asserts
can be removed at runtime without your control.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [libguestfs/libguestfs] guestfs_copy_in fails with error: copy_in: tar subprocess failed: tar: .: file changed as we read it: errno 0 (Issue #75)

2022-02-06 Thread Nir Soffer
On Thu, Jan 27, 2022 at 10:58 AM Richard W.M. Jones  wrote:
>
> On Wed, Jan 26, 2022 at 09:31:14PM -0800, anemade wrote:
> > I am using libguestfs Golang binding APIs(version 1.44)
> > Followed this document https://libguestfs.org/guestfs-golang.3.html to 
> > create
> > the disk, add the disk, format the partition, launch the appliance and
> > performing some copy_in and copy_out operations.
> >
> > While doing copy_in, I am seeing this strange issue
> >
> > error: copy_in: tar subprocess failed: tar: .: file changed as we read it:
> > errno 0
> >
> > This issue is not all time reproducible. It comes like 1 out of 10
> > runs. In my case, data is stable while doing copy_in. There is
> > absolutely no change in the data while guestfs_copy_in operation
> > going on.  Any leads to issue or anything that I need to understand
> > for copy_in or copy_out?
>
> Can you share exactly how you are using copy_in?  A small
> reproducer would be good.

Do you copy from GlusterFS mount?

Gluster has this bug:
https://bugzilla.redhat.com/1104618

Which was fixed by adding a new configuration:

cluster.consistent-metadata on

but the old configuration

cluster.consistent-metadata no

is still the default in some cases.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [PATCH libnbd] golang: make-dist.sh: Generate the list file

2022-02-06 Thread Nir Soffer
On Sun, Feb 6, 2022 at 9:13 PM Richard W.M. Jones  wrote:
>
> On Sun, Feb 06, 2022 at 07:21:02PM +0200, Nir Soffer wrote:
> > Generated the list file when creating the distribution. Since the Go
> > tool treat the list file on the proxy server as the source of truth, we
> > do the same. The new list file is created by downloading the current
> > list file, sorting it, and appending the current version.
> >
> > Creating a distribution tarball requires now access to
> > download.libguestfs.org.
> >
> > With this change the distribution tarball can be extract on the server
> > without any additional manual process.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  golang/make-dist.sh | 16 +---
> >  1 file changed, 5 insertions(+), 11 deletions(-)
> >
> > diff --git a/golang/make-dist.sh b/golang/make-dist.sh
> > index a590e6c6..5fe006ff 100755
> > --- a/golang/make-dist.sh
> > +++ b/golang/make-dist.sh
> > @@ -86,48 +86,42 @@ rm -rf libguestfs.org
> >  #
> >  # libguestfs.org
> >  # └── libnbd
> >  #├── @latest
> >  #└── @v
> >  #├── list
> >  #├── v1.11.4.info
> >  #├── v1.11.4.mod
> >  #└── v1.11.4.zip
> >  #
> > -# We create @latest and @v/*{.info,mod,zip} here.
> > -#
> > -# The "@v/list" file must be created on the web server after uploading
> > -# a new release:
> > -#
> > -# $ cd libguestfs.org/libnbd/@v
> > -# $ ls -1 v*.info | awk -F.info '{print $1}' > list
> > -# $ cat list
> > -# v1.11.3
> > -# v1.11.4
> > -#
> >  # See https://golang.org/ref/mod#serving-from-proxy
> >
> >  module_dir=libguestfs.org/libnbd
> >  v_dir=$module_dir/@v
> >
> >  mkdir -p $v_dir
> >
> >  # Go wants a string in RFC 3339 format, git strict ISO 8601 format is
> >  # compatible.
> >  info="{
> >\"Version\": \"$version\",
> >\"Time\": \"$(git show -s --format=%cI)\"
> >  }"
> >  echo "$info" > $module_dir/@latest
> >  echo "$info" > $v_dir/$version.info
> >
> >  cp go.mod $v_dir/$version.mod
> >  mv $version.zip $v_dir
> >
> > +# Create the list file by amending the curent file on the server.
> > +list_url=https://download.libguestfs.org/libnbd/golang/libguestfs.org/libnbd/@v/list
> > +curl --silent --show-error "$list_url" | sort > $v_dir/list
> > +grep -q "$version" $v_dir/list || echo "$version" >> $v_dir/list
> > +
> >  # Create tarball to upload and extract on the webserver. It should be
> >  # extracted in the directory pointed by the "go-import" meta tag.
> >  output=$PWD/libnbd-golang-$version.tar.gz
> >  tar czf $output libguestfs.org
> >
> >  rm -rf libguestfs.org
> >
> >  echo output written to $output
>
> Yes this seems a reasonable approach.
>
> ACK

Pushed as 3895a43f3e9f00f1d612d619caf62adb7ace2772

This should be tested with the actual server, so make sure it
will work for the next release.

Can you do a build from current git and upload it to the server?

Nir


___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

[Libguestfs] [PATCH libnbd] generator/Go.ml: Simplify copy_uint32_array

2022-02-06 Thread Nir Soffer
Create a slice backed up by the entries pointer, and copy the data with
builtin copy(). This can be 3x times faster but I did not measure it.

Eric posted a similar patch[1] last year, but the patch seems to be
stuck with unrelated incomplete work.

[1] https://listman.redhat.com/archives/libguestfs/2021-December/msg00065.html

Signed-off-by: Nir Soffer 
---
 generator/GoLang.ml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/generator/GoLang.ml b/generator/GoLang.ml
index eb3aa263..73838199 100644
--- a/generator/GoLang.ml
+++ b/generator/GoLang.ml
@@ -1,13 +1,13 @@
 (* hey emacs, this is OCaml code: -*- tuareg -*- *)
 (* nbd client library in userspace: generator
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
  * License as published by the Free Software Foundation; either
  * version 2 of the License, or (at your option) any later version.
  *
  * This library is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  * Lesser General Public License for more details.
@@ -508,24 +508,24 @@ package libnbd
 #include \"wrappers.h\"
 */
 import \"C\"
 
 import \"unsafe\"
 
 /* Closures. */
 
 func copy_uint32_array (entries *C.uint32_t, count C.size_t) []uint32 {
 ret := make([]uint32, int (count))
-for i := 0; i < int (count); i++ {
-   entry := (*C.uint32_t) (unsafe.Pointer(uintptr(unsafe.Pointer(entries)) 
+ (unsafe.Sizeof(*entries) * uintptr(i
-   ret[i] = uint32 (*entry)
-}
+// See 
https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
+// TODO: Use unsafe.Slice() when we require Go 1.17.
+s := (*[1<<30]uint32)(unsafe.Pointer(entries))[:count:count]
+copy(ret, s)
 return ret
 }
 ";
 
   List.iter (
 fun { cbname; cbargs } ->
   let uname = camel_case cbname in
   pr "type %sCallback func (" uname;
   let comma = ref false in
   List.iter (
-- 
2.34.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH libnbd] golang: make-dist.sh: Generate the list file

2022-02-06 Thread Nir Soffer
Generated the list file when creating the distribution. Since the Go
tool treat the list file on the proxy server as the source of truth, we
do the same. The new list file is created by downloading the current
list file, sorting it, and appending the current version.

Creating a distribution tarball requires now access to
download.libguestfs.org.

With this change the distribution tarball can be extract on the server
without any additional manual process.

Signed-off-by: Nir Soffer 
---
 golang/make-dist.sh | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/golang/make-dist.sh b/golang/make-dist.sh
index a590e6c6..5fe006ff 100755
--- a/golang/make-dist.sh
+++ b/golang/make-dist.sh
@@ -86,48 +86,42 @@ rm -rf libguestfs.org
 #
 # libguestfs.org
 # └── libnbd
 #├── @latest
 #└── @v
 #├── list
 #├── v1.11.4.info
 #├── v1.11.4.mod
 #└── v1.11.4.zip
 #
-# We create @latest and @v/*{.info,mod,zip} here.
-#
-# The "@v/list" file must be created on the web server after uploading
-# a new release:
-#
-# $ cd libguestfs.org/libnbd/@v
-# $ ls -1 v*.info | awk -F.info '{print $1}' > list
-# $ cat list
-# v1.11.3
-# v1.11.4
-#
 # See https://golang.org/ref/mod#serving-from-proxy
 
 module_dir=libguestfs.org/libnbd
 v_dir=$module_dir/@v
 
 mkdir -p $v_dir
 
 # Go wants a string in RFC 3339 format, git strict ISO 8601 format is
 # compatible.
 info="{
   \"Version\": \"$version\",
   \"Time\": \"$(git show -s --format=%cI)\"
 }"
 echo "$info" > $module_dir/@latest
 echo "$info" > $v_dir/$version.info
 
 cp go.mod $v_dir/$version.mod
 mv $version.zip $v_dir
 
+# Create the list file by amending the curent file on the server.
+list_url=https://download.libguestfs.org/libnbd/golang/libguestfs.org/libnbd/@v/list
+curl --silent --show-error "$list_url" | sort > $v_dir/list
+grep -q "$version" $v_dir/list || echo "$version" >> $v_dir/list
+
 # Create tarball to upload and extract on the webserver. It should be
 # extracted in the directory pointed by the "go-import" meta tag.
 output=$PWD/libnbd-golang-$version.tar.gz
 tar czf $output libguestfs.org
 
 rm -rf libguestfs.org
 
 echo output written to $output
-- 
2.34.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [libnbd PATCH v2 5/5] copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails

2022-02-03 Thread Nir Soffer
On Thu, Feb 3, 2022 at 10:26 PM Eric Blake  wrote:
>
> nbdcopy has a nasty bug when performing multi-threaded copies using
> asynchronous nbd calls - it was blindly treating the completion of an
> asynchronous command as successful, rather than checking the *error
> parameter.  This can result in the silent creation of a corrupted
> image in two different ways: when a read fails, we blindly wrote
> garbage to the destination; when a write fails, we did not flag that
> the destination was not written.
>
> Since nbdcopy already calls exit() on a synchronous read or write
> failure to a file, doing the same for an asynchronous op to an NBD
> server is the simplest solution.  A nicer solution, but more invasive
> to code and thus not done here, might be to allow up to N retries of
> the transaction (in case the read or write failure was transient), or
> even having a mode where as much data is copied as possible (portions
> of the copy that failed would be logged on stderr, and nbdcopy would
> still fail with a non-zero exit status, but this would copy more than
> just stopping at the first error, as can be done with rsync or
> ddrescue).

Maybe this belongs to the nbdresuce tool :-)

> Note that since we rely on auto-retiring and do NOT call
> nbd_aio_command_completed, our completion callbacks must always return
> 1 (if they do not exit() first), even when acting on *error, so as not
> leave the command allocated until nbd_close.  As such, there is no
> sane way to return an error to a manual caller of the callback, and
> therefore we can drop dead code that calls perror() and exit() if the
> callback "failed".  It is also worth documenting the contract on when
> we must manually call the callback during the asynch_zero callback, so
> that we do not leak or double-free the command; thankfully, all the
> existing code paths were correct.
>
> The added testsuite script demonstrates several scenarios, some of
> which fail without the rest of this patch in place, and others which
> showcase ways in which sparse images can bypass errors.
>
> Once backports are complete, a followup patch on the main branch will
> edit docs/libnbd-security.pod with the mailing list announcement of
> the stable branch commit ids and release versions that incorporate
> this fix.
>
> Reported-by: Nir Soffer 
> Fixes: bc896eec4d ("copy: Implement multi-conn, multiple threads, multiple 
> requests in flight.", v1.5.6)
> Fixes: https://bugzilla.redhat.com/2046194
> ---
>  TODO|  1 +
>  copy/Makefile.am|  4 +-
>  copy/copy-nbd-error.sh  | 78 +
>  copy/file-ops.c | 17 +++-
>  copy/multi-thread-copying.c | 15 +++
>  copy/nbdcopy.h  |  7 ++--
>  copy/null-ops.c | 10 +
>  7 files changed, 108 insertions(+), 24 deletions(-)
>  create mode 100755 copy/copy-nbd-error.sh
>
> diff --git a/TODO b/TODO
> index da157942..7c9c15e2 100644
> --- a/TODO
> +++ b/TODO
> @@ -33,6 +33,7 @@ nbdcopy:
>   - Better page cache usage, see nbdkit-file-plugin options
> fadvise=sequential cache=none.
>   - Consider io_uring if there are performance bottlenecks.
> + - Configurable retries in response to read or write failures.
>
>  nbdfuse:
>   - If you write beyond the end of the virtual file, it returns EIO.
> diff --git a/copy/Makefile.am b/copy/Makefile.am
> index f2100853..e729f86a 100644
> --- a/copy/Makefile.am
> +++ b/copy/Makefile.am
> @@ -1,5 +1,5 @@
>  # nbd client library in userspace
> -# Copyright (C) 2020 Red Hat Inc.
> +# Copyright (C) 2020-2022 Red Hat Inc.
>  #
>  # This library is free software; you can redistribute it and/or
>  # modify it under the terms of the GNU Lesser General Public
> @@ -33,6 +33,7 @@ EXTRA_DIST = \
> copy-nbd-to-small-nbd-error.sh \
> copy-nbd-to-sparse-file.sh \
> copy-nbd-to-stdout.sh \
> +   copy-nbd-error.sh \
> copy-progress-bar.sh \
> copy-sparse.sh \
> copy-sparse-allocated.sh \
> @@ -124,6 +125,7 @@ TESTS += \
> copy-stdin-to-nbd.sh \
> copy-stdin-to-null.sh \
> copy-nbd-to-stdout.sh \
> +   copy-nbd-error.sh \
> copy-progress-bar.sh \
> copy-sparse.sh \
> copy-sparse-allocated.sh \
> diff --git a/copy/copy-nbd-error.sh b/copy/copy-nbd-error.sh
> new file mode 100755
> index ..89f0a2f1
> --- /dev/null
> +++ b/copy/copy-nbd-error.sh
> @@ -0,0 +1,78 @@
> +#!/usr/bin/env bash
> +# nbd client library in userspace
> +# Copyright (C) 2022 Red Hat Inc.
> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the te

Re: [Libguestfs] [libnbd PATCH v2 4/5] copy: Pass in dummy variable rather than to callback

2022-02-03 Thread Nir Soffer
On Thu, Feb 3, 2022 at 11:46 PM Richard W.M. Jones  wrote:
>
> On Thu, Feb 03, 2022 at 02:25:57PM -0600, Eric Blake wrote:
> > In several places where asynch handlers manually call the provided
> > nbd_completion_callback, the value of errno is indeterminate (for
> > example, in file-ops.c:file_asynch_read(), the previous call to
> > file_synch_read() already triggered exit() on error, but does not
> > guarantee what is left in errno on success).  As the callback should
> > be paying attention to the value of *error (to be fixed in the next
> > patch), we are better off ensuring that we pass in a pointer to a
> > known-zero value; at which point, it is easier to use a dummy variable
> > on the stack than to mess around with errno and it's magic macro
> > expansion into a thread-local storage location.
> >
> > Note that several callsites then check if the callback returned -1,
> > and if so assume that the callback has caused errno to now have a sane
> > value to pass on to perror.  In theory, the fact that we are no longer
> > passing in  means that if the callback assigns into *error but
> > did not otherwise affect errno, our perror call would no longer
> > reflect that value.  But in practice, since the callback never
> > actually returned -1, nor even assigned into *error, the call to
> > perror is dead code; although I have chosen to defer that additional
> > cleanup to the next patch.
>
> And I guess another reason not to use  is that it could be
> updated by random system calls in the same thread, perhaps even after
> we have set it.  It sounds like  would always be wrong.

I agree, using global (even if thread local) is risky.

>
> >  copy/file-ops.c | 17 ++---
> >  copy/multi-thread-copying.c |  8 +---
> >  copy/null-ops.c | 12 +++-
> >  3 files changed, 22 insertions(+), 15 deletions(-)
> >
> > diff --git a/copy/file-ops.c b/copy/file-ops.c
> > index 84704341..e37a5014 100644
> > --- a/copy/file-ops.c
> > +++ b/copy/file-ops.c
> > @@ -1,5 +1,5 @@
> >  /* NBD client library in userspace.
> > - * Copyright (C) 2020 Red Hat Inc.
> > + * Copyright (C) 2020-2022 Red Hat Inc.
> >   *
> >   * This library is free software; you can redistribute it and/or
> >   * modify it under the terms of the GNU Lesser General Public
> > @@ -587,10 +587,11 @@ file_asynch_read (struct rw *rw,
> >struct command *command,
> >nbd_completion_callback cb)
> >  {
> > +  int dummy = 0;
> > +
> >file_synch_read (rw, slice_ptr (command->slice),
> > command->slice.len, command->offset);
> > -  errno = 0;
> > -  if (cb.callback (cb.user_data, ) == -1) {
> > +  if (cb.callback (cb.user_data, ) == -1) {
> >  perror (rw->name);
> >  exit (EXIT_FAILURE);
> >}
> > @@ -601,10 +602,11 @@ file_asynch_write (struct rw *rw,
> > struct command *command,
> > nbd_completion_callback cb)
> >  {
> > +  int dummy = 0;
> > +
> >file_synch_write (rw, slice_ptr (command->slice),
> >  command->slice.len, command->offset);
> > -  errno = 0;
> > -  if (cb.callback (cb.user_data, ) == -1) {
> > +  if (cb.callback (cb.user_data, ) == -1) {
> >  perror (rw->name);
> >  exit (EXIT_FAILURE);
> >}
> > @@ -614,10 +616,11 @@ static bool
> >  file_asynch_zero (struct rw *rw, struct command *command,
> >nbd_completion_callback cb, bool allocate)
> >  {
> > +  int dummy = 0;
> > +
> >if (!file_synch_zero (rw, command->offset, command->slice.len, allocate))
> >  return false;
> > -  errno = 0;
> > -  if (cb.callback (cb.user_data, ) == -1) {
> > +  if (cb.callback (cb.user_data, ) == -1) {
> >  perror (rw->name);
> >  exit (EXIT_FAILURE);
> >}
> > diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c
> > index b17ca598..815b8a02 100644
> > --- a/copy/multi-thread-copying.c
> > +++ b/copy/multi-thread-copying.c
> > @@ -1,5 +1,5 @@
> >  /* NBD client library in userspace.
> > - * Copyright (C) 2020 Red Hat Inc.
> > + * Copyright (C) 2020-2022 Red Hat Inc.
> >   *
> >   * This library is free software; you can redistribute it and/or
> >   * modify it under the terms of the GNU Lesser General Public
> > @@ -393,6 +393,7 @@ finished_read (void *vp, int *error)
> >  bool last_is_hole = false;
> >  uint64_t i;
> >  struct command *newcommand;
> > +int dummy = 0;
> >
> >  /* Iterate over whole blocks in the command, starting on a block
> >   * boundary.
> > @@ -475,7 +476,7 @@ finished_read (void *vp, int *error)
> >  /* Free the original command since it has been split into
> >   * subcommands and the original is no longer needed.
> >   */
> > -free_command (command, );
> > +free_command (command, );
> >}
> >
> >return 1; /* auto-retires the command */
> > @@ -502,6 +503,7 @@ fill_dst_range_with_zeroes (struct command *command)
> >  {
> >char *data;
> >

Re: [Libguestfs] [libnbd PATCH v2 3/5] docs: Clarify how callbacks should handle errors

2022-02-03 Thread Nir Soffer
()On Thu, Feb 3, 2022 at 10:28 PM Eric Blake  wrote:
>
> Recent patches have demonstrated confusion on the order in which
> callbacks are reached, when it is safe or dangerous to ignore *error,
> and what a completion callback should do when auto-retirement is in
> use.  Add wording to make it more obvious that:
>
> - callbacks are reached in the following order: mid-command callback
>   (0, 1, or many times, if supplied), completion callback (exactly
>   once, if supplied), mid-command free (exactly once, if supplied),
>   completion free (exactly once, if supplied)
> - returning -1 from a mid-command callback does does not prevent
>   future callbacks
> - ignoring *error in a mid-command callback is safe
> - completion callbacks are reached unconditionally, and must NOT ignore
>   *error
> - if the user chooses to use auto-retirement instead of manual calls to
>   nbd_aio_command_completed, the completion callback should return 1 even
>   on error cases to avoid complicating command cleanup
> - the contents of buf after nbd_pread and friends is undefined on
>   error (at present, if the user did not pre-initialize the buffer,
>   there are some code paths in libnbd that leave it uninitialized)
> ---
>  docs/libnbd.pod  | 69 +---
>  generator/API.ml | 24 -
>  2 files changed, 71 insertions(+), 22 deletions(-)
>
> diff --git a/docs/libnbd.pod b/docs/libnbd.pod
> index eb8038b0..15bdf0a8 100644
> --- a/docs/libnbd.pod
> +++ b/docs/libnbd.pod
> @@ -829,8 +829,12 @@ This can be used to free associated C.  For 
> example:
>  NBD_NULL_COMPLETION,
>  0);
>
> -will call L on C after the last time that the
> -S> function is called.
> +will call L once on C after the point where it is
> +known that the S> function can no longer be
> +called, regardless of how many times C was actually called.  If
> +both a mid-command and completion callback are supplied, the functions
> +will be reached in this order: mid-function callbacks, completion
> +callback, mid-function free, and finally completion free.
>
>  The free function is only accessible in the C API as it is not needed
>  in garbage collected programming languages.
> @@ -858,27 +862,60 @@ same nbd object, as it would cause deadlock.
>
>  =head2 Completion callbacks
>
> -All of the low-level commands have a completion callback variant that
> -registers a callback function used right before the command is marked
> -complete.
> +All of the asychronous commands have an optional completion callback
> +function that is used right before the command is marked complete,
> +after any mid-command callbacks have finished, and before any free
> +functions.
>
>  When the completion callback returns C<1>, the command is
>  automatically retired (there is no need to call
> -L); for any other return value, the command
> -still needs to be retired.
> +L); for any other return value, the
> +command still needs to be manually retired (otherwise, the command
> +will tie up resources until L is eventually reached).
>
>  =head2 Callbacks with C parameter
>
>  Some of the high-level commands (L,
> -L) involve the use of a callback function invoked by
> -the state machine at appropriate points in the server's reply before
> -the overall command is complete.  These callback functions, along with
> -all of the completion callbacks, include a parameter C
> -containing the value of any error detected so far; if the callback
> -function fails, it should assign back into C and return C<-1>
> -to change the resulting error of the overall command.  Assignments
> -into C are ignored for any other return value; similarly,
> -assigning C<0> into C does not have an effect.
> +L) involve the use of a callback function invoked
> +by the state machine at appropriate points in the server's reply
> +before the overall command is complete.  These callback functions,
> +along with all of the completion callbacks, include a parameter
> +C which is a pointer containing the value of any error detected
> +so far.  If a callback function fails and wants to change the
> +resulting error of the overall command visible later in the API
> +sequence, it should assign back into C and return C<-1> in the
> +C API.  Assignments into C are ignored for any other return
> +value; similarly, assigning C<0> into C does not have an
> +effect.  In other language bindings, reporting callback errors is
> +generally done by raising an exception rather than by return value.
> +
> +Note that a mid-command callback might never be reached, such as if
> +libnbd detects that the command was invalid to send (see
> +L) or if the server reports a failure that
> +concludes the command.  It is safe for a mid-command callback to
> +ignore non-zero C: all the other parameters to the mid-command
> +callback will still be valid (corresponding to the current portion of
> +the server's reply), and the overall command will still fail (at the
> +completion 

  1   2   3   4   5   6   7   8   >