Re: [PATCH v2 3/5] qemu-img: Add checksum command

2022-12-12 Thread Hanna Reitz

On 28.11.22 15:15, Nir Soffer wrote:

The checksum command compute a checksum for disk image content using the
blkhash library[1]. The blkhash library is not packaged yet, but it is
available via copr[2].

Example run:

 $ ./qemu-img checksum -p fedora-35.qcow2
 6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  
fedora-35.qcow2

The block checksum is constructed by splitting the image to fixed sized
blocks and computing a digest of every block. The image checksum is the
digest of the all block digests.

The checksum uses internally the "sha256" algorithm but it cannot be
compared with checksums created by other tools such as `sha256sum`.

The blkhash library supports sparse images, zero detection, and
optimizes zero block hashing (they are practically free). The library
uses multiple threads to speed up the computation.

Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
faster, depending on the amount of data in the image:

 $ ./qemu-img info /scratch/50p.raw
 file format: raw
 virtual size: 6 GiB (6442450944 bytes)
 disk size: 2.91 GiB

 $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
  "sha256sum /scratch/50p.raw"
 Benchmark 1: ./qemu-img checksum /scratch/50p.raw
   Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s, System: 
0.962 s]
   Range (min … max):1.813 s …  1.908 s5 runs

 Benchmark 2: sha256sum /scratch/50p.raw
   Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s, System: 
1.003 s]
   Range (min … max):   14.501 s … 14.697 s5 runs

 Summary
   './qemu-img checksum /scratch/50p.raw' ran
 7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'

The new command is available only when `blkhash` is available during
build. To test the new command please install the `blkhash-devel`
package:

 $ dnf copr enable nsoffer/blkhash
 $ sudo dnf install blkhash-devel

[1] https://gitlab.com/nirs/blkhash
[2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
[3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
 sha256sum (estimate): 17,749s

Signed-off-by: Nir Soffer 
---
  docs/tools/qemu-img.rst |  24 ++
  meson.build |  10 ++-
  meson_options.txt   |   2 +
  qemu-img-cmds.hx|   8 ++
  qemu-img.c  | 183 
  5 files changed, 226 insertions(+), 1 deletion(-)


[...]


diff --git a/qemu-img.c b/qemu-img.c
index c03d6b4b31..4b4ca7add3 100644
--- a/qemu-img.c
+++ b/qemu-img.c


[...]


@@ -1613,20 +1617,199 @@ out:
  qemu_vfree(buf1);
  qemu_vfree(buf2);
  blk_unref(blk2);
  out2:
  blk_unref(blk1);
  out3:
  qemu_progress_end();
  return ret;
  }
  
+#ifdef CONFIG_BLKHASH

+/*
+ * Compute image checksum.
+ */
+static int img_checksum(int argc, char **argv)
+{
+const char *digest_name = "sha256";
+const size_t block_size = 64 * KiB;
+
+_Static_assert(QEMU_IS_ALIGNED(IO_BUF_SIZE, block_size),
+   "IO_BUF_SIZE should be alligned to block_size");


(s/alligned/aligned/)

A suggestion: We have a `QEMU_BUILD_BUG_MSG()` macro in 
include/qemu/compiler.h.  Nowadays it just unconditionally resolves to 
_Static_assert, I think before C11 was adopted it used a custom 
implementation.  Still, it is what seems to be used throughout the 
actual qemu code (disregarding roms/ and pc-bios/), so I think it would 
be more fitting to use.


But that’s just a suggestion.  It always resolves to _Static_assert 
anyway, so using _Static_assert seems by no means wrong.


So with the spelling fixed:

Reviewed-by: Hanna Reitz 




[PATCH v2 3/5] qemu-img: Add checksum command

2022-11-28 Thread Nir Soffer
The checksum command compute a checksum for disk image content using the
blkhash library[1]. The blkhash library is not packaged yet, but it is
available via copr[2].

Example run:

$ ./qemu-img checksum -p fedora-35.qcow2
6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  
fedora-35.qcow2

The block checksum is constructed by splitting the image to fixed sized
blocks and computing a digest of every block. The image checksum is the
digest of the all block digests.

The checksum uses internally the "sha256" algorithm but it cannot be
compared with checksums created by other tools such as `sha256sum`.

The blkhash library supports sparse images, zero detection, and
optimizes zero block hashing (they are practically free). The library
uses multiple threads to speed up the computation.

Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
faster, depending on the amount of data in the image:

$ ./qemu-img info /scratch/50p.raw
file format: raw
virtual size: 6 GiB (6442450944 bytes)
disk size: 2.91 GiB

$ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
 "sha256sum /scratch/50p.raw"
Benchmark 1: ./qemu-img checksum /scratch/50p.raw
  Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s, System: 0.962 
s]
  Range (min … max):1.813 s …  1.908 s5 runs

Benchmark 2: sha256sum /scratch/50p.raw
  Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s, System: 
1.003 s]
  Range (min … max):   14.501 s … 14.697 s5 runs

Summary
  './qemu-img checksum /scratch/50p.raw' ran
7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'

The new command is available only when `blkhash` is available during
build. To test the new command please install the `blkhash-devel`
package:

$ dnf copr enable nsoffer/blkhash
$ sudo dnf install blkhash-devel

[1] https://gitlab.com/nirs/blkhash
[2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
[3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
sha256sum (estimate): 17,749s

Signed-off-by: Nir Soffer 
---
 docs/tools/qemu-img.rst |  24 ++
 meson.build |  10 ++-
 meson_options.txt   |   2 +
 qemu-img-cmds.hx|   8 ++
 qemu-img.c  | 183 
 5 files changed, 226 insertions(+), 1 deletion(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 15aeddc6d8..d856785ecc 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -347,20 +347,44 @@ Command description:
 Check completed, image is corrupted
   3
 Check completed, image has leaked clusters, but is not corrupted
   63
 Checks are not supported by the image format
 
   If ``-r`` is specified, exit codes representing the image state refer to the
   state after (the attempt at) repairing it. That is, a successful ``-r all``
   will yield the exit code 0, independently of the image state before.
 
+.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T 
SRC_CACHE] [-p] FILENAME
+
+  Print a checksum for image *FILENAME* guest visible content. Images with
+  different format or settings will have the same checksum.
+
+  The format is probed unless you specify it by ``-f``.
+
+  The checksum is computed for guest visible content. Allocated areas full of
+  zeroes, zero clusters, and unallocated areas are read as zeros so they will
+  have the same checksum. Images with single or multiple files or backing files
+  will have the same checksums if the guest will see the same content when
+  reading the image.
+
+  Image metadata that is not visible to the guest such as dirty bitmaps does
+  not affect the checksum.
+
+  Computing a checksum requires a read-only image. You cannot compute a
+  checksum of an active image used by a guest, but you can compute a checksum
+  of a guest during pull mode incremental backup using NBD URL.
+
+  The checksum is not compatible with other tools such as *sha256sum* for
+  optimization purposes; using multithreading and optimized handling of zero
+  areas. For more info please see https://gitlab.com/nirs/blkhash.
+
 .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t 
CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
 
   Commit the changes recorded in *FILENAME* in its base image or backing file.
   If the backing file is smaller than the snapshot, then the backing file will 
be
   resized to be the same size as the snapshot.  If the snapshot is smaller than
   the backing file, the backing file will not be truncated.  If you want the
   backing file to match the size of the smaller snapshot, you can safely 
truncate
   it yourself once the commit operation successfully completes.
 
   The image *FILENAME* is emptied after the operation has succeeded. If you do
diff --git a/meson.build b/meson.build
index 5c6b5a1c75..18071dd653 100644
---