Re: [Qemu-devel] [PATCH v2 2/3] block/mirror: Fix target backing BDS

2016-06-08 Thread Nir Soffer
On Wed, Jun 8, 2016 at 12:32 PM, Kevin Wolf  wrote:
> Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
>> Currently, we are trying to move the backing BDS from the source to the
>> target in bdrv_replace_in_backing_chain() which is called from
>> mirror_exit(). However, mirror_complete() already tries to open the
>> target's backing chain with a call to bdrv_open_backing_file().
>>
>> First, we should only set the target's backing BDS once. Second, the
>> mirroring block job has a better idea of what to set it to than the
>> generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
>> conditions on when to move the backing BDS from source to target are not
>> really correct).
>>
>> Therefore, remove that code from bdrv_replace_in_backing_chain() and
>> leave it to mirror_complete().
>>
>> However, mirror_complete() in turn pursues a questionable strategy by
>> employing bdrv_open_backing_file(): On the one hand, because this may
>> open the wrong backing file with drive-mirror in "existing" mode, or
>> because it will not override a possibly wrong backing file in the
>> blockdev-mirror case.
>>
>> On the other hand, we want to reuse the existing backing chain of the
>> source instead of opening everything anew, because the latter results in
>> having multiple BDSs for a single physical file and thus potentially
>> concurrent access which we should try to avoid.
>
> Careful, this "wrong" backing file might actually be intended!
>
> Consider a case where you want to move an image with its whole backing
> chain to different storage. In that case, you would copy all of the
> backing files (cp is good enough, they are read-only), create the
> destination image which already points at the copied backing chain, and
> then mirror in "existing" mode.
>
> The intention is obviously that after the job completion the new backing
> chain is used and not the old one.
>
> I know that such cases were discussed when mirroring was introduced, I'm
> not sure whether it's actually used. We need some input there:
>
> Eric, can you tell us whether libvirt makes use of such a setup?
>
> Nir, I'm not sure who is the right person in oVirt these days, but do
> you either know yourself whether oVirt requires this to work, or do you
> know who else would know?

I'm the right person, thanks for keeping me in the loop.

What you describe is how we migrate a disk from one storage to another:

1. Create a vm snapshot
2. Create a volume on the destination storage for the snapshot
3. Start mirroring from the source snapshot to the destination snapshot
using libvirt virDomainBlockCopy:
https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockCopy
4. Copy the reset of the chain from source to destination using qemu-img convert
5. Pivot to the new chain using libvirt virDomainBlockJobAbort
https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockJobAbort
6. Remove the old chain

source and target can be files or block device, and we plan to support also
rbd and gluster volumes as target, maybe also as source.

Nir

>
>> Thus, instead of invoking bdrv_open_backing_file(), just set the correct
>> backing BDS directly via bdrv_set_backing_hd(). Also, do so only when
>> mirror_complete() is certain to succeed.
>>
>> In contrast to what bdrv_replace_in_backing_chain() did so far, we do
>> not need to drop the source's backing file.
>>
>> Signed-off-by: Max Reitz 
>
> Leaving the actual code review for later when we have decided what
> semantics we even want.
>
> Kevin



Re: [Qemu-devel] [PATCH v2 2/3] block/mirror: Fix target backing BDS

2016-06-09 Thread Nir Soffer
On Thu, Jun 9, 2016 at 11:58 AM, Kevin Wolf  wrote:
> Am 08.06.2016 um 17:39 hat Nir Soffer geschrieben:
>> On Wed, Jun 8, 2016 at 12:32 PM, Kevin Wolf  wrote:
>> > Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
>> >> Currently, we are trying to move the backing BDS from the source to the
>> >> target in bdrv_replace_in_backing_chain() which is called from
>> >> mirror_exit(). However, mirror_complete() already tries to open the
>> >> target's backing chain with a call to bdrv_open_backing_file().
>> >>
>> >> First, we should only set the target's backing BDS once. Second, the
>> >> mirroring block job has a better idea of what to set it to than the
>> >> generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
>> >> conditions on when to move the backing BDS from source to target are not
>> >> really correct).
>> >>
>> >> Therefore, remove that code from bdrv_replace_in_backing_chain() and
>> >> leave it to mirror_complete().
>> >>
>> >> However, mirror_complete() in turn pursues a questionable strategy by
>> >> employing bdrv_open_backing_file(): On the one hand, because this may
>> >> open the wrong backing file with drive-mirror in "existing" mode, or
>> >> because it will not override a possibly wrong backing file in the
>> >> blockdev-mirror case.
>> >>
>> >> On the other hand, we want to reuse the existing backing chain of the
>> >> source instead of opening everything anew, because the latter results in
>> >> having multiple BDSs for a single physical file and thus potentially
>> >> concurrent access which we should try to avoid.
>> >
>> > Careful, this "wrong" backing file might actually be intended!
>> >
>> > Consider a case where you want to move an image with its whole backing
>> > chain to different storage. In that case, you would copy all of the
>> > backing files (cp is good enough, they are read-only), create the
>> > destination image which already points at the copied backing chain, and
>> > then mirror in "existing" mode.
>> >
>> > The intention is obviously that after the job completion the new backing
>> > chain is used and not the old one.
>> >
>> > I know that such cases were discussed when mirroring was introduced, I'm
>> > not sure whether it's actually used. We need some input there:
>> >
>> > Eric, can you tell us whether libvirt makes use of such a setup?
>> >
>> > Nir, I'm not sure who is the right person in oVirt these days, but do
>> > you either know yourself whether oVirt requires this to work, or do you
>> > know who else would know?
>>
>> I'm the right person, thanks for keeping me in the loop.
>>
>> What you describe is how we migrate a disk from one storage to another:
>>
>> 1. Create a vm snapshot
>> 2. Create a volume on the destination storage for the snapshot
>> 3. Start mirroring from the source snapshot to the destination snapshot
>> using libvirt virDomainBlockCopy:
>> https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockCopy
>
> With VIR_DOMAIN_BLOCK_COPY_SHALLOW set, right? (That is, sync=top in QMP
> speech.)

Yes, actually we use:

VIR_DOMAIN_BLOCK_COPY_SHALLOW | VIR_DOMAIN_BLOCK_COPY_REUSE_EXT

>> 4. Copy the reset of the chain from source to destination using qemu-img 
>> convert
>> 5. Pivot to the new chain using libvirt virDomainBlockJobAbort
>> 
>> https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockJobAbort
>> 6. Remove the old chain
>>
>> source and target can be files or block device, and we plan to support also
>> rbd and gluster volumes as target, maybe also as source.
>
> Thanks, Nir, we should then do our best not to break it.
>
> Max, maybe we can add a qemu-iotests case that does the exact same thing
> as oVirt does?
>
> Kevin



Re: [PATCH 1/3] qemu-img: Add checksum command

2022-11-28 Thread Nir Soffer
On Mon, Nov 7, 2022 at 12:20 PM Hanna Reitz  wrote:

> On 30.10.22 18:37, Nir Soffer wrote:
> > On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz  wrote:
> >
> >     On 01.09.22 16:32, Nir Soffer wrote:
> [...]
> > > ---
> > >   docs/tools/qemu-img.rst |  22 +
> > >   meson.build |  10 ++-
> > >   meson_options.txt   |   2 +
> > >   qemu-img-cmds.hx|   8 ++
> > >   qemu-img.c  | 191
> > 
> > >   5 files changed, 232 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> > > index 85a6e05b35..8be9c45cbf 100644
> > > --- a/docs/tools/qemu-img.rst
> > > +++ b/docs/tools/qemu-img.rst
> > > @@ -347,20 +347,42 @@ Command description:
> > >   Check completed, image is corrupted
> > > 3
> > >   Check completed, image has leaked clusters, but is not
> > corrupted
> > > 63
> > >   Checks are not supported by the image format
> > >
> > > If ``-r`` is specified, exit codes representing the image
> > state refer to the
> > > state after (the attempt at) repairing it. That is, a
> > successful ``-r all``
> > > will yield the exit code 0, independently of the image state
> > before.
> > >
> > > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f
> > FMT] [-T SRC_CACHE] [-p] FILENAME
> > > +
> > > +  Print a checksum for image *FILENAME* guest visible content.
> >
> > Why not say which kind of checksum it is?
> >
> >
> > Do you mean the algorithm used? This may be confusing, for example we
> > write
> >
> >Print a sha256 checksum ...
> >
> > User will expect to get the same result from "sha256sum disk.img". How
> > about
> >
> >Print a blkhash checksum ...
> >
> > And add a link to the blkhash project?
>
> I did mean sha256, but if it isn’t pure sha256, then a link to any
> description how it is computed would be good, I think.
>

Ok, will link to https://gitlab.com/nirs/blkhash

[...]

>
> > > +  The checksum is not compatible with other tools such as
> > *sha256sum*.
> >
> > Why not?  I can see it differs even for raw images, but why?  I would
> > have very much assumed that this gives me exactly what sha256sum
> > in the
> > guest on the guest device would yield.
> >
> >
> > The blkhash is a construction based on other cryptographic hash
> > functions (e.g. sha256).
> > The way the hash is constructed is explained here:
> > https://gitlab.com/nirs/blkhash/-/blob/master/blkhash.py#L52
> >
> > We can provide a very slow version using a single thread and no zero
> > optimization
> > that will create the same hash as sha256sum for raw image.
>
> Ah, right.  Yes, especially zero optimization is likely to make a huge
> difference.  Thanks for the explanation!
>
> Maybe that could be mentioned here as a side note, though?  E.g. “The
> checksum is not compatible with other tools such as *sha256sum* for
> optimization purposes (to allow multithreading and optimized handling of
> zero areas).”?
>

Ok, I will improve the text in the next version.

[...]

> > In blksum I do not allow changing the block size.
> >
> > I'll add an assert in the next version to keeps this default optimal.
>
> Thanks!  (Static assert should work, right?)
>

I think it should

Nir


Re: [PATCH 2/3] iotests: Test qemu-img checksum

2022-11-28 Thread Nir Soffer
On Mon, Nov 7, 2022 at 1:41 PM Hanna Reitz  wrote:

> On 30.10.22 18:38, Nir Soffer wrote:
> > On Wed, Oct 26, 2022 at 4:31 PM Hanna Reitz  wrote:
> >
> >     On 01.09.22 16:32, Nir Soffer wrote:
> > > Add simple tests creating an image with all kinds of extents,
> > different
> > > formats, different backing chain, different protocol, and different
> > > image options. Since all images have the same guest visible
> > content they
> > > must have the same checksum.
> > >
> > > To help debugging in case of failures, the output includes a
> > json map of
> > > every test image.
> > >
> > > Signed-off-by: Nir Soffer 
> > > ---
> > >   tests/qemu-iotests/tests/qemu-img-checksum| 149
> > ++
> > >   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
> > >   2 files changed, 223 insertions(+)
> > >   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> > >   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> > >
> > > diff --git a/tests/qemu-iotests/tests/qemu-img-checksum
> > b/tests/qemu-iotests/tests/qemu-img-checksum
> > > new file mode 100755
> > > index 00..3a85ba33f2
> > > --- /dev/null
> > > +++ b/tests/qemu-iotests/tests/qemu-img-checksum
> > > @@ -0,0 +1,149 @@
> > > +#!/usr/bin/env python3
> > > +# group: rw auto quick
> > > +#
> > > +# Test cases for qemu-img checksum.
> > > +#
> > > +# Copyright (C) 2022 Red Hat, Inc.
> > > +#
> > > +# This program is free software; you can redistribute it and/or
> > modify
> > > +# it under the terms of the GNU General Public License as
> > published by
> > > +# the Free Software Foundation; either version 2 of the License,
> or
> > > +# (at your option) any later version.
> > > +#
> > > +# This program is distributed in the hope that it will be useful,
> > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > +# GNU General Public License for more details.
> > > +#
> > > +# You should have received a copy of the GNU General Public
> License
> > > +# along with this program.  If not, see
> > <http://www.gnu.org/licenses/>.
> > > +
> > > +import re
> > > +
> > > +import iotests
> > > +
> > > +from iotests import (
> > > +filter_testfiles,
> > > +qemu_img,
> > > +qemu_img_log,
> > > +qemu_io,
> > > +qemu_nbd_popen,
> > > +)
> > > +
> > > +
> > > +def checksum_available():
> > > +out = qemu_img("--help").stdout
> > > +return re.search(r"\bchecksum .+ filename\b", out) is not None
> > > +
> > > +
> > > +if not checksum_available():
> > > +iotests.notrun("checksum command not available")
> > > +
> > > +iotests.script_initialize(
> > > +supported_fmts=["raw", "qcow2"],
> > > +supported_cache_modes=["none", "writeback"],
> >
> > It doesn’t work with writeback, though, because it uses -T none
> below.
> >
> >
> > Good point
> >
> >
> > Which by the way is a heavy cost, because I usually run tests in
> > tmpfs,
> > where this won’t work.  Is there any way of not doing the -T none
> > below?
> >
> >
> > Testing using tempfs is problematic since you cannot test -T none.In
> > oVirt
> > we alway use /var/tmp which usually uses something that supports
> > direct I/O.
> >
> > Do we have a way to specify cache mode in the tests, so we can use -T
> none
> > only when the option is set?
>
> `./check` has a `-c` option (e.g. `./check -c none`), which lands in
> `iotests.cachemode`.  That isn’t automatically passed to qemu-img calls,
> but you can do it manually (i.e. `qemu_img_log("checksum", "-T",
> iotests.cachemode, disk_top)` instead of `"-T", "none"`).
>

Ok, I will change to use the current cache setting.


> >
> > > +supported_p

[PATCH v2 1/5] qemu-img.c: Move IO_BUF_SIZE to the top of the file

2022-11-28 Thread Nir Soffer
This macro is used by various commands (compare, convert, rebase) but it
is defined somewhere in the middle of the file. I'm going to use it in
the new checksum command so lets clean up a bit before that.
---
 qemu-img.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index a9b3a8103c..c03d6b4b31 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -49,20 +49,21 @@
 #include "block/block_int.h"
 #include "block/blockjob.h"
 #include "block/qapi.h"
 #include "crypto/init.h"
 #include "trace/control.h"
 #include "qemu/throttle.h"
 #include "block/throttle-groups.h"
 
 #define QEMU_IMG_VERSION "qemu-img version " QEMU_FULL_VERSION \
   "\n" QEMU_COPYRIGHT "\n"
+#define IO_BUF_SIZE (2 * MiB)
 
 typedef struct img_cmd_t {
 const char *name;
 int (*handler)(int argc, char **argv);
 } img_cmd_t;
 
 enum {
 OPTION_OUTPUT = 256,
 OPTION_BACKING_CHAIN = 257,
 OPTION_OBJECT = 258,
@@ -1281,22 +1282,20 @@ static int compare_buffers(const uint8_t *buf1, const 
uint8_t *buf2,
 if (!!memcmp(buf1 + i, buf2 + i, len) != res) {
 break;
 }
 i += len;
 }
 
 *pnum = i;
 return res;
 }
 
-#define IO_BUF_SIZE (2 * MiB)
-
 /*
  * Check if passed sectors are empty (not allocated or contain only 0 bytes)
  *
  * Intended for use by 'qemu-img compare': Returns 0 in case sectors are
  * filled with 0, 1 if sectors contain non-zero data (this is a comparison
  * failure), and 4 on error (the exit status for read errors), after emitting
  * an error message.
  *
  * @param blk:  BlockBackend for the image
  * @param offset: Starting offset to check
-- 
2.38.1




[PATCH v2 2/5] Support format or cache specific out file

2022-11-28 Thread Nir Soffer
Extend the test finder to find tests with format (*.out.qcow2) or cache
specific (*.out.nocache) out file. This worked before only for the
numbered tests.
---
 tests/qemu-iotests/findtests.py | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/findtests.py b/tests/qemu-iotests/findtests.py
index dd77b453b8..f4344ce78c 100644
--- a/tests/qemu-iotests/findtests.py
+++ b/tests/qemu-iotests/findtests.py
@@ -38,31 +38,37 @@ def chdir(path: Optional[str] = None) -> Iterator[None]:
 os.chdir(saved_dir)
 
 
 class TestFinder:
 def __init__(self, test_dir: Optional[str] = None) -> None:
 self.groups = defaultdict(set)
 
 with chdir(test_dir):
 self.all_tests = glob.glob('[0-9][0-9][0-9]')
 self.all_tests += [f for f in glob.iglob('tests/*')
-   if not f.endswith('.out') and
-   os.path.isfile(f + '.out')]
+   if self.is_test(f)]
 
 for t in self.all_tests:
 with open(t, encoding="utf-8") as f:
 for line in f:
 if line.startswith('# group: '):
 for g in line.split()[2:]:
 self.groups[g].add(t)
 break
 
+def is_test(self, fname: str) -> bool:
+"""
+The tests directory contains tests (no extension) and out files
+(*.out, *.out.{format}, *.out.{option}).
+"""
+return re.search(r'.+\.out(\.\w+)?$', fname) is None
+
 def add_group_file(self, fname: str) -> None:
 with open(fname, encoding="utf-8") as f:
 for line in f:
 line = line.strip()
 
 if (not line) or line[0] == '#':
 continue
 
 words = line.split()
 test_file = self.parse_test_name(words[0])
-- 
2.38.1




[PATCH v2 4/5] iotests: Test qemu-img checksum

2022-11-28 Thread Nir Soffer
Add simple tests computing a checksum for image with all kinds of
extents in raw and qcow2 formats.

The test can be extended later for other formats, format options (e..g
compressed qcow2), protocols (e.g. nbd), and image with a backing chain,
but I'm not sure this is really needed.

To help debugging in case of failures, the output includes a json map of
the test image.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/tests/qemu-img-checksum| 63 +++
 .../tests/qemu-img-checksum.out.qcow2 | 11 
 .../tests/qemu-img-checksum.out.raw   | 10 +++
 3 files changed, 84 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.raw

diff --git a/tests/qemu-iotests/tests/qemu-img-checksum 
b/tests/qemu-iotests/tests/qemu-img-checksum
new file mode 100755
index 00..3577a0bc41
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum
@@ -0,0 +1,63 @@
+#!/usr/bin/env python3
+# group: rw auto quick
+#
+# Test cases for qemu-img checksum.
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import re
+
+import iotests
+
+from iotests import (
+filter_testfiles,
+qemu_img,
+qemu_img_log,
+qemu_io,
+)
+
+
+def checksum_available():
+out = qemu_img("--help").stdout
+return re.search(r"\bchecksum .+ filename\b", out) is not None
+
+
+if not checksum_available():
+iotests.notrun("checksum command not available")
+
+iotests.script_initialize(
+supported_fmts=["raw", "qcow2"],
+supported_cache_modes=["none", "writeback"],
+supported_protocols=["file"],
+)
+
+print("=== Create test image ===\n")
+
+disk = iotests.file_path('disk')
+qemu_img("create", "-f", iotests.imgfmt, disk, "10m")
+qemu_io("-f", iotests.imgfmt,
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk)
+print(filter_testfiles(disk))
+qemu_img_log("map", "--output", "json", disk)
+
+print("=== Compute checksum ===\n")
+
+qemu_img_log("checksum", "-T", iotests.cachemode, disk)
diff --git a/tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2 
b/tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
new file mode 100644
index 00..02b9616e5b
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
@@ -0,0 +1,11 @@
+=== Create test image ===
+
+TEST_DIR/PID-disk
+[{ "start": 0, "length": 4194304, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": 327680},
+{ "start": 4194304, "length": 4194304, "depth": 0, "present": true, "zero": 
true, "data": false},
+{ "start": 8388608, "length": 2097152, "depth": 0, "present": false, "zero": 
true, "data": false}]
+
+=== Compute checksum ===
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  
TEST_DIR/PID-disk
+
diff --git a/tests/qemu-iotests/tests/qemu-img-checksum.out.raw 
b/tests/qemu-iotests/tests/qemu-img-checksum.out.raw
new file mode 100644
index 00..6294e4dace
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum.out.raw
@@ -0,0 +1,10 @@
+=== Create test image ===
+
+TEST_DIR/PID-disk
+[{ "start": 0, "length": 4194304, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": 0},
+{ "start": 4194304, "length": 6291456, "depth": 0, "present": true, "zero": 
true, "data": false, "offset": 4194304}]
+
+=== Compute checksum ===
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  
TEST_DIR/PID-disk
+
-- 
2.38.1




[PATCH v2 0/5] Add qemu-img checksum command using blkhash

2022-11-28 Thread Nir Soffer
Since blkhash is available only via copr now, the new command is added as
optional feature, built only if blkhash-devel package is installed.

Changes since v1 (Hanna):
- Move IO_BUF_SIZE to top of the file
- Extend TestFinder to support format or cache specific out files
- Improve online help (note about optimization and lint to blkhash project)
- Guard blkhash.h include with CONFIG_BLKHASH
- Using user_creatable_process_cmdline() instead of 
user_creatable_add_from_str()
- Rename ret to exit_code
- Add static assert to ensure that read buffer is algined to block size
- Drop unneeded pnum variable
- Change test to work like other tests; use iotest.imgfmt and iotest.cachemode
- Simplify test to test only raw and qcow2 format using file protocol
- Fix code style issues (multi-line comments, missing braces)
- Make error checking more clear (checksum_block_status(s) < 0)

v1:
https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html

v1 discussion:
- https://lists.nongnu.org/archive/html/qemu-block/2022-10/msg00602.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-10/msg00603.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-10/msg00604.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00171.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00173.html

Nir Soffer (5):
  qemu-img.c: Move IO_BUF_SIZE to the top of the file
  Support format or cache specific out file
  qemu-img: Add checksum command
  iotests: Test qemu-img checksum
  qemu-img: Speed up checksum

 docs/tools/qemu-img.rst   |  24 ++
 meson.build   |  10 +-
 meson_options.txt |   2 +
 qemu-img-cmds.hx  |   8 +
 qemu-img.c| 390 +-
 tests/qemu-iotests/findtests.py   |  10 +-
 tests/qemu-iotests/tests/qemu-img-checksum|  63 +++
 .../tests/qemu-img-checksum.out.qcow2 |  11 +
 .../tests/qemu-img-checksum.out.raw   |  10 +
 9 files changed, 523 insertions(+), 5 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.raw

-- 
2.38.1




[PATCH v2 5/5] qemu-img: Speed up checksum

2022-11-28 Thread Nir Soffer
Add coroutine based loop inspired by `qemu-img convert` design.

Changes compared to `qemu-img convert`:

- State for the entire image is kept in ImgChecksumState

- State for single worker coroutine is kept in ImgChecksumworker.

- "Writes" are always in-order, ensured using a queue.

- Calling block status once per image extent, when the current extent is
  consumed by the workers.

- Using 1m buffer size - testings shows that this gives best read
  performance both with buffered and direct I/O.

- Number of coroutines is not configurable. Testing does not show
  improvement when using more than 8 coroutines.

- Progress include entire image, not only the allocated state.

Comparing to the simple read loop shows that this version is up to 4.67
times faster when computing a checksum for an image full of zeroes. For
real images it is 1.59 times faster with direct I/O, and with buffered
I/O there is no difference.

Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:

| image| size | i/o   | before | after  | change |
|--|--|---||||
| zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67 |
| zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12 |
| real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02 |
| real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59 |
| nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36 |
| nbd  |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02 |

[1] raw image full of zeroes
[2] raw fedora 35 image with additional random data, 50% full
[3] image [2] exported by qemu-nbd via unix socket

Signed-off-by: Nir Soffer 
---
 qemu-img.c | 350 ++---
 1 file changed, 277 insertions(+), 73 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 4b4ca7add3..5f63a769a9 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1618,50 +1618,296 @@ out:
 qemu_vfree(buf2);
 blk_unref(blk2);
 out2:
 blk_unref(blk1);
 out3:
 qemu_progress_end();
 return ret;
 }
 
 #ifdef CONFIG_BLKHASH
+
+#define CHECKSUM_COROUTINES 8
+#define CHECKSUM_BUF_SIZE (1 * MiB)
+#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
+
+typedef struct ImgChecksumState ImgChecksumState;
+
+typedef struct ImgChecksumWorker {
+QTAILQ_ENTRY(ImgChecksumWorker) entry;
+ImgChecksumState *state;
+Coroutine *co;
+uint8_t *buf;
+
+/* The current chunk. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+/*
+ * Always true for zero extent, false for data extent. Set to true
+ * when reading the chunk completes.
+ */
+bool ready;
+} ImgChecksumWorker;
+
+struct ImgChecksumState {
+const char *filename;
+BlockBackend *blk;
+BlockDriverState *bs;
+int64_t total_size;
+
+/* Current extent, modified in checksum_co_next. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+int running_coroutines;
+CoMutex lock;
+ImgChecksumWorker workers[CHECKSUM_COROUTINES];
+
+/*
+ * Ensure in-order updates. Update are scheduled at the tail of the
+ * queue and processed from the head of the queue when a worker is
+ * ready.
+ */
+QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
+
+struct blkhash *hash;
+int ret;
+};
+
+static int checksum_block_status(ImgChecksumState *s)
+{
+int64_t length;
+int status;
+
+/* Must be called when current extent is consumed. */
+assert(s->length == 0);
+
+status = bdrv_block_status_above(s->bs, NULL, s->offset,
+ s->total_size - s->offset, &length, NULL,
+ NULL);
+if (status < 0) {
+error_report("Error checking status at offset %" PRId64 " for %s",
+ s->offset, s->filename);
+s->ret = status;
+return -1;
+}
+
+assert(length > 0);
+
+s->length = length;
+s->zero = !!(status & BDRV_BLOCK_ZERO);
+
+return 0;
+}
+
+/**
+ * Grab the next chunk from the current extent, getting the next extent if
+ * needed, and schecule the next update at the end fo the update queue.
+ *
+ * Retrun true if the worker has work to do, false if the worker has
+ * finished or there was an error getting the next extent.
+ */
+static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
+{
+ImgChecksumState *s = w->state;
+
+qemu_co_mutex_lock(&s->lock);
+
+if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
+qemu_co_mutex_unlock(&s->lock);
+return false;
+}
+
+if (s->length == 0 && checksum_block_status(s) < 0) {
+qemu_co_mutex_unlock(&s->lock);
+return false;
+}
+
+/* Grab one chunk from current extent. */
+w->offset 

[PATCH v2 3/5] qemu-img: Add checksum command

2022-11-28 Thread Nir Soffer
The checksum command compute a checksum for disk image content using the
blkhash library[1]. The blkhash library is not packaged yet, but it is
available via copr[2].

Example run:

$ ./qemu-img checksum -p fedora-35.qcow2
6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  
fedora-35.qcow2

The block checksum is constructed by splitting the image to fixed sized
blocks and computing a digest of every block. The image checksum is the
digest of the all block digests.

The checksum uses internally the "sha256" algorithm but it cannot be
compared with checksums created by other tools such as `sha256sum`.

The blkhash library supports sparse images, zero detection, and
optimizes zero block hashing (they are practically free). The library
uses multiple threads to speed up the computation.

Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
faster, depending on the amount of data in the image:

$ ./qemu-img info /scratch/50p.raw
file format: raw
virtual size: 6 GiB (6442450944 bytes)
disk size: 2.91 GiB

$ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
 "sha256sum /scratch/50p.raw"
Benchmark 1: ./qemu-img checksum /scratch/50p.raw
  Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s, System: 0.962 
s]
  Range (min … max):1.813 s …  1.908 s5 runs

Benchmark 2: sha256sum /scratch/50p.raw
  Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s, System: 
1.003 s]
  Range (min … max):   14.501 s … 14.697 s5 runs

Summary
  './qemu-img checksum /scratch/50p.raw' ran
7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'

The new command is available only when `blkhash` is available during
build. To test the new command please install the `blkhash-devel`
package:

$ dnf copr enable nsoffer/blkhash
$ sudo dnf install blkhash-devel

[1] https://gitlab.com/nirs/blkhash
[2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
[3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
sha256sum (estimate): 17,749s

Signed-off-by: Nir Soffer 
---
 docs/tools/qemu-img.rst |  24 ++
 meson.build |  10 ++-
 meson_options.txt   |   2 +
 qemu-img-cmds.hx|   8 ++
 qemu-img.c  | 183 
 5 files changed, 226 insertions(+), 1 deletion(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 15aeddc6d8..d856785ecc 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -347,20 +347,44 @@ Command description:
 Check completed, image is corrupted
   3
 Check completed, image has leaked clusters, but is not corrupted
   63
 Checks are not supported by the image format
 
   If ``-r`` is specified, exit codes representing the image state refer to the
   state after (the attempt at) repairing it. That is, a successful ``-r all``
   will yield the exit code 0, independently of the image state before.
 
+.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T 
SRC_CACHE] [-p] FILENAME
+
+  Print a checksum for image *FILENAME* guest visible content. Images with
+  different format or settings will have the same checksum.
+
+  The format is probed unless you specify it by ``-f``.
+
+  The checksum is computed for guest visible content. Allocated areas full of
+  zeroes, zero clusters, and unallocated areas are read as zeros so they will
+  have the same checksum. Images with single or multiple files or backing files
+  will have the same checksums if the guest will see the same content when
+  reading the image.
+
+  Image metadata that is not visible to the guest such as dirty bitmaps does
+  not affect the checksum.
+
+  Computing a checksum requires a read-only image. You cannot compute a
+  checksum of an active image used by a guest, but you can compute a checksum
+  of a guest during pull mode incremental backup using NBD URL.
+
+  The checksum is not compatible with other tools such as *sha256sum* for
+  optimization purposes; using multithreading and optimized handling of zero
+  areas. For more info please see https://gitlab.com/nirs/blkhash.
+
 .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t 
CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
 
   Commit the changes recorded in *FILENAME* in its base image or backing file.
   If the backing file is smaller than the snapshot, then the backing file will 
be
   resized to be the same size as the snapshot.  If the snapshot is smaller than
   the backing file, the backing file will not be truncated.  If you want the
   backing file to match the size of the smaller snapshot, you can safely 
truncate
   it yourself once the commit operation successfully completes.
 
   The image *FILENAME* is emptied after the operation has succeeded. If you do
diff --git a/meson.b

Re: [Qemu-devel] Failing QEMU iotest 175

2019-05-03 Thread Nir Soffer
On Fri, May 3, 2019, 23:21 Eric Blake  wrote:

> On 5/2/19 11:37 PM, Thomas Huth wrote:
> > On 02/05/2019 23.56, Eric Blake wrote:
> >> On 4/28/19 10:18 AM, Thomas Huth wrote:
> >>> QEMU iotest 175 is failing for me when I run it with -raw:
> >>>
> >>
> >>>  == creating image with default preallocation ==
> >>>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
> >>> -size=1048576, blocks=0
> >>> +size=1048576, blocks=2
> >>
> >> What filesystem?
> >
> > ext4
> >
>
> Hmm, it's passing for me on ext4, but that probably means we have
> different configuration parameters. I'm not sure how to easily show what
> parameters a particular ext4 partition uses to compare the differences
> between your setup and mine (mine is tuned to whatever defaults Fedora's
> installer chose on my behalf), so maybe someone else can chime in.
>
> >> It should be fairly obvious that 'stat -c blocks=%b' is
> >> file-system dependent (some allocate slightly more or less space, based
> >> on granularities and on predictions of future use), so we may need to
> >> update the test to apply a filter or otherwise allow a bit of fuzz in
> >> the answer. But 0/2 is definitely different than...
> >>>
> >>>  == creating image with preallocation off ==
> >>>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
> preallocation=off
> >>> -size=1048576, blocks=0
> >>> +size=1048576, blocks=2
> >>>
> >>>  == creating image with preallocation full ==
> >>>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
> preallocation=full
> >>> -size=1048576, blocks=2048
> >>> +size=1048576, blocks=2050
> >>
> >> 2048/2050, so we DO have some indication of whether the file is sparse
> >> or fully allocated.
> >
> > Maybe we could check that the value after "blocks=" is a single digit in
> > the first case, and matches "blocks=20.." in the second case?
>
> I wonder if 'qemu-img map --output=json $TEST_IMG' might be any more
> reliable (at least for ignoring any extra block allocations associated
> with the file, if it is some journaling option or xattr or other reason
> why your files seem to occupy more disk sectors than just the size of
> the file would imply).
>

I think it should work better and is more correct, testing actual sparsness
instead of underlying file system implementation.

I can send a fix next week.

Nir


> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
>


Re: [Qemu-devel] [Qemu-block] Failing QEMU iotest 175

2019-05-10 Thread Nir Soffer
On Sat, May 4, 2019 at 12:32 AM Nir Soffer  wrote:

>
>
> On Fri, May 3, 2019, 23:21 Eric Blake  wrote:
>
>> ...
>> >>>  == creating image with preallocation off ==
>> >>>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
>> preallocation=off
>> >>> -size=1048576, blocks=0
>> >>> +size=1048576, blocks=2
>> >>>
>> >>>  == creating image with preallocation full ==
>> >>>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
>> preallocation=full
>> >>> -size=1048576, blocks=2048
>> >>> +size=1048576, blocks=2050
>> >>
>> >> 2048/2050, so we DO have some indication of whether the file is sparse
>> >> or fully allocated.
>> >
>> > Maybe we could check that the value after "blocks=" is a single digit in
>> > the first case, and matches "blocks=20.." in the second case?
>>
>> I wonder if 'qemu-img map --output=json $TEST_IMG' might be any more
>> reliable (at least for ignoring any extra block allocations associated
>> with the file, if it is some journaling option or xattr or other reason
>> why your files seem to occupy more disk sectors than just the size of
>> the file would imply).
>>
>
> I think it should work better and is more correct, testing actual
> sparsness instead of underlying file system implementation.
>
> I can send a fix next week.
>

I tested this change:

$ git diff
diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
index d0ffc495c2..0e3faa50e4 100755
--- a/tests/qemu-iotests/175
+++ b/tests/qemu-iotests/175
@@ -43,17 +43,17 @@ _supported_os Linux
 size=1m

 echo
 echo "== creating image with default preallocation =="
 _make_test_img $size | _filter_imgfmt
-stat -c "size=%s, blocks=%b" $TEST_IMG
+$QEMU_IMG map -f $IMGFMT --output json "$TEST_IMG"

 for mode in off full falloc; do
 echo
 echo "== creating image with preallocation $mode =="
 IMGOPTS=preallocation=$mode _make_test_img $size | _filter_imgfmt
-stat -c "size=%s, blocks=%b" $TEST_IMG
+$QEMU_IMG map -f $IMGFMT --output json "$TEST_IMG"
 done

 # success, all done
 echo "*** done"
 rm -f $seq.full

It almost works:
$ ./check -raw 175
QEMU  --
"/home/nsoffer/src/qemu/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64"
-nodefaults -machine accel=qtest
QEMU_IMG  --
"/home/nsoffer/src/qemu/build/tests/qemu-iotests/../../qemu-img"
QEMU_IO   --
"/home/nsoffer/src/qemu/build/tests/qemu-iotests/../../qemu-io"  --cache
writeback -f raw
QEMU_NBD  --
"/home/nsoffer/src/qemu/build/tests/qemu-iotests/../../qemu-nbd"
IMGFMT-- raw
IMGPROTO  -- file
PLATFORM  -- Linux/x86_64 lean 5.0.11-100.fc28.x86_64
TEST_DIR  -- /home/nsoffer/src/qemu/build/tests/qemu-iotests/scratch
SOCKET_SCM_HELPER --
/home/nsoffer/src/qemu/build/tests/qemu-iotests/socket_scm_helper

175 - output mismatch (see 175.out.bad)
--- /home/nsoffer/src/qemu/tests/qemu-iotests/175.out 2019-03-23
18:35:17.788177871 +0200
+++ /home/nsoffer/src/qemu/build/tests/qemu-iotests/175.out.bad 2019-05-11
00:06:09.515873624 +0300
@@ -2,17 +2,17 @@

 == creating image with default preallocation ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
-size=1048576, blocks=0
+[{ "start": 0, "length": 1048576, "depth": 0, "zero": true, "data": false,
"offset": 0}]

 == creating image with preallocation off ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=off
-size=1048576, blocks=0
+[{ "start": 0, "length": 1048576, "depth": 0, "zero": true, "data": false,
"offset": 0}]

 == creating image with preallocation full ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=full
-size=1048576, blocks=2048
+[{ "start": 0, "length": 1048576, "depth": 0, "zero": false, "data": true,
"offset": 0}]

 == creating image with preallocation falloc ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
preallocation=falloc
-size=1048576, blocks=2048
+[{ "start": 0, "length": 1048576, "depth": 0, "zero": true, "data": false,
"offset": 0}]
The "falloc" test looks exactly like "off", qemu-img map does not report
the allocation
status.
Nir


Re: [Qemu-devel] [PATCH] iotests: Filter 175's allocation information

2019-05-10 Thread Nir Soffer
On Sat, May 11, 2019 at 12:19 AM Max Reitz  wrote:

> It is possible for an empty file to take up blocks on a filesystem.
> Make iotest 175 take this into account.
>
> Reported-by: Thomas Huth 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/175 | 15 +++
>  tests/qemu-iotests/175.out |  8 
>  2 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
> index d0ffc495c2..b5652a3889 100755
> --- a/tests/qemu-iotests/175
> +++ b/tests/qemu-iotests/175
> @@ -28,7 +28,8 @@ status=1  # failure is the default!
>
>  _cleanup()
>  {
> -   _cleanup_test_img
> +_cleanup_test_img
> +rm -f "$TEST_DIR/empty"
>  }
>  trap "_cleanup; exit \$status" 0 1 2 3 15
>
> @@ -40,18 +41,24 @@ _supported_fmt raw
>  _supported_proto file
>  _supported_os Linux
>
> -size=1m
> +size=$((1 * 1024 * 1024))

+
> +touch "$TEST_DIR/empty"
> +empty_blocks=$(stat -c '%b' "$TEST_DIR/empty")
>

Maybe extra_blocks?

 echo
>  echo "== creating image with default preallocation =="
>  _make_test_img $size | _filter_imgfmt
> -stat -c "size=%s, blocks=%b" $TEST_IMG
> +stat -c "size=%s, blocks=%b" $TEST_IMG \
> +| sed -e "s/blocks=$empty_blocks/nothing allocated/"
>
>  for mode in off full falloc; do
>  echo
>  echo "== creating image with preallocation $mode =="
>  IMGOPTS=preallocation=$mode _make_test_img $size | _filter_imgfmt
> -stat -c "size=%s, blocks=%b" $TEST_IMG
> +stat -c "size=%s, blocks=%b" $TEST_IMG \
> +| sed -e "s/blocks=$empty_blocks/nothing allocated/" \
> +| sed -e "s/blocks=$((empty_blocks + size / 512))/everything
> allocated/"
>

"fully allocated"?

Maybe add a helper like this:

_filter_blocks() {
# Some file systems sometimes allocate extra blocks
sed -e "s/blocks=$empty_blocks/nothing allocated/" \
   -e "s/blocks=$((empty_blocks + size / 512))/everything
allocated/"
}

So we can do:

stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks

And it is also clear why we need to run sed without looking up the commit
message.


>  done
>
>  # success, all done
> diff --git a/tests/qemu-iotests/175.out b/tests/qemu-iotests/175.out
> index 76c02c6a57..6d9a5ed84e 100644
> --- a/tests/qemu-iotests/175.out
> +++ b/tests/qemu-iotests/175.out
> @@ -2,17 +2,17 @@ QA output created by 175
>
>  == creating image with default preallocation ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
> -size=1048576, blocks=0
> +size=1048576, nothing allocated
>
>  == creating image with preallocation off ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=off
> -size=1048576, blocks=0
> +size=1048576, nothing allocated
>
>  == creating image with preallocation full ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=full
> -size=1048576, blocks=2048
> +size=1048576, everything allocated
>
>  == creating image with preallocation falloc ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
> preallocation=falloc
> -size=1048576, blocks=2048
> +size=1048576, everything allocated
>   *** done
> --
> 2.21.0
>

Otherwise looks good.

Nir


Re: [Qemu-devel] [PATCH v2] iotests: Filter 175's allocation information

2019-05-13 Thread Nir Soffer
On Mon, May 13, 2019, 18:52 Max Reitz  wrote:

> It is possible for an empty file to take up blocks on a filesystem.
> Make iotest 175 take this into account.
>
> Reported-by: Thomas Huth 
> Signed-off-by: Max Reitz 
> ---
> v2: [Nir]
> - Use a function for filtering
> - s/empty_blocks/extra_blocks/
> ---
>  tests/qemu-iotests/175 | 26 ++
>  tests/qemu-iotests/175.out |  8 
>  2 files changed, 26 insertions(+), 8 deletions(-)
>
> diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
> index d0ffc495c2..b5eb0aa856 100755
> --- a/tests/qemu-iotests/175
> +++ b/tests/qemu-iotests/175
> @@ -28,10 +28,25 @@ status=1# failure is the default!
>
>  _cleanup()
>  {
> -   _cleanup_test_img
> +_cleanup_test_img
> +rm -f "$TEST_DIR/empty"
>  }
>  trap "_cleanup; exit \$status" 0 1 2 3 15
>
> +# Some file systems sometimes allocate extra blocks independently of
> +# the file size.  This function hides the resulting difference in the
> +# stat -c '%b' output.
> +# Parameter 1: Number of blocks an empty file occupies
> +# Parameter 2: Image size in bytes
> +_filter_blocks()
> +{
> +extra_blocks=$1
> +img_size=$2
> +
> +sed -e "s/blocks=$extra_blocks/nothing allocated/" \
> +-e "s/blocks=$((extra_blocks + img_size / 512))/everything
> allocated/"
> +}
> +
>  # get standard environment, filters and checks
>  . ./common.rc
>  . ./common.filter
> @@ -40,18 +55,21 @@ _supported_fmt raw
>  _supported_proto file
>  _supported_os Linux
>
> -size=1m
> +size=$((1 * 1024 * 1024))
> +
> +touch "$TEST_DIR/empty"
> +extra_blocks=$(stat -c '%b' "$TEST_DIR/empty")
>
>  echo
>  echo "== creating image with default preallocation =="
>  _make_test_img $size | _filter_imgfmt
> -stat -c "size=%s, blocks=%b" $TEST_IMG
> +stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks
> $size
>
>  for mode in off full falloc; do
>  echo
>  echo "== creating image with preallocation $mode =="
>  IMGOPTS=preallocation=$mode _make_test_img $size | _filter_imgfmt
> -stat -c "size=%s, blocks=%b" $TEST_IMG
> +stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks
> $size
>  done
>
>  # success, all done
> diff --git a/tests/qemu-iotests/175.out b/tests/qemu-iotests/175.out
> index 76c02c6a57..6d9a5ed84e 100644
> --- a/tests/qemu-iotests/175.out
> +++ b/tests/qemu-iotests/175.out
> @@ -2,17 +2,17 @@ QA output created by 175
>
>  == creating image with default preallocation ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
> -size=1048576, blocks=0
> +size=1048576, nothing allocated
>
>  == creating image with preallocation off ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=off
> -size=1048576, blocks=0
> +size=1048576, nothing allocated
>
>  == creating image with preallocation full ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=full
> -size=1048576, blocks=2048
> +size=1048576, everything allocated
>
>  == creating image with preallocation falloc ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
> preallocation=falloc
> -size=1048576, blocks=2048
> +size=1048576, everything allocated
>   *** done
> --
> 2.21.0


Reviewed-by: Nir Soffer 

>


Re: [Qemu-devel] [Qemu-block] [PATCH] qemu-img convert: Deprecate using -n and -o together

2019-08-09 Thread Nir Soffer
On Fri, Aug 9, 2019 at 12:11 PM Kevin Wolf  wrote:

> bdrv_create options specified with -o have no effect when skipping image
> creation with -n, so this doesn't make sense. Warn against the misuse
> and deprecate the combination so we can make it a hard error later.
>
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-img.c   | 5 +
>  qemu-deprecated.texi | 7 +++
>  2 files changed, 12 insertions(+)
>
> diff --git a/qemu-img.c b/qemu-img.c
> index 79983772de..d9321f6418 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -2231,6 +2231,11 @@ static int img_convert(int argc, char **argv)
>  goto fail_getopt;
>  }
>
> +if (skip_create && options) {
> +warn_report("-o has no effect when skipping image creation");
> +warn_report("This will become an error in future QEMU versions.");
> +}
> +
>  s.src_num = argc - optind - 1;
>  out_filename = s.src_num >= 1 ? argv[argc - 1] : NULL;
>
> diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> index fff07bb2a3..7673d079c5 100644
> --- a/qemu-deprecated.texi
> +++ b/qemu-deprecated.texi
> @@ -305,6 +305,13 @@ to just export the entire image and then mount only
> /dev/nbd0p1 than
>  it is to reinvoke @command{qemu-nbd -c /dev/nbd0} limited to just a
>  subset of the image.
>
> +@subsection qemu-img convert -n -o (since 4.2.0)
> +
> +All options specified in @option{-o} are image creation options, so they
> +have no effect when used with @option{-n} to skip image creation. This
> +combination never made sense and shows that the user misunderstood the
> +effect of the options, so this will be made an error in future versions.
>

The user misunderstood by not reading qemu code?

Both the online help and the manual page do not mention anything about
that, so I think
they should be fixed to explain the behavior, and this text should mention
that the behavior
was never documented.

Nir


> +
>  @section Build system
>
>  @subsection Python 2 support (since 4.1.0)
> --
> 2.20.1
>
>
>


[Qemu-devel] [PATCH v2] block: posix: Handle undetectable alignment

2019-08-11 Thread Nir Soffer
In some cases buf_align or request_alignment cannot be detected:

- With Gluster, buf_align cannot be detected since the actual I/O is
  done on Gluster server, and qemu buffer alignment does not matter.

- With local XFS filesystem, buf_align cannot be detected if reading
  from unallocated area.

- With Gluster backed by XFS, request_alignment cannot be detected if
  reading from unallocated area.

- With NFS, the server does not use direct I/O, so both buf_align cannot
  be detected.

These cases seems to work when storage sector size is 512 bytes, because
the current code starts checking align=512. If the check succeeds
because alignment cannot be detected we use 512. But this does not work
for storage with 4k sector size.

Practically the alignment requirements are the same for buffer
alignment, buffer length, and offset in file. So in case we cannot
detect buf_align, we can use request alignment. If we cannot detect
request alignment, we can fallback to a safe value.

With this change:

- Provisioning VM and copying disks on local XFS and Gluster with 4k
  sector size works, resolving bugs [1],[2].

- With NFS we fallback to buf_align and request_alignment of 4096
  instead of 512. This may cause unneeded data copying, but so far I see
  better performance with this change.

[1] https://bugzilla.redhat.com/1737256
[2] https://bugzilla.redhat.com/1738657

Signed-off-by: Nir Soffer 
---

v1 was a minimal hack; this version is a more generic fix that works for
any storage without requiring users to allocate the first block of an
image. Allocting the first block of an image is still a good idea since
it allows detecting the right alignment in some cases.

v1 could also affect cases when we could detect buf_align to use
request_alignment instead; v2 will only affect cases when buf_align or
request_alignment cannot be detected.

v1 was hare:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00133.html

 block/file-posix.c | 40 +---
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index f33b542b33..511468f166 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -323,6 +323,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 BDRVRawState *s = bs->opaque;
 char *buf;
 size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
+size_t alignments[] = {1, 512, 1024, 2048, 4096};
 
 /* For SCSI generic devices the alignment is not really used.
With buffered I/O, we don't have any restrictions. */
@@ -349,25 +350,42 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 }
 #endif
 
-/* If we could not get the sizes so far, we can only guess them */
-if (!s->buf_align) {
+/*
+ * If we could not get the sizes so far, we can only guess them. First try
+ * to detect request alignment, since it is more likely to succeed. Then
+ * try to detect buf_align, which cannot be detected in some cases (e.g.
+ * Gluster). If buf_align cannot be detected, we fallback to the value of
+ * request_alignment.
+ */
+
+if (!bs->bl.request_alignment) {
+int i;
 size_t align;
-buf = qemu_memalign(max_align, 2 * max_align);
-for (align = 512; align <= max_align; align <<= 1) {
-if (raw_is_io_aligned(fd, buf + align, max_align)) {
-s->buf_align = align;
+buf = qemu_memalign(max_align, max_align);
+for (i = 0; i < ARRAY_SIZE(alignments); i++) {
+align = alignments[i];
+if (raw_is_io_aligned(fd, buf, align)) {
+/* Fallback to safe value. */
+bs->bl.request_alignment = (align != 1) ? align : max_align;
 break;
 }
 }
 qemu_vfree(buf);
 }
 
-if (!bs->bl.request_alignment) {
+if (!s->buf_align) {
+int i;
 size_t align;
-buf = qemu_memalign(s->buf_align, max_align);
-for (align = 512; align <= max_align; align <<= 1) {
-if (raw_is_io_aligned(fd, buf, align)) {
-bs->bl.request_alignment = align;
+buf = qemu_memalign(max_align, 2 * max_align);
+for (i = 0; i < ARRAY_SIZE(alignments); i++) {
+align = alignments[i];
+if (raw_is_io_aligned(fd, buf + align, max_align)) {
+/* Fallback to request_aligment or safe value. */
+s->buf_align = (align != 1)
+? align
+: (bs->bl.request_alignment != 0)
+? bs->bl.request_alignment
+: max_align;
 break;
 }
 }
-- 
2.20.1




Re: [Qemu-devel] [PATCH v2] block: posix: Handle undetectable alignment

2019-08-13 Thread Nir Soffer
On Mon, Aug 12, 2019 at 5:23 PM Kevin Wolf  wrote:

> Am 11.08.2019 um 22:50 hat Nir Soffer geschrieben:
> > In some cases buf_align or request_alignment cannot be detected:
> >
> > - With Gluster, buf_align cannot be detected since the actual I/O is
> >   done on Gluster server, and qemu buffer alignment does not matter.
>
> If it doesn't matter, the best value would be buf_align = 1.
>

Right, if we know that this is gluster.

> - With local XFS filesystem, buf_align cannot be detected if reading
> >   from unallocated area.
>
> Here, we actually do need alignment, but it's unknown whether it would
> be 512 or 4096 or something entirely. Failing to align requests
> correctly results in I/O errors.
>
> > - With Gluster backed by XFS, request_alignment cannot be detected if
> >   reading from unallocated area.
>
> This is like buf_align for XFS: We don't know the right value, but
> getting it wrong causes I/O errors.
>
> > - With NFS, the server does not use direct I/O, so both buf_align
> >   cannot be detected.
>
> This suggests that byte-aligned requests are fine for NFS, i.e.
> buf_align = request_alignment = 1 would be optimal in this case.
>

Right, but again we don't know this is NFS.

> These cases seems to work when storage sector size is 512 bytes, because
> > the current code starts checking align=512. If the check succeeds
> > because alignment cannot be detected we use 512. But this does not work
> > for storage with 4k sector size.
> >
> > Practically the alignment requirements are the same for buffer
> > alignment, buffer length, and offset in file. So in case we cannot
> > detect buf_align, we can use request alignment. If we cannot detect
> > request alignment, we can fallback to a safe value.
>
> This makes sense in general.
>
> What the commit message doesn't explain, but probably should do is how
> we determine whether we could successfully detect request alignment. The
> approach taken here is that a detected alignment of 1 is understood as
> failure to detect the real alignment.
>

Failing with EINVAL when using 1, and succeeding with another value is
considered
a successful detection.

We have 3 issues preventing detection:
- filesystem not using direct I/O on the remote server (NFS, Gluster when
network.remote-dio=on)
- area probed is unallocated with XFS or Gluster backed by XFS
- filesystem without buffer alignment requirement (e.g. Gluster)

For handling unallocated areas, we can:
- always allocate the first block when creating an image (qemu-img
create/convert)
- use write() instead of read().

In oVirt we went with the second option - when we initialize a file storage
domain, we create
a special file and do direct write to this file with 1, 512, and 4096 bytes
length. If we detect
512 or 4096, we use this value for creating the domain (e.g. for sanlock).
If we detect 1, we use the user provided value (default 512).

You can see the code here:
https://github.com/oVirt/vdsm/blob/4733018f9a719729242738b486906d3b9ed058cd/lib/vdsm/storage/fileSD.py#L838

One way we can use in qemu is to create a temporary file:

/path/to/image.tmp9vo8US

Delete the file, keeping the fd open, and detect the alignment on this file
using write().

With this we fixed all the cases listed above, but creating new files
requires write permission
in the directory where the image is in, and will not work for some strange
setups
(.e.g bind-mount images).

One issue with this is that there is no guarantee that the temporary file
will be deleted so the
user will have to deal with leftover files.

With cases 2 and 3 this gives the desird result; however for cases 1 and
> 4, an alignment of 1 would be the actual correct value, and the new
> probing algorithm results in a worse result.
>
> However, since the negative effect of the old algorithm in cases 2 and 3
> is I/O errors whereas the effect of the new one in cases 1 and 4 is just
> degraded performance for I/O that isn't 4k aligned, the new approch is
> still preferable.
>
> I think we need to make this tradeoff clearer in the commit message and
> the comment in the code, but the approach is reasonable enough.
>

I'll try to make this more clear in v3.


>
> > With this change:
> >
> > - Provisioning VM and copying disks on local XFS and Gluster with 4k
> >   sector size works, resolving bugs [1],[2].
> >
> > - With NFS we fallback to buf_align and request_alignment of 4096
> >   instead of 512. This may cause unneeded data copying, but so far I see
> >   better performance with this change.
> >
> > [1] https://bugzilla.redhat.com/1737256
> > [2] https://bugzilla.redhat.com/1738657
> >
> > Signed-off-by: Nir Soffer 
> >

Re: [Qemu-devel] [PATCH v2] block: posix: Handle undetectable alignment

2019-08-13 Thread Nir Soffer
On Tue, Aug 13, 2019 at 2:21 PM Kevin Wolf  wrote:

> Am 13.08.2019 um 12:45 hat Nir Soffer geschrieben:
> > On Mon, Aug 12, 2019 at 5:23 PM Kevin Wolf  wrote:
> >
> > > Am 11.08.2019 um 22:50 hat Nir Soffer geschrieben:
> > > > In some cases buf_align or request_alignment cannot be detected:
> > > >
> > > > - With Gluster, buf_align cannot be detected since the actual I/O is
> > > >   done on Gluster server, and qemu buffer alignment does not matter.
> > >
> > > If it doesn't matter, the best value would be buf_align = 1.
> > >
> >
> > Right, if we know that this is gluster.
> >
> > > - With local XFS filesystem, buf_align cannot be detected if reading
> > > >   from unallocated area.
> > >
> > > Here, we actually do need alignment, but it's unknown whether it would
> > > be 512 or 4096 or something entirely. Failing to align requests
> > > correctly results in I/O errors.
> > >
> > > > - With Gluster backed by XFS, request_alignment cannot be detected if
> > > >   reading from unallocated area.
> > >
> > > This is like buf_align for XFS: We don't know the right value, but
> > > getting it wrong causes I/O errors.
> > >
> > > > - With NFS, the server does not use direct I/O, so both buf_align
> > > >   cannot be detected.
> > >
> > > This suggests that byte-aligned requests are fine for NFS, i.e.
> > > buf_align = request_alignment = 1 would be optimal in this case.
> > >
> >
> > Right, but again we don't know this is NFS.
>
> Yes, I agree. I was just trying to list the optimal settings for each
> case so I could compare them against the actual results the path
> provides. I'm well aware that we don't know a way to get the optimal
> results for all four cases.
>
> > > These cases seems to work when storage sector size is 512 bytes,
> because
> > > > the current code starts checking align=512. If the check succeeds
> > > > because alignment cannot be detected we use 512. But this does not
> work
> > > > for storage with 4k sector size.
> > > >
> > > > Practically the alignment requirements are the same for buffer
> > > > alignment, buffer length, and offset in file. So in case we cannot
> > > > detect buf_align, we can use request alignment. If we cannot detect
> > > > request alignment, we can fallback to a safe value.
> > >
> > > This makes sense in general.
> > >
> > > What the commit message doesn't explain, but probably should do is how
> > > we determine whether we could successfully detect request alignment.
> The
> > > approach taken here is that a detected alignment of 1 is understood as
> > > failure to detect the real alignment.
> >
> > Failing with EINVAL when using 1, and succeeding with another value is
> > considered a successful detection.
> >
> > We have 3 issues preventing detection:
> > - filesystem not using direct I/O on the remote server (NFS, Gluster
> > when network.remote-dio=on)
> > - area probed is unallocated with XFS or Gluster backed by XFS
> > - filesystem without buffer alignment requirement (e.g. Gluster)
>
> I would say case 1 is effectively a subset of case 3 (i.e. it's just one
> specific reason why we don't have a buffer alignment requirement).
>
> > For handling unallocated areas, we can:
> > - always allocate the first block when creating an image (qemu-img
> > create/convert)
> > - use write() instead of read().
> >
> > In oVirt we went with the second option - when we initialize a file
> > storage domain, we create a special file and do direct write to this
> > file with 1, 512, and 4096 bytes length. If we detect 512 or 4096, we
> > use this value for creating the domain (e.g. for sanlock).  If we
> > detect 1, we use the user provided value (default 512).
>
> Yes, but there's the important difference that oVirt controls the image
> files, whereas QEMU doesn't. Even if qemu-img create made sure that we
> allocate the first block, the user could still pass us an image that
> was created using a different way.
>
> Using write() is actually an interesting thought. Obviously, we can't
> just overwrite the user image. But maybe what we could do is read the
> first block and then try to rewrite it with different alignments.
>

Yes, this is what we do in ovirt-imageio for file based storage:
https://github.com/oVirt/ovirt-imageio/blob/ca70170886b0c

[Qemu-devel] [PATCH v3] block: posix: Handle undetectable alignment

2019-08-13 Thread Nir Soffer
In some cases buf_align or request_alignment cannot be detected:

1. With Gluster, buf_align cannot be detected since the actual I/O is
   done on Gluster server, and qemu buffer alignment does not matter.
   Since we don't have alignment requirement, buf_align=1 is the best
   value.

2. With local XFS filesystem, buf_align cannot be detected if reading
   from unallocated area. In this we must align the buffer, but we don't
   know what is the correct size. Using the wrong alignment results in
   I/O error.

3. With Gluster backed by XFS, request_alignment cannot be detected if
   reading from unallocated area. In this case we need to use the
   correct alignment, and failing to do so results in I/O errors.

4. With NFS, the server does not use direct I/O, so both buf_align cannot
   be detected. In this case we don't need any alignment so we can use
   buf_align=1 and request_alignment=1.

These cases seems to work when storage sector size is 512 bytes, because
the current code starts checking align=512. If the check succeeds
because alignment cannot be detected we use 512. But this does not work
for storage with 4k sector size.

To determine if we can detect the alignment, we probe first with
align=1. If probing succeeds, maybe there are no alignment requirement
(cases 1, 4) or we are probing unallocated area (cases 2, 3). Since we
don't have any way to tell, we treat this as undetectable alignment. If
probing with align=1 fails with EINVAL, but probing with one of the
expected alignments succeeds, we know that we found a working alignment.

Practically the alignment requirements are the same for buffer
alignment, buffer length, and offset in file. So in case we cannot
detect buf_align, we can use request alignment. If we cannot detect
request alignment, we can fallback to a safe value. To use this logic,
we probe first request alignment instead of buf_align.

Here is a table showing the behaviour with current code (the value in
parenthesis is the optimal value).

CaseSectorbuf_align (opt)   request_alignment (opt) result
==
1   512   512   (1)  512   (512) OK
1   4096  512   (1)  4096  (4096)FAIL
--
2   512   512   (512)512   (512) OK
2   4096  512   (4096)   4096  (4096)FAIL
--
3   512   512   (1)  512   (512) OK
3   4096  512   (1)  512   (4096)FAIL
--
4   512   512   (1)  512   (1)   OK
4   4096  512   (1)  512   (1)   OK

Same cases with this change:

CaseSectorbuf_align (opt)   request_alignment (opt) result
==
1   512   512   (1)  512   (512) OK
1   4096  4096  (1)  4096  (4096)OK
--
2   512   512   (512)512   (512) OK
2   4096  4096  (4096)   4096  (4096)OK
--
3   512   4096  (1)  4096  (512) OK
3   4096  4096  (1)  4096  (4096)OK
--
4   512   4096  (1)  4096  (1)   OK
4   4096  4096  (1)  4096  (1)   OK

I tested that provisioning VMs and copying disks on local XFS and
Gluster with 4k bytes sector size work now, resolving bugs [1],[2].
I tested also on XFS, NFS, Gluster with 512 bytes sector size.

[1] https://bugzilla.redhat.com/1737256
[2] https://bugzilla.redhat.com/1738657

Signed-off-by: Nir Soffer 
---

Changes since v2
- Improve the commit message (Kevin)
- Remove unneeded 2-level ternary (Kevin)

v2 was here:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00426.html

 block/file-posix.c | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index f33b542b33..9baade65f4 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -323,6 +323,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 BDRVRawState *s = bs->opaque;
 char *buf;
 size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
+size_t alignments[] = {1, 512, 1024, 2048, 4096};
 
 /* For SCSI generic devices the alignment is not really used.
With buffered I/O, we don't have any restric

[Qemu-devel] [PATCH] block: posix: Always allocate the first block

2019-08-16 Thread Nir Soffer
When creating an image with preallocation "off" or "falloc", the first
block of the image is typically not allocated. When using Gluster
storage backed by XFS filesystem, reading this block using direct I/O
succeeds regardless of request length, fooling alignment detection.

In this case we fallback to a safe value (4096) instead of the optimal
value (512), which may lead to unneeded data copying when aligning
requests.  Allocating the first block avoids the fallback.

When using preallocation=off, we always allocate at least one filesystem
block:

$ ./qemu-img create -f raw test.raw 1g
Formatting 'test.raw', fmt=raw size=1073741824

$ ls -lhs test.raw
4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw

I did quick performance tests for these flows:
- Provisioning a VM with a new raw image.
- Copying disks with qemu-img convert to new raw target image

I installed Fedora 29 server on raw sparse image, measuring the time
from clicking "Begin installation" until the "Reboot" button appears:

Before(s)  After(s) Diff(%)
---
 356389+8.4

I ran this only once, so we cannot tell much from these results.

The second test was cloning the installation image with qemu-img
convert, doing 10 runs:

for i in $(seq 10); do
rm -f dst.raw
sleep 10
time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
done

Here is a table comparing the total time spent:

TypeBefore(s)   After(s)Diff(%)
---
real  530.028469.123  -11.4
user   17.204 10.768  -37.4
sys17.881  7.011  -60.7

Here we see very clear improvement in CPU usage.

Signed-off-by: Nir Soffer 
---
 block/file-posix.c | 25 +
 tests/qemu-iotests/150.out |  1 +
 tests/qemu-iotests/160 |  4 
 tests/qemu-iotests/175 | 19 +--
 tests/qemu-iotests/175.out |  8 
 tests/qemu-iotests/221.out | 12 
 tests/qemu-iotests/253.out | 12 
 7 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b9c33c8f6c..3964dd2021 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1755,6 +1755,27 @@ static int handle_aiocb_discard(void *opaque)
 return ret;
 }
 
+/*
+ * Help alignment detection by allocating the first block.
+ *
+ * When reading with direct I/O from unallocated area on Gluster backed by XFS,
+ * reading succeeds regardless of request length. In this case we fallback to
+ * safe aligment which is not optimal. Allocating the first block avoids this
+ * fallback.
+ *
+ * Returns: 0 on success, -errno on failure.
+ */
+static int allocate_first_block(int fd)
+{
+ssize_t n;
+
+do {
+n = pwrite(fd, "\0", 1, 0);
+} while (n == -1 && errno == EINTR);
+
+return (n == -1) ? -errno : 0;
+}
+
 static int handle_aiocb_truncate(void *opaque)
 {
 RawPosixAIOData *aiocb = opaque;
@@ -1794,6 +1815,8 @@ static int handle_aiocb_truncate(void *opaque)
 /* posix_fallocate() doesn't set errno. */
 error_setg_errno(errp, -result,
  "Could not preallocate new data");
+} else if (current_length == 0) {
+allocate_first_block(fd);
 }
 } else {
 result = 0;
@@ -1855,6 +1878,8 @@ static int handle_aiocb_truncate(void *opaque)
 if (ftruncate(fd, offset) != 0) {
 result = -errno;
 error_setg_errno(errp, -result, "Could not resize file");
+} else if (current_length == 0 && offset > current_length) {
+allocate_first_block(fd);
 }
 return result;
 default:
diff --git a/tests/qemu-iotests/150.out b/tests/qemu-iotests/150.out
index 2a54e8dcfa..3cdc7727a5 100644
--- a/tests/qemu-iotests/150.out
+++ b/tests/qemu-iotests/150.out
@@ -3,6 +3,7 @@ QA output created by 150
 === Mapping sparse conversion ===
 
 Offset  Length  File
+0   0x1000  TEST_DIR/t.IMGFMT
 
 === Mapping non-sparse conversion ===
 
diff --git a/tests/qemu-iotests/160 b/tests/qemu-iotests/160
index df89d3864b..ad2d054a47 100755
--- a/tests/qemu-iotests/160
+++ b/tests/qemu-iotests/160
@@ -57,6 +57,10 @@ for skip in $TEST_SKIP_BLOCKS; do
 $QEMU_IMG dd if="$TEST_IMG" of="$TEST_IMG.out" skip="$skip" -O "$IMGFMT" \
 2> /dev/null
 TEST_IMG="$TEST_IMG.out" _check_test_img
+
+# We always write the first byte of an image.
+printf "\0" > "$TEST_IMG.out.dd"
+
 dd if="$TEST_IMG" of="$TEST_IMG.out.dd" skip="$skip" status=none
 
 echo
diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
index 51e62c8276

Re: [Qemu-devel] [Qemu-block] [PATCH] block: posix: Always allocate the first block

2019-08-16 Thread Nir Soffer
On Sat, Aug 17, 2019 at 12:57 AM John Snow  wrote:

> On 8/16/19 5:21 PM, Nir Soffer wrote:
> > When creating an image with preallocation "off" or "falloc", the first
> > block of the image is typically not allocated. When using Gluster
> > storage backed by XFS filesystem, reading this block using direct I/O
> > succeeds regardless of request length, fooling alignment detection.
> >
> > In this case we fallback to a safe value (4096) instead of the optimal
> > value (512), which may lead to unneeded data copying when aligning
> > requests.  Allocating the first block avoids the fallback.
> >
>
> Where does this detection/fallback happen? (Can it be improved?)
>

In raw_probe_alignment().

This patch explain the issues:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00568.html

Here Kevin and me discussed ways to improve it:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00426.html

> When using preallocation=off, we always allocate at least one filesystem
> > block:
> >
> > $ ./qemu-img create -f raw test.raw 1g
> > Formatting 'test.raw', fmt=raw size=1073741824
> >
> > $ ls -lhs test.raw
> > 4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
> >
> > I did quick performance tests for these flows:
> > - Provisioning a VM with a new raw image.
> > - Copying disks with qemu-img convert to new raw target image
> >
> > I installed Fedora 29 server on raw sparse image, measuring the time
> > from clicking "Begin installation" until the "Reboot" button appears:
> >
> > Before(s)  After(s) Diff(%)
> > ---
> >  356389+8.4
> >
> > I ran this only once, so we cannot tell much from these results.
> >
>
> That seems like a pretty big difference for just having pre-allocated a
> single block. What was the actual command line / block graph for that test?
>

Having the first block allocated changes the alignment.

Before this patch, we detect request_alignment=1, so we fallback to 4096.
Then we detect buf_align=1, so we fallback to value of request alignment.

The guest see a disk with:
logical_block_size = 512
physical_block_size = 512

But qemu uses:
request_alignment = 4096
buf_align = 4096

storage uses:
logical_block_size = 512
physical_block_size = 512

If the guest does direct I/O using 512 bytes aligment, qemu has to copy
the buffer to align them to 4096 bytes.

After this patch, qemu detects the alignment correctly, so we have:

guest
logical_block_size = 512
physical_block_size = 512

qemu
request_alignment = 512
buf_align = 512

storage:
logical_block_size = 512
physical_block_size = 512

We expect this to be more efficient because qemu does not have to emulate
anything.

Was this over a network that could explain the variance?
>

Maybe, this is complete install of Fedora 29 server, I'm not sure if the
installation
access the network.

> The second test was cloning the installation image with qemu-img
> > convert, doing 10 runs:
> >
> > for i in $(seq 10); do
> > rm -f dst.raw
> > sleep 10
> > time ./qemu-img convert -f raw -O raw -t none -T none src.raw
> dst.raw
> > done
> >
> > Here is a table comparing the total time spent:
> >
> > TypeBefore(s)   After(s)Diff(%)
> > ---
> > real  530.028469.123  -11.4
> > user   17.204 10.768  -37.4
> > sys17.881  7.011  -60.7
> >
> > Here we see very clear improvement in CPU usage.
> >
>
> Hard to argue much with that. I feel a little strange trying to force
> the allocation of the first block, but I suppose in practice "almost no
> preallocation" is indistinguishable from "exactly no preallocation" if
> you squint.
>

Right.

The real issue is that filesystems and block devices do not expose the
alignment
requirement for direct I/O, so we need to use these hacks and assumptions.

With local XFS we use xfsctl(XFS_IOC_DIOINFO) to get request_alignment, but
this does
not help for XFS filesystem used by Gluster on the server side.

I hope that Niels is working on adding similar ioctl for Glsuter, os it can
expose the properties
of the remote filesystem.

Nir


Re: [Qemu-devel] [PULL 5/7] file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes

2019-08-17 Thread Nir Soffer
On Thu, Aug 15, 2019 at 1:29 PM Kevin Wolf  wrote:

> Am 15.08.2019 um 04:44 hat Eric Blake geschrieben:
> > On 3/26/19 10:51 AM, Kevin Wolf wrote:
> > > We know that the kernel implements a slow fallback code path for
> > > BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it.
> > > The other operations we call in the context of .bdrv_co_pwrite_zeroes
> > > should usually be quick, so no modification should be needed for them.
> > > If we ever notice that there are additional problematic cases, we can
> > > still make these conditional as well.
> >
> > Are there cases where fallocate(FALLOC_FL_ZERO_RANGE) falls back to slow
> > writes?  It may be fast on some file systems, but when used on a block
> > device, that may equally trigger slow fallbacks.  The man page is not
> > clear on that fact; I suspect that there may be cases in there that need
> > to be made conditional (it would be awesome if the kernel folks would
> > give us another FALLOC_ flag when we want to guarantee no fallback).
>
> The NO_FALLBACK changes were based on the Linux code rather than
> documentation because no interface is explicitly documented to forbid
> fallbacks.
>
> I think for file systems, we can generally assume that we don't get
> fallbacks because for file systems, just deallocating blocks is the
> easiest way to implement the function anyway. (Hm, or is it when we
> don't punch holes...?)
>
> And for block devices, we don't try FALLOC_FL_ZERO_RANGE because it also
> involves the same slow fallback as BLKZEROOUT. In other words,
> bdrv_co_pwrite_zeroes() with NO_FALLBACK, but without MAY_UNMAP, always
> fails on Linux block devices, and we fall back to emulation in user
> space.
>
> We would need a kernel interface that calls blkdev_issue_zeroout() with
> BLKDEV_ZERO_NOUNMAP | BLKDEV_ZERO_NOFALLBACK, but no such interface
> exists.
>
> When I talked to some file system people, they insisted that "efficient"
> or "fast" wasn't well-defined enough for them or something, so if we
> want to get a kernel change, maybe a new block device ioctl would be the
> most realistic thing.
>
> We do use FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE for MAY_UNMAP,
> which works for both file systems (I assume - each file system has a
> separate implementation) and block devices without slow fallbacks.
>
> qemu-img create sets MAY_UNMAP, so the case we are most interested in is
> covered with a fast implementation.
>
> > By the way, is there an easy setup to prove (maybe some qemu-img convert
> > command on a specially-prepared source image) whether the no fallback
> > flag makes a difference?  I'm about to cross-post a series of patches to
> > nbd/qemu/nbdkit/libnbd that adds a new NBD_CMD_FLAG_FAST_ZERO which fits
> > the bill of BDRV_REQ_NO_FALLBACK, but would like to include some
> > benchmark numbers in my cover letter if I can reproduce a setup where it
> > matters.
>
> Hm, the original case came from Nir, maybe he can suggest something.
>

The original case came from RHEL 7.{5,6}. The flow was:

qemu-img convert -> nbdkit rhv plugin -> imageio -> storage

nbdkit got NBD_CMD_WRITE_ZEROES request, converted it to imageio ZERO
request.

For block devices, imageio was trying:
1. fallocate(ZERO_RANGE) - fails
2. ioctl(BLKZEROOUT) - succeeds

See
https://github.com/oVirt/ovirt-imageio/blob/ca70170886b0c1fbeca8640b12bcf54f01a3fea0/common/ovirt_imageio_common/backends/file.py#L247

BLKZEROOUT can be fast (100 GiB/s) or slow (100 MiB/s) depending on the
server,
and on the allocation status of that area.

On our current storage (3PAR), if the device is fully allocated, for
example:

   dd if=/dev/zero bs=8M of=/dev/vg/lv

Then blkdiscard -z is slow (800 MiB/s):

But if you discard the device:

blkdiscard /dev/vg/lv

blkdiscard -z becomes fast (100 GiB/s).

Previously we had XtremIO storage, which was able to zero 50 GiB/s
regardless
of the allocation.

You'll definitely need a block device that doesn't support
> FALLOC_FL_PUNCH_HOLE,


Old kernels (CentOS 7) did not support this.

# uname -r
3.10.0-957.21.3.el7.x86_64

# strace -e trace=fallocate fallocate -l 100m /dev/loop0
fallocate(3, 0, 0, 104857600)   = -1 ENODEV (No such device)
fallocate: fallocate failed: No such device
+++ exited with 1 +++

# strace -e trace=fallocate fallocate -p -l 100m /dev/loop0
fallocate(3, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 0, 104857600) = -1
ENODEV (No such device)
fallocate: fallocate failed: No such device
+++ exited with 1 +++

# strace -e trace=fallocate fallocate -z -l 100m /dev/loop0
fallocate(3, FALLOC_FL_ZERO_RANGE, 0, 104857600) = -1 ENODEV (No such
device)
fallocate: fallocate failed: No such device
+++ exited with 1 +++

otherwise you can't trigger the fallback. My
> first though was a loop device, but this actually does support the
> operation and passes it through to the underlying file system. So maybe
> if you know a file system that doesn't support it. Or if you have an old
> hard disk handy.

...

Nir


[Qemu-devel] [PATCH] block: Use QEMU_IS_ALIGNED instead of reinventing it

2019-08-17 Thread Nir Soffer
Replace instances of:

(n & (BDRV_SECTOR_SIZE - 1)) == 0)

With:

QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE)

Which reveals the intent of the code better, and makes it easier to
locate the code checking alignment.

QEMU_IS_ALIGNED is implemented using %, which may be less efficient but
it is used only in assert() except one instance, so it should not
matter.

Signed-off-by: Nir Soffer 
---
 block/bochs.c | 4 ++--
 block/cloop.c | 4 ++--
 block/dmg.c   | 4 ++--
 block/io.c| 8 
 block/qcow2.c | 4 ++--
 block/vvfat.c | 8 
 qemu-img.c| 2 +-
 7 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/block/bochs.c b/block/bochs.c
index 962f18592d..32bb83b268 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -248,8 +248,8 @@ bochs_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 QEMUIOVector local_qiov;
 int ret;
 
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
 
 qemu_iovec_init(&local_qiov, qiov->niov);
 qemu_co_mutex_lock(&s->lock);
diff --git a/block/cloop.c b/block/cloop.c
index 384c9735bb..4de94876d4 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -253,8 +253,8 @@ cloop_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 int nb_sectors = bytes >> BDRV_SECTOR_BITS;
 int ret, i;
 
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
 
 qemu_co_mutex_lock(&s->lock);
 
diff --git a/block/dmg.c b/block/dmg.c
index 45f6b28f17..4a045f2b3e 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -697,8 +697,8 @@ dmg_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 int nb_sectors = bytes >> BDRV_SECTOR_BITS;
 int ret, i;
 
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
 
 qemu_co_mutex_lock(&s->lock);
 
diff --git a/block/io.c b/block/io.c
index 56bbf195bb..7508703ecd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1080,8 +1080,8 @@ static int coroutine_fn 
bdrv_driver_preadv(BlockDriverState *bs,
 sector_num = offset >> BDRV_SECTOR_BITS;
 nb_sectors = bytes >> BDRV_SECTOR_BITS;
 
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
 assert(bytes <= BDRV_REQUEST_MAX_BYTES);
 assert(drv->bdrv_co_readv);
 
@@ -1133,8 +1133,8 @@ static int coroutine_fn 
bdrv_driver_pwritev(BlockDriverState *bs,
 sector_num = offset >> BDRV_SECTOR_BITS;
 nb_sectors = bytes >> BDRV_SECTOR_BITS;
 
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
 assert(bytes <= BDRV_REQUEST_MAX_BYTES);
 
 assert(drv->bdrv_co_writev);
diff --git a/block/qcow2.c b/block/qcow2.c
index 59cff1d4cb..41cab70e1d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2072,8 +2072,8 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState 
*bs, uint64_t offset,
 }
 if (bs->encrypted) {
 assert(s->crypto);
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((cur_bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+assert(QEMU_IS_ALIGNED(cur_bytes, BDRV_SECTOR_SIZE));
 if (qcow2_co_decrypt(bs, cluster_offset, offset,
  cluster_data, cur_bytes) < 0) {
 ret = -EIO;
diff --git a/block/vvfat.c b/block/vvfat.c
index f6c28805dd..019b8f1341 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1547,8 +1547,8 @@ vvfat_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 int nb_sectors = bytes >> BDRV_SECTOR_BITS;
 void *buf;
 
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
 
 buf = g_try_malloc(bytes);
 if (bytes && buf == NULL) {
@@ -3082,8 +3082,8 @@ vvfat_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 int nb_sectors = bytes >> BDRV_SECTOR_BITS;
 void *buf;
 
-assert((offset & (BDRV_SECTOR_SIZE - 1

[Qemu-devel] [PATCH] block: gluster: Probe alignment limits

2019-08-17 Thread Nir Soffer
Implement alignment probing similar to file-posix, by reading from the
first 4k of the image.

Before this change, provisioning a VM on storage with sector size of
4096 bytes would fail when the installer try to create filesystems. Here
is an example command that reproduces this issue:

$ qemu-system-x86_64 -accel kvm -m 2048 -smp 2 \
-drive file=gluster://gluster1/gv0/fedora29.raw,format=raw,cache=none \
-cdrom Fedora-Server-dvd-x86_64-29-1.2.iso

The installer fails in few seconds when trying to create filesystem on
/dev/mapper/fedora-root. In error report we can see that it failed with
EINVAL (I could not extract the error from guest).

Copying disk fails with EINVAL:

$ qemu-img convert -p -f raw -O raw -t none -T none \
gluster://gluster1/gv0/fedora29.raw \
gluster://gluster1/gv0/fedora29-clone.raw
qemu-img: error while writing sector 4190208: Invalid argument

This is a fix to same issue fixed in commit a6b257a08e3d (file-posix:
Handle undetectable alignment) for gluster:// images.

This fix has the same limit, that the first block of the image should be
allocated, otherwise we cannot detect the alignment and fallback to a
safe value (4096) even when using storage with sector size of 512 bytes.

Signed-off-by: Nir Soffer 
---
 block/gluster.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index f64dc5b01e..d936240b72 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -52,6 +52,9 @@
 
 #define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n"
 
+/* The value is known only on the server side. */
+#define MAX_ALIGN 4096
+
 typedef struct GlusterAIOCB {
 int64_t size;
 int ret;
@@ -902,8 +905,52 @@ out:
 return ret;
 }
 
+/*
+ * Check if read is allowed with given memory buffer and length.
+ *
+ * This function is used to check O_DIRECT request alignment.
+ */
+static bool gluster_is_io_aligned(struct glfs_fd *fd, void *buf, size_t len)
+{
+ssize_t ret = glfs_pread(fd, buf, len, 0, 0, NULL);
+return ret >= 0 || errno != EINVAL;
+}
+
+static void gluster_probe_alignment(BlockDriverState *bs, struct glfs_fd *fd,
+Error **errp)
+{
+char *buf;
+size_t alignments[] = {1, 512, 1024, 2048, 4096};
+size_t align;
+int i;
+
+buf = qemu_memalign(MAX_ALIGN, MAX_ALIGN);
+
+for (i = 0; i < ARRAY_SIZE(alignments); i++) {
+align = alignments[i];
+if (gluster_is_io_aligned(fd, buf, align)) {
+/* Fallback to safe value. */
+bs->bl.request_alignment = (align != 1) ? align : MAX_ALIGN;
+break;
+}
+}
+
+qemu_vfree(buf);
+
+if (!bs->bl.request_alignment) {
+error_setg(errp, "Could not find working O_DIRECT alignment");
+error_append_hint(errp, "Try cache.direct=off\n");
+}
+}
+
 static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
 {
+BDRVGlusterState *s = bs->opaque;
+
+gluster_probe_alignment(bs, s->fd, errp);
+
+bs->bl.min_mem_alignment = bs->bl.request_alignment;
+bs->bl.opt_mem_alignment = MAX(bs->bl.request_alignment, MAX_ALIGN);
 bs->bl.max_transfer = GLUSTER_MAX_TRANSFER;
 }
 
-- 
2.20.1




Re: [Qemu-devel] [PATCH] block: gluster: Probe alignment limits

2019-08-17 Thread Nir Soffer
On Sun, Aug 18, 2019 at 12:21 AM Nir Soffer  wrote:

> Implement alignment probing similar to file-posix, by reading from the
> first 4k of the image.
>
> Before this change, provisioning a VM on storage with sector size of
> 4096 bytes would fail when the installer try to create filesystems. Here
> is an example command that reproduces this issue:
>
> $ qemu-system-x86_64 -accel kvm -m 2048 -smp 2 \
> -drive
> file=gluster://gluster1/gv0/fedora29.raw,format=raw,cache=none \
> -cdrom Fedora-Server-dvd-x86_64-29-1.2.iso
>
> The installer fails in few seconds when trying to create filesystem on
> /dev/mapper/fedora-root. In error report we can see that it failed with
> EINVAL (I could not extract the error from guest).
>
> Copying disk fails with EINVAL:
>
> $ qemu-img convert -p -f raw -O raw -t none -T none \
> gluster://gluster1/gv0/fedora29.raw \
> gluster://gluster1/gv0/fedora29-clone.raw
> qemu-img: error while writing sector 4190208: Invalid argument
>
> This is a fix to same issue fixed in commit a6b257a08e3d (file-posix:
> Handle undetectable alignment) for gluster:// images.
>
> This fix has the same limit, that the first block of the image should be
> allocated, otherwise we cannot detect the alignment and fallback to a
> safe value (4096) even when using storage with sector size of 512 bytes.
>
> Signed-off-by: Nir Soffer 
> ---
>  block/gluster.c | 47 +++
>  1 file changed, 47 insertions(+)
>
> diff --git a/block/gluster.c b/block/gluster.c
> index f64dc5b01e..d936240b72 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -52,6 +52,9 @@
>
>  #define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n"
>
> +/* The value is known only on the server side. */
> +#define MAX_ALIGN 4096
> +
>  typedef struct GlusterAIOCB {
>  int64_t size;
>  int ret;
> @@ -902,8 +905,52 @@ out:
>  return ret;
>  }
>
> +/*
> + * Check if read is allowed with given memory buffer and length.
> + *
> + * This function is used to check O_DIRECT request alignment.
> + */
> +static bool gluster_is_io_aligned(struct glfs_fd *fd, void *buf, size_t
> len)
> +{
> +ssize_t ret = glfs_pread(fd, buf, len, 0, 0, NULL);
> +return ret >= 0 || errno != EINVAL;
> +}
> +
> +static void gluster_probe_alignment(BlockDriverState *bs, struct glfs_fd
> *fd,
> +Error **errp)
> +{
> +char *buf;
> +size_t alignments[] = {1, 512, 1024, 2048, 4096};
> +size_t align;
> +int i;
> +
> +buf = qemu_memalign(MAX_ALIGN, MAX_ALIGN);
> +
> +for (i = 0; i < ARRAY_SIZE(alignments); i++) {
> +align = alignments[i];
> +if (gluster_is_io_aligned(fd, buf, align)) {
> +/* Fallback to safe value. */
> +bs->bl.request_alignment = (align != 1) ? align : MAX_ALIGN;
> +break;
> +}
> +}
> +
> +qemu_vfree(buf);
> +
> +if (!bs->bl.request_alignment) {
> +error_setg(errp, "Could not find working O_DIRECT alignment");
> +error_append_hint(errp, "Try cache.direct=off\n");
> +}
> +}
> +
>  static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error
> **errp)
>  {
> +BDRVGlusterState *s = bs->opaque;
> +
> +gluster_probe_alignment(bs, s->fd, errp);
> +
> +bs->bl.min_mem_alignment = bs->bl.request_alignment;
> +bs->bl.opt_mem_alignment = MAX(bs->bl.request_alignment, MAX_ALIGN);
>  bs->bl.max_transfer = GLUSTER_MAX_TRANSFER;
>  }
>
> --
> 2.20.1
>
>
To debug this I added this temporary patch:

diff --git a/block/gluster.c b/block/gluster.c
index d2d187490b..790ef4251b 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -912,6 +912,7 @@ out:
 static bool gluster_is_io_aligned(struct glfs_fd *fd, void *buf, size_t
len)
 {
 ssize_t ret = glfs_pread(fd, buf, len, 0, 0, NULL);
+printf("gluster_is_io_aligned len=%ld ret=%ld errno=%d\n", len, ret,
errno);
 return ret >= 0 || errno != EINVAL;
 }

@@ -940,6 +941,9 @@ static void gluster_probe_alignment(BlockDriverState
*bs, struct glfs_fd *fd,
 error_setg(errp, "Could not find working O_DIRECT alignment");
 error_append_hint(errp, "Try cache.direct=off\n");
 }
+
+printf("Probed aligment for %s request_alignment=%d\n",
+   bs->filename, bs->bl.request_alignment);
 }

 static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)

Here is example run with volume with sector size of 512 bytes:

$ sudo mount -t glusterfs gluste

Re: [Qemu-devel] [Qemu-block] [PATCH] nbd: Advertise multi-conn for shared read-only connections

2019-08-17 Thread Nir Soffer
On Sat, Aug 17, 2019 at 5:30 PM Eric Blake  wrote:

> On 8/16/19 5:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>
> >>> +++ b/blockdev-nbd.c
> >>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool
> has_name, const char *name,
> >>>   }
> >>>
> >>>   exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
> >>> - writable ? 0 : NBD_FLAG_READ_ONLY,
> >>> + writable ? 0 : NBD_FLAG_READ_ONLY, true,
> >>
> >> s/true/!writable ?
> >
> > Oh, I see, John already noticed this, it's checked in nbd_export_new
> anyway..
>
> Still, since two reviewers have caught it, I'm fixing it :)
>
>
> >>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs,
> uint64_t dev_offset,
> >>>   perm = BLK_PERM_CONSISTENT_READ;
> >>>   if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
> >>>   perm |= BLK_PERM_WRITE;
> >>> +} else if (shared) {
> >>> +nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
> >
> > For me it looks a bit strange: we already have nbdflags parameter for
> nbd_export_new(), why
> > to add a separate boolean to pass one of nbdflags flags?
>
> Because I want to get rid of the nbdflags in my next patch.
>
> >
> > Also, for qemu-nbd, shouldn't we allow -e only together with -r ?
>
> I'm reluctant to; it might break whatever existing user is okay exposing
> it (although such users are questionable, so maybe we can argue they
> were already broken).  Maybe it's time to start a deprecation cycle?
>

man qemu-nbd (on Centos 7.6) says:

   -e, --shared=num
   Allow up to num clients to share the device (default 1)

I see that in qemu-img 4.1 there is a note about consistency with writers:

   -e, --shared=num
   Allow up to num clients to share the device (default 1). Safe
for readers, but for now, consistency is not guaranteed between multiple
writers.
But it is not clear what are the consistency guarantees.

Supporting multiple writers is important. oVirt is giving the user a URL
(since 4.3), and the user
can use multiple connections using the same URL, each having a connection
to the same qemu-nbd
socket. I know that some backup vendors tried to use multiple connections
to speed up backups, and
they may try to do this also for restore.

An interesting use case would be using multiple connections on client side
to write in parallel to
same image, when every client is writing different ranges.

Do we have real issue in qemu-nbd serving multiple clients writing to
different parts of
the same image?

Nir


[Qemu-devel] [PATCH] block: file-posix: Fix alignment probing on glsuter

2019-08-06 Thread Nir Soffer
On Gluster storage with sector size of 4096 bytes, buf_align may be
wrong; reading 4096 bytes into unaligned buffer succeeds. This probably
happens because the actual read happens on the Gluster node with aligned
buffer, and Gluster client does not enforce any alignment on the host.

However request_alignment is always right, since the same size is use on
the Gluster node to perform the actual I/O. Use the maximum value for
setting min_mem_alignment.

With this change we can provision a virtual machine with Gluster storage
using VDO device and fuse mount.

This is a partial fix for https://bugzilla.redhat.com/1737256. To make
this work, the management system must ensure that the first block of the
image is allocated, for example:

qemu-img create -f raw test.img 1g
dd if=/dev/zero bs=4096 count=1 of=test.img conv=nortunc

Signed-off-by: Nir Soffer 
---
 block/file-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 4479cc7ab4..d29b9e5229 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1122,7 +1122,7 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 }
 
 raw_probe_alignment(bs, s->fd, errp);
-bs->bl.min_mem_alignment = s->buf_align;
+bs->bl.min_mem_alignment = MAX(s->buf_align, bs->bl.request_alignment);
 bs->bl.opt_mem_alignment = MAX(s->buf_align, getpagesize());
 }
 
-- 
2.20.1




[PATCH] libvhost-user: Fix update of signalled_used

2023-05-09 Thread Nir Soffer
When we check if a driver needs a signal, we compare:

- used_event: written by the driver each time it consumes an item
- new: current idx written to the used ring, updated by us
- old: last idx we signaled about

We call vring_need_event() which does:

return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);

Previously we updated signalled_used on every check, so old was always
new - 1. Because used_event cannot bigger than new_idx, this check
becomes (ignoring wrapping):

return new_idx == event_idx + 1;

Since the driver consumes items at the same time the device produces
items, it is very likely (and seen in logs) that the driver used_event
is too far behind new_idx and we don't signal the driver.

With libblkio virtio-blk-vhost-user driver, if the driver does not get a
signal, the libblkio client can hang polling the completion fd. This
is very easy to reproduce on some machines and impossible to reproduce
on others.

Fixed by updating signalled_used only when we signal the driver.
Tested using blkio-bench and libblkio client application that used to
hang randomly without this change.

Buglink: https://gitlab.com/libblkio/libblkio/-/issues/68
Signed-off-by: Nir Soffer 
---
 subprojects/libvhost-user/libvhost-user.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 8fb61e2df2..5f26d2d378 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2382,12 +2382,11 @@ vu_queue_empty(VuDev *dev, VuVirtq *vq)
 }
 
 static bool
 vring_notify(VuDev *dev, VuVirtq *vq)
 {
-uint16_t old, new;
-bool v;
+uint16_t old, new, used;
 
 /* We need to expose used array entries before checking used event. */
 smp_mb();
 
 /* Always notify when queue is empty (when feature acknowledge) */
@@ -2398,15 +2397,27 @@ vring_notify(VuDev *dev, VuVirtq *vq)
 
 if (!vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
 return !(vring_avail_flags(vq) & VRING_AVAIL_F_NO_INTERRUPT);
 }
 
-v = vq->signalled_used_valid;
-vq->signalled_used_valid = true;
+if (!vq->signalled_used_valid) {
+vq->signalled_used_valid = true;
+vq->signalled_used = vq->used_idx;
+return true;
+}
+
+used = vring_get_used_event(vq);
+new = vq->used_idx;
 old = vq->signalled_used;
-new = vq->signalled_used = vq->used_idx;
-return !v || vring_need_event(vring_get_used_event(vq), new, old);
+
+if (vring_need_event(used, new, old)) {
+vq->signalled_used_valid = true;
+vq->signalled_used = vq->used_idx;
+return true;
+}
+
+return false;
 }
 
 static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
 {
 if (unlikely(dev->broken) ||
-- 
2.40.1




Re: [PATCH 3/4] qemu-img: add --shallow option for qemu-img compare --stat

2021-09-29 Thread Nir Soffer
On Wed, Sep 29, 2021 at 4:37 PM Vladimir Sementsov-Ogievskiy
 wrote:
>
> Allow compare only top images of backing chains. That's useful for
> comparing two increments from the same chain of incremental backups.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  docs/tools/qemu-img.rst |  8 +++-
>  qemu-img.c  | 14 --
>  qemu-img-cmds.hx|  4 ++--
>  3 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> index 4b382ca2b0..c8ae96be6a 100644
> --- a/docs/tools/qemu-img.rst
> +++ b/docs/tools/qemu-img.rst
> @@ -176,6 +176,12 @@ Parameters to compare subcommand:
>  - If both files don't specify cluster-size, use default of 64K
>  - If only one file specify cluster-size, just use it.
>
> +.. option:: --shallow

We use the same term in oVirt when we upload/download one layer from a chain.

> +  Only allowed with ``--stat``. This option prevents opening and comparing
> +  any backing files. This is useful to compare incremental images from
> +  the chain of incremental backups.

This is useful also without --stat. Our current workaround in oVirt is
to use unsafe
rebase to disconnect the top image from the base image so we can compare
source and destination image after backup.

Here is an example of test code that could use --shallow (regardless of --stat):
https://github.com/oVirt/ovirt-imageio/blob/master/daemon/test/backup_test.py#L114

Do you have any reason to limit --shallow to --stats?

> +
>  Parameters to convert subcommand:
>
>  .. program:: qemu-img-convert
> @@ -395,7 +401,7 @@ Command description:
>
>The rate limit for the commit process is specified by ``-r``.
>
> -.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] 
> [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] 
> FILENAME1 FILENAME2
> +.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] 
> [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE] 
> [--shallow]] FILENAME1 FILENAME2
>
>Check if two images have the same content. You can compare images with
>different format or settings.
> diff --git a/qemu-img.c b/qemu-img.c
> index 61e7f470bb..e8ae412c38 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -85,6 +85,7 @@ enum {
>  OPTION_SKIP_BROKEN = 277,
>  OPTION_STAT = 277,
>  OPTION_BLOCK_SIZE = 278,
> +OPTION_SHALLOW = 279,
>  };
>
>  typedef enum OutputFormat {
> @@ -1482,7 +1483,7 @@ static int img_compare(int argc, char **argv)
>  int64_t block_end;
>  int ret = 0; /* return value - 0 Ident, 1 Different, >1 Error */
>  bool progress = false, quiet = false, strict = false;
> -int flags;
> +int flags = 0;
>  bool writethrough;
>  int64_t total_size;
>  int64_t offset = 0;
> @@ -1504,6 +1505,7 @@ static int img_compare(int argc, char **argv)
>  {"force-share", no_argument, 0, 'U'},
>  {"stat", no_argument, 0, OPTION_STAT},
>  {"block-size", required_argument, 0, OPTION_BLOCK_SIZE},
> +{"shallow", no_argument, 0, OPTION_SHALLOW},
>  {0, 0, 0, 0}
>  };
>  c = getopt_long(argc, argv, ":hf:F:T:pqsU",
> @@ -1569,6 +1571,9 @@ static int img_compare(int argc, char **argv)
>  exit(EXIT_SUCCESS);
>  }
>  break;
> +case OPTION_SHALLOW:
> +flags |= BDRV_O_NO_BACKING;
> +break;
>  }
>  }
>
> @@ -1590,10 +1595,15 @@ static int img_compare(int argc, char **argv)
>  goto out;
>  }
>
> +if (!do_stat && (flags & BDRV_O_NO_BACKING)) {
> +error_report("--shallow can be used only together with --stat");
> +ret = 1;
> +goto out;
> +}
> +
>  /* Initialize before goto out */
>  qemu_progress_init(progress, 2.0);
>
> -flags = 0;
>  ret = bdrv_parse_cache_mode(cache, &flags, &writethrough);
>  if (ret < 0) {
>  error_report("Invalid source cache option: %s", cache);
> diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
> index 96a193eea8..a295bc6860 100644
> --- a/qemu-img-cmds.hx
> +++ b/qemu-img-cmds.hx
> @@ -40,9 +40,9 @@ SRST
>  ERST
>
>  DEF("compare", img_compare,
> -"compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T 
> src_cache] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] filename1 
> filename2")
> +"compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T 
> src_cache] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE] [--shallow]] 
> filename1 filename2")
>  SRST
> -.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] 
> [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] 
> FILENAME1 FILENAME2
> +.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] 
> [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE] 
> [--shallow]] FILENAME1 FILENAME2
>  ERST
>
>  DEF("convert",

Re: [PATCH 3/4] qemu-img: add --shallow option for qemu-img compare --stat

2021-09-29 Thread Nir Soffer
On Wed, Sep 29, 2021 at 7:28 PM Vladimir Sementsov-Ogievskiy
 wrote:
>
> 29.09.2021 19:00, Nir Soffer wrote:
> > On Wed, Sep 29, 2021 at 4:37 PM Vladimir Sementsov-Ogievskiy
> >  wrote:
> >>
> >> Allow compare only top images of backing chains. That's useful for
> >> comparing two increments from the same chain of incremental backups.
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >> ---
> >>   docs/tools/qemu-img.rst |  8 +++-
> >>   qemu-img.c  | 14 --
> >>   qemu-img-cmds.hx|  4 ++--
> >>   3 files changed, 21 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> >> index 4b382ca2b0..c8ae96be6a 100644
> >> --- a/docs/tools/qemu-img.rst
> >> +++ b/docs/tools/qemu-img.rst
> >> @@ -176,6 +176,12 @@ Parameters to compare subcommand:
> >>   - If both files don't specify cluster-size, use default of 64K
> >>   - If only one file specify cluster-size, just use it.
> >>
> >> +.. option:: --shallow
> >
> > We use the same term in oVirt when we upload/download one layer from a 
> > chain.
> >
> >> +  Only allowed with ``--stat``. This option prevents opening and comparing
> >> +  any backing files. This is useful to compare incremental images from
> >> +  the chain of incremental backups.
> >
> > This is useful also without --stat. Our current workaround in oVirt is
> > to use unsafe
> > rebase to disconnect the top image from the base image so we can compare
> > source and destination image after backup.
> >
> > Here is an example of test code that could use --shallow (regardless of 
> > --stat):
> > https://github.com/oVirt/ovirt-imageio/blob/master/daemon/test/backup_test.py#L114
> >
> > Do you have any reason to limit --shallow to --stats?
>
>
> Hmm. I wrongly thought that without --stat qemu-img compare will fail on 
> first mismatch, which will occur soon, as we don't have backing images and 
> it's just superfluous.
>
> But actually, qemu-img will not compare "unallocated" areas.
>
> Ok, I agree, in v2 I'll allow --shallow without --stat.
>
>
> Another question to discuss: we already have "-u" option in qemu-img create 
> and qemu-img rebase to not open backing files. And 'u' means 'unsafe'.
> I don't think that "unsafe" term is good for qemu-img compare --stat, that's 
> why I decided to call it differently: "shallow".
> Still for qemu-img compare (without --stat) "unsafe" term make sense.
>
>
> So, it probably better to follow common notation, and call the option "-u".

--shallow is better, comparing a single image from a chain is a safe operation.
Replacing a backing file or creating an image on top of one without checking
the backing file is not.

>
> >
> >> +
> >>   Parameters to convert subcommand:
> >>
> >>   .. program:: qemu-img-convert
> >> @@ -395,7 +401,7 @@ Command description:
> >>
> >> The rate limit for the commit process is specified by ``-r``.
> >>
> >> -.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] 
> >> [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE]] 
> >> FILENAME1 FILENAME2
> >> +.. option:: compare [--object OBJECTDEF] [--image-opts] [-f FMT] [-F FMT] 
> >> [-T SRC_CACHE] [-p] [-q] [-s] [-U] [--stat [--block-size BLOCK_SIZE] 
> >> [--shallow]] FILENAME1 FILENAME2
> >>
> >> Check if two images have the same content. You can compare images with
> >> different format or settings.
> >> diff --git a/qemu-img.c b/qemu-img.c
> >> index 61e7f470bb..e8ae412c38 100644
> >> --- a/qemu-img.c
> >> +++ b/qemu-img.c
> >> @@ -85,6 +85,7 @@ enum {
> >>   OPTION_SKIP_BROKEN = 277,
> >>   OPTION_STAT = 277,
> >>   OPTION_BLOCK_SIZE = 278,
> >> +OPTION_SHALLOW = 279,
> >>   };
> >>
> >>   typedef enum OutputFormat {
> >> @@ -1482,7 +1483,7 @@ static int img_compare(int argc, char **argv)
> >>   int64_t block_end;
> >>   int ret = 0; /* return value - 0 Ident, 1 Different, >1 Error */
> >>   bool progress = false, quiet = false, strict = false;
> >> -int flags;
> >> +int flags = 0;
> >>   bool writethrough;
> >>   int64_t total_size;
> >>   int64_t offset = 

Re: [PATCH 2/2] iotests/block-status-cache: New test

2022-01-17 Thread Nir Soffer
# This will probably detect an allocated data sector first (qemu 
> likes
> +# to allocate the first sector to facilitate alignment probing), and
> +# then the rest to be zero.  The BSC will thus contain (if anything)
> +# one range covering the first sector.
> +map_pre = qemu_img_pipe('map', '--output=json', '--image-opts',
> +nbd_img_opts)
> +
> +# qemu:allocation-depth maps for want_zero=false.
> +# want_zero=false should (with the file driver, which the server is
> +# using) report everything as data.  While this is sufficient for
> +# want_zero=false, this is nothing that should end up in the
> +# block-status cache.
> +# Due to a bug, this information did end up in the cache, though, and
> +# this would lead to wrong information being returned on subsequent
> +# want_zero=true calls.
> +#
> +# We need to run this map twice: On the first call, we probably still
> +# have the first sector in the cache, and so this will be served from
> +# the cache; and only the subsequent range will be queried from the
> +# block driver.  This subsequent range will then be entered into the
> +# cache.
> +# If we did a want_zero=true call at this point, we would thus get
> +# correct information: The first sector is not covered by the cache, 
> so
> +# we would get fresh block-status information from the driver, which
> +# would return a data range, and this would then go into the cache,
> +# evicting the wrong range from the want_zero=false call before.
> +#
> +# Therefore, we need a second want_zero=false map to reproduce:
> +# Since the first sector is not in the cache, the query for its 
> status
> +# will go to the driver, which will return a result that reports the
> +# whole image to be a single data area.  This result will then go 
> into
> +# the cache, and so the cache will then report the whole image to
> +# contain data.

Interesting, but once we fix the bug this complex flow is gone so
we can eliminate this text, no?

> +#
> +# Note that once the cache reports the whole image to contain data, 
> any
> +# subsequent map operation will be served from the cache, and so we 
> can
> +# never loop too many times here.
> +for _ in range(2):
> +# (Ignore the result, this is just to contaminate the cache)
> +qemu_img_pipe('map', '--output=json', '--image-opts',
> +  nbd_img_opts_alloc_depth)
> +
> +# Now let's see whether the cache reports everything as data, or
> +# whether we get correct information (i.e. the same as we got on our
> +# first attempt).
> +map_post = qemu_img_pipe('map', '--output=json', '--image-opts',
> + nbd_img_opts)
> +
> +if map_pre != map_post:
> +print('ERROR: Map information differs before and after querying 
> ' +
> +  'qemu:allocation-depth')
> +print('Before:')
> +    print(map_pre)
> +print('After:')
> +print(map_post)
> +
> +self.fail("Map information differs")
> +
> +
> +if __name__ == '__main__':
> +# The block-status cache only works on the protocol layer, so to test it,
> +# we can only use the raw format
> +iotests.main(supported_fmts=['raw'],
> + supported_protocols=['file'])
> diff --git a/tests/qemu-iotests/tests/block-status-cache.out 
> b/tests/qemu-iotests/tests/block-status-cache.out
> new file mode 100644
> index 00..ae1213e6f8
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/block-status-cache.out
> @@ -0,0 +1,5 @@
> +.
> +--
> +Ran 1 tests
> +
> +OK
> --
> 2.33.1
>

Reviewed-by: Nir Soffer 




Re: [PATCH 1/2] block/io: Update BSC only if want_zero is true

2022-01-17 Thread Nir Soffer
On Mon, Jan 17, 2022 at 6:26 PM Hanna Reitz  wrote:
>
> We update the block-status cache whenever we get new information from a
> bdrv_co_block_status() call to the block driver.  However, if we have
> passed want_zero=false to that call, it may flag areas containing zeroes
> as data, and so we would update the block-status cache with wrong
> information.
>
> Therefore, we should not update the cache with want_zero=false.
>
> Reported-by: Nir Soffer 
> Fixes: 0bc329fbb009f8601cec23bf2bc48ead0c5a5fa2
>("block: block-status cache for data regions")
> Signed-off-by: Hanna Reitz 
> ---
>  block/io.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/block/io.c b/block/io.c
> index bb0a254def..4e4cb556c5 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2497,8 +2497,12 @@ static int coroutine_fn 
> bdrv_co_block_status(BlockDriverState *bs,
>   * non-protocol nodes, and then it is never used.  However, 
> filling
>   * the cache requires an RCU update, so double check here to 
> avoid
>   * such an update if possible.
> + *
> + * Check want_zero, because we only want to update the cache 
> when we
> + * have accurate information about what is zero and what is data.
>   */
> -if (ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
> +if (want_zero &&
> +ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
>  QLIST_EMPTY(&bs->children))
>  {
>  /*
> --
> 2.33.1
>

ovirt-imageio tests pass with this change.
Thanks for the quick fix!

Reviewed-by: Nir Soffer 




Re: [PATCH v2 2/2] iotests/block-status-cache: New test

2022-01-18 Thread Nir Soffer
#x27;map', '--output=json', '--image-opts',
> +nbd_img_opts)
> +
> +# qemu:allocation-depth maps for want_zero=false.
> +# want_zero=false should (with the file driver, which the server is
> +# using) report everything as data.  While this is sufficient for
> +# want_zero=false, this is nothing that should end up in the
> +# block-status cache.
> +# Due to a bug, this information did end up in the cache, though, and
> +# this would lead to wrong information being returned on subsequent
> +# want_zero=true calls.
> +#
> +# We need to run this map twice: On the first call, we probably still
> +# have the first sector in the cache, and so this will be served from
> +# the cache; and only the subsequent range will be queried from the
> +# block driver.  This subsequent range will then be entered into the
> +# cache.
> +# If we did a want_zero=true call at this point, we would thus get
> +# correct information: The first sector is not covered by the cache, 
> so
> +# we would get fresh block-status information from the driver, which
> +# would return a data range, and this would then go into the cache,
> +# evicting the wrong range from the want_zero=false call before.
> +#
> +# Therefore, we need a second want_zero=false map to reproduce:
> +# Since the first sector is not in the cache, the query for its 
> status
> +# will go to the driver, which will return a result that reports the
> +# whole image to be a single data area.  This result will then go 
> into
> +# the cache, and so the cache will then report the whole image to
> +# contain data.
> +#
> +# Note that once the cache reports the whole image to contain data, 
> any
> +# subsequent map operation will be served from the cache, and so we 
> can
> +# never loop too many times here.
> +for _ in range(2):
> +# (Ignore the result, this is just to contaminate the cache)
> +qemu_img_pipe('map', '--output=json', '--image-opts',
> +  nbd_img_opts_alloc_depth)
> +
> +# Now let's see whether the cache reports everything as data, or
> +# whether we get correct information (i.e. the same as we got on our
> +# first attempt).
> +map_post = qemu_img_pipe('map', '--output=json', '--image-opts',
> + nbd_img_opts)
> +
> +if map_pre != map_post:
> +print('ERROR: Map information differs before and after querying 
> ' +
> +  'qemu:allocation-depth')
> +print('Before:')
> +print(map_pre)
> +print('After:')
> +print(map_post)
> +
> +self.fail("Map information differs")
> +
> +
> +if __name__ == '__main__':
> +# The block-status cache only works on the protocol layer, so to test it,
> +# we can only use the raw format
> +iotests.main(supported_fmts=['raw'],
> + supported_protocols=['file'])
> diff --git a/tests/qemu-iotests/tests/block-status-cache.out 
> b/tests/qemu-iotests/tests/block-status-cache.out
> new file mode 100644
> index 00..ae1213e6f8
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/block-status-cache.out
> @@ -0,0 +1,5 @@
> +.
> +--
> +Ran 1 tests
> +
> +OK
> --
> 2.33.1
>

The out file is not very useful, and even fragile - if the test framework
will change the output format, the test will fail. Ideally we depend only
on the relevant output of our tools, and using a different version of the test
framework on replacing it (e.g pytest) will not require modifying the out
files.

Regardless I would like to see this fix merged and this issue already
exists in other tests. Some tests in tests/ do have useful output that can
make debugging failures easier.

Reviewed-by: Nir Soffer 




[PATCH] nbd/server.c: Remove unused field

2022-01-11 Thread Nir Soffer
NBDRequestData struct has unused QSIMPLEQ_ENTRY filed. It seems that
this field exists since the first git commit and was never used.

Signed-off-by: Nir Soffer 
---
 nbd/server.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index 3927f7789d..ce5b2a1d02 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -70,21 +70,20 @@ static int system_errno_to_nbd_errno(int err)
 default:
 return NBD_EINVAL;
 }
 }
 
 /* Definitions for opaque data types */
 
 typedef struct NBDRequestData NBDRequestData;
 
 struct NBDRequestData {
-QSIMPLEQ_ENTRY(NBDRequestData) entry;
 NBDClient *client;
 uint8_t *data;
 bool complete;
 };
 
 struct NBDExport {
 BlockExport common;
 
 char *name;
 char *description;
-- 
2.34.1




Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

2022-02-15 Thread Nir Soffer
On Tue, Feb 15, 2022 at 7:22 PM Eric Blake  wrote:

> According to the NBD spec, a server advertising
> NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> not see any cache inconsistencies: when properly separated by a single
> flush, actions performed by one client will be visible to another
> client, regardless of which client did the flush.  We satisfy these
> conditions in qemu when our block layer is backed by the local
> filesystem (by virtue of the semantics of fdatasync(), and the fact
> that qemu itself is not buffering writes beyond flushes).  It is
> harder to state whether we satisfy these conditions for network-based
> protocols, so the safest course of action is to allow users to opt-in
> to advertising multi-conn.  We may later tweak defaults to advertise
> by default when the block layer can confirm that the underlying
> protocol driver is cache consistent between multiple writers, but for
> now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
> explicitly start qemu-nbd or qemu-storage-daemon with multi-conn
> advertisement in a known-safe setup where the client end can then
> benefit from parallel clients.
>

It makes sense, and will be used by oVirt. Actually we are already using
multiple connections for writing about 2 years, based on your promise
that if every client writes to district  areas this is safe.

Note, however, that we don't want to advertise MULTI_CONN when we know
> that a second client cannot connect (for historical reasons, qemu-nbd
> defaults to a single connection while nbd-server-add and QMP commands
> default to unlimited connections; but we already have existing means
> to let either style of NBD server creation alter those defaults).  The
> harder part of this patch is setting up an iotest to demonstrate
> behavior of multiple NBD clients to a single server.  It might be
> possible with parallel qemu-io processes, but concisely managing that
> in shell is painful.  I found it easier to do by relying on the libnbd
> project's nbdsh, which means this test will be skipped on platforms
> where that is not available.
>
> Signed-off-by: Eric Blake 
> Fixes: https://bugzilla.redhat.com/1708300
> ---
>
> v1 was in Aug 2021 [1], with further replies in Sep [2] and Oct [3].
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg04900.html
> [2] https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html
> [3] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html
>
> Since then, I've tweaked the QAPI to mention 7.0 (instead of 6.2), and
> reworked the logic so that default behavior is unchanged for now
> (advertising multi-conn on a writable export requires opt-in during
> the command line or QMP, but remains default for a readonly export).
> I've also expanded the amount of testing done in the new iotest.
>
>  docs/interop/nbd.txt   |   1 +
>  docs/tools/qemu-nbd.rst|   3 +-
>  qapi/block-export.json |  34 +++-
>  include/block/nbd.h|   3 +-
>  blockdev-nbd.c |   5 +
>  nbd/server.c   |  27 ++-
>  MAINTAINERS|   1 +
>  tests/qemu-iotests/tests/nbd-multiconn | 188 +
>  tests/qemu-iotests/tests/nbd-multiconn.out | 112 
>  9 files changed, 363 insertions(+), 11 deletions(-)
>  create mode 100755 tests/qemu-iotests/tests/nbd-multiconn
>  create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out
>
> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
> index bdb0f2a41aca..6c99070b99c8 100644
> --- a/docs/interop/nbd.txt
> +++ b/docs/interop/nbd.txt
> @@ -68,3 +68,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:",
> NBD_CMD_CACHE
>  * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
>  NBD_CMD_FLAG_FAST_ZERO
>  * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
> +* 7.0: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
> diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
> index 6031f9689312..1de785524c36 100644
> --- a/docs/tools/qemu-nbd.rst
> +++ b/docs/tools/qemu-nbd.rst
> @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified.
>  .. option:: -e, --shared=NUM
>
>Allow up to *NUM* clients to share the device (default
> -  ``1``), 0 for unlimited. Safe for readers, but for now,
> -  consistency is not guaranteed between multiple writers.
> +  ``1``), 0 for unlimited.
>

Removing the note means that now consistency is guaranteed between
multiple writers, no?

Or maybe we want to mention here that consistency depends on the protocol
and users can opt in, or refer to the section where this is discussed?

 .. option:: -t, --persistent
>
> diff --git a/qapi/block-export.json b/qapi/block-export.json
> index f183522d0d2c..0a27e8ee84f9 100644
> --- a/qapi/block-export.json
> +++ b/qapi/block-export.json
> @@ -21,7 +21,9 @@
>  # recreated

Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

2022-02-16 Thread Nir Soffer
On Wed, Feb 16, 2022 at 12:13 PM Richard W.M. Jones 
wrote:

> On Tue, Feb 15, 2022 at 05:24:14PM -0600, Eric Blake wrote:
> > Oh. The QMP command (which is immediately visible through
> > nbd-server-add/block-storage-add to qemu and qemu-storage-daemon)
> > gains "multi-conn":"on", but you may be right that qemu-nbd would want
> > a command line option (either that, or we accellerate our plans that
> > qsd should replace qemu-nbd).
>
> I really hope there will always be something called "qemu-nbd"
> that acts like qemu-nbd.
>

I share this hope. Most projects I work on are based on qemu-nbd.

However in oVirt use case, we want to provide an NBD socket for clients to
allow direct
access to disks. One of the issues we need to solve for this is having a
way to tell if the
qemu-nbd is active, so we can terminate idle transfers.

The way we do this with the ovirt-imageio server is to query the status of
the transfer, and
use the idle time (time since last request) and active status (has inflight
requests) to detect
a stale transfer that should be terminated. An example use case is a
process on a remote
host that started an image transfer, and killed or crashed in the middle of
the transfer
without cleaning up properly.

To be more specific, every request to the imageio server (read, write,
flush, zero, options)
updates a timestamp in the transfer state. When we get the status we report
the time since
that timestamp was updated.

Additionally we keep and report the number of inflight requests, so we can
tell the case when
requests are blocked on inaccessible storage (e.g. non responsive NFS).

We don't have a way to do this with qemu-nbd, but I guess that using
qemu-storage-daemon
when we have qmp access will make such monitoring possible.

Nir


Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

2022-02-16 Thread Nir Soffer
On Wed, Feb 16, 2022 at 10:08 AM Vladimir Sementsov-Ogievskiy
 wrote:
>
> 16.02.2022 02:24, Eric Blake wrote:
> > On Tue, Feb 15, 2022 at 09:23:36PM +0200, Nir Soffer wrote:
> >> On Tue, Feb 15, 2022 at 7:22 PM Eric Blake  wrote:
> >>
> >>> According to the NBD spec, a server advertising
> >>> NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> >>> not see any cache inconsistencies: when properly separated by a single
> >>> flush, actions performed by one client will be visible to another
> >>> client, regardless of which client did the flush.  We satisfy these
> >>> conditions in qemu when our block layer is backed by the local
> >>> filesystem (by virtue of the semantics of fdatasync(), and the fact
> >>> that qemu itself is not buffering writes beyond flushes).  It is
> >>> harder to state whether we satisfy these conditions for network-based
> >>> protocols, so the safest course of action is to allow users to opt-in
> >>> to advertising multi-conn.  We may later tweak defaults to advertise
> >>> by default when the block layer can confirm that the underlying
> >>> protocol driver is cache consistent between multiple writers, but for
> >>> now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
> >>> explicitly start qemu-nbd or qemu-storage-daemon with multi-conn
> >>> advertisement in a known-safe setup where the client end can then
> >>> benefit from parallel clients.
> >>>
> >>
> >> It makes sense, and will be used by oVirt. Actually we are already using
> >> multiple connections for writing about 2 years, based on your promise
> >> that if every client writes to district  areas this is safe.
> >
> > I presume s/district/distinct/, but yes, I'm glad we're finally trying
> > to make the code match existing practice ;)
> >
> >>> +++ b/docs/tools/qemu-nbd.rst
> >>> @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified.
> >>>   .. option:: -e, --shared=NUM
> >>>
> >>> Allow up to *NUM* clients to share the device (default
> >>> -  ``1``), 0 for unlimited. Safe for readers, but for now,
> >>> -  consistency is not guaranteed between multiple writers.
> >>> +  ``1``), 0 for unlimited.
> >>>
> >>
> >> Removing the note means that now consistency is guaranteed between
> >> multiple writers, no?
> >>
> >> Or maybe we want to mention here that consistency depends on the protocol
> >> and users can opt in, or refer to the section where this is discussed?
> >
> > Yeah, a link to the QAPI docs where multi-conn is documented might be
> > nice, except I'm not sure the best way to do that in our sphinx
> > documentation setup.
> >
> >>> +##
> >>> +# @NbdExportMultiConn:
> >>> +#
> >>> +# Possible settings for advertising NBD multiple client support.
> >>> +#
> >>> +# @off: Do not advertise multiple clients.
> >>> +#
> >>> +# @on: Allow multiple clients (for writable clients, this is only safe
> >>> +#  if the underlying BDS is cache-consistent, such as when backed
> >>> +#  by the raw file driver); ignored if the NBD server was set up
> >>> +#  with max-connections of 1.
> >>> +#
> >>> +# @auto: Behaves like @off if the export is writable, and @on if the
> >>> +#export is read-only.
> >>> +#
> >>> +# Since: 7.0
> >>> +##
> >>> +{ 'enum': 'NbdExportMultiConn',
> >>> +  'data': ['off', 'on', 'auto'] }
> >>>
> >>
> >> Are we going to have --multi-con=(on|off|auto)?
> >
> > Oh. The QMP command (which is immediately visible through
> > nbd-server-add/block-storage-add to qemu and qemu-storage-daemon)
> > gains "multi-conn":"on", but you may be right that qemu-nbd would want
> > a command line option (either that, or we accellerate our plans that
> > qsd should replace qemu-nbd).
> >
> >>> +++ b/blockdev-nbd.c
> >>> @@ -44,6 +44,11 @@ bool nbd_server_is_running(void)
> >>>   return nbd_server || is_qemu_nbd;
> >>>   }
> >>>
> >>> +int nbd_server_max_connections(void)
> >>> +{
> >>> +return nbd_server ? nbd_server->max_connections : -1;
> >>> +}
> &

Re: [PATCH 2/6] virtio-scsi: don't waste CPU polling the event virtqueue

2022-04-27 Thread Nir Soffer
On Wed, Apr 27, 2022 at 5:35 PM Stefan Hajnoczi  wrote:
>
> The virtio-scsi event virtqueue is not emptied by its handler function.
> This is typical for rx virtqueues where the device uses buffers when
> some event occurs (e.g. a packet is received, an error condition
> happens, etc).
>
> Polling non-empty virtqueues wastes CPU cycles. We are not waiting for
> new buffers to become available, we are waiting for an event to occur,
> so it's a misuse of CPU resources to poll for buffers.
>
> Introduce the new virtio_queue_aio_attach_host_notifier_no_poll() API,
> which is identical to virtio_queue_aio_attach_host_notifier() except
> that it does not poll the virtqueue.
>
> Before this patch the following command-line consumed 100% CPU in the
> IOThread polling and calling virtio_scsi_handle_event():
>
>   $ qemu-system-x86_64 -M accel=kvm -m 1G -cpu host \
>   --object iothread,id=iothread0 \
>   --device virtio-scsi-pci,iothread=iothread0 \
>   --blockdev 
> file,filename=test.img,aio=native,cache.direct=on,node-name=drive0 \
>   --device scsi-hd,drive=drive0
>
> After this patch CPU is no longer wasted.
>
> Reported-by: Nir Soffer 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/hw/virtio/virtio.h  |  1 +
>  hw/scsi/virtio-scsi-dataplane.c |  2 +-
>  hw/virtio/virtio.c  | 13 +
>  3 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index b31c4507f5..b62a35fdca 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -317,6 +317,7 @@ EventNotifier *virtio_queue_get_host_notifier(VirtQueue 
> *vq);
>  void virtio_queue_set_host_notifier_enabled(VirtQueue *vq, bool enabled);
>  void virtio_queue_host_notifier_read(EventNotifier *n);
>  void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx);
> +void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext 
> *ctx);
>  void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx);
>  VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector);
>  VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
> diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
> index 29575cbaf6..8bb6e6acfc 100644
> --- a/hw/scsi/virtio-scsi-dataplane.c
> +++ b/hw/scsi/virtio-scsi-dataplane.c
> @@ -138,7 +138,7 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
>
>  aio_context_acquire(s->ctx);
>  virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);
> -virtio_queue_aio_attach_host_notifier(vs->event_vq, s->ctx);
> +virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
>
>  for (i = 0; i < vs->conf.num_queues; i++) {
>  virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx);
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 9d637e043e..67a873f54a 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3534,6 +3534,19 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue 
> *vq, AioContext *ctx)
>  virtio_queue_host_notifier_aio_poll_end);
>  }
>
> +/*
> + * Same as virtio_queue_aio_attach_host_notifier() but without polling. Use
> + * this for rx virtqueues and similar cases where the virtqueue handler
> + * function does not pop all elements. When the virtqueue is left non-empty
> + * polling consumes CPU cycles and should not be used.
> + */
> +void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext 
> *ctx)
> +{
> +aio_set_event_notifier(ctx, &vq->host_notifier, true,
> +   virtio_queue_host_notifier_read,
> +   NULL, NULL);
> +}
> +
>  void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx)
>  {
>  aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL, NULL);
> --
> 2.35.1
>

I tested patches 1 and 2 on top of 34723f59371f3fd02ea59b94674314b875504426
and it solved the issue.

Tested-by: Nir Soffer 

Nir




[PATCH 2/3] iotests: Test qemu-img checksum

2022-09-01 Thread Nir Soffer
Add simple tests creating an image with all kinds of extents, different
formats, different backing chain, different protocol, and different
image options. Since all images have the same guest visible content they
must have the same checksum.

To help debugging in case of failures, the output includes a json map of
every test image.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/tests/qemu-img-checksum| 149 ++
 .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
 2 files changed, 223 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out

diff --git a/tests/qemu-iotests/tests/qemu-img-checksum 
b/tests/qemu-iotests/tests/qemu-img-checksum
new file mode 100755
index 00..3a85ba33f2
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+# group: rw auto quick
+#
+# Test cases for qemu-img checksum.
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import re
+
+import iotests
+
+from iotests import (
+filter_testfiles,
+qemu_img,
+qemu_img_log,
+qemu_io,
+qemu_nbd_popen,
+)
+
+
+def checksum_available():
+out = qemu_img("--help").stdout
+return re.search(r"\bchecksum .+ filename\b", out) is not None
+
+
+if not checksum_available():
+iotests.notrun("checksum command not available")
+
+iotests.script_initialize(
+supported_fmts=["raw", "qcow2"],
+supported_cache_modes=["none", "writeback"],
+supported_protocols=["file", "nbd"],
+required_fmts=["raw", "qcow2"],
+)
+
+print()
+print("=== Test images ===")
+print()
+
+disk_raw = iotests.file_path('raw')
+qemu_img("create", "-f", "raw", disk_raw, "10m")
+qemu_io("-f", "raw",
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk_raw)
+print(filter_testfiles(disk_raw))
+qemu_img_log("map", "--output", "json", disk_raw)
+
+disk_qcow2 = iotests.file_path('qcow2')
+qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
+qemu_io("-f", "qcow2",
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk_qcow2)
+print(filter_testfiles(disk_qcow2))
+qemu_img_log("map", "--output", "json", disk_qcow2)
+
+disk_compressed = iotests.file_path('compressed')
+qemu_img("convert", "-f", "qcow2", "-O", "qcow2", "-c",
+ disk_qcow2, disk_compressed)
+print(filter_testfiles(disk_compressed))
+qemu_img_log("map", "--output", "json", disk_compressed)
+
+disk_base = iotests.file_path('base')
+qemu_img("create", "-f", "raw", disk_base, "10m")
+qemu_io("-f", "raw",
+"-c", "write -P 0x1 0 2m",
+"-c", "write -P 0x0 2m 2m",
+disk_base)
+print(filter_testfiles(disk_base))
+qemu_img_log("map", "--output", "json", disk_base)
+
+disk_top = iotests.file_path('top')
+qemu_img("create", "-f", "qcow2", "-b", disk_base, "-F", "raw",
+ disk_top)
+qemu_io("-f", "qcow2",
+"-c", "write -z 4m 2m",
+"-c", "write -z -u 6m 2m",
+disk_top)
+print(filter_testfiles(disk_top))
+qemu_img_log("

[PATCH 3/3] qemu-img: Speed up checksum

2022-09-01 Thread Nir Soffer
Add coroutine based loop inspired by `qemu-img convert` design.

Changes compared to `qemu-img convert`:

- State for the entire image is kept in ImgChecksumState

- State for single worker coroutine is kept in ImgChecksumworker.

- "Writes" are always in-order, ensured using a queue.

- Calling block status once per image extent, when the current extent is
  consumed by the workers.

- Using 1m buffer size - testings shows that this gives best read
  performance both with buffered and direct I/O.

- Number of coroutines is not configurable. Testing does not show
  improvement when using more than 8 coroutines.

- Progress include entire image, not only the allocated state.

Comparing to the simple read loop shows that this version is up to 4.67
times faster when computing a checksum for an image full of zeroes. For
real images it is 1.59 times faster with direct I/O, and with buffered
I/O there is no difference.

Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:

| image| size | i/o   | before | after  | change |
|--|--|---||||
| zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67 |
| zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12 |
| real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02 |
| real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59 |
| nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36 |
| nbd  |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02 |

[1] raw image full of zeroes
[2] raw fedora 35 image with additional random data, 50% full
[3] image [2] exported by qemu-nbd via unix socket

Signed-off-by: Nir Soffer 
---
 qemu-img.c | 343 +
 1 file changed, 270 insertions(+), 73 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 7edcfe4bc8..bfa8e2862f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1613,48 +1613,288 @@ out:
 qemu_vfree(buf2);
 blk_unref(blk2);
 out2:
 blk_unref(blk1);
 out3:
 qemu_progress_end();
 return ret;
 }
 
 #ifdef CONFIG_BLKHASH
+
+#define CHECKSUM_COROUTINES 8
+#define CHECKSUM_BUF_SIZE (1 * MiB)
+#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
+
+typedef struct ImgChecksumState ImgChecksumState;
+
+typedef struct ImgChecksumWorker {
+QTAILQ_ENTRY(ImgChecksumWorker) entry;
+ImgChecksumState *state;
+Coroutine *co;
+uint8_t *buf;
+
+/* The current chunk. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+/* Always true for zero extent, false for data extent. Set to true
+ * when reading the chunk completes. */
+bool ready;
+} ImgChecksumWorker;
+
+struct ImgChecksumState {
+const char *filename;
+BlockBackend *blk;
+BlockDriverState *bs;
+int64_t total_size;
+
+/* Current extent, modified in checksum_co_next. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+int running_coroutines;
+CoMutex lock;
+ImgChecksumWorker workers[CHECKSUM_COROUTINES];
+
+/* Ensure in-order updates. Update are scheduled at the tail of the
+ * queue and processed from the head of the queue when a worker is
+ * ready. */
+QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
+
+struct blkhash *hash;
+int ret;
+};
+
+static int checksum_block_status(ImgChecksumState *s)
+{
+int64_t length;
+int status;
+
+/* Must be called when current extent is consumed. */
+assert(s->length == 0);
+
+status = bdrv_block_status_above(s->bs, NULL, s->offset,
+ s->total_size - s->offset, &length, NULL,
+ NULL);
+if (status < 0) {
+error_report("Error checking status at offset %" PRId64 " for %s",
+ s->offset, s->filename);
+s->ret = status;
+return -1;
+}
+
+assert(length > 0);
+
+s->length = length;
+s->zero = !!(status & BDRV_BLOCK_ZERO);
+
+return 0;
+}
+
+/**
+ * Grab the next chunk from the current extent, getting the next extent if
+ * needed, and schecule the next update at the end fo the update queue.
+ *
+ * Retrun true if the worker has work to do, false if the worker has
+ * finished or there was an error getting the next extent.
+ */
+static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
+{
+ImgChecksumState *s = w->state;
+
+qemu_co_mutex_lock(&s->lock);
+
+if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
+qemu_co_mutex_unlock(&s->lock);
+return false;
+}
+
+if (s->length == 0 && checksum_block_status(s)) {
+qemu_co_mutex_unlock(&s->lock);
+return false;
+}
+
+/* Grab one chunk from current extent. */
+w->offset = s->offset;
+w->length = MI

[PATCH 0/3] Add qemu-img checksum command using blkhash

2022-09-01 Thread Nir Soffer
Since blkhash is available only via copr now, the new command is added as
optional feature, built only if blkhash-devel package is installed.

Nir Soffer (3):
  qemu-img: Add checksum command
  iotests: Test qemu-img checksum
  qemu-img: Speed up checksum

 docs/tools/qemu-img.rst   |  22 +
 meson.build   |  10 +-
 meson_options.txt |   2 +
 qemu-img-cmds.hx  |   8 +
 qemu-img.c| 388 ++
 tests/qemu-iotests/tests/qemu-img-checksum| 149 +++
 .../qemu-iotests/tests/qemu-img-checksum.out  |  74 
 7 files changed, 652 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out

-- 
2.37.2




[PATCH 1/3] qemu-img: Add checksum command

2022-09-01 Thread Nir Soffer
The checksum command compute a checksum for disk image content using the
blkhash library[1]. The blkhash library is not packaged yet, but it is
available via copr[2].

Example run:

$ ./qemu-img checksum -p fedora-35.qcow2
6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  
fedora-35.qcow2

The block checksum is constructed by splitting the image to fixed sized
blocks and computing a digest of every block. The image checksum is the
digest of the all block digests.

The checksum uses internally the "sha256" algorithm but it cannot be
compared with checksums created by other tools such as `sha256sum`.

The blkhash library supports sparse images, zero detection, and
optimizes zero block hashing (they are practically free). The library
uses multiple threads to speed up the computation.

Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
faster, depending on the amount of data in the image:

$ ./qemu-img info /scratch/50p.raw
file format: raw
virtual size: 6 GiB (6442450944 bytes)
disk size: 2.91 GiB

$ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
 "sha256sum /scratch/50p.raw"
Benchmark 1: ./qemu-img checksum /scratch/50p.raw
  Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s, System: 0.962 
s]
  Range (min … max):1.813 s …  1.908 s5 runs

Benchmark 2: sha256sum /scratch/50p.raw
  Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s, System: 
1.003 s]
  Range (min … max):   14.501 s … 14.697 s5 runs

Summary
  './qemu-img checksum /scratch/50p.raw' ran
7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'

The new command is available only when `blkhash` is available during
build. To test the new command please install the `blkhash-devel`
package:

$ dnf copr enable nsoffer/blkhash
$ sudo dnf install blkhash-devel

[1] https://gitlab.com/nirs/blkhash
[2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
[3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
sha256sum (estimate): 17,749s

Signed-off-by: Nir Soffer 
---
 docs/tools/qemu-img.rst |  22 +
 meson.build |  10 ++-
 meson_options.txt   |   2 +
 qemu-img-cmds.hx|   8 ++
 qemu-img.c  | 191 
 5 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 85a6e05b35..8be9c45cbf 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -347,20 +347,42 @@ Command description:
 Check completed, image is corrupted
   3
 Check completed, image has leaked clusters, but is not corrupted
   63
 Checks are not supported by the image format
 
   If ``-r`` is specified, exit codes representing the image state refer to the
   state after (the attempt at) repairing it. That is, a successful ``-r all``
   will yield the exit code 0, independently of the image state before.
 
+.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T 
SRC_CACHE] [-p] FILENAME
+
+  Print a checksum for image *FILENAME* guest visible content. Images with
+  different format or settings wil have the same checksum.
+
+  The format is probed unless you specify it by ``-f``.
+
+  The checksum is computed for guest visible content. Allocated areas full of
+  zeroes, zero clusters, and unallocated areas are read as zeros so they will
+  have the same checksum. Images with single or multiple files or backing files
+  will have the same checksums if the guest will see the same content when
+  reading the image.
+
+  Image metadata that is not visible to the guest such as dirty bitmaps does
+  not affect the checksum.
+
+  Computing a checksum requires a read-only image. You cannot compute a
+  checksum of an active image used by a guest, but you can compute a checksum
+  of a guest during pull mode incremental backup using NBD URL.
+
+  The checksum is not compatible with other tools such as *sha256sum*.
+
 .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t 
CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
 
   Commit the changes recorded in *FILENAME* in its base image or backing file.
   If the backing file is smaller than the snapshot, then the backing file will 
be
   resized to be the same size as the snapshot.  If the snapshot is smaller than
   the backing file, the backing file will not be truncated.  If you want the
   backing file to match the size of the smaller snapshot, you can safely 
truncate
   it yourself once the commit operation successfully completes.
 
   The image *FILENAME* is emptied after the operation has succeeded. If you do
diff --git a/meson.build b/meson.build
index 20fddbd707..56b648d8a7 100644
--- a/meson.build
+++ b/meson.build
@@ -727,20 +727,24 @@ if not get_option('curl').a

Re: [PATCH 2/3] iotests: Test qemu-img checksum

2022-10-30 Thread Nir Soffer
On Wed, Oct 26, 2022 at 4:31 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > Add simple tests creating an image with all kinds of extents, different
> > formats, different backing chain, different protocol, and different
> > image options. Since all images have the same guest visible content they
> > must have the same checksum.
> >
> > To help debugging in case of failures, the output includes a json map of
> > every test image.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   tests/qemu-iotests/tests/qemu-img-checksum| 149 ++
> >   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
> >   2 files changed, 223 insertions(+)
> >   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> >   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> >
> > diff --git a/tests/qemu-iotests/tests/qemu-img-checksum
> b/tests/qemu-iotests/tests/qemu-img-checksum
> > new file mode 100755
> > index 00..3a85ba33f2
> > --- /dev/null
> > +++ b/tests/qemu-iotests/tests/qemu-img-checksum
> > @@ -0,0 +1,149 @@
> > +#!/usr/bin/env python3
> > +# group: rw auto quick
> > +#
> > +# Test cases for qemu-img checksum.
> > +#
> > +# Copyright (C) 2022 Red Hat, Inc.
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 2 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > +
> > +import re
> > +
> > +import iotests
> > +
> > +from iotests import (
> > +filter_testfiles,
> > +qemu_img,
> > +qemu_img_log,
> > +qemu_io,
> > +qemu_nbd_popen,
> > +)
> > +
> > +
> > +def checksum_available():
> > +out = qemu_img("--help").stdout
> > +return re.search(r"\bchecksum .+ filename\b", out) is not None
> > +
> > +
> > +if not checksum_available():
> > +iotests.notrun("checksum command not available")
> > +
> > +iotests.script_initialize(
> > +supported_fmts=["raw", "qcow2"],
> > +supported_cache_modes=["none", "writeback"],
>
> It doesn’t work with writeback, though, because it uses -T none below.
>

Good point


>
> Which by the way is a heavy cost, because I usually run tests in tmpfs,
> where this won’t work.  Is there any way of not doing the -T none below?
>

Testing using tempfs is problematic since you cannot test -T none. In oVirt
we alway use /var/tmp which usually uses something that supports direct I/O.

Do we have a way to specify cache mode in the tests, so we can use -T none
only when the option is set?


>
> > +supported_protocols=["file", "nbd"],
> > +required_fmts=["raw", "qcow2"],
> > +)
> > +
> > +print()
> > +print("=== Test images ===")
> > +print()
> > +
> > +disk_raw = iotests.file_path('raw')
> > +qemu_img("create", "-f", "raw", disk_raw, "10m")
> > +qemu_io("-f", "raw",
> > +"-c", "write -P 0x1 0 2m",  # data
> > +"-c", "write -P 0x0 2m 2m", # data with zeroes
> > +"-c", "write -z 4m 2m", # zero allocated
> > +"-c", "write -z -u 6m 2m",  # zero hole
> > +# unallocated
> > +disk_raw)
> > +print(filter_testfiles(disk_raw))
> > +qemu_img_log("map", "--output", "json", disk_raw)
> > +
> > +disk_qcow2 = iotests.file_path('qcow2')
> > +qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
> > +qemu_io("-f", "qcow2",
> > +"-c", "write -P 0x1 0 2m",  # data
> > +"-c", "write -P 0x0 2m 2m", # data with zeroes
> > +

Re: [PATCH 3/3] qemu-img: Speed up checksum

2022-10-30 Thread Nir Soffer
On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > Add coroutine based loop inspired by `qemu-img convert` design.
> >
> > Changes compared to `qemu-img convert`:
> >
> > - State for the entire image is kept in ImgChecksumState
> >
> > - State for single worker coroutine is kept in ImgChecksumworker.
> >
> > - "Writes" are always in-order, ensured using a queue.
> >
> > - Calling block status once per image extent, when the current extent is
> >consumed by the workers.
> >
> > - Using 1m buffer size - testings shows that this gives best read
> >performance both with buffered and direct I/O.
>
> Why does patch 1 then choose to use 2 MB?
>

The first patch uses sync I/O, and in this case 2 MB is a little faster.


> > - Number of coroutines is not configurable. Testing does not show
> >improvement when using more than 8 coroutines.
> >
> > - Progress include entire image, not only the allocated state.
> >
> > Comparing to the simple read loop shows that this version is up to 4.67
> > times faster when computing a checksum for an image full of zeroes. For
> > real images it is 1.59 times faster with direct I/O, and with buffered
> > I/O there is no difference.
> >
> > Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:
> >
> > | image| size | i/o   | before | after  | change
> |
> >
> |--|--|---||||
> > | zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67
> |
> > | zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12
> |
> > | real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02
> |
> > | real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59
> |
> > | nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36
> |
> > | nbd      |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02
> |
> >
> > [1] raw image full of zeroes
> > [2] raw fedora 35 image with additional random data, 50% full
> > [3] image [2] exported by qemu-nbd via unix socket
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   qemu-img.c | 343 +
> >   1 file changed, 270 insertions(+), 73 deletions(-)
>
> Looks good!
>
> Just a couple of style comments below.
>
> > diff --git a/qemu-img.c b/qemu-img.c
> > index 7edcfe4bc8..bfa8e2862f 100644
> > --- a/qemu-img.c
> > +++ b/qemu-img.c
> > @@ -1613,48 +1613,288 @@ out:
> >   qemu_vfree(buf2);
> >   blk_unref(blk2);
> >   out2:
> >   blk_unref(blk1);
> >   out3:
> >   qemu_progress_end();
> >   return ret;
> >   }
> >
> >   #ifdef CONFIG_BLKHASH
> > +
> > +#define CHECKSUM_COROUTINES 8
> > +#define CHECKSUM_BUF_SIZE (1 * MiB)
> > +#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
> > +
> > +typedef struct ImgChecksumState ImgChecksumState;
> > +
> > +typedef struct ImgChecksumWorker {
> > +QTAILQ_ENTRY(ImgChecksumWorker) entry;
> > +ImgChecksumState *state;
> > +Coroutine *co;
> > +uint8_t *buf;
> > +
> > +/* The current chunk. */
> > +int64_t offset;
> > +int64_t length;
> > +bool zero;
> > +
> > +/* Always true for zero extent, false for data extent. Set to true
> > + * when reading the chunk completes. */
>
> Qemu codestyle requires /* and */ to be on separate lines for multi-line
> comments (see checkpatch.pl).
>

I'll change that. Do we have a good way to run checkpatch.pl when using
git-publish?

Maybe a way to run checkpatch.pl on all patches generated by git publish
automatically?


> > +bool ready;
> > +} ImgChecksumWorker;
> > +
> > +struct ImgChecksumState {
> > +const char *filename;
> > +BlockBackend *blk;
> > +BlockDriverState *bs;
> > +int64_t total_size;
> > +
> > +/* Current extent, modified in checksum_co_next. */
> > +int64_t offset;
> > +int64_t length;
> > +bool zero;
> > +
> > +int running_coroutines;
> > +CoMutex lock;
> > +ImgChecksumWorker workers[CHECKSUM_COROUTINES];
> > +
> > +/* Ensure in-order updates. Update are scheduled at the tail of the
> > + * queue and processed from the head of the queue when a worker is
> > + * ready. */
>
> Qemu codestyle requires /* and */ to be on separat

Re: [PATCH 1/3] qemu-img: Add checksum command

2022-10-30 Thread Nir Soffer
On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > The checksum command compute a checksum for disk image content using the
> > blkhash library[1]. The blkhash library is not packaged yet, but it is
> > available via copr[2].
> >
> > Example run:
> >
> >  $ ./qemu-img checksum -p fedora-35.qcow2
> >  6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5
> fedora-35.qcow2
> >
> > The block checksum is constructed by splitting the image to fixed sized
> > blocks and computing a digest of every block. The image checksum is the
> > digest of the all block digests.
> >
> > The checksum uses internally the "sha256" algorithm but it cannot be
> > compared with checksums created by other tools such as `sha256sum`.
> >
> > The blkhash library supports sparse images, zero detection, and
> > optimizes zero block hashing (they are practically free). The library
> > uses multiple threads to speed up the computation.
> >
> > Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
> > faster, depending on the amount of data in the image:
> >
> >  $ ./qemu-img info /scratch/50p.raw
> >  file format: raw
> >  virtual size: 6 GiB (6442450944 bytes)
> >  disk size: 2.91 GiB
> >
> >  $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum
> /scratch/50p.raw" \
> >   "sha256sum /scratch/50p.raw"
> >  Benchmark 1: ./qemu-img checksum /scratch/50p.raw
> >Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s,
> System: 0.962 s]
> >Range (min … max):1.813 s …  1.908 s5 runs
> >
> >  Benchmark 2: sha256sum /scratch/50p.raw
> >Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s,
> System: 1.003 s]
> >Range (min … max):   14.501 s … 14.697 s5 runs
> >
> >  Summary
> >'./qemu-img checksum /scratch/50p.raw' ran
> >  7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'
> >
> > The new command is available only when `blkhash` is available during
> > build. To test the new command please install the `blkhash-devel`
> > package:
> >
> >  $ dnf copr enable nsoffer/blkhash
> >  $ sudo dnf install blkhash-devel
> >
> > [1] https://gitlab.com/nirs/blkhash
> > [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
> > [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
> >  sha256sum (estimate): 17,749s
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   docs/tools/qemu-img.rst |  22 +
> >   meson.build |  10 ++-
> >   meson_options.txt   |   2 +
> >   qemu-img-cmds.hx|   8 ++
> >   qemu-img.c  | 191 
> >   5 files changed, 232 insertions(+), 1 deletion(-)
> >
> > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> > index 85a6e05b35..8be9c45cbf 100644
> > --- a/docs/tools/qemu-img.rst
> > +++ b/docs/tools/qemu-img.rst
> > @@ -347,20 +347,42 @@ Command description:
> >   Check completed, image is corrupted
> > 3
> >   Check completed, image has leaked clusters, but is not corrupted
> > 63
> >   Checks are not supported by the image format
> >
> > If ``-r`` is specified, exit codes representing the image state
> refer to the
> > state after (the attempt at) repairing it. That is, a successful
> ``-r all``
> > will yield the exit code 0, independently of the image state before.
> >
> > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T
> SRC_CACHE] [-p] FILENAME
> > +
> > +  Print a checksum for image *FILENAME* guest visible content.
>
> Why not say which kind of checksum it is?
>

Do you mean the algorithm used? This may be confusing, for example we write

   Print a sha256 checksum ...

User will expect to get the same result from "sha256sum disk.img". How about

   Print a blkhash checksum ...

And add a link to the blkhash project?


>
> >  Images
> with
> > +  different format or settings wil have the same checksum.
>
> s/wil/will/
>

Fixing


>
> > +
> > +  The format is probed unless you specify it by ``-f``.
> > +
> > +  The checksum is computed for guest visible content. Allocated areas
> full of
> > +  zeroes, zero clusters, and unallocated

Re: [PATCH 3/3] qemu-img: Speed up checksum

2022-10-30 Thread Nir Soffer
On Sun, Oct 30, 2022 at 7:38 PM Nir Soffer  wrote:

> On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz  wrote:
>
>> On 01.09.22 16:32, Nir Soffer wrote:
>>
> [...]

> > +/* The current chunk. */
>> > +int64_t offset;
>> > +int64_t length;
>> > +bool zero;
>> > +
>> > +/* Always true for zero extent, false for data extent. Set to true
>> > + * when reading the chunk completes. */
>>
>> Qemu codestyle requires /* and */ to be on separate lines for multi-line
>> comments (see checkpatch.pl).
>>
>
> I'll change that. Do we have a good way to run checkpatch.pl when using
> git-publish?
>
> Maybe a way to run checkpatch.pl on all patches generated by git publish
> automatically?
>

I found
https://blog.vmsplice.net/2011/03/how-to-automatically-run-checkpatchpl.html
and it seems to work well.


Re: [PATCH v2 2/5] Support format or cache specific out file

2022-12-13 Thread Nir Soffer
On Mon, Dec 12, 2022 at 12:38 PM Hanna Reitz  wrote:
>
> On 28.11.22 15:15, Nir Soffer wrote:
> > Extend the test finder to find tests with format (*.out.qcow2) or cache
> > specific (*.out.nocache) out file. This worked before only for the
> > numbered tests.
> > ---
> >   tests/qemu-iotests/findtests.py | 10 --
> >   1 file changed, 8 insertions(+), 2 deletions(-)
>
> This patch lacks an S-o-b, too.
>
> > diff --git a/tests/qemu-iotests/findtests.py 
> > b/tests/qemu-iotests/findtests.py
> > index dd77b453b8..f4344ce78c 100644
> > --- a/tests/qemu-iotests/findtests.py
> > +++ b/tests/qemu-iotests/findtests.py
> > @@ -38,31 +38,37 @@ def chdir(path: Optional[str] = None) -> Iterator[None]:
> >   os.chdir(saved_dir)
> >
> >
> >   class TestFinder:
> >   def __init__(self, test_dir: Optional[str] = None) -> None:
> >   self.groups = defaultdict(set)
> >
> >   with chdir(test_dir):
> >   self.all_tests = glob.glob('[0-9][0-9][0-9]')
> >   self.all_tests += [f for f in glob.iglob('tests/*')
> > -   if not f.endswith('.out') and
> > -   os.path.isfile(f + '.out')]
> > +   if self.is_test(f)]
>
> So previously a file was only considered a test file if there was a
> corresponding reference output file (`f + '.out'`), so files without
> such a reference output aren’t considered test files...
>
> >   for t in self.all_tests:
> >   with open(t, encoding="utf-8") as f:
> >   for line in f:
> >   if line.startswith('# group: '):
> >   for g in line.split()[2:]:
> >   self.groups[g].add(t)
> >   break
> >
> > +def is_test(self, fname: str) -> bool:
> > +"""
> > +The tests directory contains tests (no extension) and out files
> > +(*.out, *.out.{format}, *.out.{option}).
> > +"""
> > +return re.search(r'.+\.out(\.\w+)?$', fname) is None
>
> ...but this new function doesn’t check that.  I think we should check it
> (just whether there’s any variant of `/{fname}\.out(\.\w+)?/` to go with
> `fname`) so that behavior isn’t changed.

This means that you cannot add a test without a *.out* file, which may
 be useful when you don't use the out file for validation, but we can
add this later if needed.

I'll change the code to check both conditions.




Re: [PATCH v2 2/5] Support format or cache specific out file

2022-12-13 Thread Nir Soffer
On Tue, Dec 13, 2022 at 8:09 PM Hanna Reitz  wrote:
>
> On 13.12.22 16:56, Nir Soffer wrote:
> > On Mon, Dec 12, 2022 at 12:38 PM Hanna Reitz  wrote:
> >> On 28.11.22 15:15, Nir Soffer wrote:
> >>> Extend the test finder to find tests with format (*.out.qcow2) or cache
> >>> specific (*.out.nocache) out file. This worked before only for the
> >>> numbered tests.
> >>> ---
> >>>tests/qemu-iotests/findtests.py | 10 --
> >>>1 file changed, 8 insertions(+), 2 deletions(-)
> >> This patch lacks an S-o-b, too.
> >>
> >>> diff --git a/tests/qemu-iotests/findtests.py 
> >>> b/tests/qemu-iotests/findtests.py
> >>> index dd77b453b8..f4344ce78c 100644
> >>> --- a/tests/qemu-iotests/findtests.py
> >>> +++ b/tests/qemu-iotests/findtests.py
> >>> @@ -38,31 +38,37 @@ def chdir(path: Optional[str] = None) -> 
> >>> Iterator[None]:
> >>>os.chdir(saved_dir)
> >>>
> >>>
> >>>class TestFinder:
> >>>def __init__(self, test_dir: Optional[str] = None) -> None:
> >>>self.groups = defaultdict(set)
> >>>
> >>>with chdir(test_dir):
> >>>self.all_tests = glob.glob('[0-9][0-9][0-9]')
> >>>self.all_tests += [f for f in glob.iglob('tests/*')
> >>> -   if not f.endswith('.out') and
> >>> -   os.path.isfile(f + '.out')]
> >>> +   if self.is_test(f)]
> >> So previously a file was only considered a test file if there was a
> >> corresponding reference output file (`f + '.out'`), so files without
> >> such a reference output aren’t considered test files...
> >>
> >>>for t in self.all_tests:
> >>>with open(t, encoding="utf-8") as f:
> >>>for line in f:
> >>>if line.startswith('# group: '):
> >>>for g in line.split()[2:]:
> >>>self.groups[g].add(t)
> >>>break
> >>>
> >>> +def is_test(self, fname: str) -> bool:
> >>> +"""
> >>> +The tests directory contains tests (no extension) and out files
> >>> +(*.out, *.out.{format}, *.out.{option}).
> >>> +"""
> >>> +return re.search(r'.+\.out(\.\w+)?$', fname) is None
> >> ...but this new function doesn’t check that.  I think we should check it
> >> (just whether there’s any variant of `/{fname}\.out(\.\w+)?/` to go with
> >> `fname`) so that behavior isn’t changed.
> > This means that you cannot add a test without a *.out* file, which may
> >   be useful when you don't use the out file for validation, but we can
> > add this later if needed.
>
> I don’t think tests work without a reference output, do they?  At least
> a couple of years ago, the ./check script would refuse to run tests
> without a corresponding .out file.

This may be true, but most tests do not really need an out file and better be
verified by asserting. There are some python tests that have pointless out
file with the output of python unittest:

$ cat tests/qemu-iotests/tests/nbd-multiconn.out
...
--
Ran 3 tests

OK

This is not only unhelpful (update the output when adding a 4th test)
but fragile.
if unitests changes the output, maybe adding info about skipped tests, or
changing "---" to "", the test will break.

But for now I agree the test framework should keep the current behavior.

Nir




Re: [PATCH v2] Consider discard option when writing zeros

2024-06-24 Thread Nir Soffer
On Mon, Jun 24, 2024 at 7:08 PM Kevin Wolf  wrote:

> Am 24.06.2024 um 17:23 hat Stefan Hajnoczi geschrieben:
> > On Wed, Jun 19, 2024 at 08:43:25PM +0300, Nir Soffer wrote:
> > > Tested using:
> >
> > Hi Nir,
> > This looks like a good candidate for the qemu-iotests test suite. Adding
> > it to the automated tests will protect against future regressions.
> >
> > Please add the script and the expected output to
> > tests/qemu-iotests/test/write-zeroes-unmap and run it using
> > `(cd build && tests/qemu-iotests/check write-zeroes-unmap)`.
> >
> > See the existing test cases in tests/qemu-iotests/ and
> > tests/qemu-iotests/tests/ for examples. Some are shell scripts and
> > others are Python. I think shell makes sense for this test case. You
> > can copy the test framework boilerplate from an existing test case.
>
> 'du' can't be used like this in qemu-iotests because it makes
> assumptions that depend on the filesystem. A test case replicating what
> Nir did manually would likely fail on XFS with its preallocation.
>

This is why I did not try to add a new qemu-iotest yet.


> Maybe we could operate on a file exposed by the FUSE export that is
> backed by qcow2, and then you can use 'qemu-img map' on that qcow2 image
> to verify the allocation status. Somewhat complicated, but I think it
> could work.
>

Do we have examples of using the FUSE export? It sounds complicated but
being able to test on any file system is awesome. The complexity can be
hidden behind simple test helpers.

Another option is to use a specific file system created for the tests, for
example
on a loop device. We used userstorage[1] in ovirt to test on specific file
systems
with known sector size.

But more important, are you ok with the change?

I'm not sure about not creating sparse images by default - this is not
consistent
with qemu-img convert and qemu-nbd, which do sparsify by default. The old
behavior seems better.

[1] https://github.com/nirs/userstorage

Nir


Re: [PATCH v2] Consider discard option when writing zeros

2024-06-26 Thread Nir Soffer
On Wed, Jun 26, 2024 at 11:42 AM Kevin Wolf  wrote:

> Am 24.06.2024 um 23:12 hat Nir Soffer geschrieben:
> > On Mon, Jun 24, 2024 at 7:08 PM Kevin Wolf  wrote:
> >
> > > Am 24.06.2024 um 17:23 hat Stefan Hajnoczi geschrieben:
> > > > On Wed, Jun 19, 2024 at 08:43:25PM +0300, Nir Soffer wrote:
> > > > > Tested using:
> > > >
> > > > Hi Nir,
> > > > This looks like a good candidate for the qemu-iotests test suite.
> Adding
> > > > it to the automated tests will protect against future regressions.
> > > >
> > > > Please add the script and the expected output to
> > > > tests/qemu-iotests/test/write-zeroes-unmap and run it using
> > > > `(cd build && tests/qemu-iotests/check write-zeroes-unmap)`.
> > > >
> > > > See the existing test cases in tests/qemu-iotests/ and
> > > > tests/qemu-iotests/tests/ for examples. Some are shell scripts and
> > > > others are Python. I think shell makes sense for this test case. You
> > > > can copy the test framework boilerplate from an existing test case.
> > >
> > > 'du' can't be used like this in qemu-iotests because it makes
> > > assumptions that depend on the filesystem. A test case replicating what
> > > Nir did manually would likely fail on XFS with its preallocation.
> >
> > This is why I did not try to add a new qemu-iotest yet.
> >
> > > Maybe we could operate on a file exposed by the FUSE export that is
> > > backed by qcow2, and then you can use 'qemu-img map' on that qcow2
> image
> > > to verify the allocation status. Somewhat complicated, but I think it
> > > could work.
> >
> > Do we have examples of using the FUSE export? It sounds complicated but
> > being able to test on any file system is awesome. The complexity can be
> > hidden behind simple test helpers.
>
> We seem to have a few tests that use it, and then the fuse protocol
> implementation, too. 308 and file-io-error look relevant.
>
> > Another option is to use a specific file system created for the tests,
> > for example on a loop device. We used userstorage[1] in ovirt to test
> > on specific file systems with known sector size.
>
> Creating loop devices requires root privileges. If I understand
> correctly, userstorage solved that by having a setup phase as root
> before running the tests as a normal user? We don't really have that in
> qemu-iotests.
>
> Some tests require passwordless sudo and are skipped otherwise, but this
> means that in practice they are almost always skipped.
>

Yes, this is the assumption the storage is being created before running the
tests,
for example when setting up a development or CI environment, and the tests
can run with unprivileged user.

> But more important, are you ok with the change?
> >
> > I'm not sure about not creating sparse images by default - this is not
> > consistent with qemu-img convert and qemu-nbd, which do sparsify by
> > default. The old behavior seems better.
>
> Well, your patches make it do what we always claimed it would do, so
> that consistency is certainly a good thing. Unmapping on write_zeroes
> and ignoring truncate is a weird combination anyway that doesn't really
> make any sense to me, so I don't think it's worth preserving. The other
> way around could have been more defensible, but that's not how our bug
> works.
>
> Now, if ignoring all discard requests is a good default these days is a
> separate question and I'm not sure really. Maybe discard=unmap should
> be the default (and apply to both discard are write_zeroes, of course).
>

OK, lets limit the scope to fix the code to match the current docs. We can
tweak
the defaults later.


Re: [PATCH v2] Consider discard option when writing zeros

2024-06-26 Thread Nir Soffer
On Wed, Jun 26, 2024 at 12:17 PM Daniel P. Berrangé 
wrote:

> On Mon, Jun 24, 2024 at 06:08:26PM +0200, Kevin Wolf wrote:
> > Am 24.06.2024 um 17:23 hat Stefan Hajnoczi geschrieben:
> > > On Wed, Jun 19, 2024 at 08:43:25PM +0300, Nir Soffer wrote:
> > > > Tested using:
> > >
> > > Hi Nir,
> > > This looks like a good candidate for the qemu-iotests test suite.
> Adding
> > > it to the automated tests will protect against future regressions.
> > >
> > > Please add the script and the expected output to
> > > tests/qemu-iotests/test/write-zeroes-unmap and run it using
> > > `(cd build && tests/qemu-iotests/check write-zeroes-unmap)`.
> > >
> > > See the existing test cases in tests/qemu-iotests/ and
> > > tests/qemu-iotests/tests/ for examples. Some are shell scripts and
> > > others are Python. I think shell makes sense for this test case. You
> > > can copy the test framework boilerplate from an existing test case.
> >
> > 'du' can't be used like this in qemu-iotests because it makes
> > assumptions that depend on the filesystem. A test case replicating what
> > Nir did manually would likely fail on XFS with its preallocation.
> >
> > Maybe we could operate on a file exposed by the FUSE export that is
> > backed by qcow2, and then you can use 'qemu-img map' on that qcow2 image
> > to verify the allocation status. Somewhat complicated, but I think it
> > could work.
>
> A simpler option would be to use 'du' but with a fuzzy range test,
> rather than an exact equality test.
>
> For the tests which write 1 MB, check the 'du' usage is "at least 1MB",
> for the tests which expect to unmap blocks, check that the 'du' usage
> is "less than 256kb". This should be within bounds of xfs speculative
> allocation.
>

This should work, I'll start with this approach.


Re: [PATCH v2] Consider discard option when writing zeros

2024-06-28 Thread Nir Soffer
On Thu, Jun 27, 2024 at 2:42 PM Kevin Wolf  wrote:

> Am 26.06.2024 um 18:27 hat Nir Soffer geschrieben:
> > On Wed, Jun 26, 2024 at 12:17 PM Daniel P. Berrangé  >
> > wrote:
> >
> > > On Mon, Jun 24, 2024 at 06:08:26PM +0200, Kevin Wolf wrote:
> > > > Am 24.06.2024 um 17:23 hat Stefan Hajnoczi geschrieben:
> > > > > On Wed, Jun 19, 2024 at 08:43:25PM +0300, Nir Soffer wrote:
> > > > > > Tested using:
> > > > >
> > > > > Hi Nir,
> > > > > This looks like a good candidate for the qemu-iotests test suite.
> > > Adding
> > > > > it to the automated tests will protect against future regressions.
> > > > >
> > > > > Please add the script and the expected output to
> > > > > tests/qemu-iotests/test/write-zeroes-unmap and run it using
> > > > > `(cd build && tests/qemu-iotests/check write-zeroes-unmap)`.
> > > > >
> > > > > See the existing test cases in tests/qemu-iotests/ and
> > > > > tests/qemu-iotests/tests/ for examples. Some are shell scripts and
> > > > > others are Python. I think shell makes sense for this test case.
> You
> > > > > can copy the test framework boilerplate from an existing test case.
> > > >
> > > > 'du' can't be used like this in qemu-iotests because it makes
> > > > assumptions that depend on the filesystem. A test case replicating
> what
> > > > Nir did manually would likely fail on XFS with its preallocation.
> > > >
> > > > Maybe we could operate on a file exposed by the FUSE export that is
> > > > backed by qcow2, and then you can use 'qemu-img map' on that qcow2
> image
> > > > to verify the allocation status. Somewhat complicated, but I think it
> > > > could work.
> > >
> > > A simpler option would be to use 'du' but with a fuzzy range test,
> > > rather than an exact equality test.
> > >
> > > For the tests which write 1 MB, check the 'du' usage is "at least 1MB",
> > > for the tests which expect to unmap blocks, check that the 'du' usage
> > > is "less than 256kb". This should be within bounds of xfs speculative
> > > allocation.
> >
> > This should work, I'll start with this approach.
>
> If we're okay with accepting tests that depend on filesystem behaviour,
> then 'qemu-img map -f raw --output=json' should be the less risky
> approach than checking 'du'.
>

Unfortunately it does not work since qemu-img map and qemu-nbd reports the
allocated
area as zero area with no data.

I tried this:

$ cat test-print-allocation.sh
#!/bin/sh

qemu=${1:?Usage: $0 qemu-executable}
img=/tmp/qemu-test-unmap.img

echo
echo "discard=unmap - write zeroes"
fallocate -l 1m $img
echo -e 'qemu-io none0 "write -z 0 1m"\nquit' | $qemu -monitor stdio \
-drive if=none,file=$img,format=raw,discard=unmap >/dev/null

echo "du:"
du -sh $img
echo "qemu-img map:"
qemu-img map -f raw --output json $img
echo "nbdinfo --map:"
nbdinfo --map -- [ qemu-nbd -r -f raw $img ]

echo
echo "discard=unmap - write zeroes unmap"
fallocate -l 1m $img
echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' | $qemu -monitor stdio \
-drive if=none,file=$img,format=raw,discard=unmap >/dev/null

echo "du:"
du -sh $img
echo "qemu-img map:"
qemu-img map -f raw --output json $img
echo "nbdinfo --map:"
nbdinfo --map -- [ qemu-nbd -r -f raw $img ]

rm $img


$ ./test-print-allocation.sh ./qemu-system-x86_64

discard=unmap - write zeroes
du:
1.0M /tmp/qemu-test-unmap.img
qemu-img map:
[{ "start": 0, "length": 1048576, "depth": 0, "present": true, "zero":
true, "data": false, "offset": 0}]
nbdinfo --map:
 0 10485763  hole,zero

discard=unmap - write zeroes unmap
du:
0 /tmp/qemu-test-unmap.img
qemu-img map:
[{ "start": 0, "length": 1048576, "depth": 0, "present": true, "zero":
true, "data": false, "offset": 0}]
nbdinfo --map:
 0 10485763  hole,zero


[PATCH v3 1/2] qemu-iotest/245: Add missing discard=unmap

2024-06-28 Thread Nir Soffer
The test works since we punch holes by default even when opening the
image without discard=on or discard=unmap. Fix the test to enable
discard.
---
 tests/qemu-iotests/245 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index a934c9d1e6..f96610f510 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -590,11 +590,11 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 
 # Insert (and remove) a compress filter
 @iotests.skip_if_unsupported(['compress'])
 def test_insert_compress_filter(self):
 # Add an image to the VM: hd (raw) -> hd0 (qcow2) -> hd0-file (file)
-opts = {'driver': 'raw', 'node-name': 'hd', 'file': hd_opts(0)}
+opts = {'driver': 'raw', 'node-name': 'hd', 'file': hd_opts(0), 
'discard': 'unmap'}
 self.vm.cmd('blockdev-add', conv_keys = False, **opts)
 
 # Add a 'compress' filter
 filter_opts = {'driver': 'compress',
'node-name': 'compress0',
-- 
2.45.2




[PATCH v3 2/2] Consider discard option when writing zeros

2024-06-28 Thread Nir Soffer
When opening an image with discard=off, we punch hole in the image when
writing zeroes, making the image sparse. This breaks users that want to
ensure that writes cannot fail with ENOSPACE by using fully allocated
images[1].

bdrv_co_pwrite_zeroes() correctly disables BDRV_REQ_MAY_UNMAP if we
opened the child without discard=unmap or discard=on. But we don't go
through this function when accessing the top node. Move the check down
to bdrv_co_do_pwrite_zeroes() which seems to be used in all code paths.

This change implements the documented behavior, punching holes only when
opening the image with discard=on or discard=unmap. This may not be the
best default but can improve it later.

The test depends on a file system supporting discard, deallocating the
entire file when punching hole with the length of the entire file.
Tested with xfs, ext4, and tmpfs.

[1] https://lists.nongnu.org/archive/html/qemu-discuss/2024-06/msg3.html

Signed-off-by: Nir Soffer 
---
 block/io.c|   9 +-
 tests/qemu-iotests/tests/write-zeroes-unmap   | 127 ++
 .../qemu-iotests/tests/write-zeroes-unmap.out |  81 +++
 3 files changed, 213 insertions(+), 4 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/write-zeroes-unmap
 create mode 100644 tests/qemu-iotests/tests/write-zeroes-unmap.out

diff --git a/block/io.c b/block/io.c
index 7217cf811b..301514c880 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1860,10 +1860,15 @@ bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 /* By definition there is no user buffer so this flag doesn't make sense */
 if (flags & BDRV_REQ_REGISTERED_BUF) {
 return -EINVAL;
 }
 
+/* If opened with discard=off we should never unmap. */
+if (!(bs->open_flags & BDRV_O_UNMAP)) {
+flags &= ~BDRV_REQ_MAY_UNMAP;
+}
+
 /* Invalidate the cached block-status data range if this write overlaps */
 bdrv_bsc_invalidate_range(bs, offset, bytes);
 
 assert(alignment % bs->bl.request_alignment == 0);
 head = offset % alignment;
@@ -2313,14 +2318,10 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild 
*child, int64_t offset,
 {
 IO_CODE();
 trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
 assert_bdrv_graph_readable();
 
-if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
-flags &= ~BDRV_REQ_MAY_UNMAP;
-}
-
 return bdrv_co_pwritev(child, offset, bytes, NULL,
BDRV_REQ_ZERO_WRITE | flags);
 }
 
 /*
diff --git a/tests/qemu-iotests/tests/write-zeroes-unmap 
b/tests/qemu-iotests/tests/write-zeroes-unmap
new file mode 100755
index 00..7cfeeaf839
--- /dev/null
+++ b/tests/qemu-iotests/tests/write-zeroes-unmap
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+# group: quick
+#
+# Test write zeros unmap.
+#
+# Copyright (C) Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+trap _cleanup_test_img exit
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+create_test_image() {
+_make_test_img -f $IMGFMT 1m
+}
+
+filter_command() {
+_filter_testdir | _filter_qemu_io | _filter_qemu | _filter_hmp
+}
+
+print_disk_usage() {
+du -sh $TEST_IMG | _filter_testdir
+}
+
+echo
+echo "=== defaults - write zeros ==="
+echo
+
+create_test_image
+echo -e 'qemu-io none0 "write -z 0 1m"\nquit' \
+| $QEMU -monitor stdio -drive if=none,file=$TEST_IMG,format=$IMGFMT \
+| filter_command
+print_disk_usage
+
+echo
+echo "=== defaults - write zeros unmap ==="
+echo
+
+create_test_image
+echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' \
+| $QEMU -monitor stdio -drive if=none,file=$TEST_IMG,format=$IMGFMT \
+| filter_command
+print_disk_usage
+
+
+echo
+echo "=== defaults - write actual zeros ==="
+echo
+
+create_test_image
+echo -e 'qemu-io none0 "write -P 0 0 1m"\nquit' \
+| $QEMU -monitor stdio -drive if=none,file=$TEST_IMG,format=$IMGFMT \
+| filter_command
+print_disk_usage
+
+echo
+echo "=== discard=off - write zeroes unmap ==="
+echo
+
+create_test_image
+echo 

[PATCH v3 0/2] Consider discard option when writing zeros

2024-06-28 Thread Nir Soffer
Punch holes only when the image is opened with discard=on or discard=unmap.

Tested by:
- new write-zeroes-unmap iotest on xfs, ext4, and tmpfs
- tests/qemu-iotests/check -raw
- tests/qemu-iotests/check -qcow2

Changes since v2
- Add write-zeroes-unmap iotest
- Fix iotest missing discard=unmap

v2 was here:
https://lists.nongnu.org/archive/html/qemu-block/2024-06/msg00231.html

Nir Soffer (2):
  qemu-iotest/245: Add missing discard=unmap
  Consider discard option when writing zeros

 block/io.c|   9 +-
 tests/qemu-iotests/245|   2 +-
 tests/qemu-iotests/tests/write-zeroes-unmap   | 127 ++
 .../qemu-iotests/tests/write-zeroes-unmap.out |  81 +++
 4 files changed, 214 insertions(+), 5 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/write-zeroes-unmap
 create mode 100644 tests/qemu-iotests/tests/write-zeroes-unmap.out

-- 
2.45.2




Re: [PATCH 3/4] iotests: Change imports for Python 3.13

2024-07-02 Thread Nir Soffer
On Thu, Jun 27, 2024 at 2:23 AM John Snow  wrote:
>
> Python 3.13 isn't out yet, but it's in beta and Fedora is ramping up to
> make it the default system interpreter for Fedora 41.
>
> They moved our cheese for where ContextManager lives; add a conditional
> to locate it while we support both pre-3.9 and 3.13+.
>
> Signed-off-by: John Snow 
> ---
>  tests/qemu-iotests/testenv.py| 7 ++-
>  tests/qemu-iotests/testrunner.py | 9 ++---
>  2 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
> index 588f30a4f14..96d69e56963 100644
> --- a/tests/qemu-iotests/testenv.py
> +++ b/tests/qemu-iotests/testenv.py
> @@ -25,7 +25,12 @@
>  import random
>  import subprocess
>  import glob
> -from typing import List, Dict, Any, Optional, ContextManager
> +from typing import List, Dict, Any, Optional
> +
> +if sys.version_info >= (3, 9):
> +from contextlib import AbstractContextManager as ContextManager
> +else:
> +from typing import ContextManager

It can be cleaner to add a compat module hiding the details so the
entire project
can have a single instance of this. Other code will just use:

from compat import ContextManager

>
>  DEF_GDB_OPTIONS = 'localhost:12345'
>
> diff --git a/tests/qemu-iotests/testrunner.py 
> b/tests/qemu-iotests/testrunner.py
> index 7b322272e92..2e236c8fa39 100644
> --- a/tests/qemu-iotests/testrunner.py
> +++ b/tests/qemu-iotests/testrunner.py
> @@ -27,11 +27,14 @@
>  import shutil
>  import sys
>  from multiprocessing import Pool
> -from typing import List, Optional, Any, Sequence, Dict, \
> -ContextManager
> -
> +from typing import List, Optional, Any, Sequence, Dict
>  from testenv import TestEnv
>
> +if sys.version_info >= (3, 9):
> +from contextlib import AbstractContextManager as ContextManager
> +else:
> +from typing import ContextManager
> +
>
>  def silent_unlink(path: Path) -> None:
>  try:
> --
> 2.45.0
>
>




Re: [PATCH 3/4] iotests: Change imports for Python 3.13

2024-07-02 Thread Nir Soffer

> On 2 Jul 2024, at 17:44, John Snow  wrote:
> 
> 
> 
> On Tue, Jul 2, 2024 at 7:52 AM Nir Soffer  <mailto:nsof...@redhat.com>> wrote:
>> On Thu, Jun 27, 2024 at 2:23 AM John Snow > <mailto:js...@redhat.com>> wrote:
>> >
>> > Python 3.13 isn't out yet, but it's in beta and Fedora is ramping up to
>> > make it the default system interpreter for Fedora 41.
>> >
>> > They moved our cheese for where ContextManager lives; add a conditional
>> > to locate it while we support both pre-3.9 and 3.13+.
>> >
>> > Signed-off-by: John Snow mailto:js...@redhat.com>>
>> > ---
>> >  tests/qemu-iotests/testenv.py| 7 ++-
>> >  tests/qemu-iotests/testrunner.py | 9 ++---
>> >  2 files changed, 12 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
>> > index 588f30a4f14..96d69e56963 100644
>> > --- a/tests/qemu-iotests/testenv.py
>> > +++ b/tests/qemu-iotests/testenv.py
>> > @@ -25,7 +25,12 @@
>> >  import random
>> >  import subprocess
>> >  import glob
>> > -from typing import List, Dict, Any, Optional, ContextManager
>> > +from typing import List, Dict, Any, Optional
>> > +
>> > +if sys.version_info >= (3, 9):
>> > +from contextlib import AbstractContextManager as ContextManager
>> > +else:
>> > +from typing import ContextManager
>> 
>> It can be cleaner to add a compat module hiding the details so the
>> entire project
>> can have a single instance of this. Other code will just use:
>> 
>> from compat import ContextManager
> 
> If there were more than two uses, I'd consider it. As it stands, a compat.py 
> module with just one import conditional in it doesn't seem worth the hassle. 
> Are there more cases of compatibility goop inside iotests that need to be 
> factored out to make it worth it?

I don’t about other. For me even one instance is ugly enough :-)



[PATCH] block/file-posix: Consider discard flag when opening

2024-06-18 Thread Nir Soffer
Set has_discard only when BDRV_O_UNMAP is not set. With this users that
want to keep their images fully allocated can disable hole punching
when writing zeros or discarding using:

   -drive file=thick.img,discard=off

This change is not entirely correct since it changes the default discard
behavior.  Previously we always allowed punching holes, but now you have
must use discard=unmap|on to enable it. We probably need to add the
BDDR_O_UNMAP flag by default.

make check still works, so maybe we don't have tests for sparsifying
images, or maybe you need to run special tests that do not run by
default. We needs tests for keeping images non-sparse.

Signed-off-by: Nir Soffer 
---
 block/file-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index be25e35ff6..acac2abadc 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -738,11 +738,11 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 ret = -EINVAL;
 goto fail;
 }
 #endif /* !defined(CONFIG_LINUX_IO_URING) */
 
-s->has_discard = true;
+s->has_discard = !!(bdrv_flags & BDRV_O_UNMAP);
 s->has_write_zeroes = true;
 
 if (fstat(s->fd, &st) < 0) {
 ret = -errno;
 error_setg_errno(errp, errno, "Could not stat file");
-- 
2.45.1




Re: [PATCH] block/file-posix: Consider discard flag when opening

2024-06-19 Thread Nir Soffer


> On 19 Jun 2024, at 11:16, Kevin Wolf  wrote:
> 
> Am 18.06.2024 um 23:24 hat Nir Soffer geschrieben:
>> Set has_discard only when BDRV_O_UNMAP is not set. With this users that
>> want to keep their images fully allocated can disable hole punching
>> when writing zeros or discarding using:
>> 
>>   -drive file=thick.img,discard=off
>> 
>> This change is not entirely correct since it changes the default discard
>> behavior.  Previously we always allowed punching holes, but now you have
>> must use discard=unmap|on to enable it. We probably need to add the
>> BDDR_O_UNMAP flag by default.
>> 
>> make check still works, so maybe we don't have tests for sparsifying
>> images, or maybe you need to run special tests that do not run by
>> default. We needs tests for keeping images non-sparse.
>> 
>> Signed-off-by: Nir Soffer 
> 
> So first of all, I agree with you that this patch is wrong. ;-)
> 
> At first, I failed to understand the problem this is trying to solve. I
> put a debug message in handle_aiocb_discard() and tried with which
> options it triggers. [1] To me, this looked exactly like it should be.
> We only try to discard blocks when discard=unmap is given as an option.
> 
> That leaves the case of write_zeroes. And while at the first sight, the
> code looked good, we do seem to have a problem there and it tried to
> unmap even with discard=off.
> 
>> block/file-posix.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index be25e35ff6..acac2abadc 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -738,11 +738,11 @@ static int raw_open_common(BlockDriverState *bs, QDict 
>> *options,
>> ret = -EINVAL;
>> goto fail;
>> }
>> #endif /* !defined(CONFIG_LINUX_IO_URING) */
>> 
>> -s->has_discard = true;
>> +s->has_discard = !!(bdrv_flags & BDRV_O_UNMAP);
>> s->has_write_zeroes = true;
>> 
>> if (fstat(s->fd, &st) < 0) {
>> ret = -errno;
>> error_setg_errno(errp, errno, "Could not stat file");
> 
> s->has_discard is about what the host supports, not about the semantics
> of the QEMU block node. So this doesn't feel right to me.
> 
> So for the buggy case, write_zeroes, bdrv_co_pwrite_zeroes() has code
> that considers the case and clears the ~BDRV_REQ_MAY_UNMAP flags:
> 
>if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
>flags &= ~BDRV_REQ_MAY_UNMAP;
>}
> 
> But it turns out that we don't necessarily even go through this function
> for the top node which has discard=off, so it can't take effect:
> 
> (gdb) bt
> #0  0x74f2f144 in __pthread_kill_implementation () at /lib64/libc.so 
> <http://libc.so/>.6
> #1  0x74ed765e in raise () at /lib64/libc.so <http://libc.so/>.6
> #2  0x74ebf902 in abort () at /lib64/libc.so <http://libc.so/>.6
> #3  0x5615aff0 in raw_do_pwrite_zeroes (bs=0x57f4bcf0, offset=0, 
> bytes=1048576, flags=BDRV_REQ_MAY_UNMAP, blkdev=false) at 
> ../block/file-posix.c:3643
> #4  0x5615557e in raw_co_pwrite_zeroes (bs=0x57f4bcf0, offset=0, 
> bytes=1048576, flags=BDRV_REQ_MAY_UNMAP) at ../block/file-posix.c:3655
> #5  0x560cde2a in bdrv_co_do_pwrite_zeroes (bs=0x57f4bcf0, 
> offset=0, bytes=1048576, flags=6) at ../block/io.c:1901
> #6  0x560c72f9 in bdrv_aligned_pwritev (child=0x57f51460, 
> req=0x7fffed5ff800, offset=0, bytes=1048576, align=1, qiov=0x0, 
> qiov_offset=0, flags=6) at ../block/io.c:2100
> #7  0x560c6b41 in bdrv_co_do_zero_pwritev (child=0x57f51460, 
> offset=0, bytes=1048576, flags=6, req=0x7fffed5ff800) at ../block/io.c:2183
> #8  0x560c6647 in bdrv_co_pwritev_part (child=0x57f51460, 
> offset=0, bytes=1048576, qiov=0x0, qiov_offset=0, flags=6) at 
> ../block/io.c:2283
> #9  0x560c634f in bdrv_co_pwritev (child=0x57f51460, offset=0, 
> bytes=1048576, qiov=0x0, flags=6) at ../block/io.c:2216
> #10 0x560c75b5 in bdrv_co_pwrite_zeroes (child=0x57f51460, 
> offset=0, bytes=1048576, flags=BDRV_REQ_MAY_UNMAP) at ../block/io.c:2322
> #11 0x56117d24 in raw_co_pwrite_zeroes (bs=0x57f44980, offset=0, 
> bytes=1048576, flags=BDRV_REQ_MAY_UNMAP) at ../block/raw-format.c:307
> #12 0x560cde2a in bdrv_co_do_pwrite_zeroes (bs=0x57f44980, 
> offset=0, bytes=1048576, flags=6) at ../block/io.c:1901
> #13 0x560c72f9 in bdrv_aligned_pwritev (child=0x57f513f0, 
> req=0x7fffed5ffd90, offset=0, bytes=1048576, al

[PATCH v2] Consider discard option when writing zeros

2024-06-19 Thread Nir Soffer
When opening an image with discard=off, we punch hole in the image when
writing zeroes, making the image sparse. This breaks users that want to
ensure that writes cannot fail with ENOSPACE by using fully allocated
images.

bdrv_co_pwrite_zeroes() correctly disable BDRV_REQ_MAY_UNMAP if we
opened the child without discard=unmap or discard=on. But we don't go
through this function when accessing the top node. Move the check down
to bdrv_co_do_pwrite_zeroes() which seems to be used in all code paths.

Issues:
- We don't punch hole by default, so images are kept allocated. Before
  this change we punched holes by default. I'm not sure this is a good
  change in behavior.
- Need to run all block tests
- Not sure that we have tests covering unmapping, we may need new tests
- We may need new tests to cover this change

Signed-off-by: Nir Soffer 
---

Changes since v1:
- Replace the incorrect has_discard change with the right fix

v1 was here:
https://lists.nongnu.org/archive/html/qemu-block/2024-06/msg00198.html

 block/io.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 7217cf811b..301514c880 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1860,10 +1860,15 @@ bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 /* By definition there is no user buffer so this flag doesn't make sense */
 if (flags & BDRV_REQ_REGISTERED_BUF) {
 return -EINVAL;
 }
 
+/* If opened with discard=off we should never unmap. */
+if (!(bs->open_flags & BDRV_O_UNMAP)) {
+flags &= ~BDRV_REQ_MAY_UNMAP;
+}
+
 /* Invalidate the cached block-status data range if this write overlaps */
 bdrv_bsc_invalidate_range(bs, offset, bytes);
 
 assert(alignment % bs->bl.request_alignment == 0);
 head = offset % alignment;
@@ -2313,14 +2318,10 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild 
*child, int64_t offset,
 {
 IO_CODE();
 trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
 assert_bdrv_graph_readable();
 
-if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
-flags &= ~BDRV_REQ_MAY_UNMAP;
-}
-
 return bdrv_co_pwritev(child, offset, bytes, NULL,
BDRV_REQ_ZERO_WRITE | flags);
 }
 
 /*
-- 
2.45.1




Re: [PATCH v2] Consider discard option when writing zeros

2024-06-19 Thread Nir Soffer
Tested using:

$ cat test-unmap.sh
#!/bin/sh

qemu=${1:?Usage: $0 qemu-executable}
img=/tmp/test.raw

echo
echo "defaults - write zeroes"
fallocate -l 1m $img
echo -e 'qemu-io none0 "write -z 0 1m"\nquit' | $qemu -monitor stdio \
-drive if=none,file=$img,format=raw >/dev/null
du -sh $img

echo
echo "defaults - write zeroes unmap"
fallocate -l 1m $img
echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' | $qemu -monitor stdio \
-drive if=none,file=$img,format=raw >/dev/null
du -sh $img

echo
echo "defaults - write actual zeros"
fallocate -l 1m $img
echo -e 'qemu-io none0 "write -P 0 0 1m"\nquit' | $qemu -monitor stdio \
-drive if=none,file=$img,format=raw >/dev/null
du -sh $img

echo
echo "discard=off - write zeroes unmap"
fallocate -l 1m $img
echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' | $qemu -monitor stdio \
-drive if=none,file=$img,format=raw,discard=off >/dev/null
du -sh $img

echo
echo "detect-zeros=on - write actual zeros"
fallocate -l 1m $img
echo -e 'qemu-io none0 "write -P 0 0 1m"\nquit' | $qemu -monitor stdio \
-drive if=none,file=$img,format=raw,detect-zeroes=on >/dev/null
du -sh $img

echo
echo "detect-zeros=unmap,discard=unmap - write actual zeros"
fallocate -l 1m $img
echo -e 'qemu-io none0 "write -P 0 0 1m"\nquit' |  $qemu -monitor stdio \
-drive if=none,file=$img,format=raw,detect-zeroes=unmap,discard=unmap
>/dev/null
du -sh $img

echo
echo "discard=unmap - write zeroes"
fallocate -l 1m $img
echo -e 'qemu-io none0 "write -z 0 1m"\nquit' | $qemu -monitor stdio \
-drive if=none,file=$img,format=raw,discard=unmap >/dev/null
du -sh $img

echo
echo "discard=unmap - write zeroes unmap"
fallocate -l 1m $img
echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' | $qemu -monitor stdio \
-drive if=none,file=$img,format=raw,discard=unmap >/dev/null
du -sh $img

rm $img


Before this change:

$ cat before.out

defaults - write zeroes
1.0M /tmp/test.raw

defaults - write zeroes unmap
0 /tmp/test.raw

defaults - write actual zeros
1.0M /tmp/test.raw

discard=off - write zeroes unmap
0 /tmp/test.raw

detect-zeros=on - write actual zeros
1.0M /tmp/test.raw

detect-zeros=unmap,discard=unmap - write actual zeros
0 /tmp/test.raw

discard=unmap - write zeroes
1.0M /tmp/test.raw

discard=unmap - write zeroes unmap
0 /tmp/test.raw
[nsoffer build (consider-discard-option)]$


After this change:

$ cat after.out

defaults - write zeroes
1.0M /tmp/test.raw

defaults - write zeroes unmap
1.0M /tmp/test.raw

defaults - write actual zeros
1.0M /tmp/test.raw

discard=off - write zeroes unmap
1.0M /tmp/test.raw

detect-zeros=on - write actual zeros
1.0M /tmp/test.raw

detect-zeros=unmap,discard=unmap - write actual zeros
0 /tmp/test.raw

discard=unmap - write zeroes
1.0M /tmp/test.raw

discard=unmap - write zeroes unmap
0 /tmp/test.raw


Differences:

$ diff -u before.out after.out
--- before.out 2024-06-19 20:24:09.234083713 +0300
+++ after.out 2024-06-19 20:24:20.526165573 +0300
@@ -3,13 +3,13 @@
 1.0M /tmp/test.raw

 defaults - write zeroes unmap
-0 /tmp/test.raw
+1.0M /tmp/test.raw

 defaults - write actual zeros
 1.0M /tmp/test.raw

 discard=off - write zeroes unmap
-0 /tmp/test.raw
+1.0M /tmp/test.raw

On Wed, Jun 19, 2024 at 8:40 PM Nir Soffer  wrote:

> When opening an image with discard=off, we punch hole in the image when
> writing zeroes, making the image sparse. This breaks users that want to
> ensure that writes cannot fail with ENOSPACE by using fully allocated
> images.
>
> bdrv_co_pwrite_zeroes() correctly disable BDRV_REQ_MAY_UNMAP if we
> opened the child without discard=unmap or discard=on. But we don't go
> through this function when accessing the top node. Move the check down
> to bdrv_co_do_pwrite_zeroes() which seems to be used in all code paths.
>
> Issues:
> - We don't punch hole by default, so images are kept allocated. Before
>   this change we punched holes by default. I'm not sure this is a good
>   change in behavior.
> - Need to run all block tests
> - Not sure that we have tests covering unmapping, we may need new tests
> - We may need new tests to cover this change
>
> Signed-off-by: Nir Soffer 
> ---
>
> Changes since v1:
> - Replace the incorrect has_discard change with the right fix
>
> v1 was here:
> https://lists.nongnu.org/archive/html/qemu-block/2024-06/msg00198.html
>
>  block/io.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index 7217cf811b..301514c880 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1860,10 +1860,15 @@ bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
> int64_t offset, int64_t bytes,
> 

Re: [PATCH v2] Consider discard option when writing zeros

2024-06-19 Thread Nir Soffer
On Wed, Jun 19, 2024 at 8:40 PM Nir Soffer  wrote:

> - Need to run all block tests
>

Stale note, make check pass


Re: [PATCH 0/3] Add qemu-img checksum command using blkhash

2022-09-18 Thread Nir Soffer
ping

Kevin, Hanna, I hope you have time to take a look.

https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html


On Thu, Sep 1, 2022 at 5:32 PM Nir Soffer  wrote:
>
> Since blkhash is available only via copr now, the new command is added as
> optional feature, built only if blkhash-devel package is installed.
>
> Nir Soffer (3):
>   qemu-img: Add checksum command
>   iotests: Test qemu-img checksum
>   qemu-img: Speed up checksum
>
>  docs/tools/qemu-img.rst   |  22 +
>  meson.build   |  10 +-
>  meson_options.txt |   2 +
>  qemu-img-cmds.hx  |   8 +
>  qemu-img.c| 388 ++
>  tests/qemu-iotests/tests/qemu-img-checksum| 149 +++
>  .../qemu-iotests/tests/qemu-img-checksum.out  |  74 
>  7 files changed, 652 insertions(+), 1 deletion(-)
>  create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
>  create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
>
> --
> 2.37.2
>




Re: [PATCH 0/3] Add qemu-img checksum command using blkhash

2022-10-18 Thread Nir Soffer
On Sun, Sep 18, 2022 at 12:35 PM Nir Soffer  wrote:

> ping
>
> Kevin, Hanna, I hope you have time to take a look.
>
> https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html


Ping again, hopefully someone has time to look at this :-)


>
>
>
> On Thu, Sep 1, 2022 at 5:32 PM Nir Soffer  wrote:
> >
> > Since blkhash is available only via copr now, the new command is added as
> > optional feature, built only if blkhash-devel package is installed.
> >
> > Nir Soffer (3):
> >   qemu-img: Add checksum command
> >   iotests: Test qemu-img checksum
> >   qemu-img: Speed up checksum
> >
> >  docs/tools/qemu-img.rst   |  22 +
> >  meson.build   |  10 +-
> >  meson_options.txt |   2 +
> >  qemu-img-cmds.hx  |   8 +
> >  qemu-img.c| 388 ++
> >  tests/qemu-iotests/tests/qemu-img-checksum| 149 +++
> >  .../qemu-iotests/tests/qemu-img-checksum.out  |  74 
> >  7 files changed, 652 insertions(+), 1 deletion(-)
> >  create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> >  create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> >
> > --
> > 2.37.2
> >
>


Re: [Libguestfs] [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length

2023-03-06 Thread Nir Soffer
On Sun, Mar 5, 2023 at 10:42 AM Wouter Verhelst  wrote:
>
> On Fri, Mar 03, 2023 at 04:17:40PM -0600, Eric Blake wrote:
> > On Fri, Dec 16, 2022 at 10:32:01PM +0300, Vladimir Sementsov-Ogievskiy 
> > wrote:
> > > s-o-b line missed.
> >
> > I'm not sure if the NBD project has a strict policy on including one,
> > but I don't mind adding it.
>
> I've never required it, mostly because it's something that I myself
> always forget, too, so, *shrug*.
>
> (if there were a way in git to make it add that automatically, that
> would help; I've looked but haven't found it)

What I'm using in all projects that require signed-off-by is:

$ cat .git/hooks/commit-msg
#!/bin/sh

# Add Signed-off-by trailer.
sob=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p')
git interpret-trailers --in-place --trailer "$sob" "$1"

You can also use a pre-commit hook but the commit-msg hook is more
convenient.

And in github you can add the DCO application to the project:
https://github.com/apps/dco

Once installed it will check that all commits are signed off, and
provide helpful error
messages to contributors.

Nir




Re: [PATCH 1/1] block: improve alignment detection and fix 271 test

2023-10-14 Thread Nir Soffer
On Fri, Sep 8, 2023 at 12:54 AM Denis V. Lunev  wrote:

> Unfortunately 271 IO test is broken if started in non-cached mode.
>

Is this a real world issue? For example in oVirt you cannot create a disk
with
size < 4k so there is no way that 4k is not a good alignment.

Should we fix the test to reflect real world usage?

_reset_img 2083k

I guess it works with:

_reset_img 2084k

Commits
> commit a6b257a08e3d72219f03e461a52152672fec0612
>     Author: Nir Soffer 
> Date:   Tue Aug 13 21:21:03 2019 +0300
> file-posix: Handle undetectable alignment
> and
> commit 9c60a5d1978e6dcf85c0e01b50e6f7f54ca09104
> Author: Kevin Wolf 
> Date:   Thu Jul 16 16:26:00 2020 +0200
> block: Require aligned image size to avoid assertion failure
> have interesting side effect if used togather.
>
> If the image size is not multiple of 4k and that image falls under
> original constraints of Nil's patch, the image can not be opened
> due to the check in the bdrv_check_perm().
>
> The patch tries to satisfy the requirements of bdrv_check_perm()
> inside raw_probe_alignment(). This is at my opinion better that just
> disallowing to run that test in non-cached mode. The operation is legal
> by itself.
>
> Signed-off-by: Denis V. Lunev 
> CC: Nir Soffer 
> CC: Kevin Wolf 
> CC: Hanna Reitz 
> CC: Alberto Garcia 
> ---
>  block/file-posix.c | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index b16e9c21a1..988cfdc76c 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -447,8 +447,21 @@ static void raw_probe_alignment(BlockDriverState *bs,
> int fd, Error **errp)
>  for (i = 0; i < ARRAY_SIZE(alignments); i++) {
>  align = alignments[i];
>  if (raw_is_io_aligned(fd, buf, align)) {
> -/* Fallback to safe value. */
> -bs->bl.request_alignment = (align != 1) ? align :
> max_align;
> +if (align != 1) {
> +bs->bl.request_alignment = align;
> +break;
> +}
> +/*
> + * Fallback to safe value. max_align is perfect, but the
> size of the device must be multiple of
> + * the virtual length of the device. In the other case we
> will get a error in
> + * bdrv_node_refresh_perm().
> + */
> +for (align = max_align; align > 1; align /= 2) {
> +if ((bs->total_sectors * BDRV_SECTOR_SIZE) % align ==
> 0) {
>

Moving image size calculation out of the loop would make the intent of the
code
more clear:

if (image_size % align == 0) {

Since qemu does not enforce image size alignment, I can see how you create
a 512 bytes
aligned image and in the case when qemu cannot detect the alignment, we end
with
align = 4k. In this case this loop would select align = 512, but with the
image aligned to
some strange value, this loop may select align = 2 or some other value that
does not
make sense.

So I can see using 4k or 512 bytes as a good fallback value, but anything
else should not
be possible, so maybe we should fix this in bdrv_check_perm()?

Nir


Re: [Qemu-devel] [PATCH 0/3] qemu-img raw preallocation

2017-02-22 Thread Nir Soffer
On Wed, Feb 22, 2017 at 2:31 PM, Kevin Wolf  wrote:
> Am 17.02.2017 um 01:51 hat Nir Soffer geschrieben:
>> This series add missing tests for raw image preallocation, refine
>> preallocation=full and improve documentation.
>>
>> Create on top of the commit   10ddfe7b6044 (qemu-img: Do not truncate
>> before preallocation) from Kevin block branch.
>
> Thanks, applied to the block branch.
>
> I changed the commit message of patch 2 so that it doesn't mention the
> commit ID. This is because the commit ID is only stable once the commit
> has made it into master. I also added a few words ("...so don't do that
> here") to the comment in patch 3 because outside the context of the
> patch, talking about a truncation seemed weird when there is no
> truncation happening. I hope you're okay with these changes.

Looks fine, thanks!



Re: [Qemu-devel] [PATCH] qmp-shell: add persistent command history

2017-03-01 Thread Nir Soffer
On Wed, Mar 1, 2017 at 9:44 PM, John Snow  wrote:
>
> Use the existing readline history function we are utilizing
> to provide persistent command history across instances of qmp-shell.
>
> This assists entering debug commands across sessions that may be
> interrupted by QEMU sessions terminating, where the qmp-shell has
> to be relaunched.
>
> Signed-off-by: John Snow 
> ---
>  scripts/qmp/qmp-shell | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
> index 0373b24..b19f44b 100755
> --- a/scripts/qmp/qmp-shell
> +++ b/scripts/qmp/qmp-shell
> @@ -70,6 +70,7 @@ import json
>  import ast
>  import readline
>  import sys
> +import os
>
>  class QMPCompleter(list):
>  def complete(self, text, state):
> @@ -109,6 +110,8 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>  self._pretty = pretty
>  self._transmode = False
>  self._actions = list()
> +self._histfile = os.path.join(os.path.expanduser('~'),
> +  '.qmp_history')

This can be little bit more readable in one line

>
>  def __get_address(self, arg):
>  """
> @@ -137,6 +140,16 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>  # XXX: default delimiters conflict with some command names (eg. 
> query-),
>  # clearing everything as it doesn't seem to matter
>  readline.set_completer_delims('')
> +try:
> +readline.read_history_file(self._histfile)
> +except:
> +pass

This hides all errors, including KeyboardInterrupt and SystemExit, and will
make debugging impossible.

It looks like you want to ignore missing history file, but this way we also
ignore permission error or even typo in the code. For example this will
fail silently:

try:
readdline.read_history_file(self._histfile)
except:
pass

The docs do not specify the possible errors, but the code is raising IOError:
https://github.com/python/cpython/blob/2.7/Modules/readline.c#L126

So it would be best to handle only IOError, and ignore ENOENT. Any other
error should fail in a visible way.

> +
> +def __save_history(self):
> +try:
> +readline.write_history_file(self._histfile)
> +except:
> +pass

Same, but I'm not sure what errors should be ignored. Do we want to silently
ignore a read only file system? no space?

I think a safe way would be to print a warning if the history file
cannot be saved
with the text from the IOError.

>
>  def __parse_value(self, val):
>  try:
> @@ -244,6 +257,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>  print 'command format:  ',
>  print '[arg-name1=arg1] ... [arg-nameN=argN]'
>  return True
> +self.__save_history()

This will save the history after every command, making error handling
more complicated, and also unneeded, since we don't care about history
if you kill the qmp-shell process, right?

We can invoke readline.write_history_file() using atexit. This is also
what the docs suggest, see:
https://docs.python.org/2/library/readline.html#example

Nir

>  # For transaction mode, we may have just cached the action:
>  if qmpcmd is None:
>  return True
> --
> 2.9.3
>
>



Re: [Qemu-devel] [PATCH] qmp-shell: add persistent command history

2017-03-02 Thread Nir Soffer
On Thu, Mar 2, 2017 at 12:19 AM, John Snow  wrote:
>
>
> On 03/01/2017 05:01 PM, Nir Soffer wrote:
>> On Wed, Mar 1, 2017 at 9:44 PM, John Snow  wrote:
>>>
>>> Use the existing readline history function we are utilizing
>>> to provide persistent command history across instances of qmp-shell.
>>>
>>> This assists entering debug commands across sessions that may be
>>> interrupted by QEMU sessions terminating, where the qmp-shell has
>>> to be relaunched.
>>>
>>> Signed-off-by: John Snow 
>>> ---
>>>  scripts/qmp/qmp-shell | 14 ++
>>>  1 file changed, 14 insertions(+)
>>>
>>> diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
>>> index 0373b24..b19f44b 100755
>>> --- a/scripts/qmp/qmp-shell
>>> +++ b/scripts/qmp/qmp-shell
>>> @@ -70,6 +70,7 @@ import json
>>>  import ast
>>>  import readline
>>>  import sys
>>> +import os
>>>
>>>  class QMPCompleter(list):
>>>  def complete(self, text, state):
>>> @@ -109,6 +110,8 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>>>  self._pretty = pretty
>>>  self._transmode = False
>>>  self._actions = list()
>>> +self._histfile = os.path.join(os.path.expanduser('~'),
>>> +  '.qmp_history')
>>
>> This can be little bit more readable in one line
>>
>
> I thought I was over 80, but maybe not.
>
>>>
>>>  def __get_address(self, arg):
>>>  """
>>> @@ -137,6 +140,16 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>>>  # XXX: default delimiters conflict with some command names (eg. 
>>> query-),
>>>  # clearing everything as it doesn't seem to matter
>>>  readline.set_completer_delims('')
>>> +try:
>>> +readline.read_history_file(self._histfile)
>>> +except:
>>> +pass
>>
>> This hides all errors, including KeyboardInterrupt and SystemExit, and will
>> make debugging impossible.
>>
>
> Indeed, I want to ignore errors related to a missing history file. It
> wasn't documented, and this isn't an important feature (for a shell
> script only used for debugging), so I went with the dumb thing.
>
>> It looks like you want to ignore missing history file, but this way we also
>> ignore permission error or even typo in the code. For example this will
>> fail silently:
>>
>> try:
>> readdline.read_history_file(self._histfile)
>> except:
>> pass
>>
>> The docs do not specify the possible errors, but the code is raising IOError:
>> https://github.com/python/cpython/blob/2.7/Modules/readline.c#L126
>>
>> So it would be best to handle only IOError, and ignore ENOENT. Any other
>> error should fail in a visible way.
>>
>
> Maybe not "fail," but perhaps "warn." This feature is not so important
> that it should inhibit normal operation.

Yes, warning  with the text from the IOError should be best.

>
>>> +
>>> +def __save_history(self):
>>> +try:
>>> +readline.write_history_file(self._histfile)
>>> +except:
>>> +pass
>>
>> Same, but I'm not sure what errors should be ignored. Do we want to silently
>> ignore a read only file system? no space?
>>
>
> Pretty much my thought, yes. I could "warn" on the first failure and
> then stifle subsequent ones. I don't want this to be an irritant.
>
>> I think a safe way would be to print a warning if the history file
>> cannot be saved
>> with the text from the IOError.
>>
>>>
>>>  def __parse_value(self, val):
>>>  try:
>>> @@ -244,6 +257,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>>>  print 'command format:  ',
>>>  print '[arg-name1=arg1] ... [arg-nameN=argN]'
>>>  return True
>>> +self.__save_history()
>>
>> This will save the history after every command, making error handling
>> more complicated, and also unneeded, since we don't care about history
>> if you kill the qmp-shell process, right?
>>
>
> I suppose so. My thought was more along the lines of: "If the program
> explodes, I'd like to have the intervening history saved."

Python programs do not explode in this way usually.

> I didn't
> think this would complicate performance of a debugging tool.
>
> Why do you feel this would make error handling more complicated?

Because we have to handle errors on each command, instead of once
during exit.

> Why do you think we wouldn't care about the history if we kill the
> qmp-shell process?

We care about the history, but do you expect that the program will not
handle SIGTERM properly often?

>> We can invoke readline.write_history_file() using atexit. This is also
>> what the docs suggest, see:
>> https://docs.python.org/2/library/readline.html#example
>>
>> Nir
>>
>>>  # For transaction mode, we may have just cached the action:
>>>  if qmpcmd is None:
>>>  return True
>>> --
>>> 2.9.3
>>>
>>>



[Qemu-devel] [PATCH] qemu-img: Do not truncate before preallocation

2017-01-27 Thread Nir Soffer
From: Nir Soffer 

When using file system that does not support fallocate(),
posix_fallocate() fallback to emulation mode. In this mode, when
preallocating blocks before file end, posix_preallocate is calling
one pread() and one pwrite() per block. But when preallocation blocks
after file end, it calls only one pwrite per block.

Truncating the file only when preallocation=OFF speeds up creating raw
file in this situation.

Here are example run with without and with this change, tested on Fedora
25 VM, creating a raw image on NFS version 3 mount over 1G nic:

$ time ./qemu-img create -f raw -o preallocation=falloc mnt/test 1g
Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc

real 0m17.083s
user 0m0.020s
sys 0m0.404s

$ rm mnt/test
$ time ./qemu-img create -f raw -o preallocation=falloc mnt/test 1g
Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc

real 0m12.372s
user 0m0.020s
sys 0m0.376s

$ strace ./qemu-img-up create -f raw -o preallocation=falloc mnt/test 8192
...
pread64(9, "\0", 1, 4095)   = 1
pwrite64(9, "\0", 1, 4095)  = 1
pread64(9, "\0", 1, 8191)   = 1
pwrite64(9, "\0", 1, 8191)  = 1

$ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192
...
pwrite64(9, "\0", 1, 4095)      = 1
pwrite64(9, "\0", 1, 8191)  = 1

Signed-off-by: Nir Soffer 
---
 block/file-posix.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 28b47d9..d7f6129 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1588,12 +1588,6 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 #endif
 }
 
-if (ftruncate(fd, total_size) != 0) {
-result = -errno;
-error_setg_errno(errp, -result, "Could not resize file");
-goto out_close;
-}
-
 switch (prealloc) {
 #ifdef CONFIG_POSIX_FALLOCATE
 case PREALLOC_MODE_FALLOC:
@@ -1633,6 +1627,10 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 break;
 }
 case PREALLOC_MODE_OFF:
+if (ftruncate(fd, total_size) != 0) {
+result = -errno;
+error_setg_errno(errp, -result, "Could not resize file");
+}
 break;
 default:
 result = -EINVAL;
@@ -1641,7 +1639,6 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 break;
 }
 
-out_close:
 if (qemu_close(fd) != 0 && result == 0) {
 result = -errno;
 error_setg_errno(errp, -result, "Could not close the new file");
-- 
2.9.3




[Qemu-devel] [PATCH v2 1/2] qemu-io: Return non-zero exit code on failure

2017-01-27 Thread Nir Soffer
From: Nir Soffer 

The result of openfile was not checked, leading to failure deep in the
actual command with confusing error message, and exiting with exit code 0.

Here is a simple example - trying to read with the wrong format:

$ touch file
$ qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
no file open, try 'help open'
0

With this patch, we fail earlier with exit code 1:

$ ./qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
1

Signed-off-by: Nir Soffer 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---

Changes since v1:
- Improve commit message
- Add regression tests

 qemu-io.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 23a229f..427cbae 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -595,13 +595,17 @@ int main(int argc, char **argv)
 exit(1);
 }
 opts = qemu_opts_to_qdict(qopts, NULL);
-openfile(NULL, flags, writethrough, opts);
+if (openfile(NULL, flags, writethrough, opts)) {
+exit(1);
+}
 } else {
 if (format) {
 opts = qdict_new();
 qdict_put(opts, "driver", qstring_from_str(format));
 }
-openfile(argv[optind], flags, writethrough, opts);
+if (openfile(argv[optind], flags, writethrough, opts)) {
+exit(1);
+}
 }
 }
 command_loop();
-- 
2.9.3




[Qemu-devel] [PATCH v2 2/2] qemu-io: Add regression tests

2017-01-27 Thread Nir Soffer
From: Nir Soffer 

Add regression tests checking that qemu-io fail with non-zero exit code
when reading non-exising file or using the wrong format.
---
 tests/qemu-iotests/173 | 59 ++
 tests/qemu-iotests/173.out |  9 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 69 insertions(+)
 create mode 100755 tests/qemu-iotests/173
 create mode 100644 tests/qemu-iotests/173.out

diff --git a/tests/qemu-iotests/173 b/tests/qemu-iotests/173
new file mode 100755
index 000..1d1fd6d
--- /dev/null
+++ b/tests/qemu-iotests/173
@@ -0,0 +1,59 @@
+#!/bin/bash
+#
+# Test that qemu-io fail with non-zero exit code
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=nir...@gmail.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+
+
+size=256K
+_make_test_img $size
+
+echo
+echo "== reading wrong format should fail =="
+$QEMU_IO -f qcow2 -c "read 0 $size" "$TEST_IMG" 2>&1 | _filter_testdir
+test "${PIPESTATUS[0]}" -eq 1 || _fail "did not fail"
+
+echo
+echo "== reading missing file should fail =="
+$QEMU_IO -c "read 0 $size" "$TEST_DIR/missing" 2>&1 | _filter_testdir
+test "${PIPESTATUS[0]}" -eq 1 || _fail "did not fail"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/173.out b/tests/qemu-iotests/173.out
new file mode 100644
index 000..47012a3
--- /dev/null
+++ b/tests/qemu-iotests/173.out
@@ -0,0 +1,9 @@
+QA output created by 173
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=262144
+
+== reading wrong format should fail ==
+can't open device TEST_DIR/t.raw: Image is not in qcow2 format
+
+== reading missing file should fail ==
+can't open device TEST_DIR/missing: Could not open 'TEST_DIR/missing': No such 
file or directory
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 866c1a0..069a5f3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -165,3 +165,4 @@
 170 rw auto quick
 171 rw auto quick
 172 auto
+173 auto
-- 
2.9.3




[Qemu-devel] [PATCH v3 1/3] qemu-io: Return non-zero exit code on failure

2017-01-27 Thread Nir Soffer
From: Nir Soffer 

The result of openfile was not checked, leading to failure deep in the
actual command with confusing error message, and exiting with exit code 0.

Here is a simple example - trying to read with the wrong format:

$ touch file
$ qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
no file open, try 'help open'
0

With this patch, we fail earlier with exit code 1:

$ ./qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
1

Signed-off-by: Nir Soffer 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---

Changes since v2:
- Adding missing signed-off-by
- Fix tests expecting the wrong output

 qemu-io.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 23a229f..427cbae 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -595,13 +595,17 @@ int main(int argc, char **argv)
 exit(1);
 }
 opts = qemu_opts_to_qdict(qopts, NULL);
-openfile(NULL, flags, writethrough, opts);
+if (openfile(NULL, flags, writethrough, opts)) {
+exit(1);
+}
 } else {
 if (format) {
 opts = qdict_new();
 qdict_put(opts, "driver", qstring_from_str(format));
 }
-openfile(argv[optind], flags, writethrough, opts);
+if (openfile(argv[optind], flags, writethrough, opts)) {
+exit(1);
+}
 }
 }
 command_loop();
-- 
2.9.3




[Qemu-devel] [PATCH v3 3/3] qemu-io: Fix tests expecting the wrong output

2017-01-27 Thread Nir Soffer
From: Nir Soffer 

Many tests expected the wrong behavior when qemu-io call into the
command with after failing to open the file, writing this error:

no file open, try 'help open'

Now that we fail immediately when opening a file fails, this error does
not exist in the output; remove it from tests output.

Tested using:

./check 059 -vmdk (unrelated failure)
./check 070 -vhdx
./check 075 -cloop
./check 076 -parallels
./check 078 -bochs
./check 080 -qcow2
./check 083 -nbd
./check 088 -vpc
./check 092 -qcow
./check 116 -qed
./check 131 -parallels
./check 140 -raw
./check 140 -qcow2
./check -raw
./check -qcow2

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/059.out |  3 ---
 tests/qemu-iotests/070.out |  1 -
 tests/qemu-iotests/075.out |  7 ---
 tests/qemu-iotests/076.out |  3 ---
 tests/qemu-iotests/078.out |  6 --
 tests/qemu-iotests/080.out | 18 --
 tests/qemu-iotests/083.out | 17 -
 tests/qemu-iotests/088.out |  6 --
 tests/qemu-iotests/092.out | 12 
 tests/qemu-iotests/116.out |  7 ---
 tests/qemu-iotests/131.out |  1 -
 tests/qemu-iotests/140.out |  1 -
 12 files changed, 82 deletions(-)

diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
index 678adb4..898b528 100644
--- a/tests/qemu-iotests/059.out
+++ b/tests/qemu-iotests/059.out
@@ -3,17 +3,14 @@ QA output created by 059
 === Testing invalid granularity ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: Invalid granularity, image may be corrupt
-no file open, try 'help open'
 
 === Testing too big L2 table size ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: L2 table size too big
-no file open, try 'help open'
 
 === Testing too big L1 table size ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: L1 size too big
-no file open, try 'help open'
 
 === Testing monolithicFlat creation and opening ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2147483648 
subformat=monolithicFlat
diff --git a/tests/qemu-iotests/070.out b/tests/qemu-iotests/070.out
index 131a5b1..c269d99 100644
--- a/tests/qemu-iotests/070.out
+++ b/tests/qemu-iotests/070.out
@@ -4,7 +4,6 @@ QA output created by 070
 can't open device TEST_DIR/iotest-dirtylog-10G-4M.vhdx: VHDX image file 
'TEST_DIR/iotest-dirtylog-10G-4M.vhdx' opened read-only, but contains a log 
that needs to be replayed
 To replay the log, run:
 qemu-img check -r all 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx'
- no file open, try 'help open'
 === Verify open image replays log  ===
 read 18874368/18874368 bytes at offset 0
 18 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/075.out b/tests/qemu-iotests/075.out
index 87beae4..b234b75 100644
--- a/tests/qemu-iotests/075.out
+++ b/tests/qemu-iotests/075.out
@@ -10,29 +10,22 @@ read 512/512 bytes at offset 1048064
 
 == block_size must be a multiple of 512 ==
 can't open device TEST_DIR/simple-pattern.cloop: block_size 513 must be a 
multiple of 512
-no file open, try 'help open'
 
 == block_size cannot be zero ==
 can't open device TEST_DIR/simple-pattern.cloop: block_size cannot be zero
-no file open, try 'help open'
 
 == huge block_size ===
 can't open device TEST_DIR/simple-pattern.cloop: block_size 4294966784 must be 
64 MB or less
-no file open, try 'help open'
 
 == offsets_size overflow ===
 can't open device TEST_DIR/simple-pattern.cloop: n_blocks 4294967295 must be 
536870911 or less
-no file open, try 'help open'
 
 == refuse images that require too many offsets ===
 can't open device TEST_DIR/simple-pattern.cloop: image requires too many 
offsets, try increasing block size
-no file open, try 'help open'
 
 == refuse images with non-monotonically increasing offsets ==
 can't open device TEST_DIR/simple-pattern.cloop: offsets not monotonically 
increasing at index 1, image file is corrupt
-no file open, try 'help open'
 
 == refuse images with invalid compressed block size ==
 can't open device TEST_DIR/simple-pattern.cloop: invalid compressed block size 
at index 1, image file is corrupt
-no file open, try 'help open'
 *** done
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index 72645b2..9c66c5f 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -6,15 +6,12 @@ read 65536/65536 bytes at offset 0
 
 == Negative catalog size ==
 can't open device TEST_DIR/parallels-v1: Catalog too large
-no file open, try 'help open'
 
 == Overflow in catalog allocation ==
 can't open device TEST_DIR/parallels-v1: Catalog too large
-no file open, try 'help open'
 
 == Zero sectors per track ==
 can't open

[Qemu-devel] [PATCH v3 2/3] qemu-io: Add regression tests

2017-01-27 Thread Nir Soffer
From: Nir Soffer 

Add regression tests checking that qemu-io fail with non-zero exit code
when reading non-existing file or using the wrong format.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/173 | 59 ++
 tests/qemu-iotests/173.out |  9 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 69 insertions(+)
 create mode 100755 tests/qemu-iotests/173
 create mode 100644 tests/qemu-iotests/173.out

diff --git a/tests/qemu-iotests/173 b/tests/qemu-iotests/173
new file mode 100755
index 000..1d1fd6d
--- /dev/null
+++ b/tests/qemu-iotests/173
@@ -0,0 +1,59 @@
+#!/bin/bash
+#
+# Test that qemu-io fail with non-zero exit code
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=nir...@gmail.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+
+
+size=256K
+_make_test_img $size
+
+echo
+echo "== reading wrong format should fail =="
+$QEMU_IO -f qcow2 -c "read 0 $size" "$TEST_IMG" 2>&1 | _filter_testdir
+test "${PIPESTATUS[0]}" -eq 1 || _fail "did not fail"
+
+echo
+echo "== reading missing file should fail =="
+$QEMU_IO -c "read 0 $size" "$TEST_DIR/missing" 2>&1 | _filter_testdir
+test "${PIPESTATUS[0]}" -eq 1 || _fail "did not fail"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/173.out b/tests/qemu-iotests/173.out
new file mode 100644
index 000..47012a3
--- /dev/null
+++ b/tests/qemu-iotests/173.out
@@ -0,0 +1,9 @@
+QA output created by 173
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=262144
+
+== reading wrong format should fail ==
+can't open device TEST_DIR/t.raw: Image is not in qcow2 format
+
+== reading missing file should fail ==
+can't open device TEST_DIR/missing: Could not open 'TEST_DIR/missing': No such 
file or directory
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 866c1a0..069a5f3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -165,3 +165,4 @@
 170 rw auto quick
 171 rw auto quick
 172 auto
+173 auto
-- 
2.9.3




[Qemu-devel] [PATCH v3 2/3] qemu-io: Add regression tests

2017-01-30 Thread Nir Soffer
From: Nir Soffer 

Add regression tests checking that qemu-io fail with non-zero exit code
when reading non-existing file or using the wrong format.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/173 | 59 ++
 tests/qemu-iotests/173.out |  9 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 69 insertions(+)
 create mode 100755 tests/qemu-iotests/173
 create mode 100644 tests/qemu-iotests/173.out

diff --git a/tests/qemu-iotests/173 b/tests/qemu-iotests/173
new file mode 100755
index 000..1d1fd6d
--- /dev/null
+++ b/tests/qemu-iotests/173
@@ -0,0 +1,59 @@
+#!/bin/bash
+#
+# Test that qemu-io fail with non-zero exit code
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=nir...@gmail.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+
+
+size=256K
+_make_test_img $size
+
+echo
+echo "== reading wrong format should fail =="
+$QEMU_IO -f qcow2 -c "read 0 $size" "$TEST_IMG" 2>&1 | _filter_testdir
+test "${PIPESTATUS[0]}" -eq 1 || _fail "did not fail"
+
+echo
+echo "== reading missing file should fail =="
+$QEMU_IO -c "read 0 $size" "$TEST_DIR/missing" 2>&1 | _filter_testdir
+test "${PIPESTATUS[0]}" -eq 1 || _fail "did not fail"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/173.out b/tests/qemu-iotests/173.out
new file mode 100644
index 000..47012a3
--- /dev/null
+++ b/tests/qemu-iotests/173.out
@@ -0,0 +1,9 @@
+QA output created by 173
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=262144
+
+== reading wrong format should fail ==
+can't open device TEST_DIR/t.raw: Image is not in qcow2 format
+
+== reading missing file should fail ==
+can't open device TEST_DIR/missing: Could not open 'TEST_DIR/missing': No such 
file or directory
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 866c1a0..069a5f3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -165,3 +165,4 @@
 170 rw auto quick
 171 rw auto quick
 172 auto
+173 auto
-- 
2.9.3




[Qemu-devel] [PATCH v3 1/3] qemu-io: Return non-zero exit code on failure

2017-01-30 Thread Nir Soffer
From: Nir Soffer 

The result of openfile was not checked, leading to failure deep in the
actual command with confusing error message, and exiting with exit code 0.

Here is a simple example - trying to read with the wrong format:

$ touch file
$ qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
no file open, try 'help open'
0

With this patch, we fail earlier with exit code 1:

$ ./qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
1

Signed-off-by: Nir Soffer 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---

Changes since v2:
- Adding missing signed-off-by
- Fix tests expecting the wrong output

 qemu-io.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 23a229f..427cbae 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -595,13 +595,17 @@ int main(int argc, char **argv)
 exit(1);
 }
 opts = qemu_opts_to_qdict(qopts, NULL);
-openfile(NULL, flags, writethrough, opts);
+if (openfile(NULL, flags, writethrough, opts)) {
+exit(1);
+}
 } else {
 if (format) {
 opts = qdict_new();
 qdict_put(opts, "driver", qstring_from_str(format));
 }
-openfile(argv[optind], flags, writethrough, opts);
+if (openfile(argv[optind], flags, writethrough, opts)) {
+exit(1);
+}
 }
 }
 command_loop();
-- 
2.9.3




[Qemu-devel] [PATCH v3 3/3] qemu-io: Fix tests expecting the wrong output

2017-01-30 Thread Nir Soffer
From: Nir Soffer 

Many tests expected the wrong behavior when qemu-io call into the
command with after failing to open the file, writing this error:

no file open, try 'help open'

Now that we fail immediately when opening a file fails, this error does
not exist in the output; remove it from tests output.

Tested using:

./check 059 -vmdk (unrelated failure)
./check 070 -vhdx
./check 075 -cloop
./check 076 -parallels
./check 078 -bochs
./check 080 -qcow2
./check 083 -nbd
./check 088 -vpc
./check 092 -qcow
./check 116 -qed
./check 131 -parallels
./check 140 -raw
./check 140 -qcow2
./check -raw
./check -qcow2

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/059.out |  3 ---
 tests/qemu-iotests/070.out |  1 -
 tests/qemu-iotests/075.out |  7 ---
 tests/qemu-iotests/076.out |  3 ---
 tests/qemu-iotests/078.out |  6 --
 tests/qemu-iotests/080.out | 18 --
 tests/qemu-iotests/083.out | 17 -
 tests/qemu-iotests/088.out |  6 --
 tests/qemu-iotests/092.out | 12 
 tests/qemu-iotests/116.out |  7 ---
 tests/qemu-iotests/131.out |  1 -
 tests/qemu-iotests/140.out |  1 -
 12 files changed, 82 deletions(-)

diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
index 678adb4..898b528 100644
--- a/tests/qemu-iotests/059.out
+++ b/tests/qemu-iotests/059.out
@@ -3,17 +3,14 @@ QA output created by 059
 === Testing invalid granularity ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: Invalid granularity, image may be corrupt
-no file open, try 'help open'
 
 === Testing too big L2 table size ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: L2 table size too big
-no file open, try 'help open'
 
 === Testing too big L1 table size ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: L1 size too big
-no file open, try 'help open'
 
 === Testing monolithicFlat creation and opening ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2147483648 
subformat=monolithicFlat
diff --git a/tests/qemu-iotests/070.out b/tests/qemu-iotests/070.out
index 131a5b1..c269d99 100644
--- a/tests/qemu-iotests/070.out
+++ b/tests/qemu-iotests/070.out
@@ -4,7 +4,6 @@ QA output created by 070
 can't open device TEST_DIR/iotest-dirtylog-10G-4M.vhdx: VHDX image file 
'TEST_DIR/iotest-dirtylog-10G-4M.vhdx' opened read-only, but contains a log 
that needs to be replayed
 To replay the log, run:
 qemu-img check -r all 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx'
- no file open, try 'help open'
 === Verify open image replays log  ===
 read 18874368/18874368 bytes at offset 0
 18 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/075.out b/tests/qemu-iotests/075.out
index 87beae4..b234b75 100644
--- a/tests/qemu-iotests/075.out
+++ b/tests/qemu-iotests/075.out
@@ -10,29 +10,22 @@ read 512/512 bytes at offset 1048064
 
 == block_size must be a multiple of 512 ==
 can't open device TEST_DIR/simple-pattern.cloop: block_size 513 must be a 
multiple of 512
-no file open, try 'help open'
 
 == block_size cannot be zero ==
 can't open device TEST_DIR/simple-pattern.cloop: block_size cannot be zero
-no file open, try 'help open'
 
 == huge block_size ===
 can't open device TEST_DIR/simple-pattern.cloop: block_size 4294966784 must be 
64 MB or less
-no file open, try 'help open'
 
 == offsets_size overflow ===
 can't open device TEST_DIR/simple-pattern.cloop: n_blocks 4294967295 must be 
536870911 or less
-no file open, try 'help open'
 
 == refuse images that require too many offsets ===
 can't open device TEST_DIR/simple-pattern.cloop: image requires too many 
offsets, try increasing block size
-no file open, try 'help open'
 
 == refuse images with non-monotonically increasing offsets ==
 can't open device TEST_DIR/simple-pattern.cloop: offsets not monotonically 
increasing at index 1, image file is corrupt
-no file open, try 'help open'
 
 == refuse images with invalid compressed block size ==
 can't open device TEST_DIR/simple-pattern.cloop: invalid compressed block size 
at index 1, image file is corrupt
-no file open, try 'help open'
 *** done
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index 72645b2..9c66c5f 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -6,15 +6,12 @@ read 65536/65536 bytes at offset 0
 
 == Negative catalog size ==
 can't open device TEST_DIR/parallels-v1: Catalog too large
-no file open, try 'help open'
 
 == Overflow in catalog allocation ==
 can't open device TEST_DIR/parallels-v1: Catalog too large
-no file open, try 'help open'
 
 == Zero sectors per track ==
 can't open

Re: [Qemu-devel] [PATCH v3 1/3] qemu-io: Return non-zero exit code on failure

2017-01-31 Thread Nir Soffer
On Mon, Jan 30, 2017 at 6:44 PM, Eric Blake  wrote:
> On 01/27/2017 09:59 PM, Nir Soffer wrote:
>> From: Nir Soffer 
>>
>> The result of openfile was not checked, leading to failure deep in the
>> actual command with confusing error message, and exiting with exit code 0.
>>
>
> When posting a series, please ensure that your messages are all marked
> In-Reply-To a 0/3 cover letter (it may help if you do 'git config
> format.coverletter auto').
>
>> Here is a simple example - trying to read with the wrong format:
>>
>> $ touch file
>> $ qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
>> can't open device file: Image is not in qcow2 format
>> no file open, try 'help open'
>> 0
>>
>> With this patch, we fail earlier with exit code 1:
>>
>>     $ ./qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
>> can't open device file: Image is not in qcow2 format
>> 1
>>
>> Signed-off-by: Nir Soffer 
>> Reviewed-by: Eric Blake 
>> Reviewed-by: Fam Zheng 
>> ---
>>
>> Changes since v2:
>> - Adding missing signed-off-by
>> - Fix tests expecting the wrong output
>
> I don't see any tests changed...
>
>>
>>  qemu-io.c | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> ...in this diffstat.  If something really changed in this particular
> patch since v2, then you should drop the Reviewed-by lines in order to
> make sure I re-review it.  Or, if the changes you mention here are to
> other patches in the series, then the 0/3 cover letter would have been a
> better place to put that information.

This diffstat is a poor man cover letter,  I'll resend a proper one.

>
>>
>> diff --git a/qemu-io.c b/qemu-io.c
>> index 23a229f..427cbae 100644
>> --- a/qemu-io.c
>> +++ b/qemu-io.c
>> @@ -595,13 +595,17 @@ int main(int argc, char **argv)
>>  exit(1);
>>  }
>>  opts = qemu_opts_to_qdict(qopts, NULL);
>> -openfile(NULL, flags, writethrough, opts);
>> +if (openfile(NULL, flags, writethrough, opts)) {
>> +exit(1);
>> +}
>>  } else {
>>  if (format) {
>>  opts = qdict_new();
>>  qdict_put(opts, "driver", qstring_from_str(format));
>>  }
>> -openfile(argv[optind], flags, writethrough, opts);
>> +if (openfile(argv[optind], flags, writethrough, opts)) {
>> +exit(1);
>> +}
>>  }
>>  }
>>  command_loop();
>>
>
> At any rate, I'm happy with this current patch, even if its presentation
> in a series is less than ideal, so you can keep my R-b.
>
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>



[Qemu-devel] [PATCH v4 0/2] Fix qemu-io return value on failure

2017-01-31 Thread Nir Soffer
This series fix qemu-io to fail with non zero exit code when failing to open
the file.

Changes since v3:
- Add cover letter
- Squash the tests fix with the behavior change, so git bisect landing on the
  change in behavior does not hit unnecessarily-broken tests.

Tested by running qemu-io manually and by running tests/check-block.sh.
Note that test 059 has one unrelated test failure.

Nir Soffer (2):
  qemu-io: Return non-zero exit code on failure
  qemu-io: Add regression tests

 qemu-io.c  |  8 +--
 tests/qemu-iotests/059.out |  3 ---
 tests/qemu-iotests/070.out |  1 -
 tests/qemu-iotests/075.out |  7 --
 tests/qemu-iotests/076.out |  3 ---
 tests/qemu-iotests/078.out |  6 -
 tests/qemu-iotests/080.out | 18 --
 tests/qemu-iotests/083.out | 17 -
 tests/qemu-iotests/088.out |  6 -
 tests/qemu-iotests/092.out | 12 --
 tests/qemu-iotests/116.out |  7 --
 tests/qemu-iotests/131.out |  1 -
 tests/qemu-iotests/140.out |  1 -
 tests/qemu-iotests/173 | 59 ++
 tests/qemu-iotests/173.out |  9 +++
 tests/qemu-iotests/group   |  1 +
 16 files changed, 75 insertions(+), 84 deletions(-)
 create mode 100755 tests/qemu-iotests/173
 create mode 100644 tests/qemu-iotests/173.out

-- 
2.9.3




[Qemu-devel] [PATCH v4 2/2] qemu-io: Add regression tests

2017-01-31 Thread Nir Soffer
From: Nir Soffer 

Add regression tests checking that qemu-io fail with non-zero exit code
when reading non-existing file or using the wrong format.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/173 | 59 ++
 tests/qemu-iotests/173.out |  9 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 69 insertions(+)
 create mode 100755 tests/qemu-iotests/173
 create mode 100644 tests/qemu-iotests/173.out

diff --git a/tests/qemu-iotests/173 b/tests/qemu-iotests/173
new file mode 100755
index 000..1d1fd6d
--- /dev/null
+++ b/tests/qemu-iotests/173
@@ -0,0 +1,59 @@
+#!/bin/bash
+#
+# Test that qemu-io fail with non-zero exit code
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=nir...@gmail.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+
+
+size=256K
+_make_test_img $size
+
+echo
+echo "== reading wrong format should fail =="
+$QEMU_IO -f qcow2 -c "read 0 $size" "$TEST_IMG" 2>&1 | _filter_testdir
+test "${PIPESTATUS[0]}" -eq 1 || _fail "did not fail"
+
+echo
+echo "== reading missing file should fail =="
+$QEMU_IO -c "read 0 $size" "$TEST_DIR/missing" 2>&1 | _filter_testdir
+test "${PIPESTATUS[0]}" -eq 1 || _fail "did not fail"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/173.out b/tests/qemu-iotests/173.out
new file mode 100644
index 000..47012a3
--- /dev/null
+++ b/tests/qemu-iotests/173.out
@@ -0,0 +1,9 @@
+QA output created by 173
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=262144
+
+== reading wrong format should fail ==
+can't open device TEST_DIR/t.raw: Image is not in qcow2 format
+
+== reading missing file should fail ==
+can't open device TEST_DIR/missing: Could not open 'TEST_DIR/missing': No such 
file or directory
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 866c1a0..069a5f3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -165,3 +165,4 @@
 170 rw auto quick
 171 rw auto quick
 172 auto
+173 auto
-- 
2.9.3




[Qemu-devel] [PATCH v4 1/2] qemu-io: Return non-zero exit code on failure

2017-01-31 Thread Nir Soffer
From: Nir Soffer 

The result of openfile was not checked, leading to failure deep in the
actual command with confusing error message, and exiting with exit code 0.

Here is a simple example - trying to read with the wrong format:

$ touch file
$ qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
no file open, try 'help open'
0

With this patch, we fail earlier with exit code 1:

$ ./qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
1

Failing earlier, we don't log this error now:

no file open, try 'help open'

But some tests expected it; the line was removed from the test output.

Signed-off-by: Nir Soffer 
---
 qemu-io.c  |  8 ++--
 tests/qemu-iotests/059.out |  3 ---
 tests/qemu-iotests/070.out |  1 -
 tests/qemu-iotests/075.out |  7 ---
 tests/qemu-iotests/076.out |  3 ---
 tests/qemu-iotests/078.out |  6 --
 tests/qemu-iotests/080.out | 18 --
 tests/qemu-iotests/083.out | 17 -
 tests/qemu-iotests/088.out |  6 --
 tests/qemu-iotests/092.out | 12 
 tests/qemu-iotests/116.out |  7 ---
 tests/qemu-iotests/131.out |  1 -
 tests/qemu-iotests/140.out |  1 -
 13 files changed, 6 insertions(+), 84 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 23a229f..427cbae 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -595,13 +595,17 @@ int main(int argc, char **argv)
 exit(1);
 }
 opts = qemu_opts_to_qdict(qopts, NULL);
-openfile(NULL, flags, writethrough, opts);
+if (openfile(NULL, flags, writethrough, opts)) {
+exit(1);
+}
 } else {
 if (format) {
 opts = qdict_new();
 qdict_put(opts, "driver", qstring_from_str(format));
 }
-openfile(argv[optind], flags, writethrough, opts);
+if (openfile(argv[optind], flags, writethrough, opts)) {
+exit(1);
+}
 }
 }
 command_loop();
diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
index 678adb4..898b528 100644
--- a/tests/qemu-iotests/059.out
+++ b/tests/qemu-iotests/059.out
@@ -3,17 +3,14 @@ QA output created by 059
 === Testing invalid granularity ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: Invalid granularity, image may be corrupt
-no file open, try 'help open'
 
 === Testing too big L2 table size ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: L2 table size too big
-no file open, try 'help open'
 
 === Testing too big L1 table size ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: L1 size too big
-no file open, try 'help open'
 
 === Testing monolithicFlat creation and opening ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2147483648 
subformat=monolithicFlat
diff --git a/tests/qemu-iotests/070.out b/tests/qemu-iotests/070.out
index 131a5b1..c269d99 100644
--- a/tests/qemu-iotests/070.out
+++ b/tests/qemu-iotests/070.out
@@ -4,7 +4,6 @@ QA output created by 070
 can't open device TEST_DIR/iotest-dirtylog-10G-4M.vhdx: VHDX image file 
'TEST_DIR/iotest-dirtylog-10G-4M.vhdx' opened read-only, but contains a log 
that needs to be replayed
 To replay the log, run:
 qemu-img check -r all 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx'
- no file open, try 'help open'
 === Verify open image replays log  ===
 read 18874368/18874368 bytes at offset 0
 18 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/075.out b/tests/qemu-iotests/075.out
index 87beae4..b234b75 100644
--- a/tests/qemu-iotests/075.out
+++ b/tests/qemu-iotests/075.out
@@ -10,29 +10,22 @@ read 512/512 bytes at offset 1048064
 
 == block_size must be a multiple of 512 ==
 can't open device TEST_DIR/simple-pattern.cloop: block_size 513 must be a 
multiple of 512
-no file open, try 'help open'
 
 == block_size cannot be zero ==
 can't open device TEST_DIR/simple-pattern.cloop: block_size cannot be zero
-no file open, try 'help open'
 
 == huge block_size ===
 can't open device TEST_DIR/simple-pattern.cloop: block_size 4294966784 must be 
64 MB or less
-no file open, try 'help open'
 
 == offsets_size overflow ===
 can't open device TEST_DIR/simple-pattern.cloop: n_blocks 4294967295 must be 
536870911 or less
-no file open, try 'help open'
 
 == refuse images that require too many offsets ===
 can't open device TEST_DIR/simple-pattern.cloop: image requires too many 
offsets, try increasing block size
-no file open, try 'help o

[Qemu-devel] [PATCH v5 0/3] Fix qemu-io return value on failure

2017-01-31 Thread Nir Soffer
This series fix qemu-io to fail with non zero exit code when failing to open
the file.

Changes since v4:
- Added _unsupported_fmt helper
- Test any format except raw, instead of only raw
- Don't test stderr content, depends on the format
- New test move to 174 since 173 is pending
- Private copyright for new test
- Fix commit message issues

Changes since v3:
- Add cover letter
- Squash the tests fix with the behavior change, so git bisect landing on the
  change in behavior does not hit unnecessarily-broken tests.

Tested by running qemu-io manually and by running tests/check-block.sh.
Note that test 059 has one unrelated test failure.

Nir Soffer (3):
  qemu-io: Return non-zero exit code on failure
  qemu-iotests: Add _unsupported_fmt helper
  qemu-io: Add failure regression tests

 qemu-io.c|  8 --
 tests/qemu-iotests/059.out   |  3 ---
 tests/qemu-iotests/070.out   |  1 -
 tests/qemu-iotests/075.out   |  7 --
 tests/qemu-iotests/076.out   |  3 ---
 tests/qemu-iotests/078.out   |  6 -
 tests/qemu-iotests/080.out   | 18 --
 tests/qemu-iotests/083.out   | 17 -
 tests/qemu-iotests/088.out   |  6 -
 tests/qemu-iotests/092.out   | 12 -
 tests/qemu-iotests/116.out   |  7 --
 tests/qemu-iotests/131.out   |  1 -
 tests/qemu-iotests/140.out   |  1 -
 tests/qemu-iotests/174   | 59 
 tests/qemu-iotests/174.out   |  7 ++
 tests/qemu-iotests/common.rc | 11 +
 tests/qemu-iotests/group |  1 +
 17 files changed, 84 insertions(+), 84 deletions(-)
 create mode 100755 tests/qemu-iotests/174
 create mode 100644 tests/qemu-iotests/174.out

-- 
2.9.3




[Qemu-devel] [PATCH v5 2/3] qemu-iotests: Add _unsupported_fmt helper

2017-01-31 Thread Nir Soffer
This helper allows adding tests supporting any format expect the
specified formats. This may be useful to test that many formats behave
in a common way.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/common.rc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 3213765..c6d5d81 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -355,6 +355,17 @@ _supported_fmt()
 _notrun "not suitable for this image format: $IMGFMT"
 }
 
+# tests whether $IMGFMT is one of the unsupported image format for a test
+#
+_unsupported_fmt()
+{
+for f; do
+if [ "$f" = "$IMGFMT" ]; then
+_notrun "not suitable for this image format: $IMGFMT"
+fi
+done
+}
+
 # tests whether $IMGPROTO is one of the supported image protocols for a test
 #
 _supported_proto()
-- 
2.9.3




[Qemu-devel] [PATCH v5 1/3] qemu-io: Return non-zero exit code on failure

2017-01-31 Thread Nir Soffer
The result of openfile was not checked, leading to failure deep in the
actual command with confusing error message, and exiting with exit code 0.

Here is a simple example - trying to read with the wrong format:

$ touch file
$ qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
no file open, try 'help open'
0

With this patch, we fail earlier with exit code 1:

$ ./qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
1

Failing earlier, we don't log this error now:

no file open, try 'help open'

But some tests expected it; the line was removed from the test output.

Signed-off-by: Nir Soffer 
Reviewed-by: Eric Blake 
---
 qemu-io.c  |  8 ++--
 tests/qemu-iotests/059.out |  3 ---
 tests/qemu-iotests/070.out |  1 -
 tests/qemu-iotests/075.out |  7 ---
 tests/qemu-iotests/076.out |  3 ---
 tests/qemu-iotests/078.out |  6 --
 tests/qemu-iotests/080.out | 18 --
 tests/qemu-iotests/083.out | 17 -
 tests/qemu-iotests/088.out |  6 --
 tests/qemu-iotests/092.out | 12 
 tests/qemu-iotests/116.out |  7 ---
 tests/qemu-iotests/131.out |  1 -
 tests/qemu-iotests/140.out |  1 -
 13 files changed, 6 insertions(+), 84 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 23a229f..427cbae 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -595,13 +595,17 @@ int main(int argc, char **argv)
 exit(1);
 }
 opts = qemu_opts_to_qdict(qopts, NULL);
-openfile(NULL, flags, writethrough, opts);
+if (openfile(NULL, flags, writethrough, opts)) {
+exit(1);
+}
 } else {
 if (format) {
 opts = qdict_new();
 qdict_put(opts, "driver", qstring_from_str(format));
 }
-openfile(argv[optind], flags, writethrough, opts);
+if (openfile(argv[optind], flags, writethrough, opts)) {
+exit(1);
+}
 }
 }
 command_loop();
diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
index 678adb4..898b528 100644
--- a/tests/qemu-iotests/059.out
+++ b/tests/qemu-iotests/059.out
@@ -3,17 +3,14 @@ QA output created by 059
 === Testing invalid granularity ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: Invalid granularity, image may be corrupt
-no file open, try 'help open'
 
 === Testing too big L2 table size ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: L2 table size too big
-no file open, try 'help open'
 
 === Testing too big L1 table size ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: L1 size too big
-no file open, try 'help open'
 
 === Testing monolithicFlat creation and opening ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2147483648 
subformat=monolithicFlat
diff --git a/tests/qemu-iotests/070.out b/tests/qemu-iotests/070.out
index 131a5b1..c269d99 100644
--- a/tests/qemu-iotests/070.out
+++ b/tests/qemu-iotests/070.out
@@ -4,7 +4,6 @@ QA output created by 070
 can't open device TEST_DIR/iotest-dirtylog-10G-4M.vhdx: VHDX image file 
'TEST_DIR/iotest-dirtylog-10G-4M.vhdx' opened read-only, but contains a log 
that needs to be replayed
 To replay the log, run:
 qemu-img check -r all 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx'
- no file open, try 'help open'
 === Verify open image replays log  ===
 read 18874368/18874368 bytes at offset 0
 18 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/075.out b/tests/qemu-iotests/075.out
index 87beae4..b234b75 100644
--- a/tests/qemu-iotests/075.out
+++ b/tests/qemu-iotests/075.out
@@ -10,29 +10,22 @@ read 512/512 bytes at offset 1048064
 
 == block_size must be a multiple of 512 ==
 can't open device TEST_DIR/simple-pattern.cloop: block_size 513 must be a 
multiple of 512
-no file open, try 'help open'
 
 == block_size cannot be zero ==
 can't open device TEST_DIR/simple-pattern.cloop: block_size cannot be zero
-no file open, try 'help open'
 
 == huge block_size ===
 can't open device TEST_DIR/simple-pattern.cloop: block_size 4294966784 must be 
64 MB or less
-no file open, try 'help open'
 
 == offsets_size overflow ===
 can't open device TEST_DIR/simple-pattern.cloop: n_blocks 4294967295 must be 
536870911 or less
-no file open, try 'help open'
 
 == refuse images that require too many offsets ===
 can't open device TEST_DIR/simple-pattern.cloop: image requires too many 
offsets, try increasing block size
-no file open, try 'help o

[Qemu-devel] [PATCH v5 3/3] qemu-io: Add failure regression tests

2017-01-31 Thread Nir Soffer
Add regression tests checking that qemu-io fails with non-zero exit code
when reading non-existing file or using the wrong image format.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/174 | 59 ++
 tests/qemu-iotests/174.out |  7 ++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 67 insertions(+)
 create mode 100755 tests/qemu-iotests/174
 create mode 100644 tests/qemu-iotests/174.out

diff --git a/tests/qemu-iotests/174 b/tests/qemu-iotests/174
new file mode 100755
index 000..c1c20a1
--- /dev/null
+++ b/tests/qemu-iotests/174
@@ -0,0 +1,59 @@
+#!/bin/bash
+#
+# Test that qemu-io fail with non-zero exit code
+#
+# Copyright (C) 2017 Nir Soffer 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=nir...@gmail.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_unsupported_fmt raw
+
+
+size=256K
+IMGFMT=raw IMGOPTS= _make_test_img $size | _filter_imgfmt
+
+echo
+echo "== reading wrong format should fail =="
+$QEMU_IO -f $IMGFMT -c "read 0 $size" "$TEST_IMG" 2>/dev/null
+test $? -eq 1 || _fail "did not fail"
+
+echo
+echo "== reading missing file should fail =="
+$QEMU_IO -c "read 0 $size" "$TEST_DIR/missing" 2>/dev/null
+test $? -eq 1 || _fail "did not fail"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/174.out b/tests/qemu-iotests/174.out
new file mode 100644
index 000..a06d237
--- /dev/null
+++ b/tests/qemu-iotests/174.out
@@ -0,0 +1,7 @@
+QA output created by 174
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=262144
+
+== reading wrong format should fail ==
+
+== reading missing file should fail ==
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 866c1a0..1732a8b 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -165,3 +165,4 @@
 170 rw auto quick
 171 rw auto quick
 172 auto
+174 auto
-- 
2.9.3




[Qemu-devel] [PATCH] qemu-img: Do not truncate before preallocation

2017-02-03 Thread Nir Soffer
When using file system that does not support fallocate() (e.g. NFS <
4.2), truncating the file only when preallocation=OFF speeds up creating
raw file.

Here is example run, tested on Fedora 24 machine, creating raw file on
NFS version 3 server.

$ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g
Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc

real0m21.185s
user0m0.022s
sys 0m0.574s

$ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g
Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc

real0m11.601s
user0m0.016s
sys 0m0.525s

$ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s

real0m16.104s
user0m0.009s
sys 0m0.220s

Running with strace we can see that without this change we do one
pread() and one pwrite() for each block. With this change, we do only
one pwrite() per block.

$ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 8192
...
pread64(9, "\0", 1, 4095)   = 1
pwrite64(9, "\0", 1, 4095)  = 1
pread64(9, "\0", 1, 8191)   = 1
pwrite64(9, "\0", 1, 8191)  = 1

$ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192
...
pwrite64(9, "\0", 1, 4095)  = 1
pwrite64(9, "\0", 1, 8191)  = 1

This happens because posix_fallocate is checking if each block is
allocated before writing a byte to the block, and when truncating the
file before preallocation, all blocks are unallocated.

Signed-off-by: Nir Soffer 
---

I sent this a week ago:
http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg06123.html

Sending again with improved commit message.

 block/file-posix.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 2134e0e..442f080 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1591,12 +1591,6 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 #endif
 }
 
-if (ftruncate(fd, total_size) != 0) {
-result = -errno;
-error_setg_errno(errp, -result, "Could not resize file");
-goto out_close;
-}
-
 switch (prealloc) {
 #ifdef CONFIG_POSIX_FALLOCATE
 case PREALLOC_MODE_FALLOC:
@@ -1636,6 +1630,10 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 break;
 }
 case PREALLOC_MODE_OFF:
+if (ftruncate(fd, total_size) != 0) {
+result = -errno;
+error_setg_errno(errp, -result, "Could not resize file");
+}
 break;
 default:
 result = -EINVAL;
@@ -1644,7 +1642,6 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 break;
 }
 
-out_close:
 if (qemu_close(fd) != 0 && result == 0) {
 result = -errno;
 error_setg_errno(errp, -result, "Could not close the new file");
-- 
2.9.3




Re: [Qemu-devel] [PATCH v3 2/3] qemu-io: Add regression tests

2017-02-06 Thread Nir Soffer
This was created by mistake with --no-thread and no cover letter, I
was confused by the instructions in the wiki, warning not to send
multiple patches in the same thread.

I already sent v4 and v5 properly.

Thanks for the comments,
Nir

On Mon, Feb 6, 2017 at 12:20 PM, Fam Zheng  wrote:
> On Sat, 01/28 05:59, Nir Soffer wrote:
>> From: Nir Soffer 
>>
>> Add regression tests checking that qemu-io fail with non-zero exit code
>> when reading non-existing file or using the wrong format.
>>
>> Signed-off-by: Nir Soffer 
>
> This message is not correctly threaded as a reply to a v3 cover letter, and 
> it's
> hard to review. Please check your git command lines comform to the 
> instructions
> in
>
> http://wiki.qemu-project.org/Contribute/SubmitAPatch#Submitting_your_Patches
>
> Particularly, --cover-letter and --thread should be used in git-format-email.
>
> Fam



Re: [Qemu-devel] Estimation of qcow2 image size converted from raw image

2017-02-15 Thread Nir Soffer
On Wed, Feb 15, 2017 at 5:14 PM, Stefan Hajnoczi  wrote:
> On Mon, Feb 13, 2017 at 05:46:19PM +0200, Maor Lipchuk wrote:
>> I was wondering if that is possible to provide a new API that
>> estimates the size of
>> qcow2 image converted from a raw image. We could use this new API to
>> allocate the
>> size more precisely before the convert operation.
>>
> [...]
>> We think that the best way to solve this issue is to return this info
>> from qemu-img, maybe as a flag to qemu-img convert that will
>> calculate the size of the converted image without doing any writes.
>
> Sounds reasonable.  qcow2 actually already does some of this calculation
> internally for image preallocation in qcow2_create2().
>
> Let's try this syntax:
>
>   $ qemu-img query-max-size -f raw -O qcow2 input.raw
>   1234678000

This is little bit verbose compared to other commands
(e.g. info, check, convert)

Since this is needed only during convert, maybe this can be
a convert flag?

qemu-img convert -f xxx -O yyy src dst --estimate-size --output json
{
"estimated size": 1234678000
}

> As John explained, it is only an estimate.  But it will be a
> conservative maximum.
>
> Internally BlockDriver needs a new interface:
>
> struct BlockDriver {
> /*
>  * Return a conservative estimate of the maximum host file size
>  * required by a new image given an existing BlockDriverState (not
>  * necessarily opened with this BlockDriver).
>  */
> uint64_t (*bdrv_query_max_size)(BlockDriverState *other_bs,
> Error **errp);
> };
>
> This interface allows individual block drivers to probe other_bs in
> whatever way necessary (e.g. querying block allocation status).
>
> Since this is a conservative max estimate there's no need to read all
> data to check for zero regions.  We should give the best estimate that
> can be generated quickly.

I think we need to check allocation (e.g. with SEEK_DATA), I hope this
is what you mean by not read all data.

Nir



Re: [Qemu-devel] Estimation of qcow2 image size converted from raw image

2017-02-15 Thread Nir Soffer
On Wed, Feb 15, 2017 at 5:20 PM, Daniel P. Berrange  wrote:
> On Wed, Feb 15, 2017 at 03:14:19PM +, Stefan Hajnoczi wrote:
>> On Mon, Feb 13, 2017 at 05:46:19PM +0200, Maor Lipchuk wrote:
>> > I was wondering if that is possible to provide a new API that
>> > estimates the size of
>> > qcow2 image converted from a raw image. We could use this new API to
>> > allocate the
>> > size more precisely before the convert operation.
>> >
>> [...]
>> > We think that the best way to solve this issue is to return this info
>> > from qemu-img, maybe as a flag to qemu-img convert that will
>> > calculate the size of the converted image without doing any writes.
>>
>> Sounds reasonable.  qcow2 actually already does some of this calculation
>> internally for image preallocation in qcow2_create2().
>>
>> Let's try this syntax:
>>
>>   $ qemu-img query-max-size -f raw -O qcow2 input.raw
>>   1234678000
>>
>> As John explained, it is only an estimate.  But it will be a
>> conservative maximum.
>
> This forces you to have an input file. It would be nice to be able to
> get the same information by merely giving the desired capacity e.g
>
>   $ qemu-img query-max-size -O qcow2 20G

Without a file, this will have to assume that all clusters will be allocated.

Do you have a use case for not using existing file?

For ovirt we need this when converting a file from one storage to another,
the capabilities of the storage matter in both cases.

(Adding all)

Nir



Re: [Qemu-devel] [PATCH] qemu-img: Do not truncate before preallocation

2017-02-16 Thread Nir Soffer
Ping

On Fri, Feb 3, 2017 at 9:50 PM, Nir Soffer  wrote:
> When using file system that does not support fallocate() (e.g. NFS <
> 4.2), truncating the file only when preallocation=OFF speeds up creating
> raw file.
>
> Here is example run, tested on Fedora 24 machine, creating raw file on
> NFS version 3 server.
>
> $ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g
> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
>
> real0m21.185s
> user0m0.022s
> sys 0m0.574s
>
> $ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g
> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
>
> real0m11.601s
> user0m0.016s
> sys 0m0.525s
>
> $ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s
>
> real0m16.104s
> user0m0.009s
> sys 0m0.220s
>
> Running with strace we can see that without this change we do one
> pread() and one pwrite() for each block. With this change, we do only
> one pwrite() per block.
>
> $ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 8192
> ...
> pread64(9, "\0", 1, 4095)   = 1
> pwrite64(9, "\0", 1, 4095)  = 1
> pread64(9, "\0", 1, 8191)   = 1
> pwrite64(9, "\0", 1, 8191)  = 1
>
> $ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192
> ...
> pwrite64(9, "\0", 1, 4095)  = 1
> pwrite64(9, "\0", 1, 8191)  = 1
>
> This happens because posix_fallocate is checking if each block is
> allocated before writing a byte to the block, and when truncating the
> file before preallocation, all blocks are unallocated.
>
> Signed-off-by: Nir Soffer 
> ---
>
> I sent this a week ago:
> http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg06123.html
>
> Sending again with improved commit message.
>
>  block/file-posix.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 2134e0e..442f080 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1591,12 +1591,6 @@ static int raw_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  #endif
>  }
>
> -if (ftruncate(fd, total_size) != 0) {
> -result = -errno;
> -error_setg_errno(errp, -result, "Could not resize file");
> -goto out_close;
> -}
> -
>  switch (prealloc) {
>  #ifdef CONFIG_POSIX_FALLOCATE
>  case PREALLOC_MODE_FALLOC:
> @@ -1636,6 +1630,10 @@ static int raw_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  break;
>  }
>  case PREALLOC_MODE_OFF:
> +if (ftruncate(fd, total_size) != 0) {
> +result = -errno;
> +error_setg_errno(errp, -result, "Could not resize file");
> +}
>  break;
>  default:
>  result = -EINVAL;
> @@ -1644,7 +1642,6 @@ static int raw_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  break;
>  }
>
> -out_close:
>  if (qemu_close(fd) != 0 && result == 0) {
>  result = -errno;
>  error_setg_errno(errp, -result, "Could not close the new file");
> --
> 2.9.3
>



Re: [Qemu-devel] [PATCH] qemu-img: Do not truncate before preallocation

2017-02-16 Thread Nir Soffer
On Thu, Feb 16, 2017 at 7:52 PM, Kevin Wolf  wrote:
> Am 03.02.2017 um 20:50 hat Nir Soffer geschrieben:
>> When using file system that does not support fallocate() (e.g. NFS <
>> 4.2), truncating the file only when preallocation=OFF speeds up creating
>> raw file.
>>
>> Here is example run, tested on Fedora 24 machine, creating raw file on
>> NFS version 3 server.
>>
>> $ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g
>> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
>>
>> real  0m21.185s
>> user  0m0.022s
>> sys   0m0.574s
>>
>> $ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g
>> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
>>
>> real  0m11.601s
>> user  0m0.016s
>> sys   0m0.525s
>>
>> $ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct
>> 1024+0 records in
>> 1024+0 records out
>> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s
>>
>> real  0m16.104s
>> user  0m0.009s
>> sys   0m0.220s
>>
>> Running with strace we can see that without this change we do one
>> pread() and one pwrite() for each block. With this change, we do only
>> one pwrite() per block.
>>
>> $ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 
>> 8192
>> ...
>> pread64(9, "\0", 1, 4095)   = 1
>> pwrite64(9, "\0", 1, 4095)  = 1
>> pread64(9, "\0", 1, 8191)   = 1
>> pwrite64(9, "\0", 1, 8191)  = 1
>>
>> $ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192
>> ...
>> pwrite64(9, "\0", 1, 4095)  = 1
>> pwrite64(9, "\0", 1, 8191)  = 1
>>
>> This happens because posix_fallocate is checking if each block is
>> allocated before writing a byte to the block, and when truncating the
>> file before preallocation, all blocks are unallocated.
>>
>> Signed-off-by: Nir Soffer 
>
> Thanks, applied to the block branch.
>
> I'm not completely sure if doing an ftruncate() first couldn't improve
> PREALLOC_MODE_FULL somewhat in some cases, but I agree that the patch
> should still result in correct images.

Good point, I'll do some tests with full mode to check this.

Do you know which cases can benefit from ftruncate() before full preallocation?

Nir



[Qemu-devel] [PATCH 2/3] qemu-img: Truncate before full preallocation

2017-02-16 Thread Nir Soffer
In commit 10ddfe7b6044 (qemu-img: Do not truncate before preallocation)
we moved truncate to the PREALLOC_MODE_OFF branch to avoid slowdown in
posix_fallocate().

However this change is not optimal when using PREALLOC_MODE_FULL, since
knowing the final size from the beginning could allow the file system
driver to do less allocations and possibly avoid fragmentation of the
file.

Now we truncate also before doing full preallocation.

Signed-off-by: Nir Soffer 
---
 block/file-posix.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 442f080..d24e34b 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1604,6 +1604,17 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 #endif
 case PREALLOC_MODE_FULL:
 {
+/*
+ * Knowing the final size from the beginning could allow the file
+ * system driver to do less allocations and possibly avoid
+ * fragmentation of the file.
+ */
+if (ftruncate(fd, total_size) != 0) {
+result = -errno;
+error_setg_errno(errp, -result, "Could not resize file");
+goto out_close;
+}
+
 int64_t num = 0, left = total_size;
 buf = g_malloc0(65536);
 
@@ -1642,6 +1653,7 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 break;
 }
 
+out_close:
 if (qemu_close(fd) != 0 && result == 0) {
 result = -errno;
 error_setg_errno(errp, -result, "Could not close the new file");
-- 
2.9.3




[Qemu-devel] [PATCH 0/3] qemu-img raw preallocation

2017-02-16 Thread Nir Soffer
This series add missing tests for raw image preallocation, refine
preallocation=full and improve documentation.

Create on top of the commit 10ddfe7b6044 (qemu-img: Do not truncate
before preallocation) from Kevin block branch.

Nir Soffer (3):
  qemu-img: Add tests for raw image preallocation
  qemu-img: Truncate before full preallocation
  qemu-img: Improve documentation for PREALLOC_MODE_FALLOC

 block/file-posix.c | 19 ++-
 tests/qemu-iotests/175 | 61 ++
 tests/qemu-iotests/175.out | 18 ++
 tests/qemu-iotests/group   |  1 +
 4 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/175
 create mode 100644 tests/qemu-iotests/175.out

-- 
2.9.3




[Qemu-devel] [PATCH 3/3] qemu-img: Improve documentation for PREALLOC_MODE_FALLOC

2017-02-16 Thread Nir Soffer
Now that we are truncating the file in both PREALLOC_MODE_FULL and
PREALLOC_MODE_OFF, not truncating in PREALLOC_MODE_FALLOC looks odd.
Add a comment explaining why we do not truncate in this case.

Signed-off-by: Nir Soffer 
---
 block/file-posix.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index d24e34b..20a261f 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1594,9 +1594,14 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 switch (prealloc) {
 #ifdef CONFIG_POSIX_FALLOCATE
 case PREALLOC_MODE_FALLOC:
-/* posix_fallocate() doesn't set errno. */
+/*
+ * Truncating before posix_fallocate() makes it about twice slower on
+ * file systems that do not support fallocate(), trying to check if a
+ * block is allocated before allocating it.
+ */
 result = -posix_fallocate(fd, 0, total_size);
 if (result != 0) {
+/* posix_fallocate() doesn't set errno. */
 error_setg_errno(errp, -result,
  "Could not preallocate data for the new file");
 }
-- 
2.9.3




[Qemu-devel] [PATCH 1/3] qemu-img: Add tests for raw image preallocation

2017-02-16 Thread Nir Soffer
Add tests for creating raw image with and without the preallocation
option.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/175 | 61 ++
 tests/qemu-iotests/175.out | 18 ++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 80 insertions(+)
 create mode 100755 tests/qemu-iotests/175
 create mode 100644 tests/qemu-iotests/175.out

diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
new file mode 100755
index 000..ca56e82
--- /dev/null
+++ b/tests/qemu-iotests/175
@@ -0,0 +1,61 @@
+#!/bin/bash
+#
+# Test creating raw image preallocation mode
+#
+# Copyright (C) 2017 Nir Soffer 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=nir...@gmail.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+size=1m
+
+echo
+echo "== creating image with default preallocation =="
+_make_test_img $size | _filter_imgfmt
+stat -c "size=%s, blocks=%b" $TEST_IMG
+
+for mode in off full falloc; do
+echo
+echo "== creating image with preallocation $mode =="
+IMGOPTS=preallocation=$mode _make_test_img $size | _filter_imgfmt
+stat -c "size=%s, blocks=%b" $TEST_IMG
+done
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/175.out b/tests/qemu-iotests/175.out
new file mode 100644
index 000..76c02c6
--- /dev/null
+++ b/tests/qemu-iotests/175.out
@@ -0,0 +1,18 @@
+QA output created by 175
+
+== creating image with default preallocation ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+size=1048576, blocks=0
+
+== creating image with preallocation off ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=off
+size=1048576, blocks=0
+
+== creating image with preallocation full ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=full
+size=1048576, blocks=2048
+
+== creating image with preallocation falloc ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=falloc
+size=1048576, blocks=2048
+ *** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 985b9a6..1f4bf03 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -167,3 +167,4 @@
 172 auto
 173 rw auto
 174 auto
+175 auto quick
-- 
2.9.3




Re: [Qemu-devel] [PATCH 1/2] Update README to accomodate markdown format

2017-02-16 Thread Nir Soffer
On Fri, Feb 17, 2017 at 2:54 AM, Pranith Kumar  wrote:
> Signed-off-by: Pranith Kumar 
> ---
>  README | 44 +---
>  1 file changed, 21 insertions(+), 23 deletions(-)
>
> diff --git a/README b/README
> index cb60d05bee..225afd6be7 100644
> --- a/README
> +++ b/README
> @@ -1,5 +1,5 @@
> - QEMU README
> - ===
> +QEMU
> +
>
>  QEMU is a generic and open source machine & userspace emulator and
>  virtualizer.
> @@ -31,22 +31,22 @@ version 2. For full licensing details, consult the 
> LICENSE file.
>
>
>  Building
> -
> +
>
>  QEMU is multi-platform software intended to be buildable on all modern
>  Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
>  of other UNIX targets. The simple steps to build QEMU are:
>
> -  mkdir build
> -  cd build
> -  ../configure
> -  make
> +mkdir build
> +cd build
> +../configure
> +make
>
>  Additional information can also be found online via the QEMU website:
>
> -  http://qemu-project.org/Hosts/Linux
> -  http://qemu-project.org/Hosts/Mac
> -  http://qemu-project.org/Hosts/W32
> +  - http://qemu-project.org/Hosts/Linux
> +  - http://qemu-project.org/Hosts/Mac
> +  - http://qemu-project.org/Hosts/W32
>
>
>  Submitting patches

You missed this title

> @@ -54,7 +54,7 @@ Submitting patches
>
>  The QEMU source code is maintained under the GIT version control system.
>
> -   git clone git://git.qemu-project.org/qemu.git
> +git clone git://git.qemu-project.org/qemu.git
>
>  When submitting patches, the preferred approach is to use 'git
>  format-patch' and/or 'git send-email' to format & send the mail to the
> @@ -65,18 +65,18 @@ guidelines set out in the HACKING and CODING_STYLE files.
>  Additional information on submitting patches can be found online via
>  the QEMU website
>
> -  http://qemu-project.org/Contribute/SubmitAPatch
> -  http://qemu-project.org/Contribute/TrivialPatches
> +  - http://qemu-project.org/Contribute/SubmitAPatch
> +  - http://qemu-project.org/Contribute/TrivialPatches
>
>
>  Bug reporting
> -=
> +-
>
>  The QEMU project uses Launchpad as its primary upstream bug tracker. Bugs
>  found when running code built from QEMU git or upstream released sources
>  should be reported via:
>
> -  https://bugs.launchpad.net/qemu/
> +  - https://bugs.launchpad.net/qemu/
>
>  If using QEMU via an operating system vendor pre-built binary package, it
>  is preferable to report bugs to the vendor's own bug tracker first. If
> @@ -85,22 +85,20 @@ reported via launchpad.
>
>  For additional information on bug reporting consult:
>
> -  http://qemu-project.org/Contribute/ReportABug
> +  - http://qemu-project.org/Contribute/ReportABug
>
>
>  Contact
> -===
> +---
>
>  The QEMU community can be contacted in a number of ways, with the two
>  main methods being email and IRC
>
> - - qemu-devel@nongnu.org
> -   http://lists.nongnu.org/mailman/listinfo/qemu-devel
> - - #qemu on irc.oftc.net
> + - Mailing List: qemu-devel@nongnu.org
> + - Archives: http://lists.nongnu.org/mailman/listinfo/qemu-devel
> + - IRC: #qemu on irc.oftc.net
>
>  Information on additional methods of contacting the community can be
>  found online via the QEMU website:
>
> -  http://qemu-project.org/Contribute/StartHere
> -
> --- End
> +  - http://qemu-project.org/Contribute/StartHere

Much nicer now!

Nir



Re: [Qemu-devel] [PATCH 1/3] qemu-img: Add tests for raw image preallocation

2017-02-17 Thread Nir Soffer
On Fri, Feb 17, 2017 at 11:14 AM, Kevin Wolf  wrote:
> Am 17.02.2017 um 01:51 hat Nir Soffer geschrieben:
>> Add tests for creating raw image with and without the preallocation
>> option.
>>
>> Signed-off-by: Nir Soffer 
>
> Looks good, but 175 is already (multiply) taken. Not making this a
> blocker, but I just want to remind everyone to check the mailing list
> for pending patches which add new tests before using a new number in
> order to avoid unnecessary rebases for everyone. In general, it's as
> easy as searching for the string "175.out" in the mailbox.
>
> The next free one seems to be 177 currently.

Thanks, will change to 177 in the next version.

For next patches, what do you mean by "pending"? patches sent
to the block mailing list?

Nir



Re: [Qemu-devel] [PATCH v2] qmp-shell: add persistent command history

2017-03-03 Thread Nir Soffer
On Fri, Mar 3, 2017 at 8:54 PM, John Snow  wrote:
> Use the existing readline history function we are utilizing
> to provide persistent command history across instances of qmp-shell.
>
> This assists entering debug commands across sessions that may be
> interrupted by QEMU sessions terminating, where the qmp-shell has
> to be relaunched.
>
> Signed-off-by: John Snow 
> ---
>
> v2: Adjusted the errors to whine about non-ENOENT errors, but still
> intercept all errors as non-fatal.
> Save history atexit() to match bash standard behavior
>
>  scripts/qmp/qmp-shell | 19 +++
>  1 file changed, 19 insertions(+)
>
> diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
> index 0373b24..55a8285 100755
> --- a/scripts/qmp/qmp-shell
> +++ b/scripts/qmp/qmp-shell
> @@ -70,6 +70,9 @@ import json
>  import ast
>  import readline
>  import sys
> +import os
> +import errno
> +import atexit
>
>  class QMPCompleter(list):
>  def complete(self, text, state):
> @@ -109,6 +112,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>  self._pretty = pretty
>  self._transmode = False
>  self._actions = list()
> +self._histfile = os.path.join(os.path.expanduser('~'), 
> '.qmp_history')
>
>  def __get_address(self, arg):
>  """
> @@ -137,6 +141,21 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>  # XXX: default delimiters conflict with some command names (eg. 
> query-),
>  # clearing everything as it doesn't seem to matter
>  readline.set_completer_delims('')
> +try:
> +readline.read_history_file(self._histfile)
> +except Exception as e:
> +if isinstance(e, IOError) and e.errno == errno.ENOENT:
> +# File not found. No problem.
> +pass
> +else:
> +print "Failed to read history '%s'; %s" % (self._histfile, e)

I would handle only IOError, since any other error means a bug in this code
or in the underlying readline library, and the best way to handle this is to
let it fail loudly.

> +atexit.register(self.__save_history)
> +
> +def __save_history(self):
> +try:
> +readline.write_history_file(self._histfile)
> +except Exception as e:
> +print "Failed to save history file '%s'; %s" % (self._histfile, 
> e)
>
>  def __parse_value(self, val):
>  try:

But I think this is good enough and useful as is.

Reviewed-by: Nir Soffer 



Re: [Qemu-devel] [RFC 0/4] qemu-img: add max-size subcommand

2017-03-03 Thread Nir Soffer
On Fri, Mar 3, 2017 at 3:51 PM, Stefan Hajnoczi  wrote:
>
> RFCv1:
>  * Publishing patch series with just raw support, no qcow2 yet.  Please review
>the command-line interface and let me know if you are happy with this
>approach.
>
> Users and management tools sometimes need to know the size required for a new
> disk image so that an LVM volume, SAN LUN, etc can be allocated ahead of time.
> Image formats like qcow2 have non-trivial metadata that makes it hard to
> estimate the exact size without knowledge of file format internals.
>
> This patch series introduces a new qemu-img subcommand that calculates the
> required size for both image creation and conversion scenarios.
>
> The conversion scenario is:
>
>   $ qemu-img max-size -f raw -O qcow2 input.img
>   107374184448

Isn't this the minimal size required to convert input.img?

>
> Here an existing image file is taken and the output includes the space 
> required
> for data from the input image file.
>
> The creation scenario is:
>
>   $ qemu-img max-size -O qcow2 --size 5G
>   196688

Again, this is the minimal size.

So maybe use min-size?

Or:

qemu-img measure -f raw -O qcow2 input.img

Works nicely with other verbs like create, convert, check.

Now about the return value, do we want to return both the minimum size
and the maximum size?

For ovirt use case, we currently calculate the maximum size by multiplying
by 1.1. We use this when doing automatic extending of ovirt thin provisioned
disk. We start with 1G lv, and extend it each time it becomes full, stopping
when we reach virtual size * 1.1. Using more accurate calculation instead
can be nicer.

So we can retrun:

{
"min-size": 196688,
"max-size": 5905580032
}

Anyway thanks for working on this!

>
> Stefan Hajnoczi (4):
>   block: add bdrv_max_size() API
>   raw-format: add bdrv_max_size() support
>   qemu-img: add max-size subcommand
>   iotests: add test 178 for qemu-img max-size
>
>  include/block/block.h  |   2 +
>  include/block/block_int.h  |   2 +
>  block.c|  37 +
>  block/raw-format.c |  16 
>  qemu-img.c | 196 
> +
>  qemu-img-cmds.hx   |   6 ++
>  tests/qemu-iotests/178 |  75 +
>  tests/qemu-iotests/178.out |  25 ++
>  tests/qemu-iotests/group   |   1 +
>  9 files changed, 360 insertions(+)
>  create mode 100755 tests/qemu-iotests/178
>  create mode 100644 tests/qemu-iotests/178.out
>
> --
> 2.9.3
>



  1   2   3   4   >