Re: [PATCH v2 31/31] iotests: Test block-export-* QMP interface

2020-09-25 Thread Max Reitz
On 24.09.20 17:27, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/307 | 132 +
>  tests/qemu-iotests/307.out | 124 ++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 257 insertions(+)
>  create mode 100755 tests/qemu-iotests/307
>  create mode 100644 tests/qemu-iotests/307.out

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 30/31] iotests: Allow supported and unsupported formats at the same time

2020-09-25 Thread Max Reitz
On 24.09.20 17:27, Kevin Wolf wrote:
> This is useful for specifying 'generic' as supported (which includes
> only writable image formats), but still excluding some incompatible
> writable formats.
> 
> It also removes more lines than it adds.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/iotests.py | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 29/31] iotests: Introduce qemu_nbd_list_log()

2020-09-25 Thread Max Reitz
On 24.09.20 17:27, Kevin Wolf wrote:
> Add a function to list the NBD exports offered by an NBD server.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/iotests.py | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 28/31] iotests: Factor out qemu_tool_pipe_and_status()

2020-09-25 Thread Max Reitz
On 24.09.20 17:27, Kevin Wolf wrote:
> We have three almost identical functions that call an external process
> and return its output and return code. Refactor them into small wrappers
> around a common function.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/iotests.py | 49 ---
>  1 file changed, 23 insertions(+), 26 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 26/31] nbd: Merge nbd_export_new() and nbd_export_create()

2020-09-25 Thread Max Reitz
On 24.09.20 17:27, Kevin Wolf wrote:
> There is no real reason any more why nbd_export_new() and
> nbd_export_create() should be separate functions. The latter only
> performs a few checks before it calls the former.
> 
> What makes the current state stand out is that it's the only function in
> BlockExportDriver that is not a static function inside nbd/server.c, but
> a small wrapper in blockdev-nbd.c that then calls back into nbd/server.c
> for the real functionality.

Thanks :)



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 27/31] nbd: Deprecate nbd-server-add/remove

2020-09-25 Thread Max Reitz
On 24.09.20 17:27, Kevin Wolf wrote:
> These QMP commands are replaced by block-export-add/del.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json | 11 +--
>  docs/system/deprecated.rst |  6 ++
>  2 files changed, 15 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 21/31] block/export: Add BLOCK_EXPORT_DELETED event

2020-09-25 Thread Max Reitz
On 24.09.20 17:27, Kevin Wolf wrote:
> Clients may want to know when an export has finally disappeard
> (block-export-del returns earlier than that in the general case), so add
> a QAPI event for it.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json | 12 
>  block/export/export.c  |  2 ++
>  tests/qemu-iotests/140 |  9 -
>  tests/qemu-iotests/140.out |  2 +-
>  tests/qemu-iotests/223.out |  4 
>  5 files changed, 27 insertions(+), 2 deletions(-)

[...]

> diff --git a/tests/qemu-iotests/140 b/tests/qemu-iotests/140
> index 8d2ce5d9e3..309b177e77 100755
> --- a/tests/qemu-iotests/140
> +++ b/tests/qemu-iotests/140
> @@ -81,10 +81,17 @@ $QEMU_IO_PROG -f raw -r -c 'read -P 42 0 64k' \
>  "nbd+unix:///drv?socket=$SOCK_DIR/nbd" 2>&1 \
>  | _filter_qemu_io | _filter_nbd
>  
> +# The order of 'return' and the BLOCK_EXPORT_DELETED event is undefined. Just
> +# wait until we've twice seen one of them. Filter the 'return' line out so 
> that
> +# the output is defined.
>  _send_qemu_cmd $QEMU_HANDLE \
>  "{ 'execute': 'eject',
> 'arguments': { 'device': 'drv' }}" \
> -'return'
> +'return\|BLOCK_EXPORT_DELETED' |
> +grep -v 'return'
> +
> +_send_qemu_cmd $QEMU_HANDLE '' 'return\|BLOCK_EXPORT_DELETED' |
> +grep -v 'return'

Funny.  I did basically the same thing (only I filtered the event, not
the return):

https://git.xanclic.moe/XanClic/qemu/commit/e6f7510149a4a26c868013639ec89d28f16857d8#diff-3

and considered it kind of a hack.

Oh well. :)

Reviewed-by: Max Reitz 

>  $QEMU_IO_PROG -f raw -r -c close \
>  "nbd+unix:///drv?socket=$SOCK_DIR/nbd" 2>&1 \



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 15/15] scripts/simplebench: add bench_prealloc.py

2020-09-25 Thread Max Reitz
On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Benchmark for new preallocate filter.
> 
> Example usage:
> ./bench_prealloc.py ../../build/qemu-img \
> ssd-ext4:/path/to/mount/point \
> ssd-xfs:/path2 hdd-ext4:/path3 hdd-xfs:/path4
> 
> The benchmark shows performance improvement (or degradation) when use
> new preallocate filter with qcow2 image.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  scripts/simplebench/bench_prealloc.py | 128 ++
>  1 file changed, 128 insertions(+)
>  create mode 100755 scripts/simplebench/bench_prealloc.py

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line

2020-09-25 Thread Max Reitz
On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Performance improvements / degradations are usually discussed in
> percentage. Let's make the script calculate it for us.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  scripts/simplebench/simplebench.py | 46 +++---
>  1 file changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/scripts/simplebench/simplebench.py 
> b/scripts/simplebench/simplebench.py
> index 56d3a91ea2..0ff05a38b8 100644
> --- a/scripts/simplebench/simplebench.py
> +++ b/scripts/simplebench/simplebench.py

[...]

> +for j in range(0, i):
> +env_j = results['envs'][j]
> +res_j = case_results[env_j['id']]
> +
> +if 'average' not in res_j:
> +# Failed result
> +cell += ' --'
> +continue
> +
> +col_j = chr(ord('A') + j)
> +avg_j = res_j['average']
> +delta = (res['average'] - avg_j) / avg_j * 100

I was wondering why you’d subtract, when percentage differences usually
mean a quotient.  Then I realized that this would usually be written as:

(res['average'] / avg_j - 1) * 100

> +delta_delta = (res['delta'] + res_j['delta']) / avg_j * 100

Why not use the new format_percent for both cases?

> +cell += f' {col_j}{round(delta):+}±{round(delta_delta)}%'

I don’t know what I should think about ±delta_delta.  If I saw “Compared
to run A, this is +42.1%±2.0%”, I would think that you calculated the
difference between each run result, and then based on that array
calculated average and standard deviation.

Furthermore, I don’t even know what the delta_delta is supposed to tell
you.  It isn’t even a delta_delta, it’s an average_delta.

The delta_delta would be (res['delta'] / res_j['delta'] - 1) * 100.0.
And that might be presented perhaps like “+42.1% Δ± +2.0%” (if delta
were the SD, “Δx̅=+42.1% Δσ=+2.0%” would also work; although, again, I do
interpret ± as the SD anyway).

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 13/15] scripts/simplebench: improve view of ascii table

2020-09-25 Thread Max Reitz
On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Introduce dynamic float precision and use percentage to show delta.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  scripts/simplebench/simplebench.py | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/simplebench/simplebench.py 
> b/scripts/simplebench/simplebench.py
> index 716d7fe9b2..56d3a91ea2 100644
> --- a/scripts/simplebench/simplebench.py
> +++ b/scripts/simplebench/simplebench.py
> @@ -79,10 +79,34 @@ def bench_one(test_func, test_env, test_case, count=5, 
> initial_run=True):
>  return result
>  
>  
> +def format_float(x):
> +res = round(x)
> +if res >= 100:
> +return str(res)
> +
> +res = f'{x:.1f}'
> +if len(res) >= 4:
> +return res
> +
> +return f'{x:.2f}'

This itches me to ask for some log() calculation.

Like:

%.*f' % (math.ceil(math.log10(99.95 / x)), x)

(For three significant digits)

> +def format_percent(x):
> +x *= 100
> +
> +res = round(x)
> +if res >= 10:
> +return str(res)
> +
> +return f'{x:.1f}' if res >= 1 else f'{x:.2f}'

Same here.  (Also, why not append a % sign in this function?)

>  def ascii_one(result):
>  """Return ASCII representation of bench_one() returned dict."""
>  if 'average' in result:
> -s = '{:.2f} +- {:.2f}'.format(result['average'], result['delta'])
> +avg = result['average']
> +delta_pr = result['delta'] / avg
> +s = f'{format_float(avg)}±{format_percent(delta_pr)}%'

Pre-existing, but isn’t the ± range generally assumed to be the standard
deviation?

Max

>  if 'n-failed' in result:
>  s += '\n({} failed)'.format(result['n-failed'])
>  return s
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver

2020-09-25 Thread Max Reitz
On 25.09.20 10:49, Vladimir Sementsov-Ogievskiy wrote:
> 25.09.2020 11:26, Max Reitz wrote:
>> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   tests/qemu-iotests/298 | 186 +
>>>   tests/qemu-iotests/298.out |   5 +
>>>   tests/qemu-iotests/group   |   1 +
>>>   3 files changed, 192 insertions(+)
>>>   create mode 100644 tests/qemu-iotests/298
>>>   create mode 100644 tests/qemu-iotests/298.out

[...]

>>> +class TestTruncate(iotests.QMPTestCase):
>>
>> The same decorator could be placed here, although this class doesn’t
>> start a VM, and so is unaffected by the allowlist.  Still may be
>> relevant in case of block modules, I don’t know.
> 
> Or just global test skip at file top

Hm.  Like verify_quorum()?  Is there a generic function for that already?

[...]

>>> +    # Probably we'll want preallocate filter to keep align to
>>> cluster when
>>> +    # shrink preallocation, so, ignore small differece
>>> +    self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
>>> +
>>> +    # Preallocate filter may leak some internal clusters (for
>>> example, if
>>> +    # guest write far over EOF, skipping some clusters - they
>>> will remain
>>> +    # fallocated, preallocate filter don't care about such
>>> leaks, it drops
>>> +    # only trailing preallocation.
>>
>> True, but that isn’t what’s happening here.  (We only write 10M at 0, so
>> there are no gaps.)  Why do we need this 1M margin?
> 
> We write 10M, but qcow2 also writes metadata as it wants

Ah, yes, sure.  Shouldn’t result in 1M, but why not.

>>> +    self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
>>> +    1024 * 1024)
>>
>> [...]
>>
>>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>>> index ff59cfd2d4..15d5f9619b 100644
>>> --- a/tests/qemu-iotests/group
>>> +++ b/tests/qemu-iotests/group
>>> @@ -307,6 +307,7 @@
>>>   295 rw
>>>   296 rw
>>>   297 meta
>>> +298 auto quick
>>
>> I wouldn’t mark it as quick, there is at least one preallocate=full of
>> 140M, and one of 40M, plus multiple 10M data writes and falloc
>> preallocations.
>>
>> Also, since you mark it as “auto”, have you run this test on all
>> CI-relevant hosts?  (Among other things I can’t predict) I wonder how
>> preallocation behaves on macOS.  Just because that one was always a bit
>> weird about not-really-data areas.
>>
> 
> Ofcourse, I didn't run on all hosts. I'm a bit out of sync about this..

Well, someone has to do it.  The background story is that tests are
added to auto all the time (because “why not”), and then they fail on
BSD or macOS.  We have BSD docker test build targets at least, so they
can be easily tested.  (Well, it takes like half an hour, but you know.)

(We don’t have macOS builds, as far as I can tell, but I personally
don’t even know why we run the iotests on macOS at all.  (Well, I also
wonder about the BSDs, but given the test build targets, I shouldn’t
complain, I suppose.))

(Though macOS isn’t part of the gating CI, is it?  I seem to remember
macOS errors are generally only reported to me half a week after the
pull request is merged, which is even worse.)

Anyway.  I just ran the test for OpenBSD
(EXTRA_CONFIGURE_OPTS='--target-list=x86_64-softmmu' \
   make vm-build-openbsd)
and got some failures:

--- /home/qemu/qemu-test.PGo2ls/src/tests/qemu-iotests/298.out  Fri Sep
25 07:10:31 2020
+++ /home/qemu/qemu-test.PGo2ls/build/tests/qemu-iotests/298.out.bad
Fri Sep 25 08:57:56 2020
@@ -1,5 +1,67 @@
-.
+qemu-io: Failed to resize underlying file: Unsupported preallocation
mode: falloc
+qemu-io: Failed to resize underlying file: Unsupported preallocation
mode: falloc
+.F...F...
+==
+FAIL: test_external_snapshot (__main__.TestPreallocateFilter)
 --
+Traceback (most recent call last):
+  File "298", line 81, in test_external_snapshot
+self.test_prealloc()
+  File "298", line 78, in test_prealloc
+self.check_big()
+  File "298", line 48, in check_big
+self.assertTrue(os.path.getsize(disk) > 100 * MiB)
+AssertionError: False is not true
+
+==
+FAIL: test_prealloc (__main__.TestPreallocateFilter)
+--
+Trac

Re: [PATCH v6 12/15] scripts/simplebench: support iops

2020-09-25 Thread Max Reitz
On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Support benchmarks returning not seconds but iops. We'll use it for
> further new test.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  scripts/simplebench/simplebench.py | 35 +++---
>  1 file changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/scripts/simplebench/simplebench.py 
> b/scripts/simplebench/simplebench.py
> index 59e7314ff6..716d7fe9b2 100644
> --- a/scripts/simplebench/simplebench.py
> +++ b/scripts/simplebench/simplebench.py

[...]

> @@ -34,6 +37,7 @@ def bench_one(test_func, test_env, test_case, count=5, 
> initial_run=True):
>  
>  Returns dict with the following fields:
>  'runs': list of test_func results
> +'dimension': dimension of results, may be 'seconds' or 'iops'
>  'average':  average seconds per run (exists only if at least one run

s/seconds/value/ (or something like that)

>  succeeded)
>  'delta':maximum delta between test_func result and the average
> @@ -54,11 +58,20 @@ def bench_one(test_func, test_env, test_case, count=5, 
> initial_run=True):
>  
>  result = {'runs': runs}
>  
> -successed = [r for r in runs if ('seconds' in r)]
> +successed = [r for r in runs if ('seconds' in r or 'iops' in r)]
>  if successed:

((Pre-existing, but I feel the urge to point it that it should be
“succeeded”.  (Or perhaps “successes”.)

Sorry, not something that should be fixed here, but I just couldn’t
contain myself.))

> -avg = sum(r['seconds'] for r in successed) / len(successed)
> +dim = 'iops' if ('iops' in successed[0]) else 'seconds'

I think this line should be dropped, because it’s obsoleted by the
if-else that follows.

> +if 'iops' in successed[0]:
> +assert all('iops' in r for r in successed)
> +dim = 'iops'
> +else:
> +assert all('seconds' in r for r in successed)
> +assert all('iops' not in r for r in successed)
> +dim = 'seconds'

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver

2020-09-25 Thread Max Reitz
On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/298 | 186 +
>  tests/qemu-iotests/298.out |   5 +
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 192 insertions(+)
>  create mode 100644 tests/qemu-iotests/298
>  create mode 100644 tests/qemu-iotests/298.out
> 
> diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298
> new file mode 100644
> index 00..fef10f6a7a
> --- /dev/null
> +++ b/tests/qemu-iotests/298

[...]

> +class TestPreallocateBase(iotests.QMPTestCase):

Perhaps a

@iotests.skip_if_unsupported(['preallocate'])

here?

> +def setUp(self):
> +iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))

[...]

> +class TestTruncate(iotests.QMPTestCase):

The same decorator could be placed here, although this class doesn’t
start a VM, and so is unaffected by the allowlist.  Still may be
relevant in case of block modules, I don’t know.

> +def setUp(self):
> +iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))
> +iotests.qemu_img_create('-f', iotests.imgfmt, refdisk, str(10 * MiB))
> +
> +def tearDown(self):
> +os.remove(disk)
> +os.remove(refdisk)
> +
> +def do_test(self, prealloc_mode, new_size):
> +ret = iotests.qemu_io_silent('--image-opts', '-c', 'write 0 10M', 
> '-c',
> + f'truncate -m {prealloc_mode} 
> {new_size}',
> + drive_opts)
> +self.assertEqual(ret, 0)
> +
> +ret = iotests.qemu_io_silent('-f', iotests.imgfmt, '-c', 'write 0 
> 10M',
> + '-c',
> + f'truncate -m {prealloc_mode} 
> {new_size}',
> + refdisk)
> +self.assertEqual(ret, 0)
> +
> +stat = os.stat(disk)
> +refstat = os.stat(refdisk)
> +
> +# Probably we'll want preallocate filter to keep align to cluster 
> when
> +# shrink preallocation, so, ignore small differece
> +self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
> +
> +# Preallocate filter may leak some internal clusters (for example, if
> +# guest write far over EOF, skipping some clusters - they will remain
> +# fallocated, preallocate filter don't care about such leaks, it 
> drops
> +# only trailing preallocation.

True, but that isn’t what’s happening here.  (We only write 10M at 0, so
there are no gaps.)  Why do we need this 1M margin?

> +self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
> +1024 * 1024)

[...]

> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index ff59cfd2d4..15d5f9619b 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -307,6 +307,7 @@
>  295 rw
>  296 rw
>  297 meta
> +298 auto quick

I wouldn’t mark it as quick, there is at least one preallocate=full of
140M, and one of 40M, plus multiple 10M data writes and falloc
preallocations.

Also, since you mark it as “auto”, have you run this test on all
CI-relevant hosts?  (Among other things I can’t predict) I wonder how
preallocation behaves on macOS.  Just because that one was always a bit
weird about not-really-data areas.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 10/15] iotests: qemu_io_silent: support --image-opts

2020-09-25 Thread Max Reitz
On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/iotests.py | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 09/15] qemu-io: add preallocate mode parameter for truncate command

2020-09-24 Thread Max Reitz
On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> This will be used in further test.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  qemu-io-cmds.c | 46 --
>  1 file changed, 32 insertions(+), 14 deletions(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index baeae86d8c..64f0246a71 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1698,13 +1698,42 @@ static const cmdinfo_t flush_cmd = {
>  .oneline= "flush all in-core file state to disk",
>  };
>  
> +static int truncate_f(BlockBackend *blk, int argc, char **argv);
> +static const cmdinfo_t truncate_cmd = {
> +.name   = "truncate",
> +.altname= "t",
> +.cfunc  = truncate_f,
> +.perm   = BLK_PERM_WRITE | BLK_PERM_RESIZE,
> +.argmin = 1,
> +.argmax = 3,
> +.args   = "[-m prealloc_mode] off",
> +.oneline= "truncates the current file at the given offset",
> +};
> +

Forward-declaring truncate_cmd instead of truncate_f would mean a
smaller diffstat.

But that isn’t what other commands do, so.

*shrug*

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 08/15] block: introduce preallocate filter

2020-09-24 Thread Max Reitz
On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> It's intended to be inserted between format and protocol nodes to
> preallocate additional space (expanding protocol file) on writes
> crossing EOF. It improves performance for file-systems with slow
> allocation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  docs/system/qemu-block-drivers.rst.inc |  26 ++
>  qapi/block-core.json   |  20 +-
>  block/preallocate.c| 556 +
>  block/meson.build  |   1 +
>  4 files changed, 602 insertions(+), 1 deletion(-)
>  create mode 100644 block/preallocate.c

Looks good to me in general.

[...]

> diff --git a/block/preallocate.c b/block/preallocate.c
> new file mode 100644
> index 00..6510ad0149
> --- /dev/null
> +++ b/block/preallocate.c

[...]

> +/*
> + * Handle reopen.
> + *
> + * We must implement reopen handlers, otherwise reopen just don't work. 
> Handle
> + * new options and don't care about preallocation state, as it is handled in
> + * set/check permission handlers.
> + */
> +
> +static int preallocate_reopen_prepare(BDRVReopenState *reopen_state,
> +  BlockReopenQueue *queue, Error **errp)
> +{
> +PreallocateOpts *opts = g_new0(PreallocateOpts, 1);
> +
> +if (!preallocate_absorb_opts(opts, reopen_state->options,
> + reopen_state->bs->file->bs, errp)) {

I tried to find out whether this refers to the old file child, or the
post-reopen one.  As far as I could find out, there is no generic
implementation for changing the file child as part of x-blockdev-reopen:

{"error": {"class": "GenericError", "desc": "Cannot change the option
'file'"}}

Now that’s a shame.  That means you can’t reasonably integrate a
preallocate filter into an existing node graph unless the format driver
checks for the respective child option and issues a replace_node on
commit or something, right?  I suppose any driver who’d want to
implement child replacement would need to attach the new node in
prepare() as some pseudo-child, and then drop the old one and rename the
new one in commit().  I don’t think any driver does that yet, so I
suppose no format driver allows replacement of children yet (except for
quorum...).

Does anyone know what the status on that is?  Are there any plans for
implementing child replacement in reopen, or did I just miss something?

(It looks like the backing child can be replaced, but that’s probably
not a child where the preallocate filter would be placed on top...).

> +g_free(opts);
> +return -EINVAL;
> +}
> +
> +reopen_state->opaque = opts;
> +
> +return 0;
> +}

[...]

> +/*
> + * Call on each write. Returns true if @want_merge_zero is true and the 
> region
> + * [offset, offset + bytes) is zeroed (as a result of this call or earlier
> + * preallocation).
> + *
> + * want_merge_zero is used to merge write-zero request with preallocation in
> + * one bdrv_co_pwrite_zeroes() call.
> + */
> +static bool coroutine_fn handle_write(BlockDriverState *bs, int64_t offset,
> +  int64_t bytes, bool want_merge_zero)
> +{
> +BDRVPreallocateState *s = bs->opaque;
> +int64_t end = offset + bytes;
> +int64_t prealloc_start, prealloc_end;
> +int ret;
> +
> +if (!has_prealloc_perms(bs)) {

Took me a bit to figure out, because it takes a trip to
preallocate_child_perm() to figure out exactly when we’re going to have
the necessary permissions for this to pass.

Then it turns out that this is going to be the case exactly when the
parents collectively have the same permissions (WRITE+RESIZE) on the
preallocate node.

Then I had to wonder whether it’s appropriate not to preallocate if
WRITE is taken, but RESIZE isn’t.  But that seems entirely correct, yes.
 If noone is going to grow the file, then there is no need for
preallocation.  (Vice versa, if noone is going to write, but only
resize, then there is no need for preallocation either.)

So this seems correct, yes.

(It could be argued that if one parent has WRITE, and another has RESIZE
(but neither has both), then we probably don’t need preallocation
either.  But in such an arcane case (which is impossible to figure out
in .bdrv_child_perm() anyway), we might as well just do preallocation.
Won’t hurt.)

> +/* We don't have state neither should try to recover it */
> +return false;
> +}
> +
> +if (s->data_end < 0) {
> +s->data_end = bdrv_getlength(bs->file->bs);
> +if (s->data_end < 0) {
> +return false;
> +}
> +
> +if (s->file_end < 0) {
> +s->file_end = s->data_end;
> +}
> +}
> +
> +if (end <= s->data_end) {
> +return false;
> +}
> +
> +/* We have valid s->data_end, and request writes beyond it. */
> +
> +s->data_end = end;
> +if (s->zero_start < 0 || !want_merge_zero) {
> +s->zero_start 

Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions

2020-09-24 Thread Max Reitz
On 24.09.20 16:51, Vladimir Sementsov-Ogievskiy wrote:
> 24.09.2020 16:25, Max Reitz wrote:
>> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>>> Provide API for the COR-filter insertion/removal.
>>>> ...
>>>>>> Also, drop the filter child permissions for an inactive state when
>>>>>> the
>>>>>> filter node is being removed.
>>>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>>
>>>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>>>> similar field in the preallocate filter, and there already is in
>>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>>
>>>>> .bdrv_child_perm() should generally be able to decide what permissions
>>>>> it needs based on the information it gets.  If every driver needs to
>>>>> keep track of whether it has any parents and feed that information
>>>>> into
>>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>>
>>>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>>>> explicitly add a boolean that tells .bdrv_child_perm() whether
>>>>> there are
>>>>> any parents or not – shouldn’t be too difficult.
>>>>
>>>>
>>>> The issue is that we fail in the bdrv_check_update_perm() with
>>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>>> The filter is still in the backing chain by that moment and has a
>>>> parent with child->perm != 0.
>>>>
>>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>>> similar to one in the block/mirror.c.
>>>>
>>>> How could we resolve the issue at the generic layer?
>>>>
>>>>
>>>
>>> The problem is that when we update permissions in bdrv_replace_node, we
>>> consider new placement for "to" node, but old placement for "from" node.
>>> So, during update, we may consider stricter permission requirements for
>>> "from" than needed and they conflict with new "to" permissions.
>>>
>>> We should consider all graph changes for permission update
>>> simultaneously, in same transaction. For this, we need refactor
>>> permission update system to work on queue of nodes instead of simple DFS
>>> recursion. And in the queue all the nodes to update should  be
>>> toplogically sorted. In this way, when we update node N, all its parents
>>> are already updated. And of course, we should make no-perm graph update
>>> before permission update, and rollback no-perm movement if permission
>>> update failed.
>>
>> OK, you’ve convinced me, .active is good enough for now. O:)
>>
>>> And we need topological sort anyway. Consider the following example:
>>>
>>> A -
>>> |  \
>>> |  v
>>> |  B
>>> |  |
>>> v  /
>>> C<-
>>>
>>> A is parent for B and C, B is parent for C.
>>>
>>> Obviously, to update permissions, we should go in order A B C, so, when
>>> we update C, all it's parents permission already updated. But with
>>> current approach (simple recursion) we can update in sequence A C B C (C
>>> is updated twice). On first update of C, we consider old B permissions,
>>> so doing wrong thing. If it succeed, all is OK, on second C update we
>>> will finish with correct graph. But if the wrong thing failed, we break
>>> the whole process for no reason (it's possible that updated B permission
>>> will be less strict, but we will never check it).
>>
>> True.
>>
>>> I'll work on a patch for it.
>>
>> Sounds great, though I fear for the complexity.  I’ll just wait and see
>> for now.
>>
> 
> If you are OK with .active for now, then I think, Andrey can resend with
> .active and I'll dig into permissions in parallel. If Andrey's series
> go first, I'll just drop .active later if my idea works.

Sure, that works for me.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 07/15] block: bdrv_check_perm(): process children anyway

2020-09-24 Thread Max Reitz
On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Do generic processing even for drivers which define .bdrv_check_perm
> handler. It's needed for further preallocate filter: it will need to do
> additional action on bdrv_check_perm, but don't want to reimplement
> generic logic.
> 
> The patch doesn't change existing behaviour: the only driver that
> implements bdrv_check_perm is file-posix, but it never has any
> children.
> 
> Also, bdrv_set_perm() don't stop processing if driver has
> .bdrv_set_perm handler as well.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 9538af4884..165c2d3cb2 100644
> --- a/block.c
> +++ b/block.c
> @@ -1964,8 +1964,7 @@ static void bdrv_child_perm(BlockDriverState *bs, 
> BlockDriverState *child_bs,
>  /*
>   * Check whether permissions on this node can be changed in a way that
>   * @cumulative_perms and @cumulative_shared_perms are the new cumulative
> - * permissions of all its parents. This involves checking whether all 
> necessary
> - * permission changes to child nodes can be performed.
> + * permissions of all its parents.

Why do you want to remove this sentence?

>   *
>   * Will set *tighten_restrictions to true if and only if new permissions 
> have to
>   * be taken or currently shared permissions are to be unshared.  Otherwise,
> @@ -2047,8 +2046,11 @@ static int bdrv_check_perm(BlockDriverState *bs, 
> BlockReopenQueue *q,
>  }
>  
>  if (drv->bdrv_check_perm) {
> -return drv->bdrv_check_perm(bs, cumulative_perms,
> -cumulative_shared_perms, errp);
> +ret = drv->bdrv_check_perm(bs, cumulative_perms,
> +   cumulative_shared_perms, errp);
> +if (ret < 0) {
> +return ret;
> +}
>  }

Sounds good.  It’s also consistent with how bdrv_abort_perm_update() and
bdrv_set_perm() don’t return after calling the respective driver
functions, but always recurse to the children.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions

2020-09-24 Thread Max Reitz
On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
> 17.09.2020 19:09, Andrey Shinkevich wrote:
>> On 04.09.2020 14:22, Max Reitz wrote:
>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>> Provide API for the COR-filter insertion/removal.
>> ...
>>>> Also, drop the filter child permissions for an inactive state when the
>>>> filter node is being removed.
>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>> node (i.e. perm == 0 in cor_child_perm())?
>>>
>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>> similar field in the preallocate filter, and there already is in
>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>
>>> .bdrv_child_perm() should generally be able to decide what permissions
>>> it needs based on the information it gets.  If every driver needs to
>>> keep track of whether it has any parents and feed that information into
>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>
>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>> explicitly add a boolean that tells .bdrv_child_perm() whether there are
>>> any parents or not – shouldn’t be too difficult.
>>
>>
>> The issue is that we fail in the bdrv_check_update_perm() with
>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>> The filter is still in the backing chain by that moment and has a
>> parent with child->perm != 0.
>>
>> The implementation of  the .bdrv_child_perm()in the given patch is
>> similar to one in the block/mirror.c.
>>
>> How could we resolve the issue at the generic layer?
>>
>>
> 
> The problem is that when we update permissions in bdrv_replace_node, we
> consider new placement for "to" node, but old placement for "from" node.
> So, during update, we may consider stricter permission requirements for
> "from" than needed and they conflict with new "to" permissions.
> 
> We should consider all graph changes for permission update
> simultaneously, in same transaction. For this, we need refactor
> permission update system to work on queue of nodes instead of simple DFS
> recursion. And in the queue all the nodes to update should  be
> toplogically sorted. In this way, when we update node N, all its parents
> are already updated. And of course, we should make no-perm graph update
> before permission update, and rollback no-perm movement if permission
> update failed.

OK, you’ve convinced me, .active is good enough for now. O:)

> And we need topological sort anyway. Consider the following example:
> 
> A -
> |  \
> |  v
> |  B
> |  |
> v  /
> C<-
> 
> A is parent for B and C, B is parent for C.
> 
> Obviously, to update permissions, we should go in order A B C, so, when
> we update C, all it's parents permission already updated. But with
> current approach (simple recursion) we can update in sequence A C B C (C
> is updated twice). On first update of C, we consider old B permissions,
> so doing wrong thing. If it succeed, all is OK, on second C update we
> will finish with correct graph. But if the wrong thing failed, we break
> the whole process for no reason (it's possible that updated B permission
> will be less strict, but we will never check it).

True.

> I'll work on a patch for it.

Sounds great, though I fear for the complexity.  I’ll just wait and see
for now.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v8 6/7] block-stream: freeze link to base node during stream job

2020-09-24 Thread Max Reitz
On 07.09.20 14:17, Vladimir Sementsov-Ogievskiy wrote:
> 07.09.2020 14:44, Max Reitz wrote:
>> On 04.09.20 15:48, Vladimir Sementsov-Ogievskiy wrote:
>>> 04.09.2020 16:21, Max Reitz wrote:
>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>> To keep the base node unchanged during the block-stream operation,
>>>>> freeze it as the other part of the backing chain with the intermediate
>>>>> nodes related to the job.
>>>>> This patch revers the change made with the commit c624b015bf as the
>>>>> correct base file name and its format have to be written down to the
>>>>> QCOW2 header on the disk when the backing file is being changed after
>>>>> the stream job completes.
>>>>> This reversion incurs changes in the tests 030, 245 and discards the
>>>>> test 258 where concurrent stream/commit jobs are tested. When the link
>>>>> to a base node is frozen, the concurrent job cannot change the common
>>>>> backing chain.
>>>>>
>>>>> Signed-off-by: Andrey Shinkevich 
>>>>> ---
>>>>>    block/stream.c |  29 ++--
>>>>>    tests/qemu-iotests/030 |  10 +--
>>>>>    tests/qemu-iotests/245 |   2 +-
>>>>>    tests/qemu-iotests/258 | 161
>>>>> -
>>>>>    tests/qemu-iotests/258.out |  33 --
>>>>>    5 files changed, 14 insertions(+), 221 deletions(-)
>>>>>    delete mode 100755 tests/qemu-iotests/258
>>>>>    delete mode 100644 tests/qemu-iotests/258.out
>>>>
>>>> Does this need to be in this series?  (I’m not entirely sure, based on
>>>> what I can see in patch 7.)
>>>>
>>>> When doing this, should we introduce a @bottom-node option
>>>> alongside, so
>>>> that we can make all the tests that are deleted here pass still, just
>>>> with changes?
>>>>
>>>
>>> That's a question to discuss, and I asked Andrey to make this patch
>>> in this
>>> simple way to see, how much damage we have with this change.
>>>
>>> Honestly, I doubt that we need the new option. Previously, we decided
>>> that
>>> we can make this change without a deprecation. If we still going to
>>> do it,
>>> we shouldn't care about these use cases. So, if we freeze base again
>>> wituhout
>>> a depreaction and:
>>>
>>> 1. add bottom-node
>>>
>>>   - we keep the iotests
>>>   - If (unlikely) someone will came and say: hey, you've broken my
>>> working scenario, we will say "use bottom-node option, sorry"
>>>   - Most likely we'll have unused option and corresponding unused logic,
>>> making code more complex for nothing (and we'll have to say "sorry"
>>> anyway)
>>>
>>> 2. without option
>>>
>>>   - we loose the iotests. this looks scary, but it is honest: we drop
>>> use-cases and corresponding iotests
>>>   - If (unlikely) someone will came and say: hey, you've broken my
>>> working scenario, he will have to wait for next release / package
>>> version / or update the downstream himself
>>>   - Most likely all will be OK.
>>
>> Well, yes, we’ll disrupt either way, but it is a difference whether we
>> can tell people immediately that there’s an alternative now, or whether
>> we’ll have to make them wait for the next release.
>>
>> Basically, the whole argument hinges on the question of whether anyone
>> uses this right now or not, and we just don’t know.
>>
>> The question remains whether this patch is necessary for this series.
> 
> Otherwise iotests fail :)
> 
>> We also have the option of introducing @bottom-node, leaving @base’s
>> behavior as-is
> 
> You mean not make it freeze base again, but just don't care?

Yes.  I think the only problem with that would be that it’s unintuitive
in case the graph is modified while the job is running, but I can’t find
that worse than forbidding that case completely.

(And I think it would be easier to explain if we introduced @bottom-node.)

>> and explaining it as a legacy option from which
>> @bottom-node is inferred.  Then specifying @base just becomes weird and
>> problem-prone when the graph is reconfigured while the job is active,
>> but you can get around that by simply using the non-legacy option.
> 
> Hmm. Last time, I thought that bottom-node was a bad idea, as we ha

Re: [PATCH v8 5/7] copy-on-read: limit guest writes to base in COR driver

2020-09-24 Thread Max Reitz
On 22.09.20 15:13, Andrey Shinkevich wrote:
> On 04.09.2020 16:59, Vladimir Sementsov-Ogievskiy wrote:
>> 04.09.2020 15:50, Max Reitz wrote:
>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>> Limit the guest's COR operations by the base node in the backing chain
>>>> during a stream job.
>>>
>>> I don’t understand.   Shouldn’t we limit the areas where we set the COR
>>> flag?
>>>
>>>> Signed-off-by: Andrey Shinkevich 
>>>> ---
>>>>   block/copy-on-read.c | 49
>>>> +
>>>>   1 file changed, 49 insertions(+)
>>>>
>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>>> index 1f858bb..ecbd1f8 100644
>>>> --- a/block/copy-on-read.c
>>>> +++ b/block/copy-on-read.c
>>>> @@ -57,6 +57,37 @@ static BlockDriverState
>>>> *get_base_by_name(BlockDriverState *bs,
>>>>   return base_bs;
>>>>   }
>>>>   +/*
>>>> + * Returns 1 if the block is allocated in a node between
>>>> cor_filter_bs->file->bs
>>>> + * and the base_bs in the backing chain. Otherwise, returns 0.
>>>> + * The COR operation is allowed if the base_bs is not specified -
>>>> return 1.
>>>> + * Returns negative errno on failure.
>>>> + */
>>>> +static int node_above_base(BlockDriverState *cor_filter_bs,
>>>> uint64_t offset,
>>>> +   uint64_t bytes)
>>>> +{
>>>> +    int ret;
>>>> +    int64_t dummy;
>>>> +    BlockDriverState *file = NULL;
>>>> +    BDRVStateCOR *state = cor_filter_bs->opaque;
>>>> +
>>>> +    if (!state->base_bs) {
>>>> +    return 1;
>>>> +    }
>>>> +
>>>> +    ret = bdrv_block_status_above(cor_filter_bs->file->bs,
>>>> state->base_bs,
>>>> +  offset, bytes, , NULL, );
>>>> +    if (ret < 0) {
>>>> +    return ret;
>>>> +    }
>>>> +
>>>> +    if (file) {
>>>
>>> Why check file and not the return value?
>>>
>>>> +    return 1;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>
>>> “dummy” should really not be called that way, it should be evaluated
>>> whether it’s smaller than bytes.
>>>
>>> First, [offset, offset + dummy) may not be allocated above the base –
>>> but [offset + dummy, offset + bytes) may be.  Then this function returns
>>> 0 here, even though there is something in that range that’s allocated.
>>>
>>> Second, in that case we still shouldn’t return 1, but return the
>>> shrunken offset instead.  Or, if there are multiple distinct allocated
>>> areas, they should probably even all be copied separately.
>>>
>>>
>>> (But all of that of course only if this function is intended to be used
>>> to limit where we set the COR flag, because I don’t understand why we’d
>>> want to limit where something can be written.)
>>>
>>
>> Agree to all.
>>
>> 1. Write path shouldn't be changed: it's a copy-on-read filter.
>>
>> 2. On read we need is_allocated_above-driven loop, to insert the flag
>> only to regions allocated above base
>>  (and other regions we read just without the flag, don't skip them).
>> qiov_offset will help very well.
>>
>> 3. Like in many other places, let's ignore  errors (and just add the
>> flag if block_status fails)
> 
> 
> If "block_status" fails, the stream job does not copy. Shall we keep the
> same behavior in the cor_co_preadv_part()?

I think copying can’t really hurt, so I think it would be better to copy
if we aren’t sure (because block_status failed).  The difference to
streaming could well be considered a bug fix.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 00/20] block/export: Allow exporting BDSs via FUSE

2020-09-23 Thread Max Reitz
On 22.09.20 17:58, Daniel P. Berrangé wrote:
> On Tue, Sep 22, 2020 at 12:49:12PM +0200, Max Reitz wrote:
>> Based-on: <20200907182011.521007-1-kw...@redhat.com>
>>   (“block/export: Add infrastructure and QAPI for block exports”)
>>
>> (Though its patch 16 needs a s/= \_exp_nbd/= drv/ on top.)
>>
>> v1: https://lists.nongnu.org/archive/html/qemu-block/2019-12/msg00451.html
>>
>> Branch: https://github.com/XanClic/qemu.git fuse-exports-v2
>> Branch: https://git.xanclic.moe/XanClic/qemu.git fuse-exports-v2
>>
>>
>> Hi,
>>
>> Ever since I found out that you can mount FUSE filesystems on regular
>> files (not just directories), I had the idea of adding FUSE block
>> exports to qemu where you can export block nodes as raw images.  The
>> best thing is that you’d be able to mount an image on itself, so
>> whatever format it may be in, qemu lets it appear as a raw image (and
>> you can then use regular tools like dd on it).
>>
>> The performance is quite bad so far, but we can always try to improve it
>> if the need arises.  For now I consider it mostly a cute feature to get
>> easy access to the raw contents of image files in any format (without
>> requiring root rights).
> 
> Aside from the iotests, so you forsee any particular use cases
> where this feature is desirable / important ?

No.

I implemented this feature for fun last year (when I realized that FUSE
allows regular files to be mount points), and I got positive reactions.
 I assumed others would find it as nice as me to be able to quickly
access an image file without requiring root rights (and then device file
accesses), or setting up an NBD chain.

(Though it should be noted that when I first came up with the feature,
nbdfuse did not exist yet.)

(It should also be noted that my original idea was to have a new
executable qemu-blkfuse that would basically allow you to invoke
“qemu-blkfuse $img”, and then $img would appear as a raw image.  To me,
that appeared very useful because it was so simple.  I admit that the
current proposal, which relies on the storage-daemon, has none of that
simplicity.  But if that’s the problem that prevents this from being
considered useful, I’m sure we (I) can figure something out.  Perhaps a
simple script, bundled with qemu, that can generate -blockdev
invocations based on the result of file(1).)

> Looking at it from a security POV, I'm not thrilled about the
> idea of granting QEMU permission to use the mount syscall for
> seccomp or SELinux. IOW, I expect this feature won't be something
> we want to expose in QEMU guests managed by libvirt, which would
> limit how widely it can be used.

I don’t expect this to be used through QEMU very much, but through the
storage daemon.  I assume that for the storage daemon, the permissions
can effectively be fine-tuned for each export, because you can “just”
launch another instance.

> QEMU can export NBD. Would it make sense to do this as an NBD
> client ? There's already https://libguestfs.org/nbdfuse.1.html
> but IIUC that exposes it as a file within a dir. Presumably
> it is not too hard to make it support exposing it directly as
> a file too.

I don’t like that idea very much, because my main gripe with the current
state of my proposal is that it’s more cumbersome than
“qemu-blkfuse $img”.  Adding more indirections won’t make it simpler.

> I wonder how performance compares between your native FUSE
> impl in QEMU vs NBD FUSE ?

Last year, I tried various ways of improving performance and nothing
really amounted to much.  So in the end I settled for a simple and naive
implementation, for it to be improved in case anyone cares for it.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 01/20] configure: Detect libfuse

2020-09-22 Thread Max Reitz
On 22.09.20 13:14, Thomas Huth wrote:

[...]

> Could you turn this immediately into a meson test instead? See e.g.
> https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg07112.html as
> an example for how to do this.

Done (I think).  Until I send v3, the updated version lives here:

https://git.xanclic.moe/XanClic/qemu/commits/branch/fuse-exports-next
https://github.com/XanClic/qemu/commits/fuse-exports-next

I’ve replaced the compile checks (on test programs) by simpler version
checks (>=3.1 for libfuse itself, >=3.8 for fuse-lseek).

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 01/20] configure: Detect libfuse

2020-09-22 Thread Max Reitz
On 22.09.20 13:14, Thomas Huth wrote:

[...]

> Could you turn this immediately into a meson test instead? See e.g.
> https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg07112.html as
> an example for how to do this.

Sure!

Max



signature.asc
Description: OpenPGP digital signature


[PATCH v2 18/20] iotests: Allow testing FUSE exports

2020-09-22 Thread Max Reitz
This pretends FUSE exports are a kind of protocol.  As such, they are
always tested under the format node.  This is probably the best way to
test them, actually, because this will generate more I/O load and more
varied patterns.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/check |   6 ++
 tests/qemu-iotests/common.filter |   5 +-
 tests/qemu-iotests/common.rc | 124 +++
 3 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 467a7cf1b7..07232138d7 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -270,6 +270,7 @@ image protocol options
 -rbdtest rbd
 -sheepdog   test sheepdog
 -nbdtest nbd
+-fuse   test fuse
 -sshtest ssh
 -nfstest nfs
 
@@ -382,6 +383,11 @@ testlist options
 xpand=false
 ;;
 
+-fuse)
+IMGPROTO=fuse
+xpand=false
+;;
+
 -ssh)
 IMGPROTO=ssh
 xpand=false
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 838ed15793..172ea5752e 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -44,7 +44,8 @@ _filter_qom_path()
 _filter_testdir()
 {
 $SED -e "s#$TEST_DIR/#TEST_DIR/#g" \
- -e "s#$SOCK_DIR/#SOCK_DIR/#g"
+ -e "s#$SOCK_DIR/#SOCK_DIR/#g" \
+ -e "s#SOCK_DIR/fuse-#TEST_DIR/#g"
 }
 
 # replace occurrences of the actual IMGFMT value with IMGFMT
@@ -127,6 +128,7 @@ _filter_img_create_filenames()
 -e "s#$IMGPROTO:$TEST_DIR#TEST_DIR#g" \
 -e "s#$TEST_DIR#TEST_DIR#g" \
 -e "s#$SOCK_DIR#SOCK_DIR#g" \
+-e 's#SOCK_DIR/fuse-#TEST_DIR/#g' \
 -e "s#$IMGFMT#IMGFMT#g" \
 -e 's#nbd:127.0.0.1:[0-9]\\+#TEST_DIR/t.IMGFMT#g' \
 -e 's#nbd+unix:///\??socket=SOCK_DIR/nbd#TEST_DIR/t.IMGFMT#g'
@@ -227,6 +229,7 @@ _filter_img_info()
 -e "s#$IMGFMT#IMGFMT#g" \
 -e 's#nbd://127.0.0.1:[0-9]\\+$#TEST_DIR/t.IMGFMT#g' \
 -e 's#nbd+unix:///\??socket=SOCK_DIR/nbd#TEST_DIR/t.IMGFMT#g' \
+-e 's#SOCK_DIR/fuse-#TEST_DIR/#g' \
 -e "/encrypted: yes/d" \
 -e "/cluster_size: [0-9]\\+/d" \
 -e "/table_size: [0-9]\\+/d" \
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index e4751d4985..e17f813f06 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -257,6 +257,9 @@ if [ "$IMGOPTSSYNTAX" = "true" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 TEST_IMG="$DRIVER,file.driver=nbd,file.type=unix"
 TEST_IMG="$TEST_IMG,file.path=$SOCK_DIR/nbd"
+elif [ "$IMGPROTO" = "fuse" ]; then
+TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
+TEST_IMG="$DRIVER,file.filename=$SOCK_DIR/fuse-t.$IMGFMT"
 elif [ "$IMGPROTO" = "ssh" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 
TEST_IMG="$DRIVER,file.driver=ssh,file.host=127.0.0.1,file.path=$TEST_IMG_FILE"
@@ -273,6 +276,9 @@ else
 elif [ "$IMGPROTO" = "nbd" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 TEST_IMG="nbd+unix:///?socket=$SOCK_DIR/nbd"
+elif [ "$IMGPROTO" = "fuse" ]; then
+TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
+TEST_IMG="$SOCK_DIR/fuse-t.$IMGFMT"
 elif [ "$IMGPROTO" = "ssh" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 
REMOTE_TEST_DIR="ssh://\\($USER@\\)\\?127.0.0.1\\(:[0-9]\\+\\)\\?$TEST_DIR"
@@ -288,6 +294,9 @@ fi
 ORIG_TEST_IMG_FILE=$TEST_IMG_FILE
 ORIG_TEST_IMG="$TEST_IMG"
 
+FUSE_PIDS=()
+FUSE_EXPORTS=()
+
 if [ -z "$TEST_DIR" ]; then
 TEST_DIR=$PWD/scratch
 fi
@@ -357,6 +366,10 @@ _test_img_to_test_img_file()
 echo "$1"
 ;;
 
+fuse)
+echo "$1" | sed -e "s#$SOCK_DIR/fuse-#$TEST_DIR/#"
+;;
+
 nfs)
 echo "$1" | sed -e "s#nfs://127.0.0.1##"
 ;;
@@ -385,6 +398,11 @@ _make_test_img()
 local opts_param=false
 local misc_params=()
 
+if [[ $IMGPROTO == fuse && $TEST_IMG == $SOCK_DIR/fuse-* ]]; then
+# The caller may be trying to overwrite an existing image
+_rm_test_img "$TEST_IMG"
+fi
+
 if [ -z "$TEST_IMG_FILE" ]; then
 img_name=$TEST_IMG
 elif [ "$IMGOPTSSYNTAX" != "true" -a \
@@ -469,11 +487,105 @@ _make_test_img()
 eval "$QEMU_NBD -v -t -k '$SOCK_DIR/nbd' -f $IMGFMT -e 42 -x '' 
$TEST_IMG_FILE >/dev/null &"

[PATCH v2 15/20] iotests/287: Clean up subshell test image

2020-09-22 Thread Max Reitz
287 creates an image in a subshell (thanks to the pipe) to see whether
that is possible with compression_type=zstd.  If _make_test_img were to
modify any global state, this global state would then be lost before we
could cleanup the image.

When using FUSE as the test protocol, this global state is important, so
clean up the image before the state is lost.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/287 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
index f98a4cadc1..036cc09e82 100755
--- a/tests/qemu-iotests/287
+++ b/tests/qemu-iotests/287
@@ -51,8 +51,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 CLUSTER_SIZE=65536
 
 # Check if we can run this test.
-if IMGOPTS='compression_type=zstd' _make_test_img 64M |
-grep "Invalid parameter 'zstd'"; then
+output=$(_make_test_img -o 'compression_type=zstd' 64M; _cleanup_test_img)
+if echo "$output" | grep -q "Invalid parameter 'zstd'"; then
 _notrun "ZSTD is disabled"
 fi
 
-- 
2.26.2




[PATCH v2 17/20] iotests: Give access to the qemu-storage-daemon

2020-09-22 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/check | 11 +++
 tests/qemu-iotests/common.rc | 17 +
 2 files changed, 28 insertions(+)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index e14a1f354d..467a7cf1b7 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -644,6 +644,17 @@ if [ -z $QEMU_NBD_PROG ]; then
 fi
 export QEMU_NBD_PROG="$(type -p "$QEMU_NBD_PROG")"
 
+if [ -z "$QEMU_STGD_PROG" ]; then
+if [ -x "$build_iotests/qemu-storage-daemon" ]; then
+export QEMU_STGD_PROG="$build_iotests/qemu-storage-daemon"
+elif [ -x "$build_root/storage-daemon/qemu-storage-daemon" ]; then
+export QEMU_STGD_PROG="$build_root/storage-daemon/qemu-storage-daemon"
+else
+_init_error "qemu-storage-daemon not found"
+fi
+fi
+export QEMU_STGD_PROG="$(type -p "$QEMU_STGD_PROG")"
+
 if [ -x "$build_iotests/socket_scm_helper" ]
 then
 export SOCKET_SCM_HELPER="$build_iotests/socket_scm_helper"
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 23f46da2db..e4751d4985 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -124,6 +124,7 @@ fi
 : ${VALGRIND_QEMU_IMG=$VALGRIND_QEMU}
 : ${VALGRIND_QEMU_IO=$VALGRIND_QEMU}
 : ${VALGRIND_QEMU_NBD=$VALGRIND_QEMU}
+: ${VALGRIND_QEMU_STGD=$VALGRIND_QEMU}
 
 # The Valgrind own parameters may be set with
 # its environment variable VALGRIND_OPTS, e.g.
@@ -211,6 +212,21 @@ _qemu_nbd_wrapper()
 return $RETVAL
 }
 
+_qemu_storage_daemon_wrapper()
+{
+local VALGRIND_LOGFILE="${TEST_DIR}"/$$.valgrind
+(
+if [ -n "${QEMU_STGD_NEED_PID}" ]; then
+echo $BASHPID > "${QEMU_TEST_DIR}/qemu-storage-daemon.pid"
+fi
+VALGRIND_QEMU="${VALGRIND_QEMU_STGD}" _qemu_proc_exec 
"${VALGRIND_LOGFILE}" \
+"$QEMU_STGD_PROG" $QEMU_STGD_OPTIONS "$@"
+)
+RETVAL=$?
+_qemu_proc_valgrind_log "${VALGRIND_LOGFILE}" $RETVAL
+return $RETVAL
+}
+
 # Valgrind bug #409141 https://bugs.kde.org/show_bug.cgi?id=409141
 # Until valgrind 3.16+ is ubiquitous, we must work around a hang in
 # valgrind when issuing sigkill. Disable valgrind for this invocation.
@@ -223,6 +239,7 @@ export QEMU=_qemu_wrapper
 export QEMU_IMG=_qemu_img_wrapper
 export QEMU_IO=_qemu_io_wrapper
 export QEMU_NBD=_qemu_nbd_wrapper
+export QEMU_STGD=_qemu_storage_daemon_wrapper
 
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
 DRIVER="driver=$IMGFMT"
-- 
2.26.2




[PATCH v2 14/20] iotests: Let _make_test_img guess $TEST_IMG_FILE

2020-09-22 Thread Max Reitz
When most iotests want to create a test image that is named differently
from the default $TEST_IMG, they do something like this:

TEST_IMG="$TEST_IMG.base" _make_test_img $options

This works fine with the "file" protocol, but not so much for anything
else: _make_test_img tries to create an image under $TEST_IMG_FILE
first, and only under $TEST_IMG if the former is not set; and on
everything but "file", $TEST_IMG_FILE is set.

There are two ways we can fix this: First, we could make all tests
adjust not only TEST_IMG, but also TEST_IMG_FILE if that is present
(e.g. with something like _set_test_img_suffix $suffix that would affect
not only TEST_IMG but also TEST_IMG_FILE, if necessary).  This is a
pretty clean solution, and this is maybe what we should have done from
the start.

But it would also require changes to most existing bash tests.  So the
alternative is this: Let _make_test_img see whether $TEST_IMG_FILE still
points to the original value.  If so, it is possible that the caller has
adjusted $TEST_IMG but not $TEST_IMG_FILE.  In such a case, we can (for
most protocols) derive the corresponding $TEST_IMG_FILE value from
$TEST_IMG value and thus work around what technically is the caller
misbehaving.

This second solution is less clean, but it is robust against people
keeping their old habit of adjusting TEST_IMG only, and requires much
less changes.  So this patch implements it.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/common.rc | 40 +---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 494490a272..23f46da2db 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -268,6 +268,7 @@ else
 TEST_IMG=$IMGPROTO:$TEST_DIR/t.$IMGFMT
 fi
 fi
+ORIG_TEST_IMG_FILE=$TEST_IMG_FILE
 ORIG_TEST_IMG="$TEST_IMG"
 
 if [ -z "$TEST_DIR" ]; then
@@ -330,6 +331,30 @@ _get_data_file()
 | sed -e "s#\\\$TEST_IMG#$1#"
 }
 
+# Translate a $TEST_IMG to its corresponding $TEST_IMG_FILE for
+# different protocols
+_test_img_to_test_img_file()
+{
+case "$IMGPROTO" in
+file)
+echo "$1"
+;;
+
+nfs)
+echo "$1" | sed -e "s#nfs://127.0.0.1##"
+;;
+
+ssh)
+echo "$1" | \
+sed -e "s#ssh://\\($USER@\\)\\?127.0.0.1\\(:[0-9]\\+\\)\\?##"
+;;
+
+*)
+return 1
+;;
+esac
+}
+
 _make_test_img()
 {
 # extra qemu-img options can be added by tests
@@ -343,10 +368,19 @@ _make_test_img()
 local opts_param=false
 local misc_params=()
 
-if [ -n "$TEST_IMG_FILE" ]; then
-img_name=$TEST_IMG_FILE
-else
+if [ -z "$TEST_IMG_FILE" ]; then
 img_name=$TEST_IMG
+elif [ "$IMGOPTSSYNTAX" != "true" -a \
+   "$TEST_IMG_FILE" = "$ORIG_TEST_IMG_FILE" ]; then
+# Handle cases of tests only updating TEST_IMG, but not TEST_IMG_FILE
+img_name=$(_test_img_to_test_img_file "$TEST_IMG")
+if [ "$?" != 0 ]; then
+img_name=$TEST_IMG_FILE
+fi
+else
+# $TEST_IMG_FILE is not the default value, so it definitely has been
+# modified by the test
+img_name=$TEST_IMG_FILE
 fi
 
 if [ -n "$IMGOPTS" ]; then
-- 
2.26.2




[PATCH v2 13/20] iotests: Restrict some Python tests to file

2020-09-22 Thread Max Reitz
Most Python tests are restricted to the file protocol (without
explicitly saying so), but these are the ones that would break
./check -fuse -qcow2.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/206 | 3 ++-
 tests/qemu-iotests/242 | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/206 b/tests/qemu-iotests/206
index 11bc51f256..0a3ee5ef00 100755
--- a/tests/qemu-iotests/206
+++ b/tests/qemu-iotests/206
@@ -23,7 +23,8 @@
 import iotests
 from iotests import imgfmt
 
-iotests.script_initialize(supported_fmts=['qcow2'])
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_protocols=['file'])
 iotests.verify_working_luks()
 
 with iotests.FilePath('t.qcow2') as disk_path, \
diff --git a/tests/qemu-iotests/242 b/tests/qemu-iotests/242
index 64f1bd95e4..a16de3085f 100755
--- a/tests/qemu-iotests/242
+++ b/tests/qemu-iotests/242
@@ -24,7 +24,8 @@ import struct
 from iotests import qemu_img_create, qemu_io, qemu_img_pipe, \
 file_path, img_info_log, log, filter_qemu_io
 
-iotests.script_initialize(supported_fmts=['qcow2'])
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_protocols=['file'])
 
 disk = file_path('disk')
 chunk = 256 * 1024
-- 
2.26.2




[PATCH v2 20/20] iotests/308: Add test for FUSE exports

2020-09-22 Thread Max Reitz
We have good coverage of the normal I/O paths now, but what remains is a
test that tests some more special cases: Exporting an image on itself
(thus turning a formatted image into a raw one), some error cases, and
non-writable and non-growable exports.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/308 | 339 +
 tests/qemu-iotests/308.out |  97 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 437 insertions(+)
 create mode 100755 tests/qemu-iotests/308
 create mode 100644 tests/qemu-iotests/308.out

diff --git a/tests/qemu-iotests/308 b/tests/qemu-iotests/308
new file mode 100755
index 00..b30f4400f6
--- /dev/null
+++ b/tests/qemu-iotests/308
@@ -0,0 +1,339 @@
+#!/usr/bin/env bash
+#
+# Test FUSE exports (in ways that are not captured by the generic
+# tests)
+#
+# Copyright (C) 2020 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+seq=$(basename "$0")
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_qemu
+_cleanup_test_img
+rmdir "$EXT_MP" 2>/dev/null
+rm -f "$EXT_MP"
+rm -f "$COPIED_IMG"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+# Generic format, but needs a plain filename
+_supported_fmt generic
+if [ "$IMGOPTSSYNTAX" = "true" ]; then
+_unsupported_fmt $IMGFMT
+fi
+# We need the image to have exactly the specified size, and VPC does
+# not allow that by default
+_unsupported_fmt vpc
+
+_supported_proto file # We create the FUSE export manually
+_supported_os Linux # We need /dev/urandom
+
+# $1: Export ID
+# $2: Options (beyond the node-name and ID)
+# $3: Expected return value (defaults to 'return')
+# $4: Node to export (defaults to 'node-format')
+fuse_export_add()
+{
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'block-export-add',
+  'arguments': {
+  'type': 'fuse',
+  'id': '$1',
+  'node-name': '${4:-node-format}',
+  $2
+  } }" \
+"${3:-return}" \
+| _filter_imgfmt
+}
+
+# $1: Export ID
+fuse_export_del()
+{
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'block-export-del',
+  'arguments': {
+  'id': '$1'
+  } }" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+'' \
+'BLOCK_EXPORT_DELETED'
+}
+
+# Return the length of the protocol file
+# $1: Protocol node export mount point
+# $2: Original file (to compare)
+get_proto_len()
+{
+len1=$(stat -c '%s' "$1")
+len2=$(stat -c '%s' "$2")
+
+if [ "$len1" != "$len2" ]; then
+echo 'ERROR: Length of export and original differ:' >&2
+echo "$len1 != $len2" >&2
+else
+echo '(OK: Lengths of export and original are the same)' >&2
+fi
+
+echo "$len1"
+}
+
+COPIED_IMG="$TEST_IMG.copy"
+EXT_MP="$TEST_IMG.fuse"
+
+echo '=== Set up ==='
+
+# Create image with random data
+_make_test_img 64M
+$QEMU_IO -c 'write -s /dev/urandom 0 64M' "$TEST_IMG" | _filter_qemu_io
+
+_launch_qemu
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'qmp_capabilities'}" \
+'return'
+
+# Separate blockdev-add calls for format and protocol so we can remove
+# the format layer later on
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'blockdev-add',
+  'arguments': {
+  'driver': 'file',
+  'node-name': 'node-protocol',
+  'filename': '$TEST_IMG'
+  } }" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'blockdev-add',
+  'arguments': {
+  'driver': '$IMGFMT',
+  'node-name': 'node-format',
+  'file': 'node-protocol'
+  } }" \
+'return'
+
+echo
+echo '=== Mountpoint not present ==='
+
+rmdir "$EXT_MP" 2>/dev/null
+rm -f "$EXT_MP"
+output=$(fuse_export_add 'export-err' "'mountpoint': '$EXT_MP'" error)
+
+if echo "$output" | grep -q "Invalid parameter 'fuse'"; then
+_notrun 'No FUSE support'
+fi
+
+echo "$output"
+

[PATCH v2 11/20] iotests: Derive image names from $TEST_IMG

2020-09-22 Thread Max Reitz
Avoid creating images with custom filenames in $TEST_DIR, because
non-file protocols may want to keep $TEST_IMG (and all other test
images) in some other directory.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/200 | 3 +--
 tests/qemu-iotests/200.out | 4 ++--
 tests/qemu-iotests/229 | 3 +--
 tests/qemu-iotests/229.out | 6 +++---
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
index 59f7854b9f..a7aabbd032 100755
--- a/tests/qemu-iotests/200
+++ b/tests/qemu-iotests/200
@@ -44,8 +44,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2 qed
 _supported_proto file
 
-BACKING_IMG="${TEST_DIR}/backing.img"
-TEST_IMG="${TEST_DIR}/test.img"
+BACKING_IMG="$TEST_IMG.base"
 
 TEST_IMG="$BACKING_IMG" _make_test_img 512M
 _make_test_img -F $IMGFMT -b "$BACKING_IMG" 512M
diff --git a/tests/qemu-iotests/200.out b/tests/qemu-iotests/200.out
index a6776070e4..5883f16ac3 100644
--- a/tests/qemu-iotests/200.out
+++ b/tests/qemu-iotests/200.out
@@ -1,6 +1,6 @@
 QA output created by 200
-Formatting 'TEST_DIR/backing.img', fmt=IMGFMT size=536870912
-Formatting 'TEST_DIR/test.img', fmt=IMGFMT size=536870912 
backing_file=TEST_DIR/backing.img backing_fmt=IMGFMT
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=536870912
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=536870912 
backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
 wrote 314572800/314572800 bytes at offset 512
 300 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
diff --git a/tests/qemu-iotests/229 b/tests/qemu-iotests/229
index 89a5359f32..5f759fa587 100755
--- a/tests/qemu-iotests/229
+++ b/tests/qemu-iotests/229
@@ -51,8 +51,7 @@ _supported_os Linux
 _unsupported_imgopts data_file
 
 
-DEST_IMG="$TEST_DIR/d.$IMGFMT"
-TEST_IMG="$TEST_DIR/b.$IMGFMT"
+DEST_IMG="$TEST_IMG.dest"
 BLKDEBUG_CONF="$TEST_DIR/blkdebug.conf"
 
 _make_test_img 2M
diff --git a/tests/qemu-iotests/229.out b/tests/qemu-iotests/229.out
index 4de6dfaa28..7eed393013 100644
--- a/tests/qemu-iotests/229.out
+++ b/tests/qemu-iotests/229.out
@@ -1,6 +1,6 @@
 QA output created by 229
-Formatting 'TEST_DIR/b.IMGFMT', fmt=IMGFMT size=2097152
-Formatting 'TEST_DIR/d.IMGFMT', fmt=IMGFMT size=2097152
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2097152
+Formatting 'TEST_DIR/t.IMGFMT.dest', fmt=IMGFMT size=2097152
 wrote 2097152/2097152 bytes at offset 0
 2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {'execute': 'qmp_capabilities'}
@@ -8,7 +8,7 @@ wrote 2097152/2097152 bytes at offset 0
 
 === Starting drive-mirror, causing error & stop  ===
 
-{'execute': 'drive-mirror', 'arguments': {'device': 'testdisk', 'format': 
'IMGFMT', 'target': 'blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/d.IMGFMT', 
'sync': 'full', 'mode': 'existing', 'on-source-error': 'stop', 
'on-target-error': 'stop' }}
+{'execute': 'drive-mirror', 'arguments': {'device': 'testdisk', 'format': 
'IMGFMT', 'target': 'blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT.dest', 
'sync': 'full', 'mode': 'existing', 'on-source-error': 'stop', 
'on-target-error': 'stop' }}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "created", "id": "testdisk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "running", "id": "testdisk"}}
 {"return": {}}
-- 
2.26.2




[PATCH v2 09/20] iotests: Use convert -n in some cases

2020-09-22 Thread Max Reitz
qemu-img convert (without -n) can often be replaced by a combination of
_make_test_img + qemu-img convert -n.  Doing so allows converting to
protocols that do not allow direct file creation, such as FUSE exports.
The only problem is that for formats other than qcow2 and qed (qcow1 at
least), this may lead to high disk usage for some reason, so we cannot
do it everywhere.

But we can do it in 028 and 089, so let us do that so they can run on
FUSE exports.  Also, in 028 this allows us to remove a 9-line comment
that used to explain why we cannot safely filter drive-backup's image
creation output.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/028 | 14 --
 tests/qemu-iotests/028.out |  3 +++
 tests/qemu-iotests/089 |  3 ++-
 tests/qemu-iotests/089.out |  1 +
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index 6dd3ae09a3..864dc4a4e2 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -116,16 +116,10 @@ else
 QEMU_COMM_TIMEOUT=1
 fi
 
-# Silence output since it contains the disk image path and QEMU's readline
-# character echoing makes it very hard to filter the output. Plus, there
-# is no telling how many times the command will repeat before succeeding.
-# (Note that creating the image results in a "Formatting..." message over
-# stdout, which is the same channel the monitor uses.  We cannot reliably
-# wait for it because the monitor output may interact with it in such a
-# way that _timed_wait_for cannot read it.  However, once the block job is
-# done, we know that the "Formatting..." message must have appeared
-# already, so the output is still deterministic.)
-silent=y _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)"
+TEST_IMG="$TEST_IMG.copy" _make_test_img $image_size
+_send_qemu_cmd $h "drive_backup -n disk ${TEST_IMG}.copy" "(qemu)" \
+| _filter_imgfmt
+
 silent=y qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active 
jobs"
 _send_qemu_cmd $h "info block-jobs" "No active jobs"
 _send_qemu_cmd $h 'quit' ""
diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out
index 5a68de5c46..e580488216 100644
--- a/tests/qemu-iotests/028.out
+++ b/tests/qemu-iotests/028.out
@@ -468,6 +468,9 @@ No errors were found on the image.
 
 block-backup
 
+Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) drive_backup -n disk TEST_DIR/t.IMGFMT.copy
 (qemu) info block-jobs
 No active jobs
 === IO: pattern 195
diff --git a/tests/qemu-iotests/089 b/tests/qemu-iotests/089
index 66c5415abe..03a2ccf1e8 100755
--- a/tests/qemu-iotests/089
+++ b/tests/qemu-iotests/089
@@ -62,7 +62,8 @@ TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write -P 42 0 512' -c 'write -P 23 512 512' \
  -c 'write -P 66 1024 512' "$TEST_IMG.base" | _filter_qemu_io
 
-$QEMU_IMG convert -f raw -O $IMGFMT "$TEST_IMG.base" "$TEST_IMG"
+_make_test_img $IMG_SIZE
+$QEMU_IMG convert -f raw -O $IMGFMT -n "$TEST_IMG.base" "$TEST_IMG"
 
 $QEMU_IO_PROG --cache $CACHEMODE --aio $AIOMODE \
  -c 'read -P 42 0 512' -c 'read -P 23 512 512' \
diff --git a/tests/qemu-iotests/089.out b/tests/qemu-iotests/089.out
index 15682c2886..c53fc4823a 100644
--- a/tests/qemu-iotests/089.out
+++ b/tests/qemu-iotests/089.out
@@ -9,6 +9,7 @@ wrote 512/512 bytes at offset 512
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 512/512 bytes at offset 1024
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 512/512 bytes at offset 512
-- 
2.26.2




[PATCH v2 08/20] iotests: Do not pipe _make_test_img

2020-09-22 Thread Max Reitz
Executing _make_test_img as part of a pipe will undo all variable
changes it has done.  As such, this could not work with FUSE (because
we want to remember all of our exports and their qemu instances).

Replace the pipe by a temporary file in 071 and 174 (the two tests that
can run on FUSE).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/071 | 19 +++
 tests/qemu-iotests/174 | 10 +-
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
index 88faebcc1d..18fe9054b0 100755
--- a/tests/qemu-iotests/071
+++ b/tests/qemu-iotests/071
@@ -61,8 +61,17 @@ echo
 echo "=== Testing blkverify through filename ==="
 echo
 
-TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE |\
-_filter_imgfmt
+# _make_test_img may set variables that we need to retain.  Everything
+# in a pipe is executed in a subshell, so doing so would throw away
+# all changes.  Therefore, we have to store the output in some temp
+# file and filter that.
+scratch_out="$TEST_DIR/img-create.out"
+
+TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE \
+>"$scratch_out"
+_filter_imgfmt <"$scratch_out"
+rm -f "$scratch_out"
+
 _make_test_img $IMG_SIZE
 $QEMU_IO -c "open -o 
driver=raw,file.driver=blkverify,file.raw.filename=$TEST_IMG.base $TEST_IMG" \
  -c 'read 0 512' -c 'write -P 42 0x38000 512' -c 'read -P 42 0x38000 
512' | _filter_qemu_io
@@ -76,8 +85,10 @@ echo
 echo "=== Testing blkverify through file blockref ==="
 echo
 
-TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE |\
-_filter_imgfmt
+TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE \
+>"$scratch_out"
+_filter_imgfmt <"$scratch_out"
+
 _make_test_img $IMG_SIZE
 $QEMU_IO -c "open -o 
driver=raw,file.driver=blkverify,file.raw.filename=$TEST_IMG.base,file.test.driver=$IMGFMT,file.test.file.filename=$TEST_IMG"
 \
  -c 'read 0 512' -c 'write -P 42 0x38000 512' -c 'read -P 42 0x38000 
512' | _filter_qemu_io
diff --git a/tests/qemu-iotests/174 b/tests/qemu-iotests/174
index e2f14a38c6..1b0dd2e8b7 100755
--- a/tests/qemu-iotests/174
+++ b/tests/qemu-iotests/174
@@ -40,7 +40,15 @@ _unsupported_fmt raw
 
 
 size=256K
-IMGFMT=raw IMGKEYSECRET= _make_test_img --no-opts $size | _filter_imgfmt
+
+# _make_test_img may set variables that we need to retain.  Everything
+# in a pipe is executed in a subshell, so doing so would throw away
+# all changes.  Therefore, we have to store the output in some temp
+# file and filter that.
+scratch_out="$TEST_DIR/img-create.out"
+IMGFMT=raw IMGKEYSECRET= _make_test_img --no-opts $size >"$scratch_out"
+_filter_imgfmt <"$scratch_out"
+rm -f "$scratch_out"
 
 echo
 echo "== reading wrong format should fail =="
-- 
2.26.2




[PATCH v2 19/20] iotests: Enable fuse for many tests

2020-09-22 Thread Max Reitz
Many tests (that do not support generic protocols) can run just fine
with FUSE-exported images, so allow them to.  Note that this is no
attempt at being definitely complete.  There are some tests that might
be modified to run on FUSE, but this patch still skips them.  This patch
only tries to pick the rather low-hanging fruits.

Note that 221 and 250 only pass when .lseek is correctly implemented,
which is only possible with a libfuse that is 3.8 or newer.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/025 | 2 +-
 tests/qemu-iotests/026 | 2 +-
 tests/qemu-iotests/028 | 2 +-
 tests/qemu-iotests/031 | 2 +-
 tests/qemu-iotests/034 | 2 +-
 tests/qemu-iotests/036 | 2 +-
 tests/qemu-iotests/037 | 2 +-
 tests/qemu-iotests/038 | 2 +-
 tests/qemu-iotests/039 | 2 +-
 tests/qemu-iotests/046 | 2 +-
 tests/qemu-iotests/050 | 2 +-
 tests/qemu-iotests/054 | 2 +-
 tests/qemu-iotests/060 | 2 +-
 tests/qemu-iotests/071 | 2 +-
 tests/qemu-iotests/079 | 2 +-
 tests/qemu-iotests/080 | 2 +-
 tests/qemu-iotests/089 | 2 +-
 tests/qemu-iotests/090 | 2 +-
 tests/qemu-iotests/091 | 2 +-
 tests/qemu-iotests/095 | 2 +-
 tests/qemu-iotests/097 | 2 +-
 tests/qemu-iotests/098 | 2 +-
 tests/qemu-iotests/102 | 2 +-
 tests/qemu-iotests/103 | 2 +-
 tests/qemu-iotests/106 | 2 +-
 tests/qemu-iotests/107 | 2 +-
 tests/qemu-iotests/108 | 2 +-
 tests/qemu-iotests/111 | 2 +-
 tests/qemu-iotests/112 | 2 +-
 tests/qemu-iotests/115 | 2 +-
 tests/qemu-iotests/117 | 2 +-
 tests/qemu-iotests/120 | 2 +-
 tests/qemu-iotests/121 | 2 +-
 tests/qemu-iotests/127 | 2 +-
 tests/qemu-iotests/133 | 2 +-
 tests/qemu-iotests/137 | 2 +-
 tests/qemu-iotests/138 | 2 +-
 tests/qemu-iotests/140 | 2 +-
 tests/qemu-iotests/154 | 2 +-
 tests/qemu-iotests/161 | 2 +-
 tests/qemu-iotests/171 | 2 +-
 tests/qemu-iotests/175 | 2 +-
 tests/qemu-iotests/176 | 2 +-
 tests/qemu-iotests/177 | 2 +-
 tests/qemu-iotests/179 | 2 +-
 tests/qemu-iotests/183 | 2 +-
 tests/qemu-iotests/186 | 2 +-
 tests/qemu-iotests/187 | 2 +-
 tests/qemu-iotests/191 | 2 +-
 tests/qemu-iotests/195 | 2 +-
 tests/qemu-iotests/200 | 2 +-
 tests/qemu-iotests/204 | 2 +-
 tests/qemu-iotests/214 | 2 +-
 tests/qemu-iotests/217 | 2 +-
 tests/qemu-iotests/220 | 2 +-
 tests/qemu-iotests/221 | 2 +-
 tests/qemu-iotests/229 | 2 +-
 tests/qemu-iotests/247 | 2 +-
 tests/qemu-iotests/249 | 2 +-
 tests/qemu-iotests/250 | 2 +-
 tests/qemu-iotests/252 | 2 +-
 tests/qemu-iotests/265 | 2 +-
 tests/qemu-iotests/268 | 2 +-
 tests/qemu-iotests/272 | 2 +-
 tests/qemu-iotests/273 | 2 +-
 tests/qemu-iotests/279 | 2 +-
 tests/qemu-iotests/286 | 2 +-
 tests/qemu-iotests/287 | 2 +-
 tests/qemu-iotests/289 | 2 +-
 tests/qemu-iotests/290 | 2 +-
 tests/qemu-iotests/291 | 2 +-
 tests/qemu-iotests/292 | 2 +-
 tests/qemu-iotests/293 | 2 +-
 tests/qemu-iotests/294 | 2 +-
 tests/qemu-iotests/305 | 2 +-
 75 files changed, 75 insertions(+), 75 deletions(-)

diff --git a/tests/qemu-iotests/025 b/tests/qemu-iotests/025
index e05d833452..1569d912f4 100755
--- a/tests/qemu-iotests/025
+++ b/tests/qemu-iotests/025
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 _supported_fmt raw qcow2 qed luks
-_supported_proto file sheepdog rbd nfs
+_supported_proto file sheepdog rbd nfs fuse
 
 echo "=== Creating image"
 echo
diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
index b9713eb591..9ecc5880b1 100755
--- a/tests/qemu-iotests/026
+++ b/tests/qemu-iotests/026
@@ -41,7 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # Currently only qcow2 supports rebasing
 _supported_fmt qcow2
-_supported_proto file
+_supported_proto file fuse
 _default_cache_mode writethrough
 _supported_cache_modes writethrough none
 # The refcount table tests expect a certain minimum width for refcount entries
diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index 864dc4a4e2..57d34aae99 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -46,7 +46,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # Any format supporting backing files except vmdk and qcow which do not support
 # smaller backing files.
 _supported_fmt qcow2 qed
-_supported_proto file
+_supported_proto file fuse
 _supported_os Linux
 
 # Choose a size that is not necessarily a cluster size multiple for image
diff --git a/tests/qemu-iotests/031 b/tests/qemu-iotests/031
index 646ecd593f..2bcbc5886e 100755
--- a/tests/qemu-iotests/031
+++ b/tests/qemu-iotests/031
@@ -39,7 +39,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # This tests qcow2-specific low-level functionality
 _supported_fmt qcow2
-_supported_proto file
+_supported_proto file fuse
 # We want to test compat=0.10, which does not support external data
 # files or refcount widths other than 16
 _unsupported_imgopts data_file 'refcount_bits=\([^1]\|.\([^6]\|$\)\)'
diff --git a/tests/qemu-iotests/034 b/tests/qemu-iotests/034
index ac2d687c71..08f7aea6d5 100755
--- a/tests/qemu-iotests/034
+++ b/tests/qemu-iotests/034
@@ -37,7 

[PATCH v2 07/20] iotests: Do not needlessly filter _make_test_img

2020-09-22 Thread Max Reitz
In most cases, _make_test_img does not need a _filter_imgfmt on top.  It
does that by itself.

(The exception is when IMGFMT has been overwritten but TEST_IMG has not.
In such cases, we do need a _filter_imgfmt on top to filter the test's
original IMGFMT from TEST_IMG.)

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/161 | 12 ++--
 tests/qemu-iotests/175 |  6 +++---
 tests/qemu-iotests/249 |  6 +++---
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/161 b/tests/qemu-iotests/161
index e270976d87..bbf7dbbc5c 100755
--- a/tests/qemu-iotests/161
+++ b/tests/qemu-iotests/161
@@ -48,9 +48,9 @@ _supported_os Linux
 IMG_SIZE=1M
 
 # Create the images
-TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE | _filter_imgfmt
-TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT | 
_filter_imgfmt
-_make_test_img -b "$TEST_IMG.int" -F $IMGFMT -F $IMGFMT | _filter_imgfmt
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT
+_make_test_img -b "$TEST_IMG.int" -F $IMGFMT -F $IMGFMT
 
 # First test: reopen $TEST.IMG changing the detect-zeroes option on
 # its backing file ($TEST_IMG.int).
@@ -105,9 +105,9 @@ echo
 echo "*** Commit and then change an option on the backing file"
 echo
 # Create the images again
-TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE | _filter_imgfmt
-TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT| 
_filter_imgfmt
-_make_test_img -b "$TEST_IMG.int" -F $IMGFMT | _filter_imgfmt
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT
+_make_test_img -b "$TEST_IMG.int" -F $IMGFMT
 
 _launch_qemu -drive if=none,file="${TEST_IMG}"
 _send_qemu_cmd $QEMU_HANDLE \
diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
index 00a626aa63..c3c2aed653 100755
--- a/tests/qemu-iotests/175
+++ b/tests/qemu-iotests/175
@@ -89,20 +89,20 @@ min_blocks=$(stat -c '%b' "$TEST_DIR/empty")
 
 echo
 echo "== creating image with default preallocation =="
-_make_test_img -o extent_size_hint=0 $size | _filter_imgfmt
+_make_test_img -o extent_size_hint=0 $size
 stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks 
$min_blocks $size
 
 for mode in off full falloc; do
 echo
 echo "== creating image with preallocation $mode =="
-_make_test_img -o preallocation=$mode,extent_size_hint=0 $size | 
_filter_imgfmt
+_make_test_img -o preallocation=$mode,extent_size_hint=0 $size
 stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks 
$min_blocks $size
 done
 
 for new_size in 4096 1048576; do
 echo
 echo "== resize empty image with block_resize =="
-_make_test_img -o extent_size_hint=0 0 | _filter_imgfmt
+_make_test_img -o extent_size_hint=0 0
 _block_resize $TEST_IMG $new_size >/dev/null
 stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks 
$min_blocks $new_size
 done
diff --git a/tests/qemu-iotests/249 b/tests/qemu-iotests/249
index 68f13ed328..a9aa9303eb 100755
--- a/tests/qemu-iotests/249
+++ b/tests/qemu-iotests/249
@@ -48,9 +48,9 @@ _supported_os Linux
 IMG_SIZE=1M
 
 # Create the images: base <- int <- active
-TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE | _filter_imgfmt
-TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT | 
_filter_imgfmt
-_make_test_img -b "$TEST_IMG.int" -F $IMGFMT | _filter_imgfmt
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT
+_make_test_img -b "$TEST_IMG.int" -F $IMGFMT
 
 # Launch QEMU with these two drives:
 # none0: base (read-only)
-- 
2.26.2




[PATCH v2 10/20] iotests/046: Avoid renaming images

2020-09-22 Thread Max Reitz
This generally does not work on non-file protocols.  It is better to
create the image with the final name from the start, and most tests do
this already.  Let 046 follow suit.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/046 | 5 +++--
 tests/qemu-iotests/046.out | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/046 b/tests/qemu-iotests/046
index 88b3363c19..40a9f30087 100755
--- a/tests/qemu-iotests/046
+++ b/tests/qemu-iotests/046
@@ -47,6 +47,8 @@ size=128M
 echo
 echo "== creating backing file for COW tests =="
 
+TEST_IMG_SAVE=$TEST_IMG
+TEST_IMG="$TEST_IMG.base"
 _make_test_img $size
 
 backing_io()
@@ -67,8 +69,7 @@ backing_io()
 
 backing_io 0 32 write | $QEMU_IO "$TEST_IMG" | _filter_qemu_io
 
-mv "$TEST_IMG" "$TEST_IMG.base"
-
+TEST_IMG=$TEST_IMG_SAVE
 _make_test_img -b "$TEST_IMG.base" -F $IMGFMT 6G
 
 echo
diff --git a/tests/qemu-iotests/046.out b/tests/qemu-iotests/046.out
index b022bcddd5..66ad987ab3 100644
--- a/tests/qemu-iotests/046.out
+++ b/tests/qemu-iotests/046.out
@@ -1,7 +1,7 @@
 QA output created by 046
 
 == creating backing file for COW tests ==
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 65536/65536 bytes at offset 65536
-- 
2.26.2




[PATCH v2 16/20] storage-daemon: Call bdrv_close_all() on exit

2020-09-22 Thread Max Reitz
Otherwise, exports and block devices are not properly shut down and
closed, unless the users explicitly issues blockdev-del and
block-export-del commands for each of them.

Signed-off-by: Max Reitz 
---
 storage-daemon/qemu-storage-daemon.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index e6157ff518..0b6d469751 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -324,6 +324,9 @@ int main(int argc, char *argv[])
 main_loop_wait(false);
 }
 
+bdrv_drain_all_begin();
+bdrv_close_all();
+
 monitor_cleanup();
 qemu_chr_cleanup();
 user_creatable_cleanup();
-- 
2.26.2




[PATCH v2 12/20] iotests/091: Use _cleanup_qemu instad of "wait"

2020-09-22 Thread Max Reitz
If the test environment has some other child processes running (like a
storage daemon that provides a FUSE export), then "wait" will never
finish.  Use wait=yes _cleanup_qemu instead.

(We need to discard the output so there is no change to the reference
output.)

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/091 | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/091 b/tests/qemu-iotests/091
index 68fbfd777b..8a4ce5b7e2 100755
--- a/tests/qemu-iotests/091
+++ b/tests/qemu-iotests/091
@@ -96,7 +96,8 @@ _send_qemu_cmd $h2 'qemu-io disk flush' "(qemu)"
 _send_qemu_cmd $h2 'quit' ""
 _send_qemu_cmd $h1 'quit' ""
 
-wait
+wait=yes _cleanup_qemu >/dev/null
+
 echo "Check image pattern"
 ${QEMU_IO} -c "read -P 0x22 0 4M" "${TEST_IMG}" | _filter_testdir | 
_filter_qemu_io
 
-- 
2.26.2




[PATCH v2 00/20] block/export: Allow exporting BDSs via FUSE

2020-09-22 Thread Max Reitz
Based-on: <20200907182011.521007-1-kw...@redhat.com>
  (“block/export: Add infrastructure and QAPI for block exports”)

(Though its patch 16 needs a s/= \_exp_nbd/= drv/ on top.)

v1: https://lists.nongnu.org/archive/html/qemu-block/2019-12/msg00451.html

Branch: https://github.com/XanClic/qemu.git fuse-exports-v2
Branch: https://git.xanclic.moe/XanClic/qemu.git fuse-exports-v2


Hi,

Ever since I found out that you can mount FUSE filesystems on regular
files (not just directories), I had the idea of adding FUSE block
exports to qemu where you can export block nodes as raw images.  The
best thing is that you’d be able to mount an image on itself, so
whatever format it may be in, qemu lets it appear as a raw image (and
you can then use regular tools like dd on it).

The performance is quite bad so far, but we can always try to improve it
if the need arises.  For now I consider it mostly a cute feature to get
easy access to the raw contents of image files in any format (without
requiring root rights).

In this version (as opposed to v1 linked above), I integrated the FUSE
export code into Kevin’s proposed common infrastructure for block
exports.


This series does the following:

First, add the FUSE export module (block/export/fuse.c) that implements
the basic file access functions.  (Note that you need libfuse 3.8.0 or
later for SEEK_HOLE/SEEK_DATA.)

Second, it allows using FUSE exports as a protocol in the iotests and
makes many iotests work with it.  (The file node is exported by a
background qemu instance to $SOCK_DIR.)

This gives us a lot of coverage for, well, not free (it does take twelve
patches), but for cheap; but there are still some more specialized
things we want to test, so third and last, this series adds an iotest
dedicated to FUSE exports.


(Note that as opposed to v1, I did run the iotests this time.)


Some notable changes from v1:
- Integrated everything into Kevin’s block-export infrastructure
- Use the storage daemon instead of full QEMU to provide FUSE exports
  when running the iotests with -fuse
- meson rebase
- Some other rebase conflicts


git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/20:[0007] [FC] 'configure: Detect libfuse'
002/20:[0255] [FC] 'fuse: Allow exporting BDSs via FUSE'
003/20:[0062] [FC] 'fuse: Implement standard FUSE operations'
004/20:[0018] [FC] 'fuse: Allow growable exports'
005/20:[0016] [FC] 'fuse: (Partially) implement fallocate()'
006/20:[0008] [FC] 'fuse: Implement hole detection through lseek'
007/20:[0036] [FC] 'iotests: Do not needlessly filter _make_test_img'
008/20:[] [--] 'iotests: Do not pipe _make_test_img'
009/20:[0012] [FC] 'iotests: Use convert -n in some cases'
010/20:[0006] [FC] 'iotests: Avoid renaming images'
011/20:[0008] [FC] 'iotests: Derive image names from $TEST_IMG'
012/20:[] [--] 'iotests/091: Use _cleanup_qemu instad of "wait"'
013/20:[0008] [FC] 'iotests: Restrict some Python tests to file'
014/20:[0010] [FC] 'iotests: Let _make_test_img guess $TEST_IMG_FILE'
015/20:[down] 'iotests/287: Clean up subshell test image'
016/20:[down] 'storage-daemon: Call bdrv_close_all() on exit'
017/20:[down] 'iotests: Give access to the qemu-storage-daemon'
018/20:[0042] [FC] 'iotests: Allow testing FUSE exports'
019/20:[0026] [FC] 'iotests: Enable fuse for many tests'
020/20:[0104] [FC] 'iotests/281: Add test for FUSE exports'


Max Reitz (20):
  configure: Detect libfuse
  fuse: Allow exporting BDSs via FUSE
  fuse: Implement standard FUSE operations
  fuse: Allow growable exports
  fuse: (Partially) implement fallocate()
  fuse: Implement hole detection through lseek
  iotests: Do not needlessly filter _make_test_img
  iotests: Do not pipe _make_test_img
  iotests: Use convert -n in some cases
  iotests/046: Avoid renaming images
  iotests: Derive image names from $TEST_IMG
  iotests/091: Use _cleanup_qemu instad of "wait"
  iotests: Restrict some Python tests to file
  iotests: Let _make_test_img guess $TEST_IMG_FILE
  iotests/287: Clean up subshell test image
  storage-daemon: Call bdrv_close_all() on exit
  iotests: Give access to the qemu-storage-daemon
  iotests: Allow testing FUSE exports
  iotests: Enable fuse for many tests
  iotests/308: Add test for FUSE exports

 configure|  66 +++
 qapi/block-export.json   |  28 +-
 include/block/fuse.h |  30 ++
 block.c  |   1 +
 block/export/export.c|   4 +
 block/export/fuse.c  | 645 +++
 storage-daemon/qemu-storage-daemon.c |   3 +
 block/export/meson.build |   1 +
 meson.build  |   7 +
 tests/qemu-iotests/025   |   2 +-
 tests/qemu-iotests/026 

[PATCH v2 06/20] fuse: Implement hole detection through lseek

2020-09-22 Thread Max Reitz
This is a relatively new feature in libfuse (available since 3.8.0,
which was released in November 2019), so we have to let configure check
whether it is available before making use of it.

Signed-off-by: Max Reitz 
---
 configure   | 32 +++
 block/export/fuse.c | 77 +
 meson.build |  1 +
 3 files changed, 110 insertions(+)

diff --git a/configure b/configure
index 21c31e4694..6b9184b62a 100755
--- a/configure
+++ b/configure
@@ -6226,11 +6226,39 @@ EOF
   fuse_libs=$(pkg-config --libs fuse3)
   if compile_prog "$fuse_cflags" "$fuse_libs"; then
 fuse=yes
+
+cat > $TMPC <
+#include 
+#include 
+#include 
+#include 
+static void fuse_lseek(fuse_req_t req, fuse_ino_t inode, off_t offset,
+   int whence, struct fuse_file_info *fi)
+{
+if (whence == SEEK_DATA || whence == SEEK_HOLE) {
+fuse_reply_lseek(req, offset);
+} else {
+fuse_reply_err(req, EINVAL);
+}
+}
+const struct fuse_lowlevel_ops fuse_ops = {
+.lseek = fuse_lseek,
+};
+int main(void) { return 0; }
+EOF
+if compile_prog "$fuse_cflags" "$fuse_libs"; then
+  fuse_lseek=yes
+else
+  fuse_lseek=no
+fi
   else
 if test "$fuse" = "yes"; then
   feature_not_found "fuse"
 fi
 fuse=no
+fuse_lseek=no
   fi
 fi
 
@@ -7425,6 +7453,10 @@ if test "$fuse" = "yes"; then
   echo "CONFIG_FUSE=y" >> $config_host_mak
   echo "FUSE_CFLAGS=$fuse_cflags" >> $config_host_mak
   echo "FUSE_LIBS=$fuse_libs" >> $config_host_mak
+
+  if test "$fuse_lseek" = "yes"; then
+echo "CONFIG_FUSE_LSEEK=y" >> $config_host_mak
+  fi
 fi
 
 if test "$tcg_interpreter" = "yes"; then
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 7dfb4eb280..ee90f5a185 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -548,6 +548,80 @@ static void fuse_flush(fuse_req_t req, fuse_ino_t inode,
 fuse_reply_err(req, ret < 0 ? -ret : 0);
 }
 
+#ifdef CONFIG_FUSE_LSEEK
+/**
+ * Let clients inquire allocation status.
+ */
+static void fuse_lseek(fuse_req_t req, fuse_ino_t inode, off_t offset,
+   int whence, struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+
+if (whence != SEEK_HOLE && whence != SEEK_DATA) {
+fuse_reply_err(req, EINVAL);
+return;
+}
+
+while (true) {
+int64_t pnum;
+int ret;
+
+ret = bdrv_block_status_above(blk_bs(exp->common.blk), NULL,
+  offset, INT64_MAX, , NULL, NULL);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+
+if (!pnum && (ret & BDRV_BLOCK_EOF)) {
+int64_t blk_len;
+
+/*
+ * If blk_getlength() rounds (e.g. by sectors), then the
+ * export length will be rounded, too.  However,
+ * bdrv_block_status_above() may return EOF at unaligned
+ * offsets.  We must not let this become visible and thus
+ * always simulate a hole between @offset (the real EOF)
+ * and @blk_len (the client-visible EOF).
+ */
+
+blk_len = blk_getlength(exp->common.blk);
+if (blk_len < 0) {
+fuse_reply_err(req, -blk_len);
+return;
+}
+
+if (offset > blk_len || whence == SEEK_DATA) {
+fuse_reply_err(req, ENXIO);
+} else {
+fuse_reply_lseek(req, offset);
+}
+return;
+}
+
+if (ret & BDRV_BLOCK_DATA) {
+if (whence == SEEK_DATA) {
+fuse_reply_lseek(req, offset);
+return;
+}
+} else {
+if (whence == SEEK_HOLE) {
+fuse_reply_lseek(req, offset);
+return;
+}
+}
+
+/* Safety check against infinite loops */
+if (!pnum) {
+fuse_reply_err(req, ENXIO);
+return;
+}
+
+offset += pnum;
+}
+}
+#endif
+
 static const struct fuse_lowlevel_ops fuse_ops = {
 .lookup = fuse_lookup,
 .getattr= fuse_getattr,
@@ -557,6 +631,9 @@ static const struct fuse_lowlevel_ops fuse_ops = {
 .write  = fuse_write,
 .fallocate  = fuse_fallocate,
 .flush  = fuse_flush,
+#ifdef CONFIG_FUSE_LSEEK
+.lseek  = fuse_lseek,
+#endif
 };
 
 const BlockExportDriver blk_exp_fuse = {
diff --git a/meson.build b/meson.build
index 85addd8562..1db6a46d14 100644
--- a/meson.build
+++ b/meson.build
@@ -1537,6 +1537,7 @@ summary_info += {'thread sanitizer':  
config_host.has_key('CONFIG_TSAN')}
 summary_info += {'rng-none

[PATCH v2 04/20] fuse: Allow growable exports

2020-09-22 Thread Max Reitz
These will behave more like normal files in that writes beyond the EOF
will automatically grow the export size.

Signed-off-by: Max Reitz 
---
 qapi/block-export.json |  6 +-
 block/export/fuse.c| 12 +++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index cb5bd54cbf..cb26daa98b 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -183,10 +183,14 @@
 # @mountpoint: Path on which to export the block device via FUSE.
 #  This must point to an existing regular file.
 #
+# @growable: Whether writes beyond the EOF should grow the block node
+#accordingly. (default: false)
+#
 # Since: 5.2
 ##
 { 'struct': 'BlockExportOptionsFuse',
-  'data': { 'mountpoint': 'str' },
+  'data': { 'mountpoint': 'str',
+'*growable': 'bool' },
   'if': 'defined(CONFIG_FUSE)' }
 
 ##
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 8fc667231d..f3a84579ba 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -45,6 +45,7 @@ typedef struct FuseExport {
 
 char *mountpoint;
 bool writable;
+bool growable;
 } FuseExport;
 
 static GHashTable *exports;
@@ -101,6 +102,7 @@ static int fuse_export_create(BlockExport *blk_exp,
 
 exp->mountpoint = g_strdup(args->mountpoint);
 exp->writable = blk_exp_args->writable;
+exp->growable = args->growable;
 
 ret = setup_fuse_export(exp, args->mountpoint, errp);
 if (ret < 0) {
@@ -436,7 +438,15 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
const char *buf,
 
 size = MIN(size, BDRV_REQUEST_MAX_BYTES);
 if (offset + size > length) {
-size = length - offset;
+if (exp->growable) {
+ret = fuse_do_truncate(exp, offset + size, PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+} else {
+size = length - offset;
+}
 }
 
 ret = blk_pwrite(exp->common.blk, offset, buf, size, 0);
-- 
2.26.2




[PATCH v2 05/20] fuse: (Partially) implement fallocate()

2020-09-22 Thread Max Reitz
This allows allocating areas after the (old) EOF as part of a growing
resize, writing zeroes, and discarding.

Signed-off-by: Max Reitz 
---
 block/export/fuse.c | 79 +
 1 file changed, 79 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index f3a84579ba..7dfb4eb280 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -457,6 +457,84 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
const char *buf,
 }
 }
 
+/**
+ * Let clients perform various fallocate() operations.
+ */
+static void fuse_fallocate(fuse_req_t req, fuse_ino_t inode, int mode,
+   off_t offset, off_t length,
+   struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+int64_t blk_len;
+int ret;
+
+if (!exp->writable) {
+fuse_reply_err(req, EACCES);
+return;
+}
+
+blk_len = blk_getlength(exp->common.blk);
+if (blk_len < 0) {
+fuse_reply_err(req, -blk_len);
+return;
+}
+
+if (mode & FALLOC_FL_KEEP_SIZE) {
+length = MIN(length, blk_len - offset);
+}
+
+if (mode & FALLOC_FL_PUNCH_HOLE) {
+if (!(mode & FALLOC_FL_KEEP_SIZE)) {
+fuse_reply_err(req, EINVAL);
+return;
+}
+
+do {
+int size = MIN(length, BDRV_REQUEST_MAX_BYTES);
+
+ret = blk_pdiscard(exp->common.blk, offset, size);
+length -= size;
+} while (ret == 0 && length > 0);
+} else if (mode & FALLOC_FL_ZERO_RANGE) {
+if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + length > blk_len) {
+ret = fuse_do_truncate(exp, offset + length, PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+}
+
+do {
+int size = MIN(length, BDRV_REQUEST_MAX_BYTES);
+
+ret = blk_pwrite_zeroes(exp->common.blk,
+offset, size, 0);
+length -= size;
+} while (ret == 0 && length > 0);
+} else if (!mode) {
+/* We can only fallocate at the EOF with a truncate */
+if (offset < blk_len) {
+fuse_reply_err(req, EOPNOTSUPP);
+return;
+}
+
+if (offset > blk_len) {
+/* No preallocation needed here */
+ret = fuse_do_truncate(exp, offset, PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+}
+
+ret = fuse_do_truncate(exp, offset + length, PREALLOC_MODE_FALLOC);
+} else {
+ret = -EOPNOTSUPP;
+}
+
+fuse_reply_err(req, ret < 0 ? -ret : 0);
+}
+
 /**
  * Let clients flush the exported image.
  */
@@ -477,6 +555,7 @@ static const struct fuse_lowlevel_ops fuse_ops = {
 .open   = fuse_open,
 .read   = fuse_read,
 .write  = fuse_write,
+.fallocate  = fuse_fallocate,
 .flush  = fuse_flush,
 };
 
-- 
2.26.2




[PATCH v2 02/20] fuse: Allow exporting BDSs via FUSE

2020-09-22 Thread Max Reitz
block-export-add type=fuse allows mounting block graph nodes via FUSE on
some existing regular file.  That file should then appears like a raw
disk image, and accesses to it result in accesses to the exported BDS.

Right now, we only implement the necessary block export functions to set
it up and shut it down.  We do not implement any access functions, so
accessing the mount point only results in errors.  This will be
addressed by a followup patch.

We keep a hash table of exported mount points, because we want to be
able to detect when users try to use a mount point twice.  This is
because we invoke stat() to check whether the given mount point is a
regular file, but if that file is served by ourselves (because it is
already used as a mount point), then this stat() would have to be served
by ourselves, too, which is impossible to do while we (as the caller)
are waiting for it to settle.  Therefore, keep track of mount point
paths to at least catch the most obvious instances of that problem.

Signed-off-by: Max Reitz 
---
 qapi/block-export.json   |  24 +++-
 include/block/fuse.h |  30 +
 block.c  |   1 +
 block/export/export.c|   4 +
 block/export/fuse.c  | 253 +++
 block/export/meson.build |   1 +
 6 files changed, 311 insertions(+), 2 deletions(-)
 create mode 100644 include/block/fuse.h
 create mode 100644 block/export/fuse.c

diff --git a/qapi/block-export.json b/qapi/block-export.json
index 0c045dfe7b..cb5bd54cbf 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -174,6 +174,21 @@
 ##
 { 'command': 'nbd-server-stop' }
 
+##
+# @BlockExportOptionsFuse:
+#
+# Options for exporting a block graph node on some (file) mountpoint
+# as a raw image.
+#
+# @mountpoint: Path on which to export the block device via FUSE.
+#  This must point to an existing regular file.
+#
+# Since: 5.2
+##
+{ 'struct': 'BlockExportOptionsFuse',
+  'data': { 'mountpoint': 'str' },
+  'if': 'defined(CONFIG_FUSE)' }
+
 ##
 # @BlockExportType:
 #
@@ -181,10 +196,13 @@
 #
 # @nbd: NBD export
 #
+# @fuse: Fuse export (since: 5.2)
+#
 # Since: 4.2
 ##
 { 'enum': 'BlockExportType',
-  'data': [ 'nbd' ] }
+  'data': [ 'nbd',
+{ 'name': 'fuse', 'if': 'defined(CONFIG_FUSE)' } ] }
 
 ##
 # @BlockExportOptions:
@@ -213,7 +231,9 @@
 '*writethrough': 'bool' },
   'discriminator': 'type',
   'data': {
-  'nbd': 'BlockExportOptionsNbd'
+  'nbd': 'BlockExportOptionsNbd',
+  'fuse': { 'type': 'BlockExportOptionsFuse',
+'if': 'defined(CONFIG_FUSE)' }
} }
 
 ##
diff --git a/include/block/fuse.h b/include/block/fuse.h
new file mode 100644
index 00..ffa91fe364
--- /dev/null
+++ b/include/block/fuse.h
@@ -0,0 +1,30 @@
+/*
+ * Present a block device as a raw image through FUSE
+ *
+ * Copyright (c) 2020 Max Reitz 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; under version 2 or later of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef BLOCK_FUSE_H
+#define BLOCK_FUSE_H
+
+#ifdef CONFIG_FUSE
+
+#include "block/export.h"
+
+extern const BlockExportDriver blk_exp_fuse;
+
+#endif /* CONFIG_FUSE */
+
+#endif
diff --git a/block.c b/block.c
index acde53f92a..7bf45f6ede 100644
--- a/block.c
+++ b/block.c
@@ -26,6 +26,7 @@
 #include "block/trace.h"
 #include "block/block_int.h"
 #include "block/blockjob.h"
+#include "block/fuse.h"
 #include "block/nbd.h"
 #include "block/qdict.h"
 #include "qemu/error-report.h"
diff --git a/block/export/export.c b/block/export/export.c
index 64d39a6934..ba866d6ba7 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -16,6 +16,7 @@
 #include "block/block.h"
 #include "sysemu/block-backend.h"
 #include "block/export.h"
+#include "block/fuse.h"
 #include "block/nbd.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-block-export.h"
@@ -24,6 +25,9 @@
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 _exp_nbd,
+#ifdef CONFIG_FUSE
+_exp_fuse,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/fuse.c b/block/export/fuse.c
new file mode 100644
index 00..75f11d2514
--- /dev/null
+++ b/block/export/fuse.c
@@ -0,0 +1,253 @@
+/*
+ * Present a block device as a raw image through FUSE
+ *
+ * Copyright (c) 2020 Max Reitz 
+ *
+ * This program is fr

[PATCH v2 03/20] fuse: Implement standard FUSE operations

2020-09-22 Thread Max Reitz
This makes the export actually useful instead of only producing errors
whenever it is accessed.

Signed-off-by: Max Reitz 
---
 block/export/fuse.c | 226 
 1 file changed, 226 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 75f11d2514..8fc667231d 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -32,6 +32,10 @@
 #include 
 
 
+/* Prevent overly long bounce buffer allocations */
+#define FUSE_MAX_BOUNCE_BYTES (MIN(BDRV_REQUEST_MAX_BYTES, 64 * 1024 * 1024))
+
+
 typedef struct FuseExport {
 BlockExport common;
 
@@ -241,7 +245,229 @@ static bool is_regular_file(const char *path, Error 
**errp)
 return true;
 }
 
+
+/**
+ * Let clients look up files.  Always return ENOENT because we only
+ * care about the mountpoint itself.
+ */
+static void fuse_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
+{
+fuse_reply_err(req, ENOENT);
+}
+
+/**
+ * Let clients get file attributes (i.e., stat() the file).
+ */
+static void fuse_getattr(fuse_req_t req, fuse_ino_t inode,
+ struct fuse_file_info *fi)
+{
+struct stat statbuf;
+int64_t length, allocated_blocks;
+time_t now = time(NULL);
+ImageInfo *info = NULL;
+FuseExport *exp = fuse_req_userdata(req);
+mode_t mode;
+Error *local_error = NULL;
+
+length = blk_getlength(exp->common.blk);
+if (length < 0) {
+fuse_reply_err(req, -length);
+return;
+}
+
+bdrv_query_image_info(blk_bs(exp->common.blk), , _error);
+if (local_error) {
+allocated_blocks = DIV_ROUND_UP(length, 512);
+} else {
+allocated_blocks = DIV_ROUND_UP(info->actual_size, 512);
+}
+
+qapi_free_ImageInfo(info);
+error_free(local_error);
+local_error = NULL;
+
+mode = S_IFREG | S_IRUSR;
+if (exp->writable) {
+mode |= S_IWUSR;
+}
+
+statbuf = (struct stat) {
+.st_ino = inode,
+.st_mode= mode,
+.st_nlink   = 1,
+.st_uid = getuid(),
+.st_gid = getgid(),
+.st_size= length,
+.st_blksize = blk_bs(exp->common.blk)->bl.request_alignment,
+.st_blocks  = allocated_blocks,
+.st_atime   = now,
+.st_mtime   = now,
+.st_ctime   = now,
+};
+
+fuse_reply_attr(req, , 1.);
+}
+
+static int fuse_do_truncate(const FuseExport *exp, int64_t size,
+PreallocMode prealloc)
+{
+uint64_t blk_perm, blk_shared_perm;
+int ret;
+
+blk_get_perm(exp->common.blk, _perm, _shared_perm);
+
+ret = blk_set_perm(exp->common.blk, blk_perm | BLK_PERM_RESIZE,
+   blk_shared_perm, NULL);
+if (ret < 0) {
+return ret;
+}
+
+ret = blk_truncate(exp->common.blk, size, true, prealloc, 0, NULL);
+
+/* Must succeed, because we are only giving up the RESIZE permission */
+blk_set_perm(exp->common.blk, blk_perm, blk_shared_perm, _abort);
+
+return ret;
+}
+
+/**
+ * Let clients set file attributes.  Only resizing is supported.
+ */
+static void fuse_setattr(fuse_req_t req, fuse_ino_t inode, struct stat 
*statbuf,
+ int to_set, struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+int ret;
+
+if (!exp->writable) {
+fuse_reply_err(req, EACCES);
+return;
+}
+
+if (to_set & ~FUSE_SET_ATTR_SIZE) {
+fuse_reply_err(req, ENOTSUP);
+return;
+}
+
+ret = fuse_do_truncate(exp, statbuf->st_size, PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+
+fuse_getattr(req, inode, fi);
+}
+
+/**
+ * Let clients open a file (i.e., the exported image).
+ */
+static void fuse_open(fuse_req_t req, fuse_ino_t inode,
+  struct fuse_file_info *fi)
+{
+fuse_reply_open(req, fi);
+}
+
+/**
+ * Handle client reads from the exported image.
+ */
+static void fuse_read(fuse_req_t req, fuse_ino_t inode,
+  size_t size, off_t offset, struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+int64_t length;
+void *buf;
+int ret;
+
+/**
+ * Clients will expect short reads at EOF, so we have to limit
+ * offset+size to the image length.
+ */
+length = blk_getlength(exp->common.blk);
+if (length < 0) {
+fuse_reply_err(req, -length);
+return;
+}
+
+size = MIN(size, FUSE_MAX_BOUNCE_BYTES);
+if (offset + size > length) {
+size = length - offset;
+}
+
+buf = qemu_try_blockalign(blk_bs(exp->common.blk), size);
+if (!buf) {
+fuse_reply_err(req, ENOMEM);
+return;
+}
+
+ret = blk_pread(exp->common.blk, offset, buf, size);
+if (ret >= 0) {
+fuse_reply_buf(req, buf, size);
+} else {
+fuse_reply_err(req, -ret);
+}
+
+qemu_v

[PATCH v2 01/20] configure: Detect libfuse

2020-09-22 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 configure   | 34 ++
 meson.build |  6 ++
 2 files changed, 40 insertions(+)

diff --git a/configure b/configure
index ce27eafb0a..21c31e4694 100755
--- a/configure
+++ b/configure
@@ -538,6 +538,7 @@ meson=""
 ninja=""
 skip_meson=no
 gettext=""
+fuse=""
 
 bogus_os="no"
 malloc_trim=""
@@ -1621,6 +1622,10 @@ for opt do
   ;;
   --disable-libdaxctl) libdaxctl=no
   ;;
+  --enable-fuse) fuse=yes
+  ;;
+  --disable-fuse) fuse=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1945,6 +1950,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   xkbcommon   xkbcommon support
   rng-nonedummy RNG, avoid using /dev/(u)random and getrandom()
   libdaxctl   libdaxctl support
+  fusefuse block device export
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -6206,6 +6212,28 @@ but not implemented on your system"
 fi
 fi
 
+##
+# FUSE support
+
+if test "$fuse" != "no"; then
+  cat > $TMPC <
+#include 
+int main(void) { return 0; }
+EOF
+  fuse_cflags=$(pkg-config --cflags fuse3)
+  fuse_libs=$(pkg-config --libs fuse3)
+  if compile_prog "$fuse_cflags" "$fuse_libs"; then
+fuse=yes
+  else
+if test "$fuse" = "yes"; then
+  feature_not_found "fuse"
+fi
+fuse=no
+  fi
+fi
+
 ##
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -7393,6 +7421,12 @@ if test "$secret_keyring" = "yes" ; then
   echo "CONFIG_SECRET_KEYRING=y" >> $config_host_mak
 fi
 
+if test "$fuse" = "yes"; then
+  echo "CONFIG_FUSE=y" >> $config_host_mak
+  echo "FUSE_CFLAGS=$fuse_cflags" >> $config_host_mak
+  echo "FUSE_LIBS=$fuse_libs" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote ${source_path}/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
diff --git a/meson.build b/meson.build
index 86e1cca0ad..85addd8562 100644
--- a/meson.build
+++ b/meson.build
@@ -436,6 +436,11 @@ if 'CONFIG_TASN1' in config_host
 endif
 keyutils = dependency('libkeyutils', required: false,
   method: 'pkg-config', static: enable_static)
+libfuse = not_found
+if 'CONFIG_FUSE' in config_host
+  libfuse = declare_dependency(compile_args: 
config_host['FUSE_CFLAGS'].split(),
+   link_args: config_host['FUSE_LIBS'].split())
+endif
 
 has_gettid = cc.has_function('gettid')
 
@@ -1531,6 +1536,7 @@ endif
 summary_info += {'thread sanitizer':  config_host.has_key('CONFIG_TSAN')}
 summary_info += {'rng-none':  config_host.has_key('CONFIG_RNG_NONE')}
 summary_info += {'Linux keyring': 
config_host.has_key('CONFIG_SECRET_KEYRING')}
+summary_info += {'fuse exports':  config_host.has_key('CONFIG_FUSE')}
 summary(summary_info, bool_yn: true)
 
 if not supported_cpus.contains(cpu)
-- 
2.26.2




Re: [PATCH v2 04/20] block/block-copy: More explicit call_state

2020-09-21 Thread Max Reitz
On 18.09.20 22:11, Vladimir Sementsov-Ogievskiy wrote:
> 17.07.2020 16:45, Max Reitz wrote:
>> On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>> Refactor common path to use BlockCopyCallState pointer as parameter, to
>>> prepare it for use in asynchronous block-copy (at least, we'll need to
>>> run block-copy in a coroutine, passing the whole parameters as one
>>> pointer).
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/block-copy.c | 51 ++
>>>   1 file changed, 38 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>> index 43a018d190..75882a094c 100644
>>> --- a/block/block-copy.c
>>> +++ b/block/block-copy.c
>>
>> [...]
>>
>>> @@ -646,16 +653,16 @@ out:
>>>    * it means that some I/O operation failed in context of _this_
>>> block_copy call,
>>>    * not some parallel operation.
>>>    */
>>> -int coroutine_fn block_copy(BlockCopyState *s, int64_t offset,
>>> int64_t bytes,
>>> -    bool *error_is_read)
>>> +static int coroutine_fn block_copy_common(BlockCopyCallState
>>> *call_state)
>>>   {
>>>   int ret;
>>>     do {
>>> -    ret = block_copy_dirty_clusters(s, offset, bytes,
>>> error_is_read);
>>> +    ret = block_copy_dirty_clusters(call_state);
>>
>> It’s possible that much of this code will change in a future patch of
>> this series, but as it is, it might be nice to make
>> block_copy_dirty_clusters’s argument a const pointer so it’s clear that
>> the call to block_copy_wait_one() below will use the original @offset
>> and @bytes values.
> 
> Hm. I'm trying this, and it doesn't work:
> 
> block_copy_task_entry() wants to change call_state:
> 
>    t->call_state->failed = true;

Too bad :)

>> *shrug*
>>
>> Reviewed-by: Max Reitz 
>>
>>>     if (ret == 0) {
>>> -    ret = block_copy_wait_one(s, offset, bytes);
>>> +    ret = block_copy_wait_one(call_state->s,
>>> call_state->offset,
>>> +  call_state->bytes);
>>>   }
>>>     /*
>>
> 
> 




signature.asc
Description: OpenPGP digital signature


[PATCH] iotests/046: Filter request length

2020-09-18 Thread Max Reitz
For its concurrent requests, 046 has always filtered the offset,
probably because concurrent requests may settle in any order.  However,
it did not filter the request length, and so if requests with different
lengths settle in an unexpected order (notably the longer request before
the shorter request), the test fails (for no good reason).

Filter the length, too.

Signed-off-by: Max Reitz 
---
This has annoyed me for quite some time now, but when rebasing (and
testing) my FUSE export series, it became apparent that on a FUSE export
qcow2 images with -o compat=0.10 always fail this test (because the
first 56k request settles before its accompanying 8k request), so now
I'm forced to do something about it.
---
 tests/qemu-iotests/046 |   3 +-
 tests/qemu-iotests/046.out | 104 ++---
 2 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/tests/qemu-iotests/046 b/tests/qemu-iotests/046
index ed6fae3529..17e71c4b5e 100755
--- a/tests/qemu-iotests/046
+++ b/tests/qemu-iotests/046
@@ -186,7 +186,8 @@ EOF
 }
 
 overlay_io | $QEMU_IO blkdebug::"$TEST_IMG" | _filter_qemu_io |\
-   sed -e 's/bytes at offset [0-9]*/bytes at offset XXX/g'
+sed -e 's/[0-9]*\/[0-9]* bytes at offset [0-9]*/XXX\/XXX bytes at offset 
XXX/g' \
+-e 's/^[0-9]* KiB/XXX KiB/g'
 
 echo
 echo "== Verify image content =="
diff --git a/tests/qemu-iotests/046.out b/tests/qemu-iotests/046.out
index 66ad987ab3..b1a03f4041 100644
--- a/tests/qemu-iotests/046.out
+++ b/tests/qemu-iotests/046.out
@@ -71,74 +71,74 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=6442450944 
backing_file=TEST_DIR
 == Some concurrent requests touching the same cluster ==
 blkdebug: Suspended request 'A'
 blkdebug: Resuming request 'A'
-wrote 8192/8192 bytes at offset XXX
-8 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 8192/8192 bytes at offset XXX
-8 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 8192/8192 bytes at offset XXX
-8 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 blkdebug: Suspended request 'A'
 blkdebug: Resuming request 'A'
-wrote 8192/8192 bytes at offset XXX
-8 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 65536/65536 bytes at offset XXX
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 blkdebug: Suspended request 'A'
 blkdebug: Resuming request 'A'
-wrote 8192/8192 bytes at offset XXX
-8 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 65536/65536 bytes at offset XXX
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 32768/32768 bytes at offset XXX
-32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 blkdebug: Suspended request 'A'
 blkdebug: Resuming request 'A'
-wrote 8192/8192 bytes at offset XXX
-8 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 57344/57344 bytes at offset XXX
-56 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 4096/4096 bytes at offset XXX
-4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 32768/32768 bytes at offset XXX
-32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-discard 65536/65536 bytes at offset XXX
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+discard XXX/XXX bytes at offset XXX
+XXX KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 blkdebug: Suspended request 'A'
 blkdebug: Resuming request 'A'
-wrote 8192/8192 bytes at offset XXX
-8 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 57344/57344 bytes at offset XXX
-56 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 4096/4096 bytes at offset XXX
-4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 65536/65536 bytes at offset XXX
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-discard 65536/65536 bytes at offset XXX
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote XXX/XXX bytes at offset XXX
+XXX Ki

Re: [PATCH 3/3] block: enable long IO requests report by default

2020-09-17 Thread Max Reitz
On 10.08.20 12:14, Denis V. Lunev wrote:
> Latency threshold is set to 10 seconds following guest request timeout
> on legacy storage controller.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  blockdev.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 66158d1292..733fdd36da 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -622,8 +622,13 @@ static BlockBackend *blockdev_init(const char *file, 
> QDict *bs_opts,
>  
>  bs->detect_zeroes = detect_zeroes;
>  
> +/*
> + * Set log threshold to 10 seconds. Timeout choosen by observation

*chosen

> + * of the guest behavior with legacy storage controllers. Linux
> + * could remount FS read-only if journal write takes this time.
> + */
>  block_acct_setup(blk_get_stats(blk), account_invalid, account_failed,
> -qemu_opt_get_number(opts, "latency-log-threshold", 0));
> +        qemu_opt_get_number(opts, "latency-log-threshold", 1));

Yeah, why not.

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/3] block/block-backend: add converter from BlockAcctStats to BlockBackend

2020-09-17 Thread Max Reitz
On 10.08.20 12:14, Denis V. Lunev wrote:
> Right now BlockAcctStats is always reside on BlockBackend. This structure
> is not used in any other place. Thus we are able to create a converter
> from one pointer to another.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  block/block-backend.c  | 5 +
>  include/sysemu/block-backend.h | 1 +
>  2 files changed, 6 insertions(+)

(Note: I’m just writing this mail because I already responded to patch
2.  I wouldn’t have if I didn’t have anything to say regarding the other
patches in this series, so nothing I say here is backed by a strong
opinion from me.)

> diff --git a/block/block-backend.c b/block/block-backend.c
> index 3a13cb5f0b..88e531df98 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -2153,6 +2153,11 @@ BlockAcctStats *blk_get_stats(BlockBackend *blk)
>  return >stats;
>  }
>  
> +BlockBackend *blk_stats2blk(BlockAcctStats *s)

(Extreme bikeshedding: I’d prefer something like blk_from_stats().)

> +{
> +return container_of(s, BlockBackend, stats);
> +}

Works, but I get all itchy, especially given the reasoning in the commit
message, which is basically “Right now this works”.

That sounds to me like maybe in the future someone could get the idea of
moving BlockAcctStats somewhere else and then this is something that we
absolutely must not forget about, lest horrible accidents occur.

Would a back pointer from BlockAcctStats to the BlockBackend work or do
you find that just too ugly and unnecessary?  (I think it would help at
least so that we do not forget this place here.)

Or maybe just a comment above BlockAcctStats would help quench my itch,
too, like “Note: blk_stats2blk() expects objects of this type only to
occur as part of a BlockBackend”.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/3] block: add logging facility for long standing IO requests

2020-09-17 Thread Max Reitz
On 10.08.20 12:14, Denis V. Lunev wrote:
> There are severe delays with IO requests processing if QEMU is running in
> virtual machine or over software defined storage. Such delays potentially
> results in unpredictable guest behavior. For example, guests over IDE or
> SATA drive could remount filesystem read-only if write is performed
> longer than 10 seconds.
> 
> Such reports are very complex to process. Some good starting point for this
> seems quite reasonable. This patch provides one. It adds logging of such
> potentially dangerous long IO operations.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  block/accounting.c | 59 +-
>  blockdev.c |  7 -
>  include/block/accounting.h |  5 +++-
>  3 files changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/block/accounting.c b/block/accounting.c
> index 8d41c8a83a..24f48c84b8 100644
> --- a/block/accounting.c
> +++ b/block/accounting.c

[...]

> @@ -182,6 +185,58 @@ void block_latency_histograms_clear(BlockAcctStats 
> *stats)
>  }
>  }
>  
> +static const char *block_account_type(enum BlockAcctType type)
> +{
> +switch (type) {
> +case BLOCK_ACCT_READ:
> +return "READ";
> +case BLOCK_ACCT_WRITE:
> +return "WRITE";
> +case BLOCK_ACCT_FLUSH:
> +return "DISCARD";
> +case BLOCK_ACCT_UNMAP:
> +return "TRUNCATE";

I don’t understand how FLUSH translates to DISCARD, and UNMAP to TRUNCATE.

> +case BLOCK_ACCT_NONE:
> +return "NONE";
> +case BLOCK_MAX_IOTYPE:
> +break;
> +}
> +return "UNKNOWN";
> +}
> +
> +static void block_acct_report_long(BlockAcctStats *stats,
> +   BlockAcctCookie *cookie, int64_t 
> latency_ns)
> +{
> +unsigned latency_ms = latency_ns / SCALE_MS;
> +int64_t start_time_host_s;
> +char buf[64];
> +struct tm t;
> +BlockDriverState *bs;
> +
> +if (cookie->type == BLOCK_ACCT_NONE) {
> +return;
> +}
> +if (stats->latency_log_threshold_ms == 0) {
> +return;
> +}
> +if (latency_ms < stats->latency_log_threshold_ms) {
> +return;
> +}
> +
> +start_time_host_s = cookie->start_time_ns / NANOSECONDS_PER_SECOND;
> +strftime(buf, sizeof(buf), "%m-%d %H:%M:%S",
> + localtime_r(_time_host_s, ));
> +
> +bs = blk_bs(blk_stats2blk(stats));
> +qemu_log("long %s[%ld] IO request: %d.03%d since %s.%03d bs: %s(%s, 
> %s)\n",
> + block_account_type(cookie->type), cookie->bytes,
> + (int)(latency_ms / 1000), (int)(latency_ms % 1000), buf,

Why not just latency_ms * .001f and %.3f?

> + (int)((cookie->start_time_ns / 100) % 1000),

s/100/SCALE_MS/? (But I’m not sure whether that’s what the SCALE_?S
are for.)

> + bs == NULL ? "unknown" : bdrv_get_node_name(bs),
> + bs == NULL ? "unknown" : bdrv_get_format_name(bs),
> + bs == NULL ? "unknown" : bs->filename);

Now that I’m writing this response already, I wonder whether a QMP event
wouldn’t be nice.  (But considering that accounting apparently just
doesn’t with -blockdev, I suppose that’s not that big of a worry.)

> +}
> +
>  static void block_account_one_io(BlockAcctStats *stats, BlockAcctCookie 
> *cookie,
>   bool failed)
>  {

[...]

> diff --git a/blockdev.c b/blockdev.c
> index 3848a9c8ab..66158d1292 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -622,7 +622,8 @@ static BlockBackend *blockdev_init(const char *file, 
> QDict *bs_opts,
>  
>  bs->detect_zeroes = detect_zeroes;
>  
> -block_acct_setup(blk_get_stats(blk), account_invalid, 
> account_failed);
> +block_acct_setup(blk_get_stats(blk), account_invalid, account_failed,
> +qemu_opt_get_number(opts, "latency-log-threshold", 0));

latency_log_threshold_ms is an unsigned int and so this will silently
overflow for values >= 2^32.

(Or for user-specified values < 0, which are wrapped around.)

>  
>  if (!parse_stats_intervals(blk_get_stats(blk), interval_list, errp)) 
> {
>  blk_unref(blk);
> @@ -3727,6 +3728,10 @@ QemuOptsList qemu_common_drive_opts = {
>  .type = QEMU_OPT_BOOL,
>  .help = "whether to account for failed I/O operations "
>  "in the statistics",
> +},{
> +.name = "latency-log-threshold",
> +.type = QEMU_OPT_NUMBER,
> +.help = "threshold for long I/O report (disabled if <=0), in ms",

Because of that overflow, negative values will not necessarily disable
reporting, because truncating them to 32 bit may make them small
absolute values again.

In any case, I’d just say “disabled if 0”, and not mention anything
about <0.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] file-posix: Correctly read max_segments of SG nodes

2020-09-17 Thread Max Reitz
On 12.08.20 00:51, Dmitry Fomichev wrote:
> If scsi-generic driver is in use, an SG node can be specified in
> the command line in place of a regular SCSI device. In this case,
> sg_get_max_segments() fails to open max_segments entry in sysfs
> because /dev/sgX is a character device. As the result, the maximum
> transfer size for the device may be calculated incorrectly, causing
> I/O errors if the maximum transfer size at the guest ends up to be
> larger compared to the host.
> 
> Check system device type in sg_get_max_segments() and read the
> max_segments value differently if it is a character device.
> 
> Reported-by: Johannes Thumshirn 
> Fixes: 9103f1ceb46614b150bcbc3c9a4fbc72b47fedcc
> Signed-off-by: Dmitry Fomichev 
> ---
>  block/file-posix.c | 55 +++---
>  1 file changed, 32 insertions(+), 23 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 094e3b0212..f9e2424e8f 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1108,6 +1108,7 @@ static int sg_get_max_segments(int fd)
>  int ret;
>  int sysfd = -1;
>  long max_segments;
> +unsigned int max_segs;
>  struct stat st;
>  
>  if (fstat(fd, )) {
> @@ -1115,30 +1116,38 @@ static int sg_get_max_segments(int fd)
>  goto out;
>  }
>  
> -sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> -major(st.st_rdev), minor(st.st_rdev));
> -sysfd = open(sysfspath, O_RDONLY);
> -if (sysfd == -1) {
> -ret = -errno;
> -goto out;
> +if (S_ISBLK(st.st_mode)) {
> +sysfspath = 
> g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> +major(st.st_rdev), minor(st.st_rdev));

Sounds reasonable, but this function is (naturally) only called if
bs->sg is true, which is set by hdev_is_sg(), which returns true only if
the device file is a character device.

So is this path ever taken, or can we just replace it all with the ioctl?

(Before 867eccfed84, this function was used for all host devices, which
might explain why the code even exists.)

Max




Re: [PATCH v2] qemu-img: Support bitmap --merge into backing image

2020-09-17 Thread Max Reitz
On 15.09.20 15:31, Eric Blake wrote:
> On 9/15/20 3:57 AM, Max Reitz wrote:
>> On 14.09.20 21:10, Eric Blake wrote:
>>> If you have the chain 'base.qcow2 <- top.qcow2' and want to merge a
>>> bitmap from top into base, qemu-img was failing with:
>>>
>>> qemu-img: Could not open 'top.qcow2': Could not open backing file:
>>> Failed to get shared "write" lock
>>> Is another process using the image [base.qcow2]?
>>>
>>> The easiest fix is to not open the entire backing chain of either
>>> image (source or destination); after all, the point of 'qemu-img
>>> bitmap' is solely to manipulate bitmaps directly within a single qcow2
>>> image, and this is made more precise if we don't pay attention to
>>> other images in the chain that may happen to have a bitmap by the same
>>> name.
>>>
>>> However, note that during normal usage, it is a feature that qemu will
>>> allow a bitmap from a backing image to be exposed by an overlay BDS;
>>> doing so makes it easier to perform incremental backup, where we have:
>>>
>>> Base <- Active <- temporrary
>>
>> *temporary
>>
>> (Also it’s a bit strange that “Base” and “Active” are capitalized, but
>> “temporary” isn’t)
>>
>>>    \--block job ->/
>>>
>>> with temporary being fed by a block-copy 'sync' job; when exposing
>>
>> s/block-copy 'sync'/backup 'sync=none'/?
> 
> Will fix both.
> 
>>
>>> temporary over NBD, referring to a bitmap that lives only in Active is
>>> less effort than having to copy a bitmap into temporary [1].  So the
>>> testsuite additions in this patch check both where bitmaps get
>>> allocated (the qemu-img info output), and, when NOT using 'qemu-img
>>> bitmap', that bitmaps are indeed visible through a backing chain.
>>
>> Well.  It is useful over NBD but I would doubt that it isn’t useful in
>> general.  For example, the QMP commands that refer to bitmaps always do
>> so through a node-name + bitmap-name combination, and they require that
>> the given bitmap is exactly on the given node.
>>
>> So I think this is a very much a case-by-case question.  (And in
>> practice, NBD seems to be the outlier, not qemu-img bitmap.)
>>
> 
> I'm happy to reword slightly to give that caveat.
> 
>>> [1] Full disclosure: prior to the recent commit 374eedd1c4 and
>>> friends, we were NOT able to see bitmaps through filters, which meant
>>> that we actually did not have nice clean semantics for uniformly being
>>> able to pick up bitmaps from anywhere in the backing chain (seen as a
>>> change in behavior between qemu 4.1 and 4.2 at commit 00e30f05de, when
>>> block-copy swapped from a one-off to a filter).  Which means libvirt
>>> was already coded to copy bitmaps around for the sake of older qemu,
>>> even though modern qemu no longer needs it.  Oh well.
>>>
>>> Fixes: http://bugzilla.redhat.com/1877209
>>> Reported-by: Eyal Shenitzky 
>>> Signed-off-by: Eric Blake 
>>> ---
>>>
>>> In v2:
>>> - also use BDRV_O_NO_BACKING on source [Max]
>>> - improved commit message [Max]
>>>
>>>   qemu-img.c | 11 ++--
>>>   tests/qemu-iotests/291 | 12 
>>>   tests/qemu-iotests/291.out | 56 ++
>>>   3 files changed, 76 insertions(+), 3 deletions(-)
>>
>> The code looks good to me, but I wonder whether in the commit message it
>> should be noted that we don’t want to let bitmaps from deeper nodes
>> shine through by default everywhere, but just in specific cases where
>> that’s useful (i.e. only NBD so far AFAIA).
> 
> So is this a Reviewed-by?  I'm happy to queue it through my bitmaps
> tree, if so.

It wasn’t meant as an R-b, because my R-b depends on how the commit
message addresses the question of when exactly bitmaps from the backing
chain should be visible on the top image.  Whether qemu-img bitmap is an
exception, or whether this is really a case-by-case question.

(I wanted to know whether you agree on including it.  Normally, I’m
happy to give an R-b on the basis of “with that done”, but in this case
I wasn’t entirely sure whether my request was reasonable, but I also
felt that in case it was, it wasn’t optional, given that you do have an
entire paragraph in the commit message dedicated to why the backing
image’s bitmap is visible on an image exported over NBD.)

I have to say I would like to see how you do phrase it in the end, but
given that you do agree on including it, I can give a

Reviewed-by: Max Reitz 

Now.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 21/29] block/export: Add BLOCK_EXPORT_DELETED event

2020-09-16 Thread Max Reitz
On 07.09.20 20:20, Kevin Wolf wrote:
> Clients may want to know when an export has finally disappeard
> (block-export-del returns earlier than that in the general case), so add
> a QAPI event for it.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json | 12 
>  block/export/export.c  |  2 ++
>  tests/qemu-iotests/140.out |  1 +
>  tests/qemu-iotests/223.out |  4 
>  4 files changed, 19 insertions(+)
> 
> diff --git a/qapi/block-export.json b/qapi/block-export.json
> index 77a6b595e8..dac3250f08 100644
> --- a/qapi/block-export.json
> +++ b/qapi/block-export.json
> @@ -233,3 +233,15 @@
>  ##
>  { 'command': 'block-export-del',
>'data': { 'id': 'str', '*mode': 'BlockExportRemoveMode' } }
> +
> +##
> +# @BLOCK_EXPORT_DELETED:
> +#
> +# Emitted when a block export is removed and it's id can be reused.
> +#
> +# @id: Block export id.
> +#
> +# Since: 5.2
> +##
> +{ 'event': 'BLOCK_EXPORT_DELETED',
> +  'data': { 'id': 'str' } }
> diff --git a/block/export/export.c b/block/export/export.c
> index 4af3b69186..ae7879e6a9 100644
> --- a/block/export/export.c
> +++ b/block/export/export.c
> @@ -19,6 +19,7 @@
>  #include "block/nbd.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-block-export.h"
> +#include "qapi/qapi-events-block-export.h"
>  #include "qemu/id.h"
>  
>  static const BlockExportDriver *blk_exp_drivers[] = {
> @@ -113,6 +114,7 @@ static void blk_exp_delete_bh(void *opaque)
>  assert(exp->refcount == 0);
>  QLIST_REMOVE(exp, next);
>  exp->drv->delete(exp);
> +qapi_event_send_block_export_deleted(exp->id);
>  g_free(exp->id);
>  g_free(exp);
>  
> diff --git a/tests/qemu-iotests/140.out b/tests/qemu-iotests/140.out
> index 86b985da75..f1db409b26 100644
> --- a/tests/qemu-iotests/140.out
> +++ b/tests/qemu-iotests/140.out
> @@ -15,6 +15,7 @@ read 65536/65536 bytes at offset 0
>  qemu-io: can't open device nbd+unix:///drv?socket=SOCK_DIR/nbd: Requested 
> export not available
>  server reported: export 'drv' not present
>  { 'execute': 'quit' }
> +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "BLOCK_EXPORT_DELETED", "data": {"id": "drv"}}
>  {"return": {}}
>  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
>  *** done

With -qed, this event appears immediately when the drive is ejected, so
this test now fails (for qed).

I’m not sure whether there’s much of a point in fixing it or just
marking qed as unsupported.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 16/29] block/export: Allocate BlockExport in blk_exp_add()

2020-09-16 Thread Max Reitz
On 07.09.20 20:19, Kevin Wolf wrote:
> Instead of letting the driver allocate and return the BlockExport
> object, allocate it already in blk_exp_add() and pass it. This allows us
> to initialise the generic part before calling into the driver so that
> the driver can just use these values instead of having to parse the
> options a second time.
> 
> For symmetry, move freeing the BlockExport to blk_exp_unref().
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Max Reitz 
> ---
>  include/block/export.h |  8 +++-
>  include/block/nbd.h| 11 ++-
>  block/export/export.c  | 18 +-
>  blockdev-nbd.c | 26 ++
>  nbd/server.c   | 30 +-
>  5 files changed, 57 insertions(+), 36 deletions(-)

[...]

> diff --git a/block/export/export.c b/block/export/export.c
> index 8635318ef1..03ff155f97 100644
> --- a/block/export/export.c
> +++ b/block/export/export.c
> @@ -39,6 +39,8 @@ static const BlockExportDriver 
> *blk_exp_find_driver(BlockExportType type)
>  BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
>  {
>  const BlockExportDriver *drv;
> +BlockExport *exp;
> +int ret;
>  
>  drv = blk_exp_find_driver(export->type);
>  if (!drv) {
> @@ -46,7 +48,20 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error 
> **errp)
>  return NULL;
>  }
>  
> -return drv->create(export, errp);
> +assert(drv->instance_size >= sizeof(BlockExport));
> +exp = g_malloc0(drv->instance_size);
> +*exp = (BlockExport) {
> +.drv= _exp_nbd,

Only noticed now when trying to base my FUSE stuff on this series:
s/blk_exp_nbd/drv/.

(I’d like to say “obviously”, but, well.  Evidently not obvious enough
for me.)

Max



signature.asc
Description: OpenPGP digital signature


[PULL 15/22] qcow2: Handle QCowL2Meta on error in preallocate_co()

2020-09-15 Thread Max Reitz
From: Alberto Garcia 

If qcow2_alloc_cluster_offset() or qcow2_alloc_cluster_link_l2() fail
then this function simply returns the error code, potentially leaking
the QCowL2Meta structure and leaving stale items in s->cluster_allocs.

A second problem is that this function calls qcow2_free_any_clusters()
on failure but passing a host cluster offset instead of an L2 entry.
Luckily for normal uncompressed clusters a raw offset also works like
a valid L2 entry so it works just the same, but we should be using
qcow2_free_clusters() instead.

This patch fixes both problems by using qcow2_handle_l2meta().

Signed-off-by: Alberto Garcia 
Message-Id: 

Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 block/qcow2.c | 40 +---
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3e8114dcf8..d241fb734c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2101,7 +2101,6 @@ static coroutine_fn int 
qcow2_handle_l2meta(BlockDriverState *bs,
 QCowL2Meta *next;
 
 if (link_l2) {
-assert(!l2meta->prealloc);
 ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
 if (ret) {
 goto out;
@@ -3123,7 +3122,7 @@ static int coroutine_fn preallocate_co(BlockDriverState 
*bs, uint64_t offset,
 int64_t file_length;
 unsigned int cur_bytes;
 int ret;
-QCowL2Meta *meta;
+QCowL2Meta *meta = NULL, *m;
 
 assert(offset <= new_length);
 bytes = new_length - offset;
@@ -3134,27 +3133,17 @@ static int coroutine_fn preallocate_co(BlockDriverState 
*bs, uint64_t offset,
  _offset, );
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Allocating clusters failed");
-return ret;
+goto out;
 }
 
-while (meta) {
-QCowL2Meta *next = meta->next;
-meta->prealloc = true;
-
-ret = qcow2_alloc_cluster_link_l2(bs, meta);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "Mapping clusters failed");
-qcow2_free_any_clusters(bs, meta->alloc_offset,
-meta->nb_clusters, 
QCOW2_DISCARD_NEVER);
-return ret;
-}
-
-/* There are no dependent requests, but we need to remove our
- * request from the list of in-flight requests */
-QLIST_REMOVE(meta, next_in_flight);
+for (m = meta; m != NULL; m = m->next) {
+m->prealloc = true;
+}
 
-g_free(meta);
-meta = next;
+ret = qcow2_handle_l2meta(bs, , true);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Mapping clusters failed");
+goto out;
 }
 
 /* TODO Preallocate data if requested */
@@ -3171,7 +3160,8 @@ static int coroutine_fn preallocate_co(BlockDriverState 
*bs, uint64_t offset,
 file_length = bdrv_getlength(s->data_file->bs);
 if (file_length < 0) {
 error_setg_errno(errp, -file_length, "Could not get file size");
-return file_length;
+ret = file_length;
+goto out;
 }
 
 if (host_offset + cur_bytes > file_length) {
@@ -3181,11 +3171,15 @@ static int coroutine_fn preallocate_co(BlockDriverState 
*bs, uint64_t offset,
 ret = bdrv_co_truncate(s->data_file, host_offset + cur_bytes, false,
mode, 0, errp);
 if (ret < 0) {
-return ret;
+goto out;
 }
 }
 
-return 0;
+ret = 0;
+
+out:
+qcow2_handle_l2meta(bs, , false);
+return ret;
 }
 
 /* qcow2_refcount_metadata_size:
-- 
2.26.2




[PULL 22/22] block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[]

2020-09-15 Thread Max Reitz
From: Stefano Garzarella 

Commit 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
introduced namespace support for RBD, but we forgot to add the
new 'namespace' options to qemu_rbd_strong_runtime_opts[].

The 'namespace' is used to identify the image, so it is a strong
option since it can changes the data of a BDS.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1821528
Fixes: 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
Cc: Florian Florensa 
Signed-off-by: Stefano Garzarella 
Message-Id: <20200914190553.74871-1-sgarz...@redhat.com>
Reviewed-by: Jason Dillaman 
Signed-off-by: Max Reitz 
---
 block/rbd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/rbd.c b/block/rbd.c
index 171c67e3a0..9bd2bce716 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1247,6 +1247,7 @@ static QemuOptsList qemu_rbd_create_opts = {
 
 static const char *const qemu_rbd_strong_runtime_opts[] = {
 "pool",
+"namespace",
 "image",
 "conf",
 "snapshot",
-- 
2.26.2




[PULL 12/22] qemu-img: Explicit number replaced by a constant

2020-09-15 Thread Max Reitz
From: Yi Li 

Signed-off-by: Yi Li 
Message-Id: <20200819013607.32280-1-y...@winhong.com>
Reviewed-by: Alberto Garcia 
Reviewed-by: Stefano Garzarella 
Signed-off-by: Max Reitz 
---
 qemu-img.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 37365d06fa..1d8c5cd778 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1201,10 +1201,10 @@ static int is_allocated_sectors(const uint8_t *buf, int 
n, int *pnum,
 *pnum = 0;
 return 0;
 }
-is_zero = buffer_is_zero(buf, 512);
+is_zero = buffer_is_zero(buf, BDRV_SECTOR_SIZE);
 for(i = 1; i < n; i++) {
-buf += 512;
-if (is_zero != buffer_is_zero(buf, 512)) {
+buf += BDRV_SECTOR_SIZE;
+if (is_zero != buffer_is_zero(buf, BDRV_SECTOR_SIZE)) {
 break;
 }
 }
@@ -2513,8 +2513,8 @@ static int img_convert(int argc, char **argv)
 }
 }
 
-qemu_opt_set_number(opts, BLOCK_OPT_SIZE, s.total_sectors * 512,
-_abort);
+qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
+s.total_sectors * BDRV_SECTOR_SIZE, _abort);
 ret = add_old_style_options(out_fmt, opts, out_baseimg, NULL);
 if (ret < 0) {
 goto out;
-- 
2.26.2




[PULL 18/22] block/rbd: remove runtime_opts

2020-09-15 Thread Max Reitz
From: John Snow 

This saw its last use in 4bfb274165ba.

Signed-off-by: John Snow 
Message-Id: <20200806211345.2925343-2-js...@redhat.com>
Signed-off-by: Max Reitz 
---
 block/rbd.c | 42 --
 1 file changed, 42 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 688074c64b..171c67e3a0 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -341,48 +341,6 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
 }
 }
 
-static QemuOptsList runtime_opts = {
-.name = "rbd",
-.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
-.desc = {
-{
-.name = "pool",
-.type = QEMU_OPT_STRING,
-.help = "Rados pool name",
-},
-{
-.name = "namespace",
-.type = QEMU_OPT_STRING,
-.help = "Rados namespace name in the pool",
-},
-{
-.name = "image",
-.type = QEMU_OPT_STRING,
-.help = "Image name in the pool",
-},
-{
-.name = "conf",
-.type = QEMU_OPT_STRING,
-.help = "Rados config file location",
-},
-{
-.name = "snapshot",
-.type = QEMU_OPT_STRING,
-.help = "Ceph snapshot name",
-},
-{
-/* maps to 'id' in rados_create() */
-.name = "user",
-.type = QEMU_OPT_STRING,
-.help = "Rados id name",
-},
-/*
- * server.* extracted manually, see qemu_rbd_mon_host()
- */
-{ /* end of list */ }
-},
-};
-
 /* FIXME Deprecate and remove keypairs or make it available in QMP. */
 static int qemu_rbd_do_create(BlockdevCreateOptions *options,
   const char *keypairs, const char 
*password_secret,
-- 
2.26.2




[PULL 16/22] qcow2: Make qcow2_free_any_clusters() free only one cluster

2020-09-15 Thread Max Reitz
From: Alberto Garcia 

This function takes an L2 entry and a number of clusters to free.
Although in principle it can free any type of cluster (using the L2
entry to determine its type) in practice the API is broken because
compressed clusters have a variable size and there is no way to free
more than one without having the L2 entry of each one of them.

The good news all callers are passing nb_clusters=1 so we can simply
get rid of that parameter.

Signed-off-by: Alberto Garcia 
Message-Id: 
<77cea0f4616f921d37e971b3c5b18a2faa24b173.1599573989.git.be...@igalia.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 block/qcow2.h  | 4 ++--
 block/qcow2-cluster.c  | 6 +++---
 block/qcow2-refcount.c | 8 
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 83cfa0c391..b73a4cf1f8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -861,8 +861,8 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size);
 void qcow2_free_clusters(BlockDriverState *bs,
   int64_t offset, int64_t size,
   enum qcow2_discard_type type);
-void qcow2_free_any_clusters(BlockDriverState *bs, uint64_t l2_entry,
- int nb_clusters, enum qcow2_discard_type type);
+void qcow2_free_any_cluster(BlockDriverState *bs, uint64_t l2_entry,
+enum qcow2_discard_type type);
 
 int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 int64_t l1_table_offset, int l1_size, int addend);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ce1260e746..1a67b2d928 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1096,7 +1096,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
  */
 if (!m->keep_old_clusters && j != 0) {
 for (i = 0; i < j; i++) {
-qcow2_free_any_clusters(bs, old_cluster[i], 1, 
QCOW2_DISCARD_NEVER);
+qcow2_free_any_cluster(bs, old_cluster[i], QCOW2_DISCARD_NEVER);
 }
 }
 
@@ -1913,7 +1913,7 @@ static int discard_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
 }
 /* Then decrease the refcount */
-qcow2_free_any_clusters(bs, old_l2_entry, 1, type);
+qcow2_free_any_cluster(bs, old_l2_entry, type);
 }
 
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
@@ -2005,7 +2005,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, 
uint64_t offset,
 
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 if (unmap) {
-qcow2_free_any_clusters(bs, old_l2_entry, 1, 
QCOW2_DISCARD_REQUEST);
+qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
 }
 set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
 if (has_subclusters(s)) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 51e9778ed8..8e649b008e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1157,8 +1157,8 @@ void qcow2_free_clusters(BlockDriverState *bs,
  * Free a cluster using its L2 entry (handles clusters of all types, e.g.
  * normal cluster, compressed cluster, etc.)
  */
-void qcow2_free_any_clusters(BlockDriverState *bs, uint64_t l2_entry,
- int nb_clusters, enum qcow2_discard_type type)
+void qcow2_free_any_cluster(BlockDriverState *bs, uint64_t l2_entry,
+enum qcow2_discard_type type)
 {
 BDRVQcow2State *s = bs->opaque;
 QCow2ClusterType ctype = qcow2_get_cluster_type(bs, l2_entry);
@@ -1169,7 +1169,7 @@ void qcow2_free_any_clusters(BlockDriverState *bs, 
uint64_t l2_entry,
  ctype == QCOW2_CLUSTER_ZERO_ALLOC))
 {
 bdrv_pdiscard(s->data_file, l2_entry & L2E_OFFSET_MASK,
-  nb_clusters << s->cluster_bits);
+  s->cluster_size);
 }
 return;
 }
@@ -1192,7 +1192,7 @@ void qcow2_free_any_clusters(BlockDriverState *bs, 
uint64_t l2_entry,
 l2_entry & L2E_OFFSET_MASK);
 } else {
 qcow2_free_clusters(bs, l2_entry & L2E_OFFSET_MASK,
-nb_clusters << s->cluster_bits, type);
+s->cluster_size, type);
 }
 break;
 case QCOW2_CLUSTER_ZERO_PLAIN:
-- 
2.26.2




[PULL 11/22] qcow2: Rewrite the documentation of qcow2_alloc_cluster_offset()

2020-09-15 Thread Max Reitz
From: Alberto Garcia 

The current text corresponds to an earlier, simpler version of this
function and it does not explain how it works now.

Signed-off-by: Alberto Garcia 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f65ccc5840..ce1260e746 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1714,18 +1714,22 @@ out:
 }
 
 /*
- * alloc_cluster_offset
+ * For a given area on the virtual disk defined by @offset and @bytes,
+ * find the corresponding area on the qcow2 image, allocating new
+ * clusters (or subclusters) if necessary. The result can span a
+ * combination of allocated and previously unallocated clusters.
  *
- * For a given offset on the virtual disk, find the cluster offset in qcow2
- * file. If the offset is not found, allocate a new cluster.
+ * On return, @host_offset is set to the beginning of the requested
+ * area. This area is guaranteed to be contiguous on the qcow2 file
+ * but it can be smaller than initially requested. In this case @bytes
+ * is updated with the actual size.
  *
- * If the cluster was already allocated, m->nb_clusters is set to 0 and
- * other fields in m are meaningless.
- *
- * If the cluster is newly allocated, m->nb_clusters is set to the number of
- * contiguous clusters that have been allocated. In this case, the other
- * fields of m are valid and contain information about the first allocated
- * cluster.
+ * If any clusters or subclusters were allocated then @m contains a
+ * list with the information of all the affected regions. Note that
+ * this can happen regardless of whether this function succeeds or
+ * not. The caller is responsible for updating the L2 metadata of the
+ * allocated clusters (on success) or freeing them (on failure), and
+ * for clearing the contents of @m afterwards in both cases.
  *
  * If the request conflicts with another write request in flight, the coroutine
  * is queued and will be reentered when the dependency has completed.
-- 
2.26.2




[PULL 05/22] qemu-iotests: Simplify FilePath __init__

2020-09-15 Thread Max Reitz
From: Nir Soffer 

Use list comprehension instead of append loop.

Signed-off-by: Nir Soffer 
Reviewed-by: Max Reitz 
Message-Id: <20200828232152.205833-6-nsof...@redhat.com>
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 1799338f8b..91e4a57126 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -470,9 +470,8 @@ class FilePath:
 
 """
 def __init__(self, *names, base_dir=test_dir):
-self.paths = []
-for name in names:
-self.paths.append(os.path.join(base_dir, file_pattern(name)))
+self.paths = [os.path.join(base_dir, file_pattern(name))
+  for name in names]
 
 def __enter__(self):
 if len(self.paths) == 1:
-- 
2.26.2




[PULL 20/22] qcow2: Make preallocate_co() resize the image to the correct size

2020-09-15 Thread Max Reitz
From: Alberto Garcia 

This function preallocates metadata structures and then extends the
image to its new size, but that new size calculation is wrong because
it doesn't take into account that the host_offset variable is always
cluster-aligned.

This problem can be reproduced with preallocation=metadata when the
original size is not cluster-aligned but the new size is. In this case
the final image size will be shorter than expected.

   qemu-img create -f qcow2 img.qcow2 31k
   qemu-img resize --preallocation=metadata img.qcow2 128k

Signed-off-by: Alberto Garcia 
Message-Id: 

Reviewed-by: Max Reitz 
[mreitz: Mark compat=0.10 unsupported for iotest 125]
Signed-off-by: Max Reitz 
---
 block/qcow2.c  |  1 +
 tests/qemu-iotests/125 | 44 ++
 tests/qemu-iotests/125.out | 28 ++--
 3 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 77c43ce178..1cb5daf39a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3135,6 +3135,7 @@ static int coroutine_fn preallocate_co(BlockDriverState 
*bs, uint64_t offset,
 error_setg_errno(errp, -ret, "Allocating clusters failed");
 goto out;
 }
+host_offset += offset_into_cluster(s, offset);
 
 for (m = meta; m != NULL; m = m->next) {
 m->prealloc = true;
diff --git a/tests/qemu-iotests/125 b/tests/qemu-iotests/125
index 7cb1c19730..5720e86dce 100755
--- a/tests/qemu-iotests/125
+++ b/tests/qemu-iotests/125
@@ -43,6 +43,10 @@ get_image_size_on_host()
 
 _supported_fmt qcow2
 _supported_proto file
+# Growing a file with a backing file (without preallocation=full or
+# =falloc) requires zeroing the newly added area, which is impossible
+# to do quickly for v2 images, and hence is unsupported.
+_unsupported_imgopts 'compat=0.10'
 
 if [ -z "$TEST_IMG_FILE" ]; then
 TEST_IMG_FILE=$TEST_IMG
@@ -168,24 +172,28 @@ done
 $QEMU_IMG create -f raw "$TEST_IMG.base" 128k | _filter_img_create
 $QEMU_IO -c 'write -q -P 1 0 128k' -f raw "$TEST_IMG.base"
 for orig_size in 31k 33k; do
-echo "--- Resizing image from $orig_size to 96k ---"
-_make_test_img -F raw -b "$TEST_IMG.base" -o cluster_size=64k "$orig_size"
-$QEMU_IMG resize -f "$IMGFMT" --preallocation=full "$TEST_IMG" 96k
-# The first part of the image should contain data from the backing file
-$QEMU_IO -c "read -q -P 1 0 ${orig_size}" "$TEST_IMG"
-# The resized part of the image should contain zeroes
-$QEMU_IO -c "read -q -P 0 ${orig_size} 63k" "$TEST_IMG"
-# If the image does not have an external data file we can also verify its
-# actual size. The resized image should have 7 clusters:
-# header, L1 table, L2 table, refcount table, refcount block, 2 data 
clusters
-if ! _get_data_file "$TEST_IMG" > /dev/null; then
-expected_file_length=$((65536 * 7))
-file_length=$(stat -c '%s' "$TEST_IMG_FILE")
-if [ "$file_length" != "$expected_file_length" ]; then
-echo "ERROR: file length $file_length (expected 
$expected_file_length)"
-fi
-fi
-echo
+for dst_size in 96k 128k; do
+for prealloc in metadata full; do
+echo "--- Resizing image from $orig_size to $dst_size 
(preallocation=$prealloc) ---"
+_make_test_img -F raw -b "$TEST_IMG.base" -o cluster_size=64k 
"$orig_size"
+$QEMU_IMG resize -f "$IMGFMT" --preallocation="$prealloc" 
"$TEST_IMG" "$dst_size"
+# The first part of the image should contain data from the backing 
file
+$QEMU_IO -c "read -q -P 1 0 ${orig_size}" "$TEST_IMG"
+# The resized part of the image should contain zeroes
+$QEMU_IO -c "read -q -P 0 ${orig_size} 63k" "$TEST_IMG"
+# If the image does not have an external data file we can also 
verify its
+# actual size. The resized image should have 7 clusters:
+# header, L1 table, L2 table, refcount table, refcount block, 2 
data clusters
+if ! _get_data_file "$TEST_IMG" > /dev/null; then
+expected_file_length=$((65536 * 7))
+file_length=$(stat -c '%s' "$TEST_IMG_FILE")
+if [ "$file_length" != "$expected_file_length" ]; then
+echo "ERROR: file length $file_length (expected 
$expected_file_length)"
+fi
+fi
+echo
+done
+done
 done
 
 # success, all done
diff --git a/tests/qemu-iotests/125.out b/tests/qemu-iotests/125.out
index 7f76f7af20..63a6e9e8a9 100644
--- a/tests/qemu-iotests/125.out
+++ b/tests/qemu-iote

[PULL 21/22] qcow2: Convert qcow2_alloc_cluster_offset() into qcow2_alloc_host_offset()

2020-09-15 Thread Max Reitz
From: Alberto Garcia 

qcow2_alloc_cluster_offset() takes an (unaligned) guest offset and
returns the (aligned) offset of the corresponding cluster in the qcow2
image.

In practice none of the callers need to know where the cluster starts
so this patch makes the function calculate and return the final host
offset directly. The function is also renamed accordingly.

See 388e581615 for a similar change to qcow2_get_cluster_offset().

Signed-off-by: Alberto Garcia 
Message-Id: 
<9bfef50ec9200d752413be4fc2aeb22a28378817.1599833007.git.be...@igalia.com>
Reviewed-by: Max Reitz 
Signed-off-by: Max Reitz 
---
 block/qcow2.h |  6 +++---
 block/qcow2-cluster.c | 14 ++
 block/qcow2.c | 36 +---
 3 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index b73a4cf1f8..b71e444fca 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -901,9 +901,9 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
   unsigned int *bytes, uint64_t *host_offset,
   QCow2SubclusterType *subcluster_type);
-int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
-   unsigned int *bytes, uint64_t *host_offset,
-   QCowL2Meta **m);
+int qcow2_alloc_host_offset(BlockDriverState *bs, uint64_t offset,
+unsigned int *bytes, uint64_t *host_offset,
+QCowL2Meta **m);
 int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
   uint64_t offset,
   int compressed_size,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 1a67b2d928..9acc6ce4ae 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1719,6 +1719,10 @@ out:
  * clusters (or subclusters) if necessary. The result can span a
  * combination of allocated and previously unallocated clusters.
  *
+ * Note that offset may not be cluster aligned. In this case, the returned
+ * *host_offset points to exact byte referenced by offset and therefore
+ * isn't cluster aligned as well.
+ *
  * On return, @host_offset is set to the beginning of the requested
  * area. This area is guaranteed to be contiguous on the qcow2 file
  * but it can be smaller than initially requested. In this case @bytes
@@ -1736,9 +1740,9 @@ out:
  *
  * Return 0 on success and -errno in error cases
  */
-int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
-   unsigned int *bytes, uint64_t *host_offset,
-   QCowL2Meta **m)
+int qcow2_alloc_host_offset(BlockDriverState *bs, uint64_t offset,
+unsigned int *bytes, uint64_t *host_offset,
+QCowL2Meta **m)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t start, remaining;
@@ -1759,7 +1763,7 @@ again:
 while (true) {
 
 if (*host_offset == INV_OFFSET && cluster_offset != INV_OFFSET) {
-*host_offset = start_of_cluster(s, cluster_offset);
+*host_offset = cluster_offset;
 }
 
 assert(remaining >= cur_bytes);
@@ -1842,6 +1846,8 @@ again:
 *bytes -= remaining;
 assert(*bytes > 0);
 assert(*host_offset != INV_OFFSET);
+assert(offset_into_cluster(s, *host_offset) ==
+   offset_into_cluster(s, offset));
 
 return 0;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index 1cb5daf39a..b05512718c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2559,7 +2559,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
 int offset_in_cluster;
 int ret;
 unsigned int cur_bytes; /* number of sectors in current iteration */
-uint64_t cluster_offset;
+uint64_t host_offset;
 QCowL2Meta *l2meta = NULL;
 AioTaskPool *aio = NULL;
 
@@ -2580,16 +2580,13 @@ static coroutine_fn int qcow2_co_pwritev_part(
 
 qemu_co_mutex_lock(>lock);
 
-ret = qcow2_alloc_cluster_offset(bs, offset, _bytes,
- _offset, );
+ret = qcow2_alloc_host_offset(bs, offset, _bytes,
+  _offset, );
 if (ret < 0) {
 goto out_locked;
 }
 
-assert(offset_into_cluster(s, cluster_offset) == 0);
-
-ret = qcow2_pre_write_overlap_check(bs, 0,
-cluster_offset + offset_in_cluster,
+ret = qcow2_pre_write_overlap_check(bs, 0, host_offset,
 cur_bytes, true);
 if (ret < 0) {
 goto out_locked;
@@ -2601,7 +2598,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
 aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
 }
 ret = qcow2_add_task(bs, aio

[PULL 06/22] block/quorum.c: stable children names

2020-09-15 Thread Max Reitz
From: Lukas Straub 

If we remove the child with the highest index from the quorum,
decrement s->next_child_index. This way we get stable children
names as long as we only remove the last child.

Signed-off-by: Lukas Straub 
Fixes: https://bugs.launchpad.net/bugs/1881231
Reviewed-by: Zhang Chen 
Reviewed-by: Alberto Garcia 
Message-Id: 
<5d5f930424c1c770754041aa8ad6421dc4e2b58e.1596536719.git.lukasstra...@web.de>
Signed-off-by: Max Reitz 
---
 block/quorum.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 6df9449fc2..e846a7e892 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -29,6 +29,8 @@
 
 #define HASH_LENGTH 32
 
+#define INDEXSTR_LEN 32
+
 #define QUORUM_OPT_VOTE_THRESHOLD "vote-threshold"
 #define QUORUM_OPT_BLKVERIFY  "blkverify"
 #define QUORUM_OPT_REWRITE"rewrite-corrupted"
@@ -970,9 +972,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 opened = g_new0(bool, s->num_children);
 
 for (i = 0; i < s->num_children; i++) {
-char indexstr[32];
-ret = snprintf(indexstr, 32, "children.%d", i);
-assert(ret < 32);
+char indexstr[INDEXSTR_LEN];
+ret = snprintf(indexstr, INDEXSTR_LEN, "children.%d", i);
+assert(ret < INDEXSTR_LEN);
 
 s->children[i] = bdrv_open_child(NULL, options, indexstr, bs,
  _of_bds, BDRV_CHILD_DATA, false,
@@ -1024,7 +1026,7 @@ static void quorum_add_child(BlockDriverState *bs, 
BlockDriverState *child_bs,
 {
 BDRVQuorumState *s = bs->opaque;
 BdrvChild *child;
-char indexstr[32];
+char indexstr[INDEXSTR_LEN];
 int ret;
 
 if (s->is_blkverify) {
@@ -1039,8 +1041,8 @@ static void quorum_add_child(BlockDriverState *bs, 
BlockDriverState *child_bs,
 return;
 }
 
-ret = snprintf(indexstr, 32, "children.%u", s->next_child_index);
-if (ret < 0 || ret >= 32) {
+ret = snprintf(indexstr, INDEXSTR_LEN, "children.%u", s->next_child_index);
+if (ret < 0 || ret >= INDEXSTR_LEN) {
 error_setg(errp, "cannot generate child name");
 return;
 }
@@ -1068,6 +1070,7 @@ static void quorum_del_child(BlockDriverState *bs, 
BdrvChild *child,
  Error **errp)
 {
 BDRVQuorumState *s = bs->opaque;
+char indexstr[INDEXSTR_LEN];
 int i;
 
 for (i = 0; i < s->num_children; i++) {
@@ -1089,6 +1092,11 @@ static void quorum_del_child(BlockDriverState *bs, 
BdrvChild *child,
 /* We know now that num_children > threshold, so blkverify must be false */
 assert(!s->is_blkverify);
 
+snprintf(indexstr, INDEXSTR_LEN, "children.%u", s->next_child_index - 1);
+if (!strncmp(child->name, indexstr, INDEXSTR_LEN)) {
+s->next_child_index--;
+}
+
 bdrv_drained_begin(bs);
 
 /* We can safely remove this child now */
-- 
2.26.2




[PULL 19/22] block/qcow: remove runtime opts

2020-09-15 Thread Max Reitz
From: John Snow 

Introduced by d85f4222b468,
These were seemingly never used at all.

Signed-off-by: John Snow 
Message-Id: <20200806211345.2925343-3-js...@redhat.com>
Signed-off-by: Max Reitz 
---
 block/qcow.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index e514a86fe5..f8919a44d1 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -105,15 +105,6 @@ static int qcow_probe(const uint8_t *buf, int buf_size, 
const char *filename)
 return 0;
 }
 
-static QemuOptsList qcow_runtime_opts = {
-.name = "qcow",
-.head = QTAILQ_HEAD_INITIALIZER(qcow_runtime_opts.head),
-.desc = {
-BLOCK_CRYPTO_OPT_DEF_QCOW_KEY_SECRET("encrypt."),
-{ /* end of list */ }
-},
-};
-
 static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
 {
-- 
2.26.2




[PULL 03/22] qemu-iotests: Support varargs syntax in FilePaths

2020-09-15 Thread Max Reitz
From: Nir Soffer 

Accept variable number of names instead of a sequence:

with FilePaths("a", "b", "c") as (a, b, c):

The disadvantage is that base_dir must be used as kwarg:

with FilePaths("a", "b", base_dir=soc_dir) as (sock1, sock2):

But this is more clear and calling optional argument as positional
arguments is bad idea anyway.

Signed-off-by: Nir Soffer 
Reviewed-by: Max Reitz 
Message-Id: <20200828232152.205833-4-nsof...@redhat.com>
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/194|  4 ++--
 tests/qemu-iotests/257| 10 --
 tests/qemu-iotests/iotests.py |  8 
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index da7c4265ec..08389f474e 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -26,8 +26,8 @@ iotests.script_initialize(supported_fmts=['qcow2', 'qed', 
'raw'],
 
 with iotests.FilePath('source.img') as source_img_path, \
  iotests.FilePath('dest.img') as dest_img_path, \
- iotests.FilePaths(['migration.sock', 'nbd.sock'], iotests.sock_dir) as \
- [migration_sock_path, nbd_sock_path], \
+ iotests.FilePaths('migration.sock', 'nbd.sock', 
base_dir=iotests.sock_dir) \
+as (migration_sock_path, nbd_sock_path), \
  iotests.VM('source') as source_vm, \
  iotests.VM('dest') as dest_vm:
 
diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
index e1e6077219..a9aa65bbe3 100755
--- a/tests/qemu-iotests/257
+++ b/tests/qemu-iotests/257
@@ -275,10 +275,9 @@ def test_bitmap_sync(bsync_mode, msync_mode='bitmap', 
failure=None):
 an incomplete backup. Testing limitations prevent
 testing competing writes.
 """
-with iotests.FilePaths(['img', 'bsync1', 'bsync2',
-'fbackup0', 'fbackup1', 'fbackup2']) as \
-(img_path, bsync1, bsync2,
- fbackup0, fbackup1, fbackup2), \
+with iotests.FilePaths(
+'img', 'bsync1', 'bsync2', 'fbackup0', 'fbackup1', 'fbackup2') as \
+(img_path, bsync1, bsync2, fbackup0, fbackup1, fbackup2), \
  iotests.VM() as vm:
 
 mode = "Mode {:s}; Bitmap Sync {:s}".format(msync_mode, bsync_mode)
@@ -441,8 +440,7 @@ def test_backup_api():
 """
 Test malformed and prohibited invocations of the backup API.
 """
-with iotests.FilePaths(['img', 'bsync1']) as \
- (img_path, backup_path), \
+with iotests.FilePaths('img', 'bsync1') as (img_path, backup_path), \
  iotests.VM() as vm:
 
 log("\n=== API failure tests ===\n")
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index bbe63a6da0..635ec99431 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -455,7 +455,7 @@ class FilePaths:
 
 Example usage:
 
-with FilePaths(['a.img', 'b.img']) as (img_a, img_b):
+with FilePaths('a.img', 'b.img') as (img_a, img_b):
 # Use img_a and img_b here...
 
 # a.img and b.img are automatically removed here.
@@ -463,10 +463,10 @@ class FilePaths:
 By default images are created in iotests.test_dir. To create sockets use
 iotests.sock_dir:
 
-   with FilePaths(['a.sock'], base_dir=iotests.sock_dir) as (sock,):
+   with FilePaths('a.sock', base_dir=iotests.sock_dir) as (sock,):
 
 """
-def __init__(self, names, base_dir=test_dir):
+def __init__(self, *names, base_dir=test_dir):
 self.paths = []
 for name in names:
 self.paths.append(os.path.join(base_dir, file_pattern(name)))
@@ -487,7 +487,7 @@ class FilePath(FilePaths):
 FilePath is a specialization of FilePaths that takes a single filename.
 """
 def __init__(self, name, base_dir=test_dir):
-super(FilePath, self).__init__([name], base_dir)
+super(FilePath, self).__init__(name, base_dir=base_dir)
 
 def __enter__(self):
 return self.paths[0]
-- 
2.26.2




[PULL 17/22] qcow2: Return the original error code in qcow2_co_pwrite_zeroes()

2020-09-15 Thread Max Reitz
From: Alberto Garcia 

This function checks the current status of a (sub)cluster in order to
see if an unaligned 'write zeroes' request can be done efficiently by
simply updating the L2 metadata and without having to write actual
zeroes to disk.

If the situation does not allow using the fast path then the function
returns -ENOTSUP and the caller falls back to writing zeroes.

If can happen however that the aforementioned check returns an actual
error code so in this case we should pass it to the caller.

Signed-off-by: Alberto Garcia 
Message-Id: <20200909123739.719-1-be...@igalia.com>
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 block/qcow2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d241fb734c..77c43ce178 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3907,7 +3907,7 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
  type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
  type != QCOW2_SUBCLUSTER_ZERO_ALLOC)) {
 qemu_co_mutex_unlock(>lock);
-return -ENOTSUP;
+return ret < 0 ? ret : -ENOTSUP;
 }
 } else {
 qemu_co_mutex_lock(>lock);
-- 
2.26.2




[PULL 07/22] qemu-img: avoid unaligned read requests during convert

2020-09-15 Thread Max Reitz
From: Peter Lieven 

in case of large continous areas that share the same allocation status
it happens that the value of s->sector_next_status is unaligned to the
cluster size or even request alignment of the source. Avoid this by
stripping down the s->sector_next_status position to cluster boundaries.

Signed-off-by: Peter Lieven 
Message-Id: <20200901125129.6398-1...@kamp.de>
[mreitz: Disable vhdx for 251]
Signed-off-by: Max Reitz 
---
 qemu-img.c | 22 ++
 tests/qemu-iotests/251 |  7 +--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index d0b1c97562..37365d06fa 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1666,6 +1666,7 @@ enum ImgConvertBlockStatus {
 typedef struct ImgConvertState {
 BlockBackend **src;
 int64_t *src_sectors;
+int *src_alignment;
 int src_num;
 int64_t total_sectors;
 int64_t allocated_sectors;
@@ -1732,6 +1733,7 @@ static int convert_iteration_sectors(ImgConvertState *s, 
int64_t sector_num)
 if (s->sector_next_status <= sector_num) {
 uint64_t offset = (sector_num - src_cur_offset) * BDRV_SECTOR_SIZE;
 int64_t count;
+int tail;
 BlockDriverState *src_bs = blk_bs(s->src[src_cur]);
 BlockDriverState *base;
 
@@ -1772,6 +1774,16 @@ static int convert_iteration_sectors(ImgConvertState *s, 
int64_t sector_num)
 
 n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
 
+/*
+ * Avoid that s->sector_next_status becomes unaligned to the source
+ * request alignment and/or cluster size to avoid unnecessary read
+ * cycles.
+ */
+tail = (sector_num - src_cur_offset + n) % s->src_alignment[src_cur];
+if (n > tail) {
+n -= tail;
+}
+
 if (ret & BDRV_BLOCK_ZERO) {
 s->status = post_backing_zero ? BLK_BACKING_FILE : BLK_ZERO;
 } else if (ret & BDRV_BLOCK_DATA) {
@@ -2410,8 +2422,10 @@ static int img_convert(int argc, char **argv)
 
 s.src = g_new0(BlockBackend *, s.src_num);
 s.src_sectors = g_new(int64_t, s.src_num);
+s.src_alignment = g_new(int, s.src_num);
 
 for (bs_i = 0; bs_i < s.src_num; bs_i++) {
+BlockDriverState *src_bs;
 s.src[bs_i] = img_open(image_opts, argv[optind + bs_i],
fmt, src_flags, src_writethrough, s.quiet,
force_share);
@@ -2426,6 +2440,13 @@ static int img_convert(int argc, char **argv)
 ret = -1;
 goto out;
 }
+src_bs = blk_bs(s.src[bs_i]);
+s.src_alignment[bs_i] = DIV_ROUND_UP(src_bs->bl.request_alignment,
+ BDRV_SECTOR_SIZE);
+if (!bdrv_get_info(src_bs, )) {
+s.src_alignment[bs_i] = MAX(s.src_alignment[bs_i],
+bdi.cluster_size / BDRV_SECTOR_SIZE);
+}
 s.total_sectors += s.src_sectors[bs_i];
 }
 
@@ -2708,6 +2729,7 @@ out:
 g_free(s.src);
 }
 g_free(s.src_sectors);
+g_free(s.src_alignment);
 fail_getopt:
 g_free(options);
 
diff --git a/tests/qemu-iotests/251 b/tests/qemu-iotests/251
index 7918ba3559..294773bdc1 100755
--- a/tests/qemu-iotests/251
+++ b/tests/qemu-iotests/251
@@ -46,8 +46,11 @@ if [ "$IMGOPTSSYNTAX" = "true" ]; then
 # We use json:{} filenames here, so we cannot work with additional options.
 _unsupported_fmt $IMGFMT
 else
-# With VDI, the output is ordered differently.  Just disable it.
-_unsupported_fmt vdi
+# - With VDI, the output is ordered differently.  Just disable it.
+# - VHDX has large clusters; because qemu-img convert tries to
+#   align the requests to the cluster size, the output is ordered
+#   differently, so disable it, too.
+_unsupported_fmt vdi vhdx
 fi
 
 
-- 
2.26.2




[PULL 08/22] qcow2: Use macros for the L1, refcount and bitmap table entry sizes

2020-09-15 Thread Max Reitz
From: Alberto Garcia 

This patch replaces instances of sizeof(uint64_t) in the qcow2 driver
with macros that indicate what those sizes are actually referring to.

Signed-off-by: Alberto Garcia 
Message-Id: <20200828110828.13833-1-be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2.h  |  6 +++
 block/qcow2-bitmap.c   | 11 --
 block/qcow2-cluster.c  | 24 ++--
 block/qcow2-refcount.c | 89 ++
 block/qcow2-snapshot.c | 20 +-
 block/qcow2.c  | 27 ++---
 6 files changed, 94 insertions(+), 83 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 065ec3df0b..83cfa0c391 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -99,6 +99,12 @@
 #define L2E_SIZE_NORMAL   (sizeof(uint64_t))
 #define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
 
+/* Size of L1 table entries */
+#define L1E_SIZE (sizeof(uint64_t))
+
+/* Size of reftable entries */
+#define REFTABLE_ENTRY_SIZE (sizeof(uint64_t))
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 8c34b2aef7..d7a31a8ddc 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -42,6 +42,9 @@
 #define BME_MIN_GRANULARITY_BITS 9
 #define BME_MAX_NAME_SIZE 1023
 
+/* Size of bitmap table entries */
+#define BME_TABLE_ENTRY_SIZE (sizeof(uint64_t))
+
 QEMU_BUILD_BUG_ON(BME_MAX_NAME_SIZE != BDRV_BITMAP_MAX_NAME_SIZE);
 
 #if BME_MAX_TABLE_SIZE * 8ULL > INT_MAX
@@ -232,7 +235,7 @@ static int bitmap_table_load(BlockDriverState *bs, 
Qcow2BitmapTable *tb,
 
 assert(tb->size <= BME_MAX_TABLE_SIZE);
 ret = bdrv_pread(bs->file, tb->offset,
- table, tb->size * sizeof(uint64_t));
+ table, tb->size * BME_TABLE_ENTRY_SIZE);
 if (ret < 0) {
 goto fail;
 }
@@ -265,7 +268,7 @@ static int free_bitmap_clusters(BlockDriverState *bs, 
Qcow2BitmapTable *tb)
 }
 
 clear_bitmap_table(bs, bitmap_table, tb->size);
-qcow2_free_clusters(bs, tb->offset, tb->size * sizeof(uint64_t),
+qcow2_free_clusters(bs, tb->offset, tb->size * BME_TABLE_ENTRY_SIZE,
 QCOW2_DISCARD_OTHER);
 g_free(bitmap_table);
 
@@ -690,7 +693,7 @@ int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
 ret = qcow2_inc_refcounts_imrt(bs, res,
refcount_table, refcount_table_size,
bm->table.offset,
-   bm->table.size * sizeof(uint64_t));
+   bm->table.size * BME_TABLE_ENTRY_SIZE);
 if (ret < 0) {
 goto out;
 }
@@ -1797,7 +1800,7 @@ uint64_t 
qcow2_get_persistent_dirty_bitmap_size(BlockDriverState *in_bs,
 /* Assume the entire bitmap is allocated */
 bitmaps_size += bmclusters * cluster_size;
 /* Also reserve space for the bitmap table entries */
-bitmaps_size += ROUND_UP(bmclusters * sizeof(uint64_t),
+bitmaps_size += ROUND_UP(bmclusters * BME_TABLE_ENTRY_SIZE,
  cluster_size);
 /* And space for contribution to bitmap directory size */
 bitmap_dir_size += calc_dir_entry_size(strlen(name), 0);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index fcdf7af8e6..4530e5e471 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -47,8 +47,8 @@ int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t 
exact_size)
 
 BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
 ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
-   new_l1_size * sizeof(uint64_t),
- (s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
+   new_l1_size * L1E_SIZE,
+ (s->l1_size - new_l1_size) * L1E_SIZE, 0);
 if (ret < 0) {
 goto fail;
 }
@@ -76,7 +76,7 @@ fail:
  * l1_table in memory to avoid possible image corruption.
  */
 memset(s->l1_table + new_l1_size, 0,
-   (s->l1_size - new_l1_size) * sizeof(uint64_t));
+   (s->l1_size - new_l1_size) * L1E_SIZE);
 return ret;
 }
 
@@ -96,7 +96,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
min_size,
 /* Do a sanity check on min_size before trying to calculate new_l1_size
  * (this prevents overflows during the while loop for the calculation of
  * new_l1_size) */
-if (min_size > INT_MAX / sizeof(uint64_t)) {
+if (min_size > INT_MAX / L1E_SIZE) {
 return -EFBIG;
 }
 
@@ -114,7 +114,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
min_size,
 }
 
 QEMU_BUILD_BUG_ON(QCOW_MAX_L1_SIZE > INT_MAX);
-if (new_l1_size > QCOW_MAX_L1_SIZE / sizeof(uint64_t)) {
+if (new_l

[PULL 04/22] qemu-iotests: Merge FilePaths and FilePath

2020-09-15 Thread Max Reitz
From: Nir Soffer 

FilePath creates now one temporary file:

with FilePath("a") as a:

Or more:

with FilePath("a", "b", "c") as (a, b, c):

This is also the behavior of the file_path() helper, used by some of the
tests. Now we have only 2 helpers for creating temporary files instead
of 3.

Signed-off-by: Nir Soffer 
Reviewed-by: Max Reitz 
Message-Id: <20200828232152.205833-5-nsof...@redhat.com>
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/194|  2 +-
 tests/qemu-iotests/208|  2 +-
 tests/qemu-iotests/222|  2 +-
 tests/qemu-iotests/257|  4 ++--
 tests/qemu-iotests/iotests.py | 23 ++-
 5 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index 08389f474e..7a4863ab18 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -26,7 +26,7 @@ iotests.script_initialize(supported_fmts=['qcow2', 'qed', 
'raw'],
 
 with iotests.FilePath('source.img') as source_img_path, \
  iotests.FilePath('dest.img') as dest_img_path, \
- iotests.FilePaths('migration.sock', 'nbd.sock', 
base_dir=iotests.sock_dir) \
+ iotests.FilePath('migration.sock', 'nbd.sock', base_dir=iotests.sock_dir) 
\
 as (migration_sock_path, nbd_sock_path), \
  iotests.VM('source') as source_vm, \
  iotests.VM('dest') as dest_vm:
diff --git a/tests/qemu-iotests/208 b/tests/qemu-iotests/208
index 6cb642f821..54aa4be273 100755
--- a/tests/qemu-iotests/208
+++ b/tests/qemu-iotests/208
@@ -26,7 +26,7 @@ iotests.script_initialize(supported_fmts=['generic'])
 
 with iotests.FilePath('disk.img') as disk_img_path, \
  iotests.FilePath('disk-snapshot.img') as disk_snapshot_img_path, \
- iotests.FilePath('nbd.sock', iotests.sock_dir) as nbd_sock_path, \
+ iotests.FilePath('nbd.sock', base_dir=iotests.sock_dir) as nbd_sock_path, 
\
  iotests.VM() as vm:
 
 img_size = '10M'
diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/222
index 6602f6b4ba..14d67c875b 100755
--- a/tests/qemu-iotests/222
+++ b/tests/qemu-iotests/222
@@ -49,7 +49,7 @@ remainder = [("0xd5", "0x108000",  "32k"), # Right-end of 
partial-left [1]
 
 with iotests.FilePath('base.img') as base_img_path, \
  iotests.FilePath('fleece.img') as fleece_img_path, \
- iotests.FilePath('nbd.sock', iotests.sock_dir) as nbd_sock_path, \
+ iotests.FilePath('nbd.sock', base_dir=iotests.sock_dir) as nbd_sock_path, 
\
  iotests.VM() as vm:
 
 log('--- Setting up images ---')
diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
index a9aa65bbe3..c80e06ae28 100755
--- a/tests/qemu-iotests/257
+++ b/tests/qemu-iotests/257
@@ -275,7 +275,7 @@ def test_bitmap_sync(bsync_mode, msync_mode='bitmap', 
failure=None):
 an incomplete backup. Testing limitations prevent
 testing competing writes.
 """
-with iotests.FilePaths(
+with iotests.FilePath(
 'img', 'bsync1', 'bsync2', 'fbackup0', 'fbackup1', 'fbackup2') as \
 (img_path, bsync1, bsync2, fbackup0, fbackup1, fbackup2), \
  iotests.VM() as vm:
@@ -440,7 +440,7 @@ def test_backup_api():
 """
 Test malformed and prohibited invocations of the backup API.
 """
-with iotests.FilePaths('img', 'bsync1') as (img_path, backup_path), \
+with iotests.FilePath('img', 'bsync1') as (img_path, backup_path), \
  iotests.VM() as vm:
 
 log("\n=== API failure tests ===\n")
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 635ec99431..1799338f8b 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -448,14 +448,14 @@ class Timeout:
 def file_pattern(name):
 return "{0}-{1}".format(os.getpid(), name)
 
-class FilePaths:
+class FilePath:
 """
 Context manager generating multiple file names. The generated files are
 removed when exiting the context.
 
 Example usage:
 
-with FilePaths('a.img', 'b.img') as (img_a, img_b):
+with FilePath('a.img', 'b.img') as (img_a, img_b):
 # Use img_a and img_b here...
 
 # a.img and b.img are automatically removed here.
@@ -463,7 +463,10 @@ class FilePaths:
 By default images are created in iotests.test_dir. To create sockets use
 iotests.sock_dir:
 
-   with FilePaths('a.sock', base_dir=iotests.sock_dir) as (sock,):
+   with FilePath('a.sock', base_dir=iotests.sock_dir) as sock:
+
+For convenience, calling with one argument yields a single file instead of
+a tuple with one item.
 
 """
 def __init__(self, *names, base_dir=test_dir):
@@ -472,7 +475,10 @@ class FilePaths:
 self.paths.append(os.path.join(base_dir, file_pattern(name)))
 
 def __enter__(self):
-return self.

[PULL 02/22] qemu-iotests: Fix FilePaths docstring

2020-09-15 Thread Max Reitz
From: Nir Soffer 

When this class was extracted from FilePath, the docstring was not
updated for generating multiple files, and the example usage was
referencing unrelated file.

While fixing the docstring, add example for creating sockets, which
should use iotests.sock_dir instead of the default base_dir.

Fixes: de263986b5dc
Signed-off-by: Nir Soffer 
Message-Id: <20200828232152.205833-3-nsof...@redhat.com>
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 36814daf84..bbe63a6da0 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -450,14 +450,21 @@ def file_pattern(name):
 
 class FilePaths:
 """
-FilePaths is an auto-generated filename that cleans itself up.
+Context manager generating multiple file names. The generated files are
+removed when exiting the context.
 
-Use this context manager to generate filenames and ensure that the file
-gets deleted::
+Example usage:
+
+with FilePaths(['a.img', 'b.img']) as (img_a, img_b):
+# Use img_a and img_b here...
+
+# a.img and b.img are automatically removed here.
+
+By default images are created in iotests.test_dir. To create sockets use
+iotests.sock_dir:
+
+   with FilePaths(['a.sock'], base_dir=iotests.sock_dir) as (sock,):
 
-with FilePaths(['test.img']) as img_path:
-qemu_img('create', img_path, '1G')
-# migration_sock_path is automatically deleted
 """
 def __init__(self, names, base_dir=test_dir):
 self.paths = []
-- 
2.26.2




[PULL 10/22] qcow2: Don't check nb_clusters when removing l2meta from the list

2020-09-15 Thread Max Reitz
From: Alberto Garcia 

In the past, when a new cluster was allocated the l2meta structure was
a variable in the stack so it was necessary to have a way to tell
whether it had been initialized and contained valid data or not. The
nb_clusters field was used for this purpose. Since commit f50f88b9fe
this is no longer the case, l2meta (nowadays a pointer to a list) is
only allocated when needed and nb_clusters is guaranteed to be > 0 so
this check is unnecessary.

Signed-off-by: Alberto Garcia 
Message-Id: 

Signed-off-by: Max Reitz 
---
 block/qcow2.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ef9a45e82f..3e8114dcf8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2111,9 +2111,7 @@ static coroutine_fn int 
qcow2_handle_l2meta(BlockDriverState *bs,
 }
 
 /* Take the request off the list of running requests */
-if (l2meta->nb_clusters != 0) {
-QLIST_REMOVE(l2meta, next_in_flight);
-}
+QLIST_REMOVE(l2meta, next_in_flight);
 
 qemu_co_queue_restart_all(>dependent_requests);
 
-- 
2.26.2




[PULL 01/22] qemu-iotests: Fix FilePaths cleanup

2020-09-15 Thread Max Reitz
From: Nir Soffer 

If os.remove() fails to remove one of the paths, for example if the file
was removed by the test, the cleanup loop would exit silently, without
removing the rest of the files.

Fixes: de263986b5dc
Signed-off-by: Nir Soffer 
Reviewed-by: Max Reitz 
Message-Id: <20200828232152.205833-2-nsof...@redhat.com>
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 64ccaf9061..36814daf84 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -468,11 +468,11 @@ class FilePaths:
 return self.paths
 
 def __exit__(self, exc_type, exc_val, exc_tb):
-try:
-for path in self.paths:
+for path in self.paths:
+try:
 os.remove(path)
-except OSError:
-pass
+except OSError:
+pass
 return False
 
 class FilePath(FilePaths):
-- 
2.26.2




[PULL 13/22] iotests: Skip test_stream_parallel in test 030 when doing "make check"

2020-09-15 Thread Max Reitz
From: Thomas Huth 

The test_stream_parallel test still occasionally fails in the CI.
Thus let's disable it during "make check" for now so that it does
not cause trouble during merge tests. We can enable it again once
the problem has been resolved.

Signed-off-by: Thomas Huth 
Message-Id: <20200907113824.134788-1-th...@redhat.com>
Signed-off-by: Max Reitz 
---
 tests/check-block.sh   | 3 +++
 tests/qemu-iotests/030 | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/tests/check-block.sh b/tests/check-block.sh
index 8e29c868e5..a5a69060e1 100755
--- a/tests/check-block.sh
+++ b/tests/check-block.sh
@@ -55,6 +55,9 @@ fi
 
 cd tests/qemu-iotests
 
+# QEMU_CHECK_BLOCK_AUTO is used to disable some unstable sub-tests
+export QEMU_CHECK_BLOCK_AUTO=1
+
 ret=0
 for fmt in $format_list ; do
 ./check -makecheck -$fmt $group || ret=1
diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 31c028306b..dcb4b5d6a6 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -21,6 +21,7 @@
 import time
 import os
 import iotests
+import unittest
 from iotests import qemu_img, qemu_io
 
 backing_img = os.path.join(iotests.test_dir, 'backing.img')
@@ -228,6 +229,7 @@ class TestParallelOps(iotests.QMPTestCase):
 
 # Test that it's possible to run several block-stream operations
 # in parallel in the same snapshot chain
+@unittest.skipIf(os.environ.get('QEMU_CHECK_BLOCK_AUTO'), 'disabled in CI')
 def test_stream_parallel(self):
 self.assert_no_active_block_jobs()
 
-- 
2.26.2




[PULL 14/22] block/vhdx: Support vhdx image only with 512 bytes logical sector size

2020-09-15 Thread Max Reitz
From: Swapnil Ingle 

block/vhdx uses qemu block layer where sector size is always 512 bytes.
This may have issues  with 4K logical sector sized vhdx image.

For e.g qemu-img convert on such images fails with following assert:

$qemu-img convert -f vhdx -O raw 4KTest1.vhdx test.raw
qemu-img: util/iov.c:388: qiov_slice: Assertion `offset + len <=
qiov->size' failed.
Aborted

This patch adds an check to return ENOTSUP for vhdx images which
have logical sector size other than 512 bytes.

Signed-off-by: Swapnil Ingle 
Message-Id: <1596794594-44531-1-git-send-email-swapnil.in...@nutanix.com>
Signed-off-by: Max Reitz 
---
 block/vhdx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 791eb90263..356ec4c455 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -816,9 +816,9 @@ static int vhdx_parse_metadata(BlockDriverState *bs, 
BDRVVHDXState *s)
 goto exit;
 }
 
-/* only 2 supported sector sizes */
-if (s->logical_sector_size != 512 && s->logical_sector_size != 4096) {
-ret = -EINVAL;
+/* Currently we only support 512 */
+if (s->logical_sector_size != 512) {
+ret = -ENOTSUP;
 goto exit;
 }
 
-- 
2.26.2




[PULL 09/22] qcow2: Fix removal of list members from BDRVQcow2State.cluster_allocs

2020-09-15 Thread Max Reitz
From: Alberto Garcia 

When a write request needs to allocate new clusters (or change the L2
bitmap of existing ones) a QCowL2Meta structure is created so the L2
metadata can be later updated and any copy-on-write can be performed
if necessary.

A write request can span a region consisting of an arbitrary
combination of previously unallocated and allocated clusters, and if
the unallocated ones can be put contiguous to the existing ones then
QEMU will do so in order to minimize the number of write operations.

In practice this means that a write request has not just one but a
number of QCowL2Meta structures. All of them are added to the
cluster_allocs list that is stored in BDRVQcow2State and is used to
detect overlapping requests. After the write request finishes all its
associated QCowL2Meta are removed from that list. calculate_l2_meta()
takes care of creating and putting those structures in the list, and
qcow2_handle_l2meta() takes care of removing them.

The problem is that the error path in handle_alloc() also tries to
remove an item in that list, a remnant from the time when this was
handled there (that code would not even be correct anymore because
it only removes one struct and not all the ones from the same write
request).

This can trigger a double removal of the same item from the list,
causing a crash. This is not easy to reproduce in practice because
it requires that do_alloc_cluster_offset() fails after a successful
previous allocation during the same write request, but it can be
reproduced with the included test case.

Signed-off-by: Alberto Garcia 
Message-Id: 
<3440a1c4d53c4fe48312b478c96accb338cbef7c.1599150873.git.be...@igalia.com>
Signed-off-by: Max Reitz 
---
 block/qcow2-cluster.c  |  3 --
 tests/qemu-iotests/305 | 74 ++
 tests/qemu-iotests/305.out | 16 +
 tests/qemu-iotests/group   |  1 +
 4 files changed, 91 insertions(+), 3 deletions(-)
 create mode 100755 tests/qemu-iotests/305
 create mode 100644 tests/qemu-iotests/305.out

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4530e5e471..f65ccc5840 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1710,9 +1710,6 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 
 out:
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
-if (ret < 0 && *m && (*m)->nb_clusters > 0) {
-QLIST_REMOVE(*m, next_in_flight);
-}
 return ret;
 }
 
diff --git a/tests/qemu-iotests/305 b/tests/qemu-iotests/305
new file mode 100755
index 00..768818af4a
--- /dev/null
+++ b/tests/qemu-iotests/305
@@ -0,0 +1,74 @@
+#!/usr/bin/env bash
+#
+# Test the handling of errors in write requests with multiple allocations
+#
+# Copyright (C) 2020 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=be...@igalia.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1# failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+_unsupported_imgopts cluster_size refcount_bits extended_l2 compat=0.10 
data_file
+
+echo '### Create the image'
+_make_test_img -o refcount_bits=64,cluster_size=1k 1M
+
+# The reference counts of the clusters for the first 123k of this
+# write request are stored in the first refcount block. The last
+# cluster (guest offset 123k) is referenced in the second refcount
+# block.
+echo '### Fill the first refcount block and one data cluster from the second'
+$QEMU_IO -c 'write 0 124k' "$TEST_IMG" | _filter_qemu_io
+
+echo '### Discard two of the last data clusters, leave one in the middle'
+$QEMU_IO -c 'discard 121k 1k' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c 'discard 123k 1k' "$TEST_IMG" | _filter_qemu_io
+
+echo '### Corrupt the offset of the second refcount block'
+refcount_table_offset=$(peek_file_be "$TEST_IMG" 48 8)
+poke_file "$TEST_IMG" $(($refcount_table_offset+14)) "\x06"
+
+# This tries to allocate the two clusters discarded earlier (guest
+# offsets 121k and 123k). Their reference counts are in the first a

[PULL 00/22] Block patches

2020-09-15 Thread Max Reitz
The following changes since commit 2d2c73d0e3d504a61f868e46e6abd5643f38091b:

  Merge remote-tracking branch 
'remotes/pmaydell/tags/pull-target-arm-20200914-1' into staging (2020-09-14 
16:03:08 +0100)

are available in the Git repository at:

  https://github.com/XanClic/qemu.git tags/pull-block-2020-09-15

for you to fetch changes up to 7bae7c805d82675eb3a02c744093703d84ada2d6:

  block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[] (2020-09-15 
11:31:10 +0200)


Block patches:
- Several qcow2 fixes and refactorings
- Let qemu-img convert try to stay at cluster boundaries
- Stable child names for quorum (with x-blockdev-change)
- Explicitly drop vhdx 4k sector support, as it was never actually
  working
- rbd: Mark @namespace a strong runtime option
- iotests.py improvements
- Drop unused runtime_opts objects
- Skip a test case in 030 when run through make check-block


Alberto Garcia (9):
  qcow2: Use macros for the L1, refcount and bitmap table entry sizes
  qcow2: Fix removal of list members from BDRVQcow2State.cluster_allocs
  qcow2: Don't check nb_clusters when removing l2meta from the list
  qcow2: Rewrite the documentation of qcow2_alloc_cluster_offset()
  qcow2: Handle QCowL2Meta on error in preallocate_co()
  qcow2: Make qcow2_free_any_clusters() free only one cluster
  qcow2: Return the original error code in qcow2_co_pwrite_zeroes()
  qcow2: Make preallocate_co() resize the image to the correct size
  qcow2: Convert qcow2_alloc_cluster_offset() into
qcow2_alloc_host_offset()

John Snow (2):
  block/rbd: remove runtime_opts
  block/qcow: remove runtime opts

Lukas Straub (1):
  block/quorum.c: stable children names

Nir Soffer (5):
  qemu-iotests: Fix FilePaths cleanup
  qemu-iotests: Fix FilePaths docstring
  qemu-iotests: Support varargs syntax in FilePaths
  qemu-iotests: Merge FilePaths and FilePath
  qemu-iotests: Simplify FilePath __init__

Peter Lieven (1):
  qemu-img: avoid unaligned read requests during convert

Stefano Garzarella (1):
  block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[]

Swapnil Ingle (1):
  block/vhdx: Support vhdx image only with 512 bytes logical sector size

Thomas Huth (1):
  iotests: Skip test_stream_parallel in test 030 when doing "make check"

Yi Li (1):
  qemu-img: Explicit number replaced by a constant

 block/qcow2.h |  16 +++--
 block/qcow.c  |   9 ---
 block/qcow2-bitmap.c  |  11 ++--
 block/qcow2-cluster.c |  69 --
 block/qcow2-refcount.c|  97 +++---
 block/qcow2-snapshot.c|  20 +++
 block/qcow2.c | 108 ++
 block/quorum.c|  20 +--
 block/rbd.c   |  43 +-
 block/vhdx.c  |   6 +-
 qemu-img.c|  32 --
 tests/check-block.sh  |   3 +
 tests/qemu-iotests/030|   2 +
 tests/qemu-iotests/125|  44 --
 tests/qemu-iotests/125.out|  28 -
 tests/qemu-iotests/194|   4 +-
 tests/qemu-iotests/208|   2 +-
 tests/qemu-iotests/222|   2 +-
 tests/qemu-iotests/251|   7 ++-
 tests/qemu-iotests/257|  10 ++--
 tests/qemu-iotests/305|  74 +++
 tests/qemu-iotests/305.out|  16 +
 tests/qemu-iotests/group  |   1 +
 tests/qemu-iotests/iotests.py |  53 +
 24 files changed, 395 insertions(+), 282 deletions(-)
 create mode 100755 tests/qemu-iotests/305
 create mode 100644 tests/qemu-iotests/305.out

-- 
2.26.2




Re: [PATCH 1/2] qcow2: Make preallocate_co() resize the image to the correct size

2020-09-15 Thread Max Reitz
On 11.09.20 16:09, Alberto Garcia wrote:
> This function preallocates metadata structures and then extends the
> image to its new size, but that new size calculation is wrong because
> it doesn't take into account that the host_offset variable is always
> cluster-aligned.
> 
> This problem can be reproduced with preallocation=metadata when the
> original size is not cluster-aligned but the new size is. In this case
> the final image size will be shorter than expected.
> 
>qemu-img create -f qcow2 img.qcow2 31k
>qemu-img resize --preallocation=metadata img.qcow2 128k
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2.c  |  1 +
>  tests/qemu-iotests/125 | 40 +-
>  tests/qemu-iotests/125.out | 28 --
>  3 files changed, 49 insertions(+), 20 deletions(-)

The test additions make this test fail with compat=0.10.  Are you OK
with disabling compat=0.10 by squashing this in?

diff --git a/tests/qemu-iotests/125 b/tests/qemu-iotests/125
index 1f35598b2b..894d53f2bd 100755
--- a/tests/qemu-iotests/125
+++ b/tests/qemu-iotests/125
@@ -43,6 +43,10 @@ get_image_size_on_host()

 _supported_fmt qcow2
 _supported_proto file
+# Growing a file with a backing file (without preallocation=full or
+# =falloc) requires zeroing the newly added area, which is impossible
+# to do quickly for v2 images, and hence is unsupported.
+_unsupported_imgopts 'compat=0.10'

 if [ -z "$TEST_IMG_FILE" ]; then
 TEST_IMG_FILE=$TEST_IMG



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[]

2020-09-15 Thread Max Reitz
On 14.09.20 21:05, Stefano Garzarella wrote:
> Commit 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
> introduced namespace support for RBD, but we forgot to add the
> new 'namespace' options to qemu_rbd_strong_runtime_opts[].
> 
> The 'namespace' is used to identify the image, so it is a strong
> option since it can changes the data of a BDS.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1821528
> Fixes: 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
> Cc: Florian Florensa 
> Signed-off-by: Stefano Garzarella 
> ---
>  block/rbd.c | 1 +
>  1 file changed, 1 insertion(+)

Thanks, applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] qemu-img: Support bitmap --merge into backing image

2020-09-15 Thread Max Reitz
On 14.09.20 21:10, Eric Blake wrote:
> If you have the chain 'base.qcow2 <- top.qcow2' and want to merge a
> bitmap from top into base, qemu-img was failing with:
> 
> qemu-img: Could not open 'top.qcow2': Could not open backing file: Failed to 
> get shared "write" lock
> Is another process using the image [base.qcow2]?
> 
> The easiest fix is to not open the entire backing chain of either
> image (source or destination); after all, the point of 'qemu-img
> bitmap' is solely to manipulate bitmaps directly within a single qcow2
> image, and this is made more precise if we don't pay attention to
> other images in the chain that may happen to have a bitmap by the same
> name.
> 
> However, note that during normal usage, it is a feature that qemu will
> allow a bitmap from a backing image to be exposed by an overlay BDS;
> doing so makes it easier to perform incremental backup, where we have:
> 
> Base <- Active <- temporrary

*temporary

(Also it’s a bit strange that “Base” and “Active” are capitalized, but
“temporary” isn’t)

>   \--block job ->/
> 
> with temporary being fed by a block-copy 'sync' job; when exposing

s/block-copy 'sync'/backup 'sync=none'/?

> temporary over NBD, referring to a bitmap that lives only in Active is
> less effort than having to copy a bitmap into temporary [1].  So the
> testsuite additions in this patch check both where bitmaps get
> allocated (the qemu-img info output), and, when NOT using 'qemu-img
> bitmap', that bitmaps are indeed visible through a backing chain.

Well.  It is useful over NBD but I would doubt that it isn’t useful in
general.  For example, the QMP commands that refer to bitmaps always do
so through a node-name + bitmap-name combination, and they require that
the given bitmap is exactly on the given node.

So I think this is a very much a case-by-case question.  (And in
practice, NBD seems to be the outlier, not qemu-img bitmap.)

> [1] Full disclosure: prior to the recent commit 374eedd1c4 and
> friends, we were NOT able to see bitmaps through filters, which meant
> that we actually did not have nice clean semantics for uniformly being
> able to pick up bitmaps from anywhere in the backing chain (seen as a
> change in behavior between qemu 4.1 and 4.2 at commit 00e30f05de, when
> block-copy swapped from a one-off to a filter).  Which means libvirt
> was already coded to copy bitmaps around for the sake of older qemu,
> even though modern qemu no longer needs it.  Oh well.
> 
> Fixes: http://bugzilla.redhat.com/1877209
> Reported-by: Eyal Shenitzky 
> Signed-off-by: Eric Blake 
> ---
> 
> In v2:
> - also use BDRV_O_NO_BACKING on source [Max]
> - improved commit message [Max]
> 
>  qemu-img.c | 11 ++--
>  tests/qemu-iotests/291 | 12 
>  tests/qemu-iotests/291.out | 56 ++
>  3 files changed, 76 insertions(+), 3 deletions(-)

The code looks good to me, but I wonder whether in the commit message it
should be noted that we don’t want to let bitmaps from deeper nodes
shine through by default everywhere, but just in specific cases where
that’s useful (i.e. only NBD so far AFAIA).

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/2] Make preallocate_co() resize the image to the correct size

2020-09-15 Thread Max Reitz
On 11.09.20 16:09, Alberto Garcia wrote:
> preallocate_co() does not resize the image correctly if the original
> size was not cluster-aligned.
> 
> This series should be applied on top of Max's block branch (I tested
> it with commit 8e66c829eda997dad661d49d73668b1fd3e6043d).
> 
>https://git.xanclic.moe/XanClic/qemu/commits/branch/block
> 
> Alberto Garcia (2):
>   qcow2: Make preallocate_co() resize the image to the correct size
>   qcow2: Convert qcow2_alloc_cluster_offset() into
> qcow2_alloc_host_offset()
> 
>  block/qcow2.h  |  6 +++---
>  block/qcow2-cluster.c  | 14 +
>  block/qcow2.c  | 35 +
>  tests/qemu-iotests/125 | 40 +-
>  tests/qemu-iotests/125.out | 28 --
>  5 files changed, 74 insertions(+), 49 deletions(-)

Thanks, applied to my block branch.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/2] qcow2: Convert qcow2_alloc_cluster_offset() into qcow2_alloc_host_offset()

2020-09-15 Thread Max Reitz
On 14.09.20 18:42, Alberto Garcia wrote:
> On Mon 14 Sep 2020 02:14:36 PM CEST, Max Reitz wrote:
> 
>> However, I wonder what you think about “cluster_offset” in
>> qcow2_alloc_host_offset.  It isn’t a cluster offset anymore.
>> Can/should we rename it?
> 
> That variable was not a cluster offset before this patch either (at
> least not during the first iteration of the loop).

Hm, yes.  Doesn’t make it less wrong, but you’re right, there is no
argument why this should be addressed in this patch (or series).

> The difference is that *host_offset is always the offset of the
> beginning of the requested region, and cluster_offset increases with
> every iteration of the loop. Maybe current_offset / current_host_offset?
> I don't know, but I'm fine with changing it if you have a good name.

current_host_offset sounds nice.  But let’s just leave it how it is for now.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] iotests: Work around failing readlink -f

2020-09-14 Thread Max Reitz
On 14.09.20 14:51, Peter Maydell wrote:
> On Mon, 14 Sep 2020 at 13:32, Max Reitz  wrote:
>>
>> On 14.09.20 14:31, Peter Maydell wrote:
>>> On Mon, 14 Sep 2020 at 12:39, Max Reitz  wrote:
>>>>
>>>> On macOS, (out of the box) readlink does not have -f.  If the recent
>>>> "readlink -f" call introduced by b1cbc33a397 fails, just fall back to
>>>> the old behavior (which means you can run the iotests only from the
>>>> build tree, but that worked fine for six years, so it should be fine
>>>> still).
>>>>
>>>> Keep any potential error message on stderr.  If users want to run the
>>>> iotests from outside the build tree, this may point them to what's wrong
>>>> (with their system).
>>>>
>>>> Fixes: b1cbc33a3971b6bb005d5ac3569feae35a71de0f
>>>>("iotests: Allow running from different directory")
>>>> Reported-by: Claudio Fontana 
>>>> Reported-by: Thomas Huth 
>>>> Signed-off-by: Max Reitz 
>>>> ---
>>>> Hi Thomas,
>>>>
>>>> I thought this would be quicker than writing a witty response on whether
>>>> you or me should write this patch. O:)
>>>> ---
>>>>  tests/qemu-iotests/check | 4 
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
>>>> index e14a1f354d..75675e1a18 100755
>>>> --- a/tests/qemu-iotests/check
>>>> +++ b/tests/qemu-iotests/check
>>>> @@ -45,6 +45,10 @@ then
>>>>  fi
>>>>  source_iotests=$(cd "$source_iotests"; pwd) || _init_error "failed to 
>>>> enter source tree"
>>>>  build_iotests=$(readlink -f $(dirname "$0"))
>>>> +if [ "$?" -ne 0 ]; then
>>>> +# Perhaps -f is unsupported, revert to pre-b1cbc33a397 behavior
>>>> +build_iotests=$PWD
>>>> +fi
>>>>  else
>>>
>>> This still prints
>>>   readlink: illegal option -- f
>>>   usage: readlink [-n] [file ...]
>>>
>>> (you can see it in the build log that Thomas links to).
>>>
>>>build_iotests=$(readlink -f $(dirname "$0") 2>/dev/null)
>>>
>>> should avoid that, I think.
>>
>> I mentioned in the commit message that I find this useful and desirable
>> behavior.  Something isn’t working that perhaps users are expecting to
>> work (because it will work on other systems), so I don’t think the error
>> message should be suppressed.
> 
> I disagree. Either iotests can handle readlink not having '-f',
> in which case it should not let readlink spew junk to the log,
> or it can't, in which case it should bail out.

I find this a bit complicated, because we can’t exactly handle readlink
without -f.  We can fall back to the old behavior on such systems, which
I think is good enough and I assume you think, too.

> If you want to tell the user something, you should catch the
> failure and print something yourself, because then you can do
> so with a message that will make sense to somebody trying to
> run the test suite and point them in the direction of what
> they can do to deal with the problem, eg something like
>  "readlink version doesn't support '-f'. Assuming that iotests
>   are being run from the build tree. If this isn't true then
>   we will fail later on: you can either run from the build tree,
>   or install a newer readlink."

Doesn’t sound bad, it isn’t “bail out”, though, so I don’t fully
understand how this relates to your paragraph above.  (Because it seems
like you suggest printing this unconditionally.  I think it would be
better to print it only if readlink -f failed, and then check finds $PWD
isn’t $build_tree/tests/qemu-iotests.  But...)

> Printing "readlink: illegal option" is just going to cause
> people to assume QEMU's scripts are broken and send us bug
> reports, so please don't do that.

(That’s absolutely true.)

...given everything, I think the best is then to indeed just suppress
readlink’s error message.  If readlink doesn’t work, and build_iotests
defaults to $PWD, and $PWD is not the build tree, then you’ll end up
with the six-year-old error message “(make sure the qemu-iotests are run
from tests/qemu-iotests in the build tree)”.  Because doing so is
probably easier anyway than trying to find a readlink that works.

Max



signature.asc
Description: OpenPGP digital signature


[PATCH v3] iotests: Drop readlink -f

2020-09-14 Thread Max Reitz
On macOS, (out of the box) readlink does not have -f.  We do not really
need readlink here, though, it was just a replacement for realpath
(which is not available on our BSD test systems), which we needed to
make the $(dirname) into an absolute path.

Instead of using either, just use "cd; pwd" like is done for
$source_iotests.

Fixes: b1cbc33a3971b6bb005d5ac3569feae35a71de0f
   ("iotests: Allow running from different directory")
Suggested-by: Peter Maydell 
Reported-by: Claudio Fontana 
Reported-by: Thomas Huth 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/check | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index e14a1f354d..678b6e4910 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -44,7 +44,7 @@ then
 _init_error "failed to obtain source tree name from check symlink"
 fi
 source_iotests=$(cd "$source_iotests"; pwd) || _init_error "failed to 
enter source tree"
-build_iotests=$(readlink -f $(dirname "$0"))
+build_iotests=$(cd "$(dirname "$0")"; pwd)
 else
 # called from the source tree
 source_iotests=$PWD
-- 
2.26.2




[PATCH v2] iotests: Work around failing readlink -f

2020-09-14 Thread Max Reitz
On macOS, (out of the box) readlink does not have -f.  If the recent
"readlink -f" call introduced by b1cbc33a397 fails, just fall back to
the old behavior (which means you can run the iotests only from the
build tree, but that worked fine for six years, so it should be fine
still).

Suppress all error messages, so in case using $PWD works out, we do not
cause the user to worry.  If it does not work, we will end up printing
the following error message anyway:

  check: failed to source common.env (make sure the qemu-iotests are run
  from tests/qemu-iotests in the build tree)

Following that hint (running check from $build_tree/tests/qemu-iotests)
will make it work, and is probably even easier than obtaining a readlink
that understands -f.

Fixes: b1cbc33a3971b6bb005d5ac3569feae35a71de0f
   ("iotests: Allow running from different directory")
Reported-by: Claudio Fontana 
Reported-by: Thomas Huth 
Signed-off-by: Max Reitz 
---
v2: Suppress stderr (as requested and suggested by Peter)
---
 tests/qemu-iotests/check | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index e14a1f354d..3c9ccc117b 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -44,7 +44,11 @@ then
 _init_error "failed to obtain source tree name from check symlink"
 fi
 source_iotests=$(cd "$source_iotests"; pwd) || _init_error "failed to 
enter source tree"
-build_iotests=$(readlink -f $(dirname "$0"))
+build_iotests=$(readlink -f $(dirname "$0") 2>/dev/null)
+if [ "$?" -ne 0 ]; then
+# Perhaps -f is unsupported, revert to pre-b1cbc33a397 behavior
+build_iotests=$PWD
+fi
 else
 # called from the source tree
 source_iotests=$PWD
-- 
2.26.2




Re: [PATCH v2] iotests: Work around failing readlink -f

2020-09-14 Thread Max Reitz
On 14.09.20 16:26, Peter Maydell wrote:
> On Mon, 14 Sep 2020 at 15:17, Max Reitz  wrote:
>>
>> On macOS, (out of the box) readlink does not have -f.  If the recent
>> "readlink -f" call introduced by b1cbc33a397 fails, just fall back to
>> the old behavior (which means you can run the iotests only from the
>> build tree, but that worked fine for six years, so it should be fine
>> still).
>>
>> Suppress all error messages, so in case using $PWD works out, we do not
>> cause the user to worry.  If it does not work, we will end up printing
>> the following error message anyway:
>>
>>   check: failed to source common.env (make sure the qemu-iotests are run
>>   from tests/qemu-iotests in the build tree)
>>
>> Following that hint (running check from $build_tree/tests/qemu-iotests)
>> will make it work, and is probably even easier than obtaining a readlink
>> that understands -f.
>>
>> Fixes: b1cbc33a3971b6bb005d5ac3569feae35a71de0f
>>    ("iotests: Allow running from different directory")
>> Reported-by: Claudio Fontana 
>> Reported-by: Thomas Huth 
>> Signed-off-by: Max Reitz 
>> ---
>> v2: Suppress stderr (as requested and suggested by Peter)
>> ---
>>  tests/qemu-iotests/check | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
>> index e14a1f354d..3c9ccc117b 100755
>> --- a/tests/qemu-iotests/check
>> +++ b/tests/qemu-iotests/check
>> @@ -44,7 +44,11 @@ then
>>  _init_error "failed to obtain source tree name from check symlink"
>>  fi
>>  source_iotests=$(cd "$source_iotests"; pwd) || _init_error "failed to 
>> enter source tree"
>> -build_iotests=$(readlink -f $(dirname "$0"))
>> +build_iotests=$(readlink -f $(dirname "$0") 2>/dev/null)
> 
> 
> Having woken up and actually looked at the context for what
> we're doing with readlink here, my usual rune for "give me
> the absolute path that this script is in" is
> 
> thisdir="$(cd "$(dirname "$0")"; pwd)"
> 
> which should be more portable than readlink. It doesn't
> give quite the same behaviour if it's run via a path which
> is a symlink to a directory, eg if bar/ is a symlink to
> foo/ and you run a script as bar/thescript then you'll get
> back /path/to/bar/ rather than /path/to/foo/, but do you
> really need the path with all the symlinks followed
> rather than just some valid absolute path to the build dir?

Interesting.  Sounds good.

The only reason we did use readlink here was because realpath wasn’t
available on the BSDs.  We don’t really need the symlink, we just need
the full absolute path to $(dirname $0).

Thanks!

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] iotests: Work around failing readlink -f

2020-09-14 Thread Max Reitz
On 14.09.20 14:31, Peter Maydell wrote:
> On Mon, 14 Sep 2020 at 12:39, Max Reitz  wrote:
>>
>> On macOS, (out of the box) readlink does not have -f.  If the recent
>> "readlink -f" call introduced by b1cbc33a397 fails, just fall back to
>> the old behavior (which means you can run the iotests only from the
>> build tree, but that worked fine for six years, so it should be fine
>> still).
>>
>> Keep any potential error message on stderr.  If users want to run the
>> iotests from outside the build tree, this may point them to what's wrong
>> (with their system).
>>
>> Fixes: b1cbc33a3971b6bb005d5ac3569feae35a71de0f
>>("iotests: Allow running from different directory")
>> Reported-by: Claudio Fontana 
>> Reported-by: Thomas Huth 
>> Signed-off-by: Max Reitz 
>> ---
>> Hi Thomas,
>>
>> I thought this would be quicker than writing a witty response on whether
>> you or me should write this patch. O:)
>> ---
>>  tests/qemu-iotests/check | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
>> index e14a1f354d..75675e1a18 100755
>> --- a/tests/qemu-iotests/check
>> +++ b/tests/qemu-iotests/check
>> @@ -45,6 +45,10 @@ then
>>  fi
>>  source_iotests=$(cd "$source_iotests"; pwd) || _init_error "failed to 
>> enter source tree"
>>  build_iotests=$(readlink -f $(dirname "$0"))
>> +if [ "$?" -ne 0 ]; then
>> +# Perhaps -f is unsupported, revert to pre-b1cbc33a397 behavior
>> +build_iotests=$PWD
>> +fi
>>  else
> 
> This still prints
>   readlink: illegal option -- f
>   usage: readlink [-n] [file ...]
> 
> (you can see it in the build log that Thomas links to).
> 
>build_iotests=$(readlink -f $(dirname "$0") 2>/dev/null)
> 
> should avoid that, I think.

I mentioned in the commit message that I find this useful and desirable
behavior.  Something isn’t working that perhaps users are expecting to
work (because it will work on other systems), so I don’t think the error
message should be suppressed.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] qcow2: Make preallocate_co() resize the image to the correct size

2020-09-14 Thread Max Reitz
On 11.09.20 16:09, Alberto Garcia wrote:
> This function preallocates metadata structures and then extends the
> image to its new size, but that new size calculation is wrong because
> it doesn't take into account that the host_offset variable is always
> cluster-aligned.
> 
> This problem can be reproduced with preallocation=metadata when the
> original size is not cluster-aligned but the new size is. In this case
> the final image size will be shorter than expected.
> 
>qemu-img create -f qcow2 img.qcow2 31k
>qemu-img resize --preallocation=metadata img.qcow2 128k
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2.c  |  1 +
>  tests/qemu-iotests/125 | 40 +-
>  tests/qemu-iotests/125.out | 28 --
>  3 files changed, 49 insertions(+), 20 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/2] qcow2: Convert qcow2_alloc_cluster_offset() into qcow2_alloc_host_offset()

2020-09-14 Thread Max Reitz
On 11.09.20 16:09, Alberto Garcia wrote:
> qcow2_alloc_cluster_offset() takes an (unaligned) guest offset and
> returns the (aligned) offset of the corresponding cluster in the qcow2
> image.
> 
> In practice none of the callers need to know where the cluster starts
> so this patch makes the function calculate and return the final host
> offset directly. The function is also renamed accordingly.
> 
> See 388e581615 for a similar change to qcow2_get_cluster_offset().
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2.h |  6 +++---
>  block/qcow2-cluster.c | 14 ++
>  block/qcow2.c | 36 +---
>  3 files changed, 26 insertions(+), 30 deletions(-)

First of all:

Reviewed-by: Max Reitz 

However, I wonder what you think about “cluster_offset” in
qcow2_alloc_host_offset.  It isn’t a cluster offset anymore.  Can/should
we rename it?

(Perhaps call the parameter host_offset_ptr, and then rename the local
cluster_offset to host_offset?)



signature.asc
Description: OpenPGP digital signature


[PATCH] iotests: Work around failing readlink -f

2020-09-14 Thread Max Reitz
On macOS, (out of the box) readlink does not have -f.  If the recent
"readlink -f" call introduced by b1cbc33a397 fails, just fall back to
the old behavior (which means you can run the iotests only from the
build tree, but that worked fine for six years, so it should be fine
still).

Keep any potential error message on stderr.  If users want to run the
iotests from outside the build tree, this may point them to what's wrong
(with their system).

Fixes: b1cbc33a3971b6bb005d5ac3569feae35a71de0f
   ("iotests: Allow running from different directory")
Reported-by: Claudio Fontana 
Reported-by: Thomas Huth 
Signed-off-by: Max Reitz 
---
Hi Thomas,

I thought this would be quicker than writing a witty response on whether
you or me should write this patch. O:)
---
 tests/qemu-iotests/check | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index e14a1f354d..75675e1a18 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -45,6 +45,10 @@ then
 fi
 source_iotests=$(cd "$source_iotests"; pwd) || _init_error "failed to 
enter source tree"
 build_iotests=$(readlink -f $(dirname "$0"))
+if [ "$?" -ne 0 ]; then
+# Perhaps -f is unsupported, revert to pre-b1cbc33a397 behavior
+build_iotests=$PWD
+fi
 else
 # called from the source tree
 source_iotests=$PWD
-- 
2.26.2




Re: [PATCH] tests/check-block: Do not run the iotests with old versions of bash

2020-09-14 Thread Max Reitz
On 14.09.20 12:50, Thomas Huth wrote:
> On 14/09/2020 11.19, Max Reitz wrote:
>> On 12.09.20 14:14, Thomas Huth wrote:
>>> macOS is shipped with a very old version of the bash (3.2), which
>>> is currently not suitable for running the iotests anymore. Add
>>> a check to skip the iotests in this case - if someone still wants
>>> to run the iotests on macOS, they can install a newer version from
>>> homebrew, for example.
>>>
>>> Signed-off-by: Thomas Huth 
>>> ---
>>>  tests/check-block.sh | 5 +
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/tests/check-block.sh b/tests/check-block.sh
>>> index 8e29c868e5..bfe1630c1e 100755
>>> --- a/tests/check-block.sh
>>> +++ b/tests/check-block.sh
>>> @@ -46,6 +46,11 @@ if ! command -v bash >/dev/null 2>&1 ; then
>>>  exit 0
>>>  fi
>>>  
>>> +if bash --version | grep 'GNU bash, version [123]' > /dev/null 2>&1 ; then
>>
>> grep -q instead of the redirections, perhaps?
>>
>> But more importantly, I think this needs a LANG=C prefix.  (If I expand
>> the rejected major versions to [12345], it doesn’t skip without a
>> prefix, because the string reads “GNU bash, Version 5...” here in
>> LANG=de_DE.UTF-8.)
> 
> Ouch, ok. But actually, I'm not quite sure anymore whether the patch is
> really required. I ran into the "readlink -f" problem and thought that
> it occurred due to the ancient version of bash on macOS, but as a I now
> know, readlink is a separate program and not a bash built-in, so it's a
> different issue... thus let's skip this patch here for now until we hit
> a real issue with bash again.

Yes, I had hoped this patch would fix that issue.  Or perhaps at least
hide it, because if you have a newer bash, chances are your readlink has
-f, too.

So should we just effectively revert b1cbc33a397 if readlink -f didn’t
work, i.e. check "$?" and on failure use $PWD as it was before b1cbc33a397?

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] qemu-img: Support bitmap --merge into backing image

2020-09-14 Thread Max Reitz
On 11.09.20 17:17, Eric Blake wrote:
> On 9/11/20 3:31 AM, Max Reitz wrote:
>> On 09.09.20 14:33, Eric Blake wrote:
>>> If you have the chain 'base.qcow2 <- top.qcow2' and want to merge a
>>> bitmap from top into base, qemu-img was failing with:
>>>
>>> qemu-img: Could not open 'top.qcow2': Could not open backing file:
>>> Failed to get shared "write" lock
>>> Is another process using the image [base.qcow2]?
>>>
>>> The easiest fix is to not open the entire backing chain of the source
>>> image, so that we aren't worrying about competing BDS visiting the
>>> backing image as both a read-only backing of the source and the
>>> writeable destination.
>>>
>>> Fixes: http://bugzilla.redhat.com/1877209
>>> Reported-by: Eyal Shenitzky 
>>> Signed-off-by: Eric Blake 
>>> ---
>>>   qemu-img.c |  3 +-
>>>   tests/qemu-iotests/291 | 12 
>>>   tests/qemu-iotests/291.out | 56 ++
>>>   3 files changed, 70 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qemu-img.c b/qemu-img.c
>>> index eb2fc1f86243..b15098a2f9b3 100644
>>> --- a/qemu-img.c
>>> +++ b/qemu-img.c
>>> @@ -4755,7 +4755,8 @@ static int img_bitmap(int argc, char **argv)
>>>   }
>>>   bs = blk_bs(blk);
>>>   if (src_filename) {
>>> -    src = img_open(false, src_filename, src_fmt, 0, false,
>>> false, false);
>>> +    src = img_open(false, src_filename, src_fmt, BDRV_O_NO_BACKING,
>>> +   false, false, false);
>>
>> Why not do the same for the destination BB?
> 
> Yeah, that should work, too.
> 
>>
>>>   if (!src) {
>>>   goto out;
>>>   }
>>> diff --git a/tests/qemu-iotests/291 b/tests/qemu-iotests/291
>>> index 1e0bb76959bb..4f837b205655 100755
>>> --- a/tests/qemu-iotests/291
>>> +++ b/tests/qemu-iotests/291
>>
>> [...]
>>
>>> @@ -107,6 +116,9 @@ $QEMU_IMG map --output=json --image-opts \
>>>   nbd_server_start_unix_socket -r -f qcow2 -B b2 "$TEST_IMG"
>>>   $QEMU_IMG map --output=json --image-opts \
>>>   "$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b2" | _filter_qemu_img_map
>>> +nbd_server_start_unix_socket -r -f qcow2 -B b3 "$TEST_IMG"
>>
>> Why not look into $TEST_IMG.base to see specifically whether the bitmap
>> is there?
> 
> We did just that, several lines earlier, with the qemu-img info
> --backing-chain.

OK, perfect.

>>
>> (I also am quite surprised that it’s even possible to export bitmaps
>> from backing nodes, but, well.)
> 
> I actually ought to call that out in the commit message.  It used to be
> that we were inconsistent on what we could see from the backing chain (a
> filter would make it so we can't), but as soon as your filter patches
> land, then we _do_ want to always be able to find a bitmap from the
> backing chain (incremental backup depends on that: we create an overlay
> disk to run the block-copy job as a filter, and _want_ to expose that
> overlay image with the bitmap it inherits from the original image).  So
> being able to export bitmaps from a backing node is normally a feature;
> and it is only in 'qemu-img bitmap' where we don't want accidental
> inheritance to get in the way from what we are actually merging.

Understood.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] tests/check-block: Do not run the iotests with old versions of bash

2020-09-14 Thread Max Reitz
On 12.09.20 14:14, Thomas Huth wrote:
> macOS is shipped with a very old version of the bash (3.2), which
> is currently not suitable for running the iotests anymore. Add
> a check to skip the iotests in this case - if someone still wants
> to run the iotests on macOS, they can install a newer version from
> homebrew, for example.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tests/check-block.sh | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tests/check-block.sh b/tests/check-block.sh
> index 8e29c868e5..bfe1630c1e 100755
> --- a/tests/check-block.sh
> +++ b/tests/check-block.sh
> @@ -46,6 +46,11 @@ if ! command -v bash >/dev/null 2>&1 ; then
>  exit 0
>  fi
>  
> +if bash --version | grep 'GNU bash, version [123]' > /dev/null 2>&1 ; then

grep -q instead of the redirections, perhaps?

But more importantly, I think this needs a LANG=C prefix.  (If I expand
the rejected major versions to [12345], it doesn’t skip without a
prefix, because the string reads “GNU bash, Version 5...” here in
LANG=de_DE.UTF-8.)

Max

> +echo "bash version too old ==> Not running the qemu-iotests."
> +exit 0
> +fi
> +
>  if ! (sed --version | grep 'GNU sed') > /dev/null 2>&1 ; then
>  if ! command -v gsed >/dev/null 2>&1; then
>  echo "GNU sed not available ==> Not running the qemu-iotests."
> 



signature.asc
Description: OpenPGP digital signature


Re: qemu-img create with preallocation=full and cluster-size=2M produces a bad file

2020-09-14 Thread Max Reitz
On 14.09.20 10:22, Max Reitz wrote:
> On 11.09.20 17:39, Alberto Garcia wrote:
>> On Fri 11 Sep 2020 04:55:08 PM CEST, Alberto Garcia wrote:
>>>> $ qemu-img check prealloc_4G_2M.qcow2
>>>> ERROR: coffset=0x180a0: copied flag must never be set for 
>>>> compressed clusters
>>
>> I see, this was fixed recently (QEMU 5.0.0 is still affected), here is
>> the patch:
>>
>> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=348fcc4f7ace1718006e646078d88c8cd8c1d97e
>>
>> I think this should also be set to qemu-stable, Kevin, Max?
> 
> Well, how do you set things to stable after the commit has been merged?
> 
> Judging from stable-process.rst, we’ll have to reply to the patch in
> question and CC qemu-stable.  So let’s do that.

On second thought, I see that the 5.0.1 stable freeze was last Thursday,
so it’s not going into 5.0.1 anyway.

Max



signature.asc
Description: OpenPGP digital signature


Re: qemu-img create with preallocation=full and cluster-size=2M produces a bad file

2020-09-14 Thread Max Reitz
On 11.09.20 17:39, Alberto Garcia wrote:
> On Fri 11 Sep 2020 04:55:08 PM CEST, Alberto Garcia wrote:
>>> $ qemu-img check prealloc_4G_2M.qcow2
>>> ERROR: coffset=0x180a0: copied flag must never be set for 
>>> compressed clusters
> 
> I see, this was fixed recently (QEMU 5.0.0 is still affected), here is
> the patch:
> 
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=348fcc4f7ace1718006e646078d88c8cd8c1d97e
> 
> I think this should also be set to qemu-stable, Kevin, Max?

Well, how do you set things to stable after the commit has been merged?

Judging from stable-process.rst, we’ll have to reply to the patch in
question and CC qemu-stable.  So let’s do that.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/2] block: remove stale runtime_opts

2020-09-11 Thread Max Reitz
On 06.08.20 23:13, John Snow wrote:
> 
> 
> John Snow (2):
>   block/rbd: remove runtime_opts
>   block/qcow: remove runtime opts
> 
>  block/qcow.c |  9 -
>  block/rbd.c  | 42 --
>  2 files changed, 51 deletions(-)

Thanks, applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] qcow2: Return the original error code in qcow2_co_pwrite_zeroes()

2020-09-11 Thread Max Reitz
On 09.09.20 14:37, Alberto Garcia wrote:
> This function checks the current status of a (sub)cluster in order to
> see if an unaligned 'write zeroes' request can be done efficiently by
> simply updating the L2 metadata and without having to write actual
> zeroes to disk.
> 
> If the situation does not allow using the fast path then the function
> returns -ENOTSUP and the caller falls back to writing zeroes.
> 
> If can happen however that the aforementioned check returns an actual
> error code so in this case we should pass it to the caller.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/2] Fix error handling in preallocate_co()

2020-09-11 Thread Max Reitz
On 08.09.20 16:08, Alberto Garcia wrote:
> This is a follow-up to "Fix removal of list members from
> BDRVQcow2State.cluster_allocs":
> 
>https://lists.gnu.org/archive/html/qemu-block/2020-09/msg00247.html
> 
> However the patches themselves are independent and can be applied
> separately.
> 
> Regards,
> 
> Berto
> 
> Alberto Garcia (2):
>   qcow2: Handle QCowL2Meta on error in preallocate_co()
>   qcow2: Make qcow2_free_any_clusters() free only one cluster
> 
>  block/qcow2.h  |  4 ++--
>  block/qcow2-cluster.c  |  6 +++---
>  block/qcow2-refcount.c |  8 
>  block/qcow2.c  | 40 +---
>  4 files changed, 26 insertions(+), 32 deletions(-)

Thanks, applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] qemu-img: Support bitmap --merge into backing image

2020-09-11 Thread Max Reitz
On 09.09.20 14:33, Eric Blake wrote:
> If you have the chain 'base.qcow2 <- top.qcow2' and want to merge a
> bitmap from top into base, qemu-img was failing with:
> 
> qemu-img: Could not open 'top.qcow2': Could not open backing file: Failed to 
> get shared "write" lock
> Is another process using the image [base.qcow2]?
> 
> The easiest fix is to not open the entire backing chain of the source
> image, so that we aren't worrying about competing BDS visiting the
> backing image as both a read-only backing of the source and the
> writeable destination.
> 
> Fixes: http://bugzilla.redhat.com/1877209
> Reported-by: Eyal Shenitzky 
> Signed-off-by: Eric Blake 
> ---
>  qemu-img.c |  3 +-
>  tests/qemu-iotests/291 | 12 
>  tests/qemu-iotests/291.out | 56 ++
>  3 files changed, 70 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index eb2fc1f86243..b15098a2f9b3 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -4755,7 +4755,8 @@ static int img_bitmap(int argc, char **argv)
>  }
>  bs = blk_bs(blk);
>  if (src_filename) {
> -src = img_open(false, src_filename, src_fmt, 0, false, false, false);
> +src = img_open(false, src_filename, src_fmt, BDRV_O_NO_BACKING,
> +   false, false, false);

Why not do the same for the destination BB?

>  if (!src) {
>  goto out;
>  }
> diff --git a/tests/qemu-iotests/291 b/tests/qemu-iotests/291
> index 1e0bb76959bb..4f837b205655 100755
> --- a/tests/qemu-iotests/291
> +++ b/tests/qemu-iotests/291

[...]

> @@ -107,6 +116,9 @@ $QEMU_IMG map --output=json --image-opts \
>  nbd_server_start_unix_socket -r -f qcow2 -B b2 "$TEST_IMG"
>  $QEMU_IMG map --output=json --image-opts \
>  "$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b2" | _filter_qemu_img_map
> +nbd_server_start_unix_socket -r -f qcow2 -B b3 "$TEST_IMG"

Why not look into $TEST_IMG.base to see specifically whether the bitmap
is there?

(I also am quite surprised that it’s even possible to export bitmaps
from backing nodes, but, well.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] qemu-img: avoid unaligned read requests during convert

2020-09-10 Thread Max Reitz
On 01.09.20 14:51, Peter Lieven wrote:
> in case of large continous areas that share the same allocation status
> it happens that the value of s->sector_next_status is unaligned to the
> cluster size or even request alignment of the source. Avoid this by
> stripping down the s->sector_next_status position to cluster boundaries.
> 
> Signed-off-by: Peter Lieven 
> ---
>  qemu-img.c | 22 ++
>  1 file changed, 22 insertions(+)

I've just noticed that with this patch, the iotest 251 fails for vhdx.
Would you be OK with squashing this in?

Max
diff --git a/tests/qemu-iotests/251 b/tests/qemu-iotests/251
index 7918ba3559..294773bdc1 100755
--- a/tests/qemu-iotests/251
+++ b/tests/qemu-iotests/251
@@ -46,8 +46,11 @@ if [ "$IMGOPTSSYNTAX" = "true" ]; then
 # We use json:{} filenames here, so we cannot work with additional options.
 _unsupported_fmt $IMGFMT
 else
-# With VDI, the output is ordered differently.  Just disable it.
-_unsupported_fmt vdi
+# - With VDI, the output is ordered differently.  Just disable it.
+# - VHDX has large clusters; because qemu-img convert tries to
+#   align the requests to the cluster size, the output is ordered
+#   differently, so disable it, too.
+_unsupported_fmt vdi vhdx
 fi
 
 


signature.asc
Description: OpenPGP digital signature


  1   2   3   4   5   6   7   8   9   10   >