Re: [PATCH 3/3] nbd: Add 'qemu-nbd -A' to expose allocation depth

2020-09-26 Thread Vladimir Sementsov-Ogievskiy

25.09.2020 23:32, Eric Blake wrote:

Allow the server to expose an additional metacontext to be requested
by savvy clients.  qemu-nbd adds a new option -A to expose the
qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this
can also be set via QMP when using nbd-server-add.

qemu as client can be hacked into viewing this new context by using
the now-misnamed x-dirty-bitmap option when creating an NBD blockdev;


may be rename it to x-block-status ?


although it is worth noting the decoding of how such context
information will appear in 'qemu-img map --output=json':

NBD_STATE_DEPTH_UNALLOC => "zero":false, "data":true
NBD_STATE_DEPTH_LOCAL => "zero":false, "data":false
NBD_STATE_DEPTH_BACKING => "zero":true, "data":true


It wouldn't be so simple if we decide to export exact depth number..


--
Best regards,
Vladimir



Re: [PATCH 2/3] nbd: Add new qemu:allocation-depth metacontext

2020-09-26 Thread Vladimir Sementsov-Ogievskiy

26.09.2020 10:33, Richard W.M. Jones wrote:

On Fri, Sep 25, 2020 at 03:32:48PM -0500, Eric Blake wrote:

+The second is related to exposing the source of various extents within
+the image, with a single context named:
+
+qemu:allocation-depth
+
+In the allocation depth context, bits 0 and 1 form a tri-state value:
+
+bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated
+bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image
+bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a
+   backing layer


 From the cover description I imagined it would show the actual depth, ie:

  top -> backing -> backing -> backing
  depth:   12 3     (0 = unallocated)

I wonder if that is possible?  (Perhaps there's something I don't
understand here.)


That's possible if we want it. Probably the best way is to add *depth parameter to 
bdrv_is_allocated_above (and better on top of my "[PATCH v7 0/5] fix & merge 
block_status_above and is_allocated_above")


--
Best regards,
Vladimir



Re: [PATCH 2/3] nbd: Add new qemu:allocation-depth metacontext

2020-09-26 Thread Vladimir Sementsov-Ogievskiy

25.09.2020 23:32, Eric Blake wrote:

'qemu-img map' provides a way to determine which extents of an image
come from the top layer vs. inherited from a backing chain.  This is
useful information worth exposing over NBD.  There is a proposal to
add a QMP command block-dirty-bitmap-populate which can create a dirty
bitmap that reflects allocation information, at which point
qemu:dirty-bitmap:NAME can expose that information via the creation of
a temporary bitmap, but we can shorten the effort by adding a new
qemu:allocation-depth context that does the same thing without an
intermediate bitmap (this patch does not eliminate the need for that
proposal, as it will have other uses as well).

For this patch, I just encoded a tri-state value (unallocated, from
this layer, from any of the backing layers); we could instead or in
addition report an actual depth count per extent, if that proves more
useful.

Note that this patch does not actually enable any way to request a
server to enable this context; that will come in the next patch.

Signed-off-by: Eric Blake 


Looks good to me overall, need to rebase if patch 01 changed (as I propose or 
in some better way).

--
Best regards,
Vladimir



Re: [PATCH 1/3] nbd: Simplify meta-context parsing

2020-09-26 Thread Vladimir Sementsov-Ogievskiy

25.09.2020 23:32, Eric Blake wrote:

We had a premature optimization of trying to read as little from the
wire as possible while handling NBD_OPT_SET_META_CONTEXT in phases.
But in reality, we HAVE to read the entire string from the client
before we can get to the next command, and it is easier to just read
it all at once than it is to read it in pieces.  And once we do that,
several functions end up no longer performing I/O, and no longer need
to return a value.

While simplifying this, take advantage of g_str_has_prefix for less
repetition of boilerplate string length computation.

Our iotests still pass; I also checked that libnbd's testsuite (which
covers more corner cases of odd meta context requests) still passes.

Signed-off-by: Eric Blake 
---
  nbd/server.c | 172 ++-
  1 file changed, 47 insertions(+), 125 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 982de67816a7..0d2d7e52058f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,5 +1,5 @@
  /*
- *  Copyright (C) 2016-2018 Red Hat, Inc.
+ *  Copyright (C) 2016-2020 Red Hat, Inc.
   *  Copyright (C) 2005  Anthony Liguori 
   *
   *  Network Block Device Server Side
@@ -792,135 +792,64 @@ static int nbd_negotiate_send_meta_context(NBDClient 
*client,
  return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? -EIO : 0;
  }

-/* Read strlen(@pattern) bytes, and set @match to true if they match @pattern.
- * @match is never set to false.
- *
- * Return -errno on I/O error, 0 if option was completely handled by
- * sending a reply about inconsistent lengths, or 1 on success.
- *
- * Note: return code = 1 doesn't mean that we've read exactly @pattern.
- * It only means that there are no errors.
+
+/*
+ * Check @ns with @len bytes, and set @match to true if it matches @pattern,
+ * or if @len is 0 and the client is performing _LIST_. @match is never set
+ * to false.
   */
-static int nbd_meta_pattern(NBDClient *client, const char *pattern, bool 
*match,
-Error **errp)
+static void nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern,
+  const char *ns, uint32_t len,


ns changed its meaning, it's not just a namespace, but the whole query. I 
think, better to rename it.

Also, it's unusual to pass length together with nul-terminated string, it seems 
redundant.
And, it's used only to compare with zero, strlen(ns) == 0 or ns[0] == 0 is not 
slower.

Also, errp is unused argument. And it violate Error API recommendation to not 
create void functions with errp.

Also we can use bool return instead of return through pointer.


+  bool *match, Error **errp)
  {
-int ret;
-char *query;
-size_t len = strlen(pattern);
-
-assert(len);
-
-query = g_malloc(len);
-ret = nbd_opt_read(client, query, len, errp);
-if (ret <= 0) {
-g_free(query);
-return ret;
-}
-
-if (strncmp(query, pattern, len) == 0) {
+if (len == 0) {
+if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
+*match = true;
+}
+trace_nbd_negotiate_meta_query_parse("empty");
+} else if (strcmp(pattern, ns) == 0) {
  trace_nbd_negotiate_meta_query_parse(pattern);
  *match = true;
  } else {
  trace_nbd_negotiate_meta_query_skip("pattern not matched");
  }
-g_free(query);
-
-return 1;
-}
-
-/*
- * Read @len bytes, and set @match to true if they match @pattern, or if @len
- * is 0 and the client is performing _LIST_. @match is never set to false.
- *
- * Return -errno on I/O error, 0 if option was completely handled by
- * sending a reply about inconsistent lengths, or 1 on success.
- *
- * Note: return code = 1 doesn't mean that we've read exactly @pattern.
- * It only means that there are no errors.
- */
-static int nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern,
- uint32_t len, bool *match, Error **errp)
-{
-if (len == 0) {
-if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-*match = true;
-}
-trace_nbd_negotiate_meta_query_parse("empty");
-return 1;
-}
-
-if (len != strlen(pattern)) {
-trace_nbd_negotiate_meta_query_skip("different lengths");
-return nbd_opt_skip(client, len, errp);
-}
-
-return nbd_meta_pattern(client, pattern, match, errp);
  }

  /* nbd_meta_base_query
   *
   * Handle queries to 'base' namespace. For now, only the base:allocation
- * context is available.  'len' is the amount of text remaining to be read from
- * the current name, after the 'base:' portion has been stripped.
- *
- * Return -errno on I/O error, 0 if option was completely handled by
- * sending a reply about inconsistent lengths, or 1 on success.
+ * context is available.  @len is the length of @ns, including the 'base:'
+ * prefix.
   */
-static int nbd_meta_base_query(NBDClient 

Re: [PATCH v2 07/20] block/block-copy: add ratelimit to block-copy

2020-09-25 Thread Vladimir Sementsov-Ogievskiy

22.07.2020 14:05, Max Reitz wrote:

On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:

We are going to directly use one async block-copy operation for backup
job, so we need rate limitator.


%s/limitator/limiter/g, I think.


We want to maintain current backup behavior: only background copying is
limited and copy-before-write operations only participate in limit
calculation. Therefore we need one rate limitator for block-copy state
and boolean flag for block-copy call state for actual limitation.

Note, that we can't just calculate each chunk in limitator after
successful copying: it will not save us from starting a lot of async
sub-requests which will exceed limit too much. Instead let's use the
following scheme on sub-request creation:
1. If at the moment limit is not exceeded, create the request and
account it immediately.
2. If at the moment limit is already exceeded, drop create sub-request
and handle limit instead (by sleep).
With this approach we'll never exceed the limit more than by one
sub-request (which pretty much matches current backup behavior).


Sounds reasonable.


Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block-copy.h |  8 +++
  block/block-copy.c | 44 ++
  2 files changed, 52 insertions(+)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 600984c733..d40e691123 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -59,6 +59,14 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s,
   int64_t max_chunk,
   BlockCopyAsyncCallbackFunc cb);
  
+/*

+ * Set speed limit for block-copy instance. All block-copy operations related 
to
+ * this BlockCopyState will participate in speed calculation, but only
+ * block_copy_async calls with @ratelimit=true will be actually limited.
+ */
+void block_copy_set_speed(BlockCopyState *s, BlockCopyCallState *call_state,
+  uint64_t speed);
+
  BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s);
  void block_copy_set_skip_unallocated(BlockCopyState *s, bool skip);
  
diff --git a/block/block-copy.c b/block/block-copy.c

index 4114d1fd25..851d9c8aaf 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -26,6 +26,7 @@
  #define BLOCK_COPY_MAX_BUFFER (1 * MiB)
  #define BLOCK_COPY_MAX_MEM (128 * MiB)
  #define BLOCK_COPY_MAX_WORKERS 64
+#define BLOCK_COPY_SLICE_TIME 1ULL /* ns */
  
  static coroutine_fn int block_copy_task_entry(AioTask *task);
  
@@ -36,11 +37,13 @@ typedef struct BlockCopyCallState {

  int64_t bytes;
  int max_workers;
  int64_t max_chunk;
+bool ratelimit;
  BlockCopyAsyncCallbackFunc cb;
  
  /* State */

  bool failed;
  bool finished;
+QemuCoSleepState *sleep_state;
  
  /* OUT parameters */

  bool error_is_read;
@@ -103,6 +106,9 @@ typedef struct BlockCopyState {
  void *progress_opaque;
  
  SharedResource *mem;

+
+uint64_t speed;
+RateLimit rate_limit;
  } BlockCopyState;
  
  static BlockCopyTask *find_conflicting_task(BlockCopyState *s,

@@ -611,6 +617,21 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
  }
  task->zeroes = ret & BDRV_BLOCK_ZERO;
  
+if (s->speed) {

+if (call_state->ratelimit) {
+uint64_t ns = ratelimit_calculate_delay(>rate_limit, 0);
+if (ns > 0) {
+block_copy_task_end(task, -EAGAIN);
+g_free(task);
+qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, ns,
+  _state->sleep_state);
+continue;
+}
+}
+
+ratelimit_calculate_delay(>rate_limit, task->bytes);
+}
+


Looks good.


  trace_block_copy_process(s, task->offset);
  
  co_get_from_shres(s->mem, task->bytes);

@@ -649,6 +670,13 @@ out:
  return ret < 0 ? ret : found_dirty;
  }
  
+static void block_copy_kick(BlockCopyCallState *call_state)

+{
+if (call_state->sleep_state) {
+qemu_co_sleep_wake(call_state->sleep_state);
+}
+}
+
  /*
   * block_copy_common
   *
@@ -729,6 +757,7 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s,
  .s = s,
  .offset = offset,
  .bytes = bytes,
+.ratelimit = ratelimit,


Hm, same problem/question as in patch 6: Should the @ratelimit parameter
really be added in patch 5 if it’s used only now?


  .cb = cb,
  .max_workers = max_workers ?: BLOCK_COPY_MAX_WORKERS,
  .max_chunk = max_chunk,
@@ -752,3 +781,18 @@ void block_copy_set_skip_unallocated(BlockCopyState *s, 
bool skip)
  {
  s->skip_unallocated = skip;
  }
+
+void block_copy_set_speed(BlockCopyState *s, BlockCopyCallState *call_state,
+  uint64_t speed

Re: [PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line

2020-09-25 Thread Vladimir Sementsov-Ogievskiy

25.09.2020 13:24, Max Reitz wrote:

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:

Performance improvements / degradations are usually discussed in
percentage. Let's make the script calculate it for us.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  scripts/simplebench/simplebench.py | 46 +++---
  1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/scripts/simplebench/simplebench.py 
b/scripts/simplebench/simplebench.py
index 56d3a91ea2..0ff05a38b8 100644
--- a/scripts/simplebench/simplebench.py
+++ b/scripts/simplebench/simplebench.py


[...]


+for j in range(0, i):
+env_j = results['envs'][j]
+res_j = case_results[env_j['id']]
+
+if 'average' not in res_j:
+# Failed result
+cell += ' --'
+continue
+
+col_j = chr(ord('A') + j)
+avg_j = res_j['average']
+delta = (res['average'] - avg_j) / avg_j * 100


I was wondering why you’d subtract, when percentage differences usually
mean a quotient.  Then I realized that this would usually be written as:

(res['average'] / avg_j - 1) * 100


+delta_delta = (res['delta'] + res_j['delta']) / avg_j * 100


Why not use the new format_percent for both cases?


because I want less precision here




+cell += f' {col_j}{round(delta):+}±{round(delta_delta)}%'


I don’t know what I should think about ±delta_delta.  If I saw “Compared
to run A, this is +42.1%±2.0%”, I would think that you calculated the
difference between each run result, and then based on that array
calculated average and standard deviation.

Furthermore, I don’t even know what the delta_delta is supposed to tell
you.  It isn’t even a delta_delta, it’s an average_delta.


not avarage, but sum of errors. And it shows the error for the delta



The delta_delta would be (res['delta'] / res_j['delta'] - 1) * 100.0.


and this shows nothing.

Assume we have = A = 10+-2 and B = 15+-2

The difference is (15-10)+-(2+2) = 5+-4.
And your formula will give (2/2 - 1) *100 = 0, which is wrong.

Anyway, my code is mess)


And that might be presented perhaps like “+42.1% Δ± +2.0%” (if delta
were the SD, “Δx̅=+42.1% Δσ=+2.0%” would also work; although, again, I do
interpret ± as the SD anyway).



I feel that I'm bad in statistics :( I'll learn a little and make a new version.

--
Best regards,
Vladimir



Re: [PATCH v6 13/15] scripts/simplebench: improve view of ascii table

2020-09-25 Thread Vladimir Sementsov-Ogievskiy

25.09.2020 12:31, Max Reitz wrote:

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:

Introduce dynamic float precision and use percentage to show delta.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  scripts/simplebench/simplebench.py | 26 +-
  1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/scripts/simplebench/simplebench.py 
b/scripts/simplebench/simplebench.py
index 716d7fe9b2..56d3a91ea2 100644
--- a/scripts/simplebench/simplebench.py
+++ b/scripts/simplebench/simplebench.py
@@ -79,10 +79,34 @@ def bench_one(test_func, test_env, test_case, count=5, 
initial_run=True):
  return result
  
  
+def format_float(x):

+res = round(x)
+if res >= 100:
+return str(res)
+
+res = f'{x:.1f}'
+if len(res) >= 4:
+return res
+
+return f'{x:.2f}'


This itches me to ask for some log() calculation.

Like:

%.*f' % (math.ceil(math.log10(99.95 / x)), x)



Oh yes, that's cool.




+def format_percent(x):
+x *= 100
+
+res = round(x)
+if res >= 10:
+return str(res)
+
+return f'{x:.1f}' if res >= 1 else f'{x:.2f}'


Same here.  (Also, why not append a % sign in this function?)


OK




  def ascii_one(result):
  """Return ASCII representation of bench_one() returned dict."""
  if 'average' in result:
-s = '{:.2f} +- {:.2f}'.format(result['average'], result['delta'])
+avg = result['average']
+delta_pr = result['delta'] / avg
+s = f'{format_float(avg)}±{format_percent(delta_pr)}%'


Pre-existing, but isn’t the ± range generally assumed to be the standard
deviation?



Hmm. Actually, why not, let's just use standard deviation. I wanted to show 
maximum deviation, not mean, to not miss some bugs in experiment (big deviation 
of one test run). Still, seems standard deviation is good enough in it.




  if 'n-failed' in result:
  s += '\n({} failed)'.format(result['n-failed'])
  return s







--
Best regards,
Vladimir



Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver

2020-09-25 Thread Vladimir Sementsov-Ogievskiy

25.09.2020 18:11, Vladimir Sementsov-Ogievskiy wrote:

25.09.2020 12:11, Max Reitz wrote:

On 25.09.20 10:49, Vladimir Sementsov-Ogievskiy wrote:

25.09.2020 11:26, Max Reitz wrote:

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
   tests/qemu-iotests/298 | 186 +
   tests/qemu-iotests/298.out |   5 +
   tests/qemu-iotests/group   |   1 +
   3 files changed, 192 insertions(+)
   create mode 100644 tests/qemu-iotests/298
   create mode 100644 tests/qemu-iotests/298.out


[...]


+class TestTruncate(iotests.QMPTestCase):


The same decorator could be placed here, although this class doesn’t
start a VM, and so is unaffected by the allowlist.  Still may be
relevant in case of block modules, I don’t know.


Or just global test skip at file top


Hm.  Like verify_quorum()?  Is there a generic function for that already?

[...]


+    # Probably we'll want preallocate filter to keep align to
cluster when
+    # shrink preallocation, so, ignore small differece
+    self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
+
+    # Preallocate filter may leak some internal clusters (for
example, if
+    # guest write far over EOF, skipping some clusters - they
will remain
+    # fallocated, preallocate filter don't care about such
leaks, it drops
+    # only trailing preallocation.


True, but that isn’t what’s happening here.  (We only write 10M at 0, so
there are no gaps.)  Why do we need this 1M margin?


We write 10M, but qcow2 also writes metadata as it wants


Ah, yes, sure.  Shouldn’t result in 1M, but why not.


+    self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
+    1024 * 1024)


[...]


diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index ff59cfd2d4..15d5f9619b 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -307,6 +307,7 @@
   295 rw
   296 rw
   297 meta
+298 auto quick


I wouldn’t mark it as quick, there is at least one preallocate=full of
140M, and one of 40M, plus multiple 10M data writes and falloc
preallocations.

Also, since you mark it as “auto”, have you run this test on all
CI-relevant hosts?  (Among other things I can’t predict) I wonder how
preallocation behaves on macOS.  Just because that one was always a bit
weird about not-really-data areas.



Ofcourse, I didn't run on all hosts. I'm a bit out of sync about this..


Sorry, I see now that it sounds rude.



Well, someone has to do it.  The background story is that tests are
added to auto all the time (because “why not”), and then they fail on
BSD or macOS.  We have BSD docker test build targets at least, so they
can be easily tested.  (Well, it takes like half an hour, but you know.)

(We don’t have macOS builds, as far as I can tell, but I personally
don’t even know why we run the iotests on macOS at all.  (Well, I also
wonder about the BSDs, but given the test build targets, I shouldn’t
complain, I suppose.))

(Though macOS isn’t part of the gating CI, is it?  I seem to remember
macOS errors are generally only reported to me half a week after the
pull request is merged, which is even worse.)

Anyway.  I just ran the test for OpenBSD
(EXTRA_CONFIGURE_OPTS='--target-list=x86_64-softmmu' \
    make vm-build-openbsd)


Oh, I didn't know that it's so simple. What another things you are running 
before sending a PULL?


and got some failures:

--- /home/qemu/qemu-test.PGo2ls/src/tests/qemu-iotests/298.out  Fri Sep
25 07:10:31 2020
+++ /home/qemu/qemu-test.PGo2ls/build/tests/qemu-iotests/298.out.bad
Fri Sep 25 08:57:56 2020
@@ -1,5 +1,67 @@
-.
+qemu-io: Failed to resize underlying file: Unsupported preallocation
mode: falloc


[..]


-OK
+FAILED (failures=6)


If I don't put new test in "auto", is there any chance that test would
be automatically run somewhere?


I run all tests before pull requests at least.



OK, so it doesn't work on BSD at all. I don't really want to investigate, but seems it's 
because of absence of fallocate. So let's drop "auto" group.

Another thing: maybe, add auto-linux test group for running only in 
linux-build? So, new tests will be added to it because why not, and we will not 
bother with BSD and MacOS?

--
Best regards,
Vladimir



Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver

2020-09-25 Thread Vladimir Sementsov-Ogievskiy

25.09.2020 12:11, Max Reitz wrote:

On 25.09.20 10:49, Vladimir Sementsov-Ogievskiy wrote:

25.09.2020 11:26, Max Reitz wrote:

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
   tests/qemu-iotests/298 | 186 +
   tests/qemu-iotests/298.out |   5 +
   tests/qemu-iotests/group   |   1 +
   3 files changed, 192 insertions(+)
   create mode 100644 tests/qemu-iotests/298
   create mode 100644 tests/qemu-iotests/298.out


[...]


+class TestTruncate(iotests.QMPTestCase):


The same decorator could be placed here, although this class doesn’t
start a VM, and so is unaffected by the allowlist.  Still may be
relevant in case of block modules, I don’t know.


Or just global test skip at file top


Hm.  Like verify_quorum()?  Is there a generic function for that already?

[...]


+    # Probably we'll want preallocate filter to keep align to
cluster when
+    # shrink preallocation, so, ignore small differece
+    self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
+
+    # Preallocate filter may leak some internal clusters (for
example, if
+    # guest write far over EOF, skipping some clusters - they
will remain
+    # fallocated, preallocate filter don't care about such
leaks, it drops
+    # only trailing preallocation.


True, but that isn’t what’s happening here.  (We only write 10M at 0, so
there are no gaps.)  Why do we need this 1M margin?


We write 10M, but qcow2 also writes metadata as it wants


Ah, yes, sure.  Shouldn’t result in 1M, but why not.


+    self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
+    1024 * 1024)


[...]


diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index ff59cfd2d4..15d5f9619b 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -307,6 +307,7 @@
   295 rw
   296 rw
   297 meta
+298 auto quick


I wouldn’t mark it as quick, there is at least one preallocate=full of
140M, and one of 40M, plus multiple 10M data writes and falloc
preallocations.

Also, since you mark it as “auto”, have you run this test on all
CI-relevant hosts?  (Among other things I can’t predict) I wonder how
preallocation behaves on macOS.  Just because that one was always a bit
weird about not-really-data areas.



Ofcourse, I didn't run on all hosts. I'm a bit out of sync about this..


Well, someone has to do it.  The background story is that tests are
added to auto all the time (because “why not”), and then they fail on
BSD or macOS.  We have BSD docker test build targets at least, so they
can be easily tested.  (Well, it takes like half an hour, but you know.)

(We don’t have macOS builds, as far as I can tell, but I personally
don’t even know why we run the iotests on macOS at all.  (Well, I also
wonder about the BSDs, but given the test build targets, I shouldn’t
complain, I suppose.))

(Though macOS isn’t part of the gating CI, is it?  I seem to remember
macOS errors are generally only reported to me half a week after the
pull request is merged, which is even worse.)

Anyway.  I just ran the test for OpenBSD
(EXTRA_CONFIGURE_OPTS='--target-list=x86_64-softmmu' \
make vm-build-openbsd)


Oh, I didn't know that it's so simple. What another things you are running 
before sending a PULL?


and got some failures:

--- /home/qemu/qemu-test.PGo2ls/src/tests/qemu-iotests/298.out  Fri Sep
25 07:10:31 2020
+++ /home/qemu/qemu-test.PGo2ls/build/tests/qemu-iotests/298.out.bad
Fri Sep 25 08:57:56 2020
@@ -1,5 +1,67 @@
-.
+qemu-io: Failed to resize underlying file: Unsupported preallocation
mode: falloc
+qemu-io: Failed to resize underlying file: Unsupported preallocation
mode: falloc
+.F...F...
+==
+FAIL: test_external_snapshot (__main__.TestPreallocateFilter)
  --
+Traceback (most recent call last):
+  File "298", line 81, in test_external_snapshot
+self.test_prealloc()
+  File "298", line 78, in test_prealloc
+self.check_big()
+  File "298", line 48, in check_big
+self.assertTrue(os.path.getsize(disk) > 100 * MiB)
+AssertionError: False is not true
+
+==
+FAIL: test_prealloc (__main__.TestPreallocateFilter)
+--
+Traceback (most recent call last):
+  File "298", line 78, in test_prealloc
+self.check_big()
+  File "298", line 48, in check_big
+self.assertTrue(os.path.getsize(disk) > 100 * MiB)
+AssertionError: False is not true
+
+==
+FAIL: test_reopen_opts (__main__.TestPreallocateFilter)
+--
+Tr

Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver

2020-09-25 Thread Vladimir Sementsov-Ogievskiy

25.09.2020 11:26, Max Reitz wrote:

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/298 | 186 +
  tests/qemu-iotests/298.out |   5 +
  tests/qemu-iotests/group   |   1 +
  3 files changed, 192 insertions(+)
  create mode 100644 tests/qemu-iotests/298
  create mode 100644 tests/qemu-iotests/298.out

diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298
new file mode 100644
index 00..fef10f6a7a
--- /dev/null
+++ b/tests/qemu-iotests/298


[...]


+class TestPreallocateBase(iotests.QMPTestCase):


Perhaps a

@iotests.skip_if_unsupported(['preallocate'])

here?


+def setUp(self):
+iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))


[...]


+class TestTruncate(iotests.QMPTestCase):


The same decorator could be placed here, although this class doesn’t
start a VM, and so is unaffected by the allowlist.  Still may be
relevant in case of block modules, I don’t know.


Or just global test skip at file top




+def setUp(self):
+iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))
+iotests.qemu_img_create('-f', iotests.imgfmt, refdisk, str(10 * MiB))
+
+def tearDown(self):
+os.remove(disk)
+os.remove(refdisk)
+
+def do_test(self, prealloc_mode, new_size):
+ret = iotests.qemu_io_silent('--image-opts', '-c', 'write 0 10M', '-c',
+ f'truncate -m {prealloc_mode} {new_size}',
+ drive_opts)
+self.assertEqual(ret, 0)
+
+ret = iotests.qemu_io_silent('-f', iotests.imgfmt, '-c', 'write 0 10M',
+ '-c',
+ f'truncate -m {prealloc_mode} {new_size}',
+ refdisk)
+self.assertEqual(ret, 0)
+
+stat = os.stat(disk)
+refstat = os.stat(refdisk)
+
+# Probably we'll want preallocate filter to keep align to cluster when
+# shrink preallocation, so, ignore small differece
+self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
+
+# Preallocate filter may leak some internal clusters (for example, if
+# guest write far over EOF, skipping some clusters - they will remain
+# fallocated, preallocate filter don't care about such leaks, it drops
+# only trailing preallocation.


True, but that isn’t what’s happening here.  (We only write 10M at 0, so
there are no gaps.)  Why do we need this 1M margin?


We write 10M, but qcow2 also writes metadata as it wants




+self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
+1024 * 1024)


[...]


diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index ff59cfd2d4..15d5f9619b 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -307,6 +307,7 @@
  295 rw
  296 rw
  297 meta
+298 auto quick


I wouldn’t mark it as quick, there is at least one preallocate=full of
140M, and one of 40M, plus multiple 10M data writes and falloc
preallocations.

Also, since you mark it as “auto”, have you run this test on all
CI-relevant hosts?  (Among other things I can’t predict) I wonder how
preallocation behaves on macOS.  Just because that one was always a bit
weird about not-really-data areas.



Ofcourse, I didn't run on all hosts. I'm a bit out of sync about this.. If I don't put 
new test in "auto", is there any chance that test would be automatically run 
somewhere?


--
Best regards,
Vladimir



Re: [PATCH v7 0/5] fix & merge block_status_above and is_allocated_above

2020-09-25 Thread Vladimir Sementsov-Ogievskiy

It's all because underlying "[PATCH v9 0/7] coroutines: generate wrapper code" series, 
I've answered in "[PATCH v9 0/7] coroutines: generate wrapper code" thread.

25.09.2020 00:45, no-re...@patchew.org wrote:

Patchew URL: 
https://patchew.org/QEMU/20200924194003.22080-1-vsement...@virtuozzo.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

C linker for the host machine: cc ld.bfd 2.27-43
Host machine cpu family: x86_64
Host machine cpu: x86_64
../src/meson.build:10: WARNING: Module unstable-keyval has no backwards or 
forwards compatibility and might not exist in future releases.
Program sh found: YES
Program python3 found: YES (/usr/bin/python3)
Configuring ninjatool using configuration
---
 return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11406: 
ordinal not in range(128)
Generating 'libqemu-aarch64-softmmu.fa.p/decode-neon-shared.c.inc'.
make: *** [block/block-gen.c.stamp] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
   File "./tests/docker/docker.py", line 709, in 
---
 raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--rm', 
'--label', 'com.qemu.instance.uuid=528b329e049d459c994676e3ba6dc69a', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '-e', 'TARGET_LIST=', '-e', 
'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-e72243g9/src/docker-src.2020-09-24-17.42.11.20907:/var/tmp/qemu:z,ro',
 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=528b329e049d459c994676e3ba6dc69a
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-e72243g9/src'
make: *** [docker-run-test-quick@centos7] Error 2

real3m12.373s
user0m16.084s


The full log is available at
http://patchew.org/logs/20200924194003.22080-1-vsement...@virtuozzo.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com




--
Best regards,
Vladimir



[PATCH 0.5/7] include/block/block.h: drop non-ascii quotation mark

2020-09-25 Thread Vladimir Sementsov-Ogievskiy
This is the only non-ascii character in the file and it doesn't really
needed here. Let's use normal "'" symbol for consistency with the rest
11 occurrences of "'" in the file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/block/block.h b/include/block/block.h
index 8b87df69a1..ce2ac39299 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -304,7 +304,7 @@ enum BdrvChildRoleBits {
 BDRV_CHILD_FILTERED = (1 << 2),
 
 /*
- * Child from which to read all data that isn’t allocated in the
+ * Child from which to read all data that isn't allocated in the
  * parent (i.e., the backing child); such data is copied to the
  * parent through COW (and optionally COR).
  * This field is mutually exclusive with DATA, METADATA, and
-- 
2.21.3




Re: [PATCH v9 4/7] scripts: add block-coroutine-wrapper.py

2020-09-25 Thread Vladimir Sementsov-Ogievskiy

24.09.2020 21:54, Vladimir Sementsov-Ogievskiy wrote:

We have a very frequent pattern of creating a coroutine from a function
with several arguments:

   - create a structure to pack parameters
   - create _entry function to call original function taking parameters
 from struct
   - do different magic to handle completion: set ret to NOT_DONE or
 EINPROGRESS or use separate bool field
   - fill the struct and create coroutine from _entry function with this
 struct as a parameter
   - do coroutine enter and BDRV_POLL_WHILE loop

Let's reduce code duplication by generating coroutine wrappers.

This patch adds scripts/block-coroutine-wrapper.py together with some
friends, which will generate functions with declared prototypes marked
by the 'generated_co_wrapper' specifier.

The usage of new code generation is as follows:

 1. define the coroutine function somewhere

 int coroutine_fn bdrv_co_NAME(...) {...}

 2. declare in some header file

 int generated_co_wrapper bdrv_NAME(...);

with same list of parameters (generated_co_wrapper is
defined in "include/block/block.h").

 3. Make sure the block_gen_c delaration in block/meson.build
mentions the file with your marker function.

Still, no function is now marked, this work is for the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  docs/devel/block-coroutine-wrapper.rst |  54 +++
  docs/devel/index.rst   |   1 +
  block/block-gen.h  |  49 +++
  include/block/block.h  |  10 ++
  block/meson.build  |   8 ++
  scripts/block-coroutine-wrapper.py | 188 +
  6 files changed, 310 insertions(+)
  create mode 100644 docs/devel/block-coroutine-wrapper.rst
  create mode 100644 block/block-gen.h
  create mode 100644 scripts/block-coroutine-wrapper.py

diff --git a/docs/devel/block-coroutine-wrapper.rst 
b/docs/devel/block-coroutine-wrapper.rst
new file mode 100644
index 00..d09fff2cc5
--- /dev/null
+++ b/docs/devel/block-coroutine-wrapper.rst
@@ -0,0 +1,54 @@
+===
+block-coroutine-wrapper
+===
+
+A lot of functions in QEMU block layer (see ``block/*``) can only be
+called in coroutine context. Such functions are normally marked by the
+coroutine_fn specifier. Still, sometimes we need to call them from
+non-coroutine context; for this we need to start a coroutine, run the
+needed function from it and wait for coroutine finish in
+BDRV_POLL_WHILE() loop. To run a coroutine we need a function with one
+void* argument. So for each coroutine_fn function which needs a
+non-coroutine interface, we should define a structure to pack the
+parameters, define a separate function to unpack the parameters and
+call the original function and finally define a new interface function
+with same list of arguments as original one, which will pack the
+parameters into a struct, create a coroutine, run it and wait in
+BDRV_POLL_WHILE() loop. It's boring to create such wrappers by hand,
+so we have a script to generate them.
+
+Usage
+=
+
+Assume we have defined the ``coroutine_fn`` function
+``bdrv_co_foo()`` and need a non-coroutine interface for it,
+called ``bdrv_foo()``. In this case the script can help. To
+trigger the generation:
+
+1. You need ``bdrv_foo`` declaration somewhere (for example, in
+   ``block/coroutines.h``) with the ``generated_co_wrapper`` mark,
+   like this:
+
+.. code-block:: c
+
+int generated_co_wrapper bdrv_foo();
+
+2. You need to feed this declaration to block-coroutine-wrapper script.
+   For this, add the .h (or .c) file with the declaration to the
+   ``input: files(...)`` list of ``block_gen_c`` target declaration in
+   ``block/meson.build``
+
+You are done. During the build, coroutine wrappers will be generated in
+``/block/block-gen.c``.
+
+Links
+=
+
+1. The script location is ``scripts/block-coroutine-wrapper.py``.
+
+2. Generic place for private ``generated_co_wrapper`` declarations is
+   ``block/coroutines.h``, for public declarations:
+   ``include/block/block.h``
+
+3. The core API of generated coroutine wrappers is placed in
+   (not generated) ``block/block-gen.h``
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 04773ce076..cb0abe1e69 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -31,3 +31,4 @@ Contents:
 reset
 s390-dasd-ipl
 clocks
+   block-coroutine-wrapper
diff --git a/block/block-gen.h b/block/block-gen.h
new file mode 100644
index 00..f80cf4897d
--- /dev/null
+++ b/block/block-gen.h
@@ -0,0 +1,49 @@
+/*
+ * Block coroutine wrapping core, used by auto-generated block/block-gen.c
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ * Copyright (c) 2020 Virtuozzo International GmbH
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Softw

Re: [PATCH v9 0/7] coroutines: generate wrapper code

2020-09-25 Thread Vladimir Sementsov-Ogievskiy

24.09.2020 23:32, no-re...@patchew.org wrote:

Patchew URL: 
https://patchew.org/QEMU/20200924185414.28642-1-vsement...@virtuozzo.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

C linker for the host machine: cc ld.bfd 2.27-43
Host machine cpu family: x86_64
Host machine cpu: x86_64
../src/meson.build:10: WARNING: Module unstable-keyval has no backwards or 
forwards compatibility and might not exist in future releases.
Program sh found: YES
Program python3 found: YES (/usr/bin/python3)
Configuring ninjatool using configuration
---
 return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11406: 
ordinal not in range(128)
Generating 'libqemu-aarch64-softmmu.fa.p/decode-vfp-uncond.c.inc'.
make: *** [block/block-gen.c.stamp] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
   File "./tests/docker/docker.py", line 709, in 
---
 raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--rm', 
'--label', 'com.qemu.instance.uuid=ea922a0a6ce34dfc90f59bf1b059b9d5', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '-e', 'TARGET_LIST=', '-e', 
'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-rctf_xsm/src/docker-src.2020-09-24-16.29.53.14429:/var/tmp/qemu:z,ro',
 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=ea922a0a6ce34dfc90f59bf1b059b9d5
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-rctf_xsm/src'
make: *** [docker-run-test-quick@centos7] Error 2

real3m4.047s
user0m21.648s


The full log is available at
http://patchew.org/logs/20200924185414.28642-1-vsement...@virtuozzo.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com



Generating 'libqemu-aarch64-softmmu.fa.p/decode-vfp.c.inc'.
Traceback (most recent call last):
  File "/tmp/qemu-test/src/block/../scripts/block-coroutine-wrapper.py", line 187, in 

f_out.write(gen_wrappers(f_in.read()))
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11406: 
ordinal not in range(128)


Interesting:

[root@kvm up-coroutine-wrapper]# grep --color='auto' -P -n '[^\x00-\x7F]' 
include/block/block.h
307: * Child from which to read all data that isn’t allocated in the
 ^

The file really contains one non-ascii symbol. I think it worth a separate 
patch. Still, it shouldn't break build process. On my system it works as is, 
probably unicode is default for me.

Aha, from "open" specification:

   if encoding is not specified the encoding used is platform dependent: 
locale.getpreferredencoding(False) is called to get the current locale encoding.



Is it ok, that utf-8 is not default on test system?

So, possible solutions are:

1. Enforce utf-8 io in scripts/block-coroutine-wrapper.py (patch 4)
2. Drop non-ascii quotation mark from block.h
3. Fix the test system default to be utf-8

Do we want them all?

--
Best regards,
Vladimir



[PATCH v7 4/5] block/io: fix bdrv_is_allocated_above

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
bdrv_is_allocated_above wrongly handles short backing files: it reports
after-EOF space as UNALLOCATED which is wrong, as on read the data is
generated on the level of short backing file (if all overlays has
unallocated area at that place).

Reusing bdrv_common_block_status_above fixes the issue and unifies code
path.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
---
 block/io.c | 43 +--
 1 file changed, 5 insertions(+), 38 deletions(-)

diff --git a/block/io.c b/block/io.c
index 82a3afa3dc..36baf4fed4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2477,52 +2477,19 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
*bs, int64_t offset,
  * at 'offset + *pnum' may return the same allocation status (in other
  * words, the result is not necessarily the maximum possible range);
  * but 'pnum' will only be 0 when end of file is reached.
- *
  */
 int bdrv_is_allocated_above(BlockDriverState *top,
 BlockDriverState *base,
 bool include_base, int64_t offset,
 int64_t bytes, int64_t *pnum)
 {
-BlockDriverState *intermediate;
-int ret;
-int64_t n = bytes;
-
-assert(base || !include_base);
-
-intermediate = top;
-while (include_base || intermediate != base) {
-int64_t pnum_inter;
-int64_t size_inter;
-
-assert(intermediate);
-ret = bdrv_is_allocated(intermediate, offset, bytes, _inter);
-if (ret < 0) {
-return ret;
-}
-if (ret) {
-*pnum = pnum_inter;
-return 1;
-}
-
-size_inter = bdrv_getlength(intermediate);
-if (size_inter < 0) {
-return size_inter;
-}
-if (n > pnum_inter &&
-(intermediate == top || offset + pnum_inter < size_inter)) {
-n = pnum_inter;
-}
-
-if (intermediate == base) {
-break;
-}
-
-intermediate = bdrv_filter_or_cow_bs(intermediate);
+int ret = bdrv_common_block_status_above(top, base, include_base, false,
+ offset, bytes, pnum, NULL, NULL);
+if (ret < 0) {
+return ret;
 }
 
-*pnum = n;
-return 0;
+return !!(ret & BDRV_BLOCK_ALLOCATED);
 }
 
 int coroutine_fn
-- 
2.21.3




[PATCH v7 5/5] iotests: add commit top->base cases to 274

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
These cases are fixed by previous patches around block_status and
is_allocated.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
---
 tests/qemu-iotests/274 | 20 +++
 tests/qemu-iotests/274.out | 68 ++
 2 files changed, 88 insertions(+)

diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
index d4571c5465..76b1ba6a52 100755
--- a/tests/qemu-iotests/274
+++ b/tests/qemu-iotests/274
@@ -115,6 +115,26 @@ with iotests.FilePath('base') as base, \
 iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
 iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
 
+iotests.log('=== Testing qemu-img commit (top -> base) ===')
+
+create_chain()
+iotests.qemu_img_log('commit', '-b', base, top)
+iotests.img_info_log(base)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), 
base)
+
+iotests.log('=== Testing QMP active commit (top -> base) ===')
+
+create_chain()
+with create_vm() as vm:
+vm.launch()
+vm.qmp_log('block-commit', device='top', base_node='base',
+   job_id='job0', auto_dismiss=False)
+vm.run_job('job0', wait=5)
+
+iotests.img_info_log(mid)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), 
base)
 
 iotests.log('== Resize tests ==')
 
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
index bf5abd4c10..cfe17a8659 100644
--- a/tests/qemu-iotests/274.out
+++ b/tests/qemu-iotests/274.out
@@ -135,6 +135,74 @@ read 1048576/1048576 bytes at offset 0
 read 1048576/1048576 bytes at offset 1048576
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+=== Testing qemu-img commit (top -> base) ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base 
backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid 
backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
+
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Image committed.
+
+image: TEST_IMG
+file format: IMGFMT
+virtual size: 2 MiB (2097152 bytes)
+cluster_size: 65536
+Format specific information:
+compat: 1.1
+compression type: zlib
+lazy refcounts: false
+refcount bits: 16
+corrupt: false
+extended l2: false
+
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing QMP active commit (top -> base) ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base 
backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid 
backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
+
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": 
"base", "device": "top", "job-id": "job0"}}
+{"return": {}}
+{"execute": "job-complete", "arguments": {"id": "job0"}}
+{"return": {}}
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, 
"type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": 
"USECS", "seconds": "SECS"}}
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, 
"type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "job-dismiss", "arguments": {"id": "job0"}}
+{"return": {}}
+imag

[PATCH v7 2/5] block/io: bdrv_common_block_status_above: support include_base

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
In order to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above, let's support include_base parameter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/coroutines.h |  2 ++
 block/io.c | 21 ++---
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index f69179f5ef..1cb3128b94 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -41,6 +41,7 @@ bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int 
bytes,
 int coroutine_fn
 bdrv_co_common_block_status_above(BlockDriverState *bs,
   BlockDriverState *base,
+  bool include_base,
   bool want_zero,
   int64_t offset,
   int64_t bytes,
@@ -50,6 +51,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 int generated_co_wrapper
 bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
+   bool include_base,
bool want_zero,
int64_t offset,
int64_t bytes,
diff --git a/block/io.c b/block/io.c
index 4697e67a85..b88c7a6314 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2343,6 +2343,7 @@ early_out:
 int coroutine_fn
 bdrv_co_common_block_status_above(BlockDriverState *bs,
   BlockDriverState *base,
+  bool include_base,
   bool want_zero,
   int64_t offset,
   int64_t bytes,
@@ -2354,10 +2355,11 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 BlockDriverState *p;
 int64_t eof = 0;
 
-assert(bs != base);
+assert(include_base || bs != base);
+assert(!include_base || base); /* Can't include NULL base */
 
 ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
-if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
+if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
 return ret;
 }
 
@@ -2368,7 +2370,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 assert(*pnum <= bytes);
 bytes = *pnum;
 
-for (p = bdrv_filter_or_cow_bs(bs); p != base;
+for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
  p = bdrv_filter_or_cow_bs(p))
 {
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
@@ -2406,6 +2408,11 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 break;
 }
 
+if (p == base) {
+assert(include_base);
+break;
+}
+
 /*
  * OK, [offset, offset + *pnum) region is unallocated on this layer,
  * let's continue the diving.
@@ -2425,7 +2432,7 @@ int bdrv_block_status_above(BlockDriverState *bs, 
BlockDriverState *base,
 int64_t offset, int64_t bytes, int64_t *pnum,
 int64_t *map, BlockDriverState **file)
 {
-return bdrv_common_block_status_above(bs, base, true, offset, bytes,
+return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
   pnum, map, file);
 }
 
@@ -2442,9 +2449,9 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, 
int64_t offset,
 int ret;
 int64_t dummy;
 
-ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
- offset, bytes, pnum ? pnum : ,
- NULL, NULL);
+ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
+ bytes, pnum ? pnum : , NULL,
+ NULL);
 if (ret < 0) {
 return ret;
 }
-- 
2.21.3




[PATCH v7 1/5] block/io: fix bdrv_co_block_status_above

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
bdrv_co_block_status_above has several design problems with handling
short backing files:

1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
which produces these after-EOF zeros is inside requested backing
sequence.

2. With want_zero=false, it may return pnum=0 prior to actual EOF,
because of EOF of short backing file.

Fix these things, making logic about short backing files clearer.

With fixed bdrv_block_status_above we also have to improve is_zero in
qcow2 code, otherwise iotest 154 will fail, because with this patch we
stop to merge zeros of different types (produced by fully unallocated
in the whole backing chain regions vs produced by short backing files).

Note also, that this patch leaves for another day the general problem
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
vs go-to-backing.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/io.c| 68 ---
 block/qcow2.c | 16 ++--
 2 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/block/io.c b/block/io.c
index 449b99b92c..4697e67a85 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2350,34 +2350,74 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
   int64_t *map,
   BlockDriverState **file)
 {
+int ret;
 BlockDriverState *p;
-int ret = 0;
-bool first = true;
+int64_t eof = 0;
 
 assert(bs != base);
-for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
+
+ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
+if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
+return ret;
+}
+
+if (ret & BDRV_BLOCK_EOF) {
+eof = offset + *pnum;
+}
+
+assert(*pnum <= bytes);
+bytes = *pnum;
+
+for (p = bdrv_filter_or_cow_bs(bs); p != base;
+ p = bdrv_filter_or_cow_bs(p))
+{
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
file);
 if (ret < 0) {
-break;
+return ret;
 }
-if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+if (*pnum == 0) {
 /*
- * Reading beyond the end of the file continues to read
- * zeroes, but we can only widen the result to the
- * unallocated length we learned from an earlier
- * iteration.
+ * The top layer deferred to this layer, and because this layer is
+ * short, any zeroes that we synthesize beyond EOF behave as if 
they
+ * were allocated at this layer.
+ *
+ * We don't include BDRV_BLOCK_EOF into ret, as upper layer may be
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
+ * below.
  */
+assert(ret & BDRV_BLOCK_EOF);
 *pnum = bytes;
+if (file) {
+*file = p;
+}
+ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
+break;
 }
-if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
+if (ret & BDRV_BLOCK_ALLOCATED) {
+/*
+ * We've found the node and the status, we must break.
+ *
+ * Drop BDRV_BLOCK_EOF, as it's not for upper layer, which may be
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
+ * below.
+ */
+ret &= ~BDRV_BLOCK_EOF;
 break;
 }
-/* [offset, pnum] unallocated on this layer, which could be only
- * the first part of [offset, bytes].  */
-bytes = MIN(bytes, *pnum);
-first = false;
+
+/*
+ * OK, [offset, offset + *pnum) region is unallocated on this layer,
+ * let's continue the diving.
+ */
+assert(*pnum <= bytes);
+bytes = *pnum;
 }
+
+if (offset + *pnum == eof) {
+ret |= BDRV_BLOCK_EOF;
+}
+
 return ret;
 }
 
diff --git a/block/qcow2.c b/block/qcow2.c
index b05512718c..a1bc16e202 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3860,8 +3860,20 @@ static bool is_zero(BlockDriverState *bs, int64_t 
offset, int64_t bytes)
 if (!bytes) {
 return true;
 }
-res = bdrv_block_status_above(bs, NULL, offset, bytes, , NULL, NULL);
-return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
+
+/*
+ * bdrv_block_status_above doesn't merge different types of zeros, for
+ * example, zeros which come from the region which is unallocated in
+ * the whole backing chain, and zeros which comes because of a short
+ * backing file. So, we need a loop.
+ */
+do {
+res = bdr

[PATCH v7 3/5] block/io: bdrv_common_block_status_above: support bs == base

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
We are going to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
include_base == false and still bs == base (for ex. from img_rebase()).

So, support this corner case.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
---
 block/io.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index b88c7a6314..82a3afa3dc 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2355,9 +2355,13 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 BlockDriverState *p;
 int64_t eof = 0;
 
-assert(include_base || bs != base);
 assert(!include_base || base); /* Can't include NULL base */
 
+if (!include_base && bs == base) {
+*pnum = bytes;
+return 0;
+}
+
 ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
 if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
 return ret;
-- 
2.21.3




[PATCH v7 0/5] fix & merge block_status_above and is_allocated_above

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
Hi all!

These series are here to address the following problem:
block-status-above functions may consider space after EOF of
intermediate backing files as unallocated, which is wrong, as these
backing files are the reason of producing zeroes, we never go further by
backing chain after a short backing file. So, if such short-backing file
is _inside_ requested sub-chain of the backing chain, we should never
report space after its EOF as unallocated.

See patches 01,04,05 for details.

Note, that this series leaves for another day the general problem
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
vs go-to-backing.
Audit for this problem is done here:
"backing chain & block status & filters"
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg04706.html
And I'm going to prepare series to address this problem.

Also, get_block_status func have same disease, but remains unfixed here:
I want to make separate series for it.

v7:
- add Alberto's r-bs in all patchs
- rebase to new backing-chain handling, bdrv_filter_or_cow_bs is used instead 
of backing_bs


Based on series "[PATCH v9 0/7] coroutines: generate wrapper code" or
in other words:
Based-on: <20200924185414.28642-1-vsement...@virtuozzo.com>

Vladimir Sementsov-Ogievskiy (5):
  block/io: fix bdrv_co_block_status_above
  block/io: bdrv_common_block_status_above: support include_base
  block/io: bdrv_common_block_status_above: support bs == base
  block/io: fix bdrv_is_allocated_above
  iotests: add commit top->base cases to 274

 block/coroutines.h |   2 +
 block/io.c | 132 +
 block/qcow2.c  |  16 -
 tests/qemu-iotests/274 |  20 ++
 tests/qemu-iotests/274.out |  68 +++
 5 files changed, 179 insertions(+), 59 deletions(-)

-- 
2.21.3




[PATCH v9 6/7] block: drop bdrv_prwv

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
Now that we are not maintaining boilerplate code for coroutine
wrappers, there is no more sense in keeping the extra indirection layer
of bdrv_prwv().  Let's drop it and instead generate pure bdrv_preadv()
and bdrv_pwritev().

Currently, bdrv_pwritev() and bdrv_preadv() are returning bytes on
success, auto generated functions will instead return zero, as their
_co_ prototype. Still, it's simple to make the conversion safe: the
only external user of bdrv_pwritev() is test-bdrv-drain, and it is
comfortable enough with bdrv_co_pwritev() instead. So prototypes are
moved to local block/coroutines.h. Next, the only internal use is
bdrv_pread() and bdrv_pwrite(), which are modified to return bytes on
success.

Of course, it would be great to convert bdrv_pread() and bdrv_pwrite()
to return 0 on success. But this requires audit (and probably
conversion) of all their users, let's leave it for another day
refactoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefan Hajnoczi 
---
 block/coroutines.h  | 10 -
 include/block/block.h   |  2 --
 block/io.c  | 49 -
 tests/test-bdrv-drain.c |  2 +-
 4 files changed, 15 insertions(+), 48 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index c62b3a2697..6c63a819c9 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -31,12 +31,12 @@ int coroutine_fn bdrv_co_check(BlockDriverState *bs,
BdrvCheckResult *res, BdrvCheckMode fix);
 int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp);
 
-int coroutine_fn
-bdrv_co_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
- bool is_write, BdrvRequestFlags flags);
 int generated_co_wrapper
-bdrv_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
-  bool is_write, BdrvRequestFlags flags);
+bdrv_preadv(BdrvChild *child, int64_t offset, unsigned int bytes,
+QEMUIOVector *qiov, BdrvRequestFlags flags);
+int generated_co_wrapper
+bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags);
 
 int coroutine_fn
 bdrv_co_common_block_status_above(BlockDriverState *bs,
diff --git a/include/block/block.h b/include/block/block.h
index f2d85f2cf1..eef4cceaf0 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -383,9 +383,7 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
int bytes, BdrvRequestFlags flags);
 int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags);
 int bdrv_pread(BdrvChild *child, int64_t offset, void *buf, int bytes);
-int bdrv_preadv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov);
 int bdrv_pwrite(BdrvChild *child, int64_t offset, const void *buf, int bytes);
-int bdrv_pwritev(BdrvChild *child, int64_t offset, QEMUIOVector *qiov);
 int bdrv_pwrite_sync(BdrvChild *child, int64_t offset,
  const void *buf, int count);
 /*
diff --git a/block/io.c b/block/io.c
index c1360ba57d..cd5b689473 100644
--- a/block/io.c
+++ b/block/io.c
@@ -890,23 +890,11 @@ static int bdrv_check_byte_request(BlockDriverState *bs, 
int64_t offset,
 return 0;
 }
 
-int coroutine_fn bdrv_co_prwv(BdrvChild *child, int64_t offset,
-  QEMUIOVector *qiov, bool is_write,
-  BdrvRequestFlags flags)
-{
-if (is_write) {
-return bdrv_co_pwritev(child, offset, qiov->size, qiov, flags);
-} else {
-return bdrv_co_preadv(child, offset, qiov->size, qiov, flags);
-}
-}
-
 int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
int bytes, BdrvRequestFlags flags)
 {
-QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, bytes);
-
-return bdrv_prwv(child, offset, , true, BDRV_REQ_ZERO_WRITE | flags);
+return bdrv_pwritev(child, offset, bytes, NULL,
+BDRV_REQ_ZERO_WRITE | flags);
 }
 
 /*
@@ -950,41 +938,19 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags 
flags)
 }
 }
 
-/* return < 0 if error. See bdrv_pwrite() for the return codes */
-int bdrv_preadv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov)
-{
-int ret;
-
-ret = bdrv_prwv(child, offset, qiov, false, 0);
-if (ret < 0) {
-return ret;
-}
-
-return qiov->size;
-}
-
 /* See bdrv_pwrite() for the return codes */
 int bdrv_pread(BdrvChild *child, int64_t offset, void *buf, int bytes)
 {
+int ret;
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
 
 if (bytes < 0) {
 return -EINVAL;
 }
 
-return bdrv_preadv(child, offset, );
-}
-
-int bdrv_pwritev(BdrvChild *child, int64_t offset, QEMUIOVector *qiov)
-{
-int ret;
+ret = bdrv_preadv(child, offset, bytes, ,  0);
 
-ret = bdrv_prwv(child, offset, qiov, true, 0);
-if (ret < 0) {
-return ret;
-   

[PATCH v9 4/7] scripts: add block-coroutine-wrapper.py

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
We have a very frequent pattern of creating a coroutine from a function
with several arguments:

  - create a structure to pack parameters
  - create _entry function to call original function taking parameters
from struct
  - do different magic to handle completion: set ret to NOT_DONE or
EINPROGRESS or use separate bool field
  - fill the struct and create coroutine from _entry function with this
struct as a parameter
  - do coroutine enter and BDRV_POLL_WHILE loop

Let's reduce code duplication by generating coroutine wrappers.

This patch adds scripts/block-coroutine-wrapper.py together with some
friends, which will generate functions with declared prototypes marked
by the 'generated_co_wrapper' specifier.

The usage of new code generation is as follows:

1. define the coroutine function somewhere

int coroutine_fn bdrv_co_NAME(...) {...}

2. declare in some header file

int generated_co_wrapper bdrv_NAME(...);

   with same list of parameters (generated_co_wrapper is
   defined in "include/block/block.h").

3. Make sure the block_gen_c delaration in block/meson.build
   mentions the file with your marker function.

Still, no function is now marked, this work is for the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 docs/devel/block-coroutine-wrapper.rst |  54 +++
 docs/devel/index.rst   |   1 +
 block/block-gen.h  |  49 +++
 include/block/block.h  |  10 ++
 block/meson.build  |   8 ++
 scripts/block-coroutine-wrapper.py | 188 +
 6 files changed, 310 insertions(+)
 create mode 100644 docs/devel/block-coroutine-wrapper.rst
 create mode 100644 block/block-gen.h
 create mode 100644 scripts/block-coroutine-wrapper.py

diff --git a/docs/devel/block-coroutine-wrapper.rst 
b/docs/devel/block-coroutine-wrapper.rst
new file mode 100644
index 00..d09fff2cc5
--- /dev/null
+++ b/docs/devel/block-coroutine-wrapper.rst
@@ -0,0 +1,54 @@
+===
+block-coroutine-wrapper
+===
+
+A lot of functions in QEMU block layer (see ``block/*``) can only be
+called in coroutine context. Such functions are normally marked by the
+coroutine_fn specifier. Still, sometimes we need to call them from
+non-coroutine context; for this we need to start a coroutine, run the
+needed function from it and wait for coroutine finish in
+BDRV_POLL_WHILE() loop. To run a coroutine we need a function with one
+void* argument. So for each coroutine_fn function which needs a
+non-coroutine interface, we should define a structure to pack the
+parameters, define a separate function to unpack the parameters and
+call the original function and finally define a new interface function
+with same list of arguments as original one, which will pack the
+parameters into a struct, create a coroutine, run it and wait in
+BDRV_POLL_WHILE() loop. It's boring to create such wrappers by hand,
+so we have a script to generate them.
+
+Usage
+=
+
+Assume we have defined the ``coroutine_fn`` function
+``bdrv_co_foo()`` and need a non-coroutine interface for it,
+called ``bdrv_foo()``. In this case the script can help. To
+trigger the generation:
+
+1. You need ``bdrv_foo`` declaration somewhere (for example, in
+   ``block/coroutines.h``) with the ``generated_co_wrapper`` mark,
+   like this:
+
+.. code-block:: c
+
+int generated_co_wrapper bdrv_foo();
+
+2. You need to feed this declaration to block-coroutine-wrapper script.
+   For this, add the .h (or .c) file with the declaration to the
+   ``input: files(...)`` list of ``block_gen_c`` target declaration in
+   ``block/meson.build``
+
+You are done. During the build, coroutine wrappers will be generated in
+``/block/block-gen.c``.
+
+Links
+=
+
+1. The script location is ``scripts/block-coroutine-wrapper.py``.
+
+2. Generic place for private ``generated_co_wrapper`` declarations is
+   ``block/coroutines.h``, for public declarations:
+   ``include/block/block.h``
+
+3. The core API of generated coroutine wrappers is placed in
+   (not generated) ``block/block-gen.h``
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 04773ce076..cb0abe1e69 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -31,3 +31,4 @@ Contents:
reset
s390-dasd-ipl
clocks
+   block-coroutine-wrapper
diff --git a/block/block-gen.h b/block/block-gen.h
new file mode 100644
index 00..f80cf4897d
--- /dev/null
+++ b/block/block-gen.h
@@ -0,0 +1,49 @@
+/*
+ * Block coroutine wrapping core, used by auto-generated block/block-gen.c
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ * Copyright (c) 2020 Virtuozzo International GmbH
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without l

[PATCH v9 5/7] block: generate coroutine-wrapper code

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
Use code generation implemented in previous commit to generated
coroutine wrappers in block.c and block/io.c

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/coroutines.h|   6 +-
 include/block/block.h |  16 ++--
 block.c   |  73 ---
 block/io.c| 212 --
 4 files changed, 13 insertions(+), 294 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 9ce1730a09..c62b3a2697 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -34,7 +34,7 @@ int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState 
*bs, Error **errp);
 int coroutine_fn
 bdrv_co_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
  bool is_write, BdrvRequestFlags flags);
-int
+int generated_co_wrapper
 bdrv_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
   bool is_write, BdrvRequestFlags flags);
 
@@ -47,7 +47,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
   int64_t *pnum,
   int64_t *map,
   BlockDriverState **file);
-int
+int generated_co_wrapper
 bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
bool want_zero,
@@ -60,7 +60,7 @@ bdrv_common_block_status_above(BlockDriverState *bs,
 int coroutine_fn
 bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
bool is_read);
-int
+int generated_co_wrapper
 bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
 bool is_read);
 
diff --git a/include/block/block.h b/include/block/block.h
index 0f0ddc51b4..f2d85f2cf1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -403,8 +403,9 @@ void bdrv_refresh_filename(BlockDriverState *bs);
 int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
   PreallocMode prealloc, BdrvRequestFlags 
flags,
   Error **errp);
-int bdrv_truncate(BdrvChild *child, int64_t offset, bool exact,
-  PreallocMode prealloc, BdrvRequestFlags flags, Error **errp);
+int generated_co_wrapper
+bdrv_truncate(BdrvChild *child, int64_t offset, bool exact,
+  PreallocMode prealloc, BdrvRequestFlags flags, Error **errp);
 
 int64_t bdrv_nb_sectors(BlockDriverState *bs);
 int64_t bdrv_getlength(BlockDriverState *bs);
@@ -446,7 +447,8 @@ typedef enum {
 BDRV_FIX_ERRORS   = 2,
 } BdrvCheckMode;
 
-int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix);
+int generated_co_wrapper bdrv_check(BlockDriverState *bs, BdrvCheckResult *res,
+BdrvCheckMode fix);
 
 /* The units of offset and total_work_size may be chosen arbitrarily by the
  * block driver; total_work_size may change during the course of the amendment
@@ -470,12 +472,13 @@ void bdrv_aio_cancel_async(BlockAIOCB *acb);
 int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
 
 /* Invalidate any cached metadata used by image formats */
-int bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
+int generated_co_wrapper bdrv_invalidate_cache(BlockDriverState *bs,
+   Error **errp);
 void bdrv_invalidate_cache_all(Error **errp);
 int bdrv_inactivate_all(void);
 
 /* Ensure contents are flushed to disk.  */
-int bdrv_flush(BlockDriverState *bs);
+int generated_co_wrapper bdrv_flush(BlockDriverState *bs);
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
 int bdrv_flush_all(void);
 void bdrv_close_all(void);
@@ -490,7 +493,8 @@ void bdrv_drain_all(void);
 AIO_WAIT_WHILE(bdrv_get_aio_context(bs_),  \
cond); })
 
-int bdrv_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
+int generated_co_wrapper bdrv_pdiscard(BdrvChild *child, int64_t offset,
+   int64_t bytes);
 int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
diff --git a/block.c b/block.c
index 6e2bfb93d8..de056c695a 100644
--- a/block.c
+++ b/block.c
@@ -4691,43 +4691,6 @@ int coroutine_fn bdrv_co_check(BlockDriverState *bs,
 return bs->drv->bdrv_co_check(bs, res, fix);
 }
 
-typedef struct CheckCo {
-BlockDriverState *bs;
-BdrvCheckResult *res;
-BdrvCheckMode fix;
-int ret;
-} CheckCo;
-
-static void coroutine_fn bdrv_check_co_entry(void *opaque)
-{
-CheckCo *cco = opaque;
-cco->ret = bdrv_co_check(cco->bs, cco->res, cco->fix);
-aio_wait_kick();
-}
-
-int bdrv_check(BlockDriverState *bs,
-   BdrvCheckResult *res, BdrvCheckMode fix)
-{
-Coroutine *co;
-CheckCo cco = {
-.bs = bs,
-.res = 

[PATCH v9 1/7] block: return error-code from bdrv_invalidate_cache

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
This is the only coroutine wrapper from block.c and block/io.c which
doesn't return a value, so let's convert it to the common behavior, to
simplify moving to generated coroutine wrappers in a further commit.

Also, bdrv_invalidate_cache is a void function, returning error only
through **errp parameter, which is considered to be bad practice, as
it forces callers to define and propagate local_err variable, so
conversion is good anyway.

This patch leaves the conversion of .bdrv_co_invalidate_cache() driver
callbacks and bdrv_invalidate_cache_all() for another day.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefan Hajnoczi 
---
 include/block/block.h |  2 +-
 block.c   | 32 ++--
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 981ab5b314..81d591dd4c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -460,7 +460,7 @@ void bdrv_aio_cancel_async(BlockAIOCB *acb);
 int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
 
 /* Invalidate any cached metadata used by image formats */
-void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
+int bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
 void bdrv_invalidate_cache_all(Error **errp);
 int bdrv_inactivate_all(void);
 
diff --git a/block.c b/block.c
index 11ab55f80b..47b3845e14 100644
--- a/block.c
+++ b/block.c
@@ -5781,8 +5781,8 @@ void bdrv_init_with_whitelist(void)
 bdrv_init();
 }
 
-static void coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
-  Error **errp)
+static int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
+ Error **errp)
 {
 BdrvChild *child, *parent;
 uint64_t perm, shared_perm;
@@ -5791,14 +5791,14 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 BdrvDirtyBitmap *bm;
 
 if (!bs->drv)  {
-return;
+return -ENOMEDIUM;
 }
 
 QLIST_FOREACH(child, >children, next) {
 bdrv_co_invalidate_cache(child->bs, _err);
 if (local_err) {
 error_propagate(errp, local_err);
-return;
+return -EINVAL;
 }
 }
 
@@ -5821,7 +5821,7 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 ret = bdrv_check_perm(bs, NULL, perm, shared_perm, NULL, NULL, errp);
 if (ret < 0) {
 bs->open_flags |= BDRV_O_INACTIVE;
-return;
+return ret;
 }
 bdrv_set_perm(bs, perm, shared_perm);
 
@@ -5830,7 +5830,7 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 if (local_err) {
 bs->open_flags |= BDRV_O_INACTIVE;
 error_propagate(errp, local_err);
-return;
+return -EINVAL;
 }
 }
 
@@ -5842,7 +5842,7 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 if (ret < 0) {
 bs->open_flags |= BDRV_O_INACTIVE;
 error_setg_errno(errp, -ret, "Could not refresh total sector 
count");
-return;
+return ret;
 }
 }
 
@@ -5852,27 +5852,30 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 if (local_err) {
 bs->open_flags |= BDRV_O_INACTIVE;
 error_propagate(errp, local_err);
-return;
+return -EINVAL;
 }
 }
 }
+
+return 0;
 }
 
 typedef struct InvalidateCacheCo {
 BlockDriverState *bs;
 Error **errp;
 bool done;
+int ret;
 } InvalidateCacheCo;
 
 static void coroutine_fn bdrv_invalidate_cache_co_entry(void *opaque)
 {
 InvalidateCacheCo *ico = opaque;
-bdrv_co_invalidate_cache(ico->bs, ico->errp);
+ico->ret = bdrv_co_invalidate_cache(ico->bs, ico->errp);
 ico->done = true;
 aio_wait_kick();
 }
 
-void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
+int bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
 Coroutine *co;
 InvalidateCacheCo ico = {
@@ -5889,22 +5892,23 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error 
**errp)
 bdrv_coroutine_enter(bs, co);
 BDRV_POLL_WHILE(bs, !ico.done);
 }
+
+return ico.ret;
 }
 
 void bdrv_invalidate_cache_all(Error **errp)
 {
 BlockDriverState *bs;
-Error *local_err = NULL;
 BdrvNextIterator it;
 
 for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
+int ret;
 
 aio_context_acquire(aio_context);
-bdrv_invalidate_cache(bs, _err);
+ret = bdrv_invalidate_cache(bs, errp);
 aio_context_release(aio_context);
-   

[PATCH v9 2/7] block/io: refactor coroutine wrappers

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
Most of our coroutine wrappers already follow this convention:

We have 'coroutine_fn bdrv_co_()' as
the core function, and a wrapper 'bdrv_()' which does parameter packing and calls bdrv_run_co().

The only outsiders are the bdrv_prwv_co and
bdrv_common_block_status_above wrappers. Let's refactor them to behave
as the others, it simplifies further conversion of coroutine wrappers.

This patch adds an indirection layer, but it will be compensated by
a further commit, which will drop bdrv_co_prwv together with the
is_write logic, to keep the read and write paths separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c | 60 +-
 1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/block/io.c b/block/io.c
index a2389bb38c..24a7de3463 100644
--- a/block/io.c
+++ b/block/io.c
@@ -933,27 +933,31 @@ typedef struct RwCo {
 BdrvRequestFlags flags;
 } RwCo;
 
+static int coroutine_fn bdrv_co_prwv(BdrvChild *child, int64_t offset,
+ QEMUIOVector *qiov, bool is_write,
+ BdrvRequestFlags flags)
+{
+if (is_write) {
+return bdrv_co_pwritev(child, offset, qiov->size, qiov, flags);
+} else {
+return bdrv_co_preadv(child, offset, qiov->size, qiov, flags);
+}
+}
+
 static int coroutine_fn bdrv_rw_co_entry(void *opaque)
 {
 RwCo *rwco = opaque;
 
-if (!rwco->is_write) {
-return bdrv_co_preadv(rwco->child, rwco->offset,
-  rwco->qiov->size, rwco->qiov,
-  rwco->flags);
-} else {
-return bdrv_co_pwritev(rwco->child, rwco->offset,
-   rwco->qiov->size, rwco->qiov,
-   rwco->flags);
-}
+return bdrv_co_prwv(rwco->child, rwco->offset, rwco->qiov,
+rwco->is_write, rwco->flags);
 }
 
 /*
  * Process a vectored synchronous request using coroutines
  */
-static int bdrv_prwv_co(BdrvChild *child, int64_t offset,
-QEMUIOVector *qiov, bool is_write,
-BdrvRequestFlags flags)
+static int bdrv_prwv(BdrvChild *child, int64_t offset,
+ QEMUIOVector *qiov, bool is_write,
+ BdrvRequestFlags flags)
 {
 RwCo rwco = {
 .child = child,
@@ -971,8 +975,7 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
 {
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, bytes);
 
-return bdrv_prwv_co(child, offset, , true,
-BDRV_REQ_ZERO_WRITE | flags);
+return bdrv_prwv(child, offset, , true, BDRV_REQ_ZERO_WRITE | flags);
 }
 
 /*
@@ -1021,7 +1024,7 @@ int bdrv_preadv(BdrvChild *child, int64_t offset, 
QEMUIOVector *qiov)
 {
 int ret;
 
-ret = bdrv_prwv_co(child, offset, qiov, false, 0);
+ret = bdrv_prwv(child, offset, qiov, false, 0);
 if (ret < 0) {
 return ret;
 }
@@ -1045,7 +1048,7 @@ int bdrv_pwritev(BdrvChild *child, int64_t offset, 
QEMUIOVector *qiov)
 {
 int ret;
 
-ret = bdrv_prwv_co(child, offset, qiov, true, 0);
+ret = bdrv_prwv(child, offset, qiov, true, 0);
 if (ret < 0) {
 return ret;
 }
@@ -2449,14 +2452,15 @@ early_out:
 return ret;
 }
 
-static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
-   BlockDriverState *base,
-   bool want_zero,
-   int64_t offset,
-   int64_t bytes,
-   int64_t *pnum,
-   int64_t *map,
-   BlockDriverState **file)
+static int coroutine_fn
+bdrv_co_common_block_status_above(BlockDriverState *bs,
+  BlockDriverState *base,
+  bool want_zero,
+  int64_t offset,
+  int64_t bytes,
+  int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file)
 {
 BlockDriverState *p;
 int ret = 0;
@@ -2494,10 +2498,10 @@ static int coroutine_fn 
bdrv_block_status_above_co_entry(void *opaque)
 {
 BdrvCoBlockStatusData *data = opaque;
 
-return bdrv_co_block_status_above(data->bs, data->base,
-  data->want_zero,
-  data->offset, data->bytes,
-  data->pnum, data->map, data->file);
+return bdrv_co_common_block_status

[PATCH v9 3/7] block: declare some coroutine functions in block/coroutines.h

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
We are going to keep coroutine-wrappers code (structure-packing
parameters, BDRV_POLL wrapper functions) in separate auto-generated
files. So, we'll need a header with declaration of original _co_
functions, for those which are static now. As well, we'll need
declarations for wrapper functions. Do these declarations now, as a
preparation step.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefan Hajnoczi 
---
 block/coroutines.h | 67 ++
 block.c|  8 +++---
 block/io.c | 34 +++
 3 files changed, 88 insertions(+), 21 deletions(-)
 create mode 100644 block/coroutines.h

diff --git a/block/coroutines.h b/block/coroutines.h
new file mode 100644
index 00..9ce1730a09
--- /dev/null
+++ b/block/coroutines.h
@@ -0,0 +1,67 @@
+/*
+ * Block layer I/O functions
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCK_COROUTINES_INT_H
+#define BLOCK_COROUTINES_INT_H
+
+#include "block/block_int.h"
+
+int coroutine_fn bdrv_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res, BdrvCheckMode fix);
+int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp);
+
+int coroutine_fn
+bdrv_co_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
+ bool is_write, BdrvRequestFlags flags);
+int
+bdrv_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
+  bool is_write, BdrvRequestFlags flags);
+
+int coroutine_fn
+bdrv_co_common_block_status_above(BlockDriverState *bs,
+  BlockDriverState *base,
+  bool want_zero,
+  int64_t offset,
+  int64_t bytes,
+  int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file);
+int
+bdrv_common_block_status_above(BlockDriverState *bs,
+   BlockDriverState *base,
+   bool want_zero,
+   int64_t offset,
+   int64_t bytes,
+   int64_t *pnum,
+   int64_t *map,
+   BlockDriverState **file);
+
+int coroutine_fn
+bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
+   bool is_read);
+int
+bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
+bool is_read);
+
+#endif /* BLOCK_COROUTINES_INT_H */
diff --git a/block.c b/block.c
index 47b3845e14..6e2bfb93d8 100644
--- a/block.c
+++ b/block.c
@@ -48,6 +48,7 @@
 #include "qemu/timer.h"
 #include "qemu/cutils.h"
 #include "qemu/id.h"
+#include "block/coroutines.h"
 
 #ifdef CONFIG_BSD
 #include 
@@ -4676,8 +4677,8 @@ static void bdrv_delete(BlockDriverState *bs)
  * free of errors) or -errno when an internal error occurred. The results of 
the
  * check are stored in res.
  */
-static int coroutine_fn bdrv_co_check(BlockDriverState *bs,
-  BdrvCheckResult *res, BdrvCheckMode fix)
+int coroutine_fn bdrv_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res, BdrvCheckMode fix)
 {
 if (bs->drv == NULL) {
 return -ENOMEDIUM;
@@ -5781,8 +5782,7 @@ void bdrv_init_with_whitelist(void)
 bdrv_init();
 }
 
-static int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
- Error **errp)
+int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
 BdrvChild *child, *parent;
 uint64_t perm, shared_perm;
diff --git a/block/io.

[PATCH v9 7/7] block/io: refactor save/load vmstate

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
Like for read/write in a previous commit, drop extra indirection layer,
generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/coroutines.h| 10 +++
 include/block/block.h |  6 ++--
 block/io.c| 68 ++-
 3 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 6c63a819c9..f69179f5ef 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -57,11 +57,9 @@ bdrv_common_block_status_above(BlockDriverState *bs,
int64_t *map,
BlockDriverState **file);
 
-int coroutine_fn
-bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-   bool is_read);
-int generated_co_wrapper
-bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-bool is_read);
+int coroutine_fn bdrv_co_readv_vmstate(BlockDriverState *bs,
+   QEMUIOVector *qiov, int64_t pos);
+int coroutine_fn bdrv_co_writev_vmstate(BlockDriverState *bs,
+QEMUIOVector *qiov, int64_t pos);
 
 #endif /* BLOCK_COROUTINES_INT_H */
diff --git a/include/block/block.h b/include/block/block.h
index eef4cceaf0..8b87df69a1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -572,8 +572,10 @@ int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
 char *path_combine(const char *base_path, const char *filename);
 
-int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
-int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
+int generated_co_wrapper
+bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
+int generated_co_wrapper
+bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
   int64_t pos, int size);
 
diff --git a/block/io.c b/block/io.c
index cd5b689473..449b99b92c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2475,67 +2475,69 @@ int bdrv_is_allocated_above(BlockDriverState *top,
 }
 
 int coroutine_fn
-bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-   bool is_read)
+bdrv_co_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
 {
 BlockDriver *drv = bs->drv;
 BlockDriverState *child_bs = bdrv_primary_bs(bs);
 int ret = -ENOTSUP;
 
+if (!drv) {
+return -ENOMEDIUM;
+}
+
 bdrv_inc_in_flight(bs);
 
-if (!drv) {
-ret = -ENOMEDIUM;
-} else if (drv->bdrv_load_vmstate) {
-if (is_read) {
-ret = drv->bdrv_load_vmstate(bs, qiov, pos);
-} else {
-ret = drv->bdrv_save_vmstate(bs, qiov, pos);
-}
+if (drv->bdrv_load_vmstate) {
+ret = drv->bdrv_load_vmstate(bs, qiov, pos);
 } else if (child_bs) {
-ret = bdrv_co_rw_vmstate(child_bs, qiov, pos, is_read);
+ret = bdrv_co_readv_vmstate(child_bs, qiov, pos);
 }
 
 bdrv_dec_in_flight(bs);
+
 return ret;
 }
 
-int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
-  int64_t pos, int size)
+int coroutine_fn
+bdrv_co_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
 {
-QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, size);
-int ret;
+BlockDriver *drv = bs->drv;
+BlockDriverState *child_bs = bdrv_primary_bs(bs);
+int ret = -ENOTSUP;
 
-ret = bdrv_writev_vmstate(bs, , pos);
-if (ret < 0) {
-return ret;
+if (!drv) {
+return -ENOMEDIUM;
 }
 
-return size;
-}
+bdrv_inc_in_flight(bs);
 
-int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
-{
-return bdrv_rw_vmstate(bs, qiov, pos, false);
+if (drv->bdrv_save_vmstate) {
+ret = drv->bdrv_save_vmstate(bs, qiov, pos);
+} else if (child_bs) {
+ret = bdrv_co_writev_vmstate(child_bs, qiov, pos);
+}
+
+bdrv_dec_in_flight(bs);
+
+return ret;
 }
 
-int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
+int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
   int64_t pos, int size)
 {
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, size);
-int ret;
-
-ret = bdrv_readv_vmstate(bs, , pos);
-if (ret < 0) {
-return ret;
-}
+int ret = bdrv_writev_vmstate(bs, , pos);
 
-return size;
+return ret < 0 ? ret : size;
 }
 
-int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
+int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
+  int64_t pos, int size)
 {
-return bdrv_rw_vmstate(bs, qiov, pos, true);
+QEMUIOVector qiov =

[PATCH v9 0/7] coroutines: generate wrapper code

2020-09-24 Thread Vladimir Sementsov-Ogievskiy
Hi all!

The aim of the series is to reduce code-duplication and writing
parameters structure-packing by hand around coroutine function wrappers.

Benefits:
 - no code duplication
 - less indirection

v9: Thanks to Eric, I used commit message tweaks and rebase-conflict solving 
from his git.
01: add Philippe's, Stefan's r-bs
02: - add Philippe's, Stefan's r-bs
- commit message tweaks stolen from Eric's git :)
03: add Philippe's, Stefan's r-bs
04: - wording/grammar by Eric (partly, stolen from repo)
- ref new file in docs/devel/index.rst
- use 644 rights and recommended header for python script
- call gen_header() once
- rename gen_wrappers_file to gen_wrappers
05: add Stefan's r-b
06: add Philippe's, Stefan's r-bs
07: Stefan's r-b

Vladimir Sementsov-Ogievskiy (7):
  block: return error-code from bdrv_invalidate_cache
  block/io: refactor coroutine wrappers
  block: declare some coroutine functions in block/coroutines.h
  scripts: add block-coroutine-wrapper.py
  block: generate coroutine-wrapper code
  block: drop bdrv_prwv
  block/io: refactor save/load vmstate

 docs/devel/block-coroutine-wrapper.rst |  54 
 docs/devel/index.rst   |   1 +
 block/block-gen.h  |  49 
 block/coroutines.h |  65 +
 include/block/block.h  |  34 ++-
 block.c|  97 +--
 block/io.c | 337 -
 tests/test-bdrv-drain.c|   2 +-
 block/meson.build  |   8 +
 scripts/block-coroutine-wrapper.py | 188 ++
 10 files changed, 454 insertions(+), 381 deletions(-)
 create mode 100644 docs/devel/block-coroutine-wrapper.rst
 create mode 100644 block/block-gen.h
 create mode 100644 block/coroutines.h
 create mode 100644 scripts/block-coroutine-wrapper.py

-- 
2.21.3




Re: [PATCH v8 4/7] scripts: add block-coroutine-wrapper.py

2020-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2020 20:56, Eric Blake wrote:

On 9/15/20 11:44 AM, Vladimir Sementsov-Ogievskiy wrote:

We have a very frequent pattern of creating coroutine from function
with several arguments:




+++ b/docs/devel/block-coroutine-wrapper.rst
@@ -0,0 +1,54 @@
+===
+block-coroutine-wrapper
+===
+
+A lot of functions in QEMJ block layer (see ``block/*``) can by called


My editor italicized everhting after block/*...


+only in coroutine context. Such functions are normally marked by
+coroutine_fn specifier. Still, sometimes we need to call them from
+non-coroutine context, for this we need to start a coroutine, run the
+needed function from it and wait for coroutine finish in
+BDRV_POLL_WHILE() loop. To run a coroutine we need a function with one
+void* argument. So for each coroutine_fn function, which needs


...through void*.  I wonder if you need to use \* to let .rst know that these 
are literal *, and not markup for a different font style. Although I did not 
check the actual generated docs to see how they look.



Intuitively, `` should have greater priority than *.

I now check ./build/docs/devel/block-coroutine-wrapper.html , it looks OK:

   A lot of functions in QEMU block layer (see block/*) can only be

--
Best regards,
Vladimir



Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions

2020-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2020 20:40, Andrey Shinkevich wrote:

On 24.09.2020 20:29, Andrey Shinkevich wrote:

On 24.09.2020 18:00, Max Reitz wrote:

On 24.09.20 16:51, Vladimir Sementsov-Ogievskiy wrote:

24.09.2020 16:25, Max Reitz wrote:

On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:

17.09.2020 19:09, Andrey Shinkevich wrote:

On 04.09.2020 14:22, Max Reitz wrote:

On 28.08.20 18:52, Andrey Shinkevich wrote:

Provide API for the COR-filter insertion/removal.

...

Also, drop the filter child permissions for an inactive state when
the
filter node is being removed.

Do we need .active for that?  Shouldn’t it be sufficient to just not
require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
node (i.e. perm == 0 in cor_child_perm())?

Of course, using an .active field works, too.  But Vladimir wanted a
similar field in the preallocate filter, and there already is in
backup-top.  I feel like there shouldn’t be a need for this.

.bdrv_child_perm() should generally be able to decide what permissions
it needs based on the information it gets.  If every driver needs to
keep track of whether it has any parents and feed that information
into
.bdrv_child_perm() as external state, then something feels wrong.

If perm == 0 is not sufficient to rule out any parents, we should just
explicitly add a boolean that tells .bdrv_child_perm() whether
there are
any parents or not – shouldn’t be too difficult.


The issue is that we fail in the bdrv_check_update_perm() with
""Conflicts with use by..." if the *nperm = 0 and the *nshared =
BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
The filter is still in the backing chain by that moment and has a
parent with child->perm != 0.

The implementation of  the .bdrv_child_perm()in the given patch is
similar to one in the block/mirror.c.

How could we resolve the issue at the generic layer?



The problem is that when we update permissions in bdrv_replace_node, we
consider new placement for "to" node, but old placement for "from" node.
So, during update, we may consider stricter permission requirements for
"from" than needed and they conflict with new "to" permissions.

We should consider all graph changes for permission update
simultaneously, in same transaction. For this, we need refactor
permission update system to work on queue of nodes instead of simple DFS
recursion. And in the queue all the nodes to update should  be
toplogically sorted. In this way, when we update node N, all its parents
are already updated. And of course, we should make no-perm graph update
before permission update, and rollback no-perm movement if permission
update failed.

OK, you’ve convinced me, .active is good enough for now. O:)


And we need topological sort anyway. Consider the following example:

A -
|  \
|  v
|  B
|  |
v  /
C<-

A is parent for B and C, B is parent for C.

Obviously, to update permissions, we should go in order A B C, so, when
we update C, all it's parents permission already updated. But with
current approach (simple recursion) we can update in sequence A C B C (C
is updated twice). On first update of C, we consider old B permissions,
so doing wrong thing. If it succeed, all is OK, on second C update we
will finish with correct graph. But if the wrong thing failed, we break
the whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).

True.


I'll work on a patch for it.

Sounds great, though I fear for the complexity.  I’ll just wait and see
for now.


If you are OK with .active for now, then I think, Andrey can resend with
.active and I'll dig into permissions in parallel. If Andrey's series
go first, I'll just drop .active later if my idea works.

Sure, that works for me.

Max



So, I am keeping the filter insert/remove functions in the COR-driver code for 
now rather than moving them to the block generic layer, aren't I?

Andrey



As a concession, we can invoke .bdrv_insert/remove driver functions within the 
generic API ones.

Andrey



No, such handlers will not help I think. Until we improve permission-update 
system we can't implement good generic insertion code. So, I'd keep the patch 
as is for now.

--
Best regards,
Vladimir



Re: [PATCH v6 08/15] block: introduce preallocate filter

2020-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2020 19:50, Max Reitz wrote:

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:

It's intended to be inserted between format and protocol nodes to
preallocate additional space (expanding protocol file) on writes
crossing EOF. It improves performance for file-systems with slow
allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  docs/system/qemu-block-drivers.rst.inc |  26 ++
  qapi/block-core.json   |  20 +-
  block/preallocate.c| 556 +
  block/meson.build  |   1 +
  4 files changed, 602 insertions(+), 1 deletion(-)
  create mode 100644 block/preallocate.c


Looks good to me in general.

[...]


diff --git a/block/preallocate.c b/block/preallocate.c
new file mode 100644
index 00..6510ad0149
--- /dev/null
+++ b/block/preallocate.c


[...]


+/*
+ * Handle reopen.
+ *
+ * We must implement reopen handlers, otherwise reopen just don't work. Handle
+ * new options and don't care about preallocation state, as it is handled in
+ * set/check permission handlers.
+ */
+
+static int preallocate_reopen_prepare(BDRVReopenState *reopen_state,
+  BlockReopenQueue *queue, Error **errp)
+{
+PreallocateOpts *opts = g_new0(PreallocateOpts, 1);
+
+if (!preallocate_absorb_opts(opts, reopen_state->options,
+ reopen_state->bs->file->bs, errp)) {


I tried to find out whether this refers to the old file child, or the
post-reopen one.


Note, that it's needed only to check request_alignment. Probably it's better to 
pass request_alignment to preallocate_absorb_opts, not the whole child.


 As far as I could find out, there is no generic
implementation for changing the file child as part of x-blockdev-reopen:

{"error": {"class": "GenericError", "desc": "Cannot change the option
'file'"}}

Now that’s a shame.  That means you can’t reasonably integrate a
preallocate filter into an existing node graph unless the format driver
checks for the respective child option and issues a replace_node on
commit or something, right?  I suppose any driver who’d want to
implement child replacement would need to attach the new node in
prepare() as some pseudo-child, and then drop the old one and rename the
new one in commit().  I don’t think any driver does that yet, so I
suppose no format driver allows replacement of children yet (except for
quorum...).

Does anyone know what the status on that is?  Are there any plans for
implementing child replacement in reopen, or did I just miss something?

(It looks like the backing child can be replaced, but that’s probably
not a child where the preallocate filter would be placed on top...).


Hm. I didn't care about it, because main use case is to insert the filter at 
start, specifying it in -drive or in -blockdev options.

Probably, we need a separate API which will insert/remove filters like it is 
done in block jobs code, not reopening the block device.




+g_free(opts);
+return -EINVAL;
+}
+
+reopen_state->opaque = opts;
+
+return 0;
+}


[...]


+/*
+ * Call on each write. Returns true if @want_merge_zero is true and the region
+ * [offset, offset + bytes) is zeroed (as a result of this call or earlier
+ * preallocation).
+ *
+ * want_merge_zero is used to merge write-zero request with preallocation in
+ * one bdrv_co_pwrite_zeroes() call.
+ */
+static bool coroutine_fn handle_write(BlockDriverState *bs, int64_t offset,
+  int64_t bytes, bool want_merge_zero)
+{
+BDRVPreallocateState *s = bs->opaque;
+int64_t end = offset + bytes;
+int64_t prealloc_start, prealloc_end;
+int ret;
+
+if (!has_prealloc_perms(bs)) {


Took me a bit to figure out, because it takes a trip to
preallocate_child_perm() to figure out exactly when we’re going to have
the necessary permissions for this to pass.

Then it turns out that this is going to be the case exactly when the
parents collectively have the same permissions (WRITE+RESIZE) on the
preallocate node.

Then I had to wonder whether it’s appropriate not to preallocate if
WRITE is taken, but RESIZE isn’t.  But that seems entirely correct, yes.
  If noone is going to grow the file, then there is no need for
preallocation.  (Vice versa, if noone is going to write, but only
resize, then there is no need for preallocation either.)

So this seems correct, yes.

(It could be argued that if one parent has WRITE, and another has RESIZE
(but neither has both), then we probably don’t need preallocation
either.  But in such an arcane case (which is impossible to figure out
in .bdrv_child_perm() anyway), we might as well just do preallocation.
Won’t hurt.)


+/* We don't have state neither should try to recover it */
+return false;
+}
+
+if (s->data_end < 0) {
+s->data_end = bdrv_getle

Re: [PATCH v6 07/15] block: bdrv_check_perm(): process children anyway

2020-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2020 17:25, Max Reitz wrote:

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:

Do generic processing even for drivers which define .bdrv_check_perm
handler. It's needed for further preallocate filter: it will need to do
additional action on bdrv_check_perm, but don't want to reimplement
generic logic.

The patch doesn't change existing behaviour: the only driver that
implements bdrv_check_perm is file-posix, but it never has any
children.

Also, bdrv_set_perm() don't stop processing if driver has
.bdrv_set_perm handler as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 9538af4884..165c2d3cb2 100644
--- a/block.c
+++ b/block.c
@@ -1964,8 +1964,7 @@ static void bdrv_child_perm(BlockDriverState *bs, 
BlockDriverState *child_bs,
  /*
   * Check whether permissions on this node can be changed in a way that
   * @cumulative_perms and @cumulative_shared_perms are the new cumulative
- * permissions of all its parents. This involves checking whether all necessary
- * permission changes to child nodes can be performed.
+ * permissions of all its parents.


Why do you want to remove this sentence?


Really strange :) I don't know. I remember that I've modified some comment 
working on this series, and it was important... But this sentence become even 
more obviously correct with this patch.




   *
   * Will set *tighten_restrictions to true if and only if new permissions have 
to
   * be taken or currently shared permissions are to be unshared.  Otherwise,
@@ -2047,8 +2046,11 @@ static int bdrv_check_perm(BlockDriverState *bs, 
BlockReopenQueue *q,
  }
  
  if (drv->bdrv_check_perm) {

-return drv->bdrv_check_perm(bs, cumulative_perms,
-cumulative_shared_perms, errp);
+ret = drv->bdrv_check_perm(bs, cumulative_perms,
+   cumulative_shared_perms, errp);
+if (ret < 0) {
+return ret;
+}
  }


Sounds good.  It’s also consistent with how bdrv_abort_perm_update() and
bdrv_set_perm() don’t return after calling the respective driver
functions, but always recurse to the children.

Max




--
Best regards,
Vladimir



Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions

2020-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2020 16:25, Max Reitz wrote:

On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:

17.09.2020 19:09, Andrey Shinkevich wrote:

On 04.09.2020 14:22, Max Reitz wrote:

On 28.08.20 18:52, Andrey Shinkevich wrote:

Provide API for the COR-filter insertion/removal.

...

Also, drop the filter child permissions for an inactive state when the
filter node is being removed.

Do we need .active for that?  Shouldn’t it be sufficient to just not
require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
node (i.e. perm == 0 in cor_child_perm())?

Of course, using an .active field works, too.  But Vladimir wanted a
similar field in the preallocate filter, and there already is in
backup-top.  I feel like there shouldn’t be a need for this.

.bdrv_child_perm() should generally be able to decide what permissions
it needs based on the information it gets.  If every driver needs to
keep track of whether it has any parents and feed that information into
.bdrv_child_perm() as external state, then something feels wrong.

If perm == 0 is not sufficient to rule out any parents, we should just
explicitly add a boolean that tells .bdrv_child_perm() whether there are
any parents or not – shouldn’t be too difficult.



The issue is that we fail in the bdrv_check_update_perm() with
""Conflicts with use by..." if the *nperm = 0 and the *nshared =
BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
The filter is still in the backing chain by that moment and has a
parent with child->perm != 0.

The implementation of  the .bdrv_child_perm()in the given patch is
similar to one in the block/mirror.c.

How could we resolve the issue at the generic layer?




The problem is that when we update permissions in bdrv_replace_node, we
consider new placement for "to" node, but old placement for "from" node.
So, during update, we may consider stricter permission requirements for
"from" than needed and they conflict with new "to" permissions.

We should consider all graph changes for permission update
simultaneously, in same transaction. For this, we need refactor
permission update system to work on queue of nodes instead of simple DFS
recursion. And in the queue all the nodes to update should  be
toplogically sorted. In this way, when we update node N, all its parents
are already updated. And of course, we should make no-perm graph update
before permission update, and rollback no-perm movement if permission
update failed.


OK, you’ve convinced me, .active is good enough for now. O:)


And we need topological sort anyway. Consider the following example:

A -
|  \
|  v
|  B
|  |
v  /
C<-

A is parent for B and C, B is parent for C.

Obviously, to update permissions, we should go in order A B C, so, when
we update C, all it's parents permission already updated. But with
current approach (simple recursion) we can update in sequence A C B C (C
is updated twice). On first update of C, we consider old B permissions,
so doing wrong thing. If it succeed, all is OK, on second C update we
will finish with correct graph. But if the wrong thing failed, we break
the whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).


True.


I'll work on a patch for it.


Sounds great, though I fear for the complexity.  I’ll just wait and see
for now.



If you are OK with .active for now, then I think, Andrey can resend with
.active and I'll dig into permissions in parallel. If Andrey's series
go first, I'll just drop .active later if my idea works.

--
Best regards,
Vladimir



Re: [PATCH v8 7/7] block/io: refactor save/load vmstate

2020-09-24 Thread Vladimir Sementsov-Ogievskiy

23.09.2020 23:10, Eric Blake wrote:

On 9/15/20 11:44 AM, Vladimir Sementsov-Ogievskiy wrote:

Like for read/write in a previous commit, drop extra indirection layer,
generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
  block/coroutines.h    | 10 +++
  include/block/block.h |  6 ++--
  block/io.c    | 67 ++-
  3 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h



  int coroutine_fn
-bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-   bool is_read)
+bdrv_co_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
  {
  BlockDriver *drv = bs->drv;
  int ret = -ENOTSUP;
+    if (!drv) {
+    return -ENOMEDIUM;
+    }
+
  bdrv_inc_in_flight(bs);
-    if (!drv) {
-    ret = -ENOMEDIUM;
-    } else if (drv->bdrv_load_vmstate) {
-    if (is_read) {
-    ret = drv->bdrv_load_vmstate(bs, qiov, pos);
-    } else {
-    ret = drv->bdrv_save_vmstate(bs, qiov, pos);
-    }
+    if (drv->bdrv_load_vmstate) {
+    ret = drv->bdrv_load_vmstate(bs, qiov, pos);


This one makes sense;


  } else if (bs->file) {
-    ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
+    ret = bdrv_co_readv_vmstate(bs->file->bs, qiov, pos);
  }
  bdrv_dec_in_flight(bs);
+
  return ret;
  }
-int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
-  int64_t pos, int size)
+int coroutine_fn
+bdrv_co_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
  {
-    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, size);
-    int ret;
+    BlockDriver *drv = bs->drv;
+    int ret = -ENOTSUP;
-    ret = bdrv_writev_vmstate(bs, , pos);
-    if (ret < 0) {
-    return ret;
+    if (!drv) {
+    return -ENOMEDIUM;
  }
-    return size;
-}
+    bdrv_inc_in_flight(bs);
-int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
-{
-    return bdrv_rw_vmstate(bs, qiov, pos, false);
+    if (drv->bdrv_load_vmstate) {
+    ret = drv->bdrv_save_vmstate(bs, qiov, pos);


but this one looks awkward. It represents the pre-patch logic, but it would be 
nicer to check for bdrv_save_vmstate.  With that tweak, my R-b still stands.


Agree.



I had an interesting time applying this patch due to merge conflicts with the 
new bdrv_primary_bs() changes that landed in the meantime.



Thanks a lot!

To clarify: did you finally staged the series to send a pull request? Or Stefan 
should do it? Should I make a v9?

--
Best regards,
Vladimir



Re: [PATCH v8 4/7] scripts: add block-coroutine-wrapper.py

2020-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2020 04:20, Eric Blake wrote:

On 9/23/20 7:00 PM, Eric Blake wrote:



Tested-by: Eric Blake 

There's enough grammar fixes, and the fact that John is working on python 
cleanups, to make me wonder if we need a v9, or if I should just stage it where 
it is with any other cleanups as followups.  But I'm liking the reduced 
maintenance burden once it is in, and don't want to drag it out to the point 
that it needs more rebasing as other things land first.



Here's what I've squashed in and temporarily pushed to my tree if you want to 
double-check my rebase work:
https://repo.or.cz/qemu/ericb.git/shortlog/refs/heads/master

diff --git a/docs/devel/block-coroutine-wrapper.rst 
b/docs/devel/block-coroutine-wrapper.rst
index f7050bbc8fa6..d09fff2cc539 100644
--- a/docs/devel/block-coroutine-wrapper.rst
+++ b/docs/devel/block-coroutine-wrapper.rst
@@ -2,43 +2,43 @@
  block-coroutine-wrapper
  ===

-A lot of functions in QEMJ block layer (see ``block/*``) can by called
-only in coroutine context. Such functions are normally marked by
+A lot of functions in QEMU block layer (see ``block/*``) can only be
+called in coroutine context. Such functions are normally marked by the
  coroutine_fn specifier. Still, sometimes we need to call them from
-non-coroutine context, for this we need to start a coroutine, run the
+non-coroutine context; for this we need to start a coroutine, run the
  needed function from it and wait for coroutine finish in
  BDRV_POLL_WHILE() loop. To run a coroutine we need a function with one
-void* argument. So for each coroutine_fn function, which needs
+void* argument. So for each coroutine_fn function which needs a
  non-coroutine interface, we should define a structure to pack the
  parameters, define a separate function to unpack the parameters and
  call the original function and finally define a new interface function
  with same list of arguments as original one, which will pack the
  parameters into a struct, create a coroutine, run it and wait in
-BDRV_POLL_WHILE() loop. It's boring to create such wrappers by hand, so
-we have a script to generate them.
+BDRV_POLL_WHILE() loop. It's boring to create such wrappers by hand,
+so we have a script to generate them.

  Usage
  =

-Assume we have defined ``coroutine_fn`` function
+Assume we have defined the ``coroutine_fn`` function
  ``bdrv_co_foo()`` and need a non-coroutine interface for it,
  called ``bdrv_foo()``. In this case the script can help. To
  trigger the generation:

-1. You need ``bdrv_foo`` declaration somewhere (for example in
-   ``block/coroutines.h`` with ``generated_co_wrapper`` mark,
+1. You need ``bdrv_foo`` declaration somewhere (for example, in
+   ``block/coroutines.h``) with the ``generated_co_wrapper`` mark,
     like this:

  .. code-block:: c

-    int generated_co_wrapper bdrv_foor();
+    int generated_co_wrapper bdrv_foo();

  2. You need to feed this declaration to block-coroutine-wrapper script.
-   For this, add .h (or .c) file with the declaration to
+   For this, add the .h (or .c) file with the declaration to the
     ``input: files(...)`` list of ``block_gen_c`` target declaration in
     ``block/meson.build``

-You are done. On build, coroutine wrappers will be generated in
+You are done. During the build, coroutine wrappers will be generated in
  ``/block/block-gen.c``.

  Links
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 04773ce076b3..cb0abe1e6988 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -31,3 +31,4 @@ Contents:
     reset
     s390-dasd-ipl
     clocks
+   block-coroutine-wrapper
diff --git a/scripts/block-coroutine-wrapper.py 
b/scripts/block-coroutine-wrapper.py
index d859c07a5f55..8c0a08d9b020 100755
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -2,7 +2,7 @@
  """Generate coroutine wrappers for block subsystem.

  The program parses one or several concatenated c files from stdin,
-searches for functions with 'generated_co_wrapper' specifier
+searches for functions with the 'generated_co_wrapper' specifier
  and generates corresponding wrappers on stdout.

  Usage: block-coroutine-wrapper.py generated-file.c FILE.[ch]...
@@ -39,7 +39,7 @@ def prettify(code: str) -> str:
  'BraceWrapping': {'AfterFunction': True},
  'BreakBeforeBraces': 'Custom',
  'SortIncludes': False,
-    'MaxEmptyLinesToKeep': 2
+    'MaxEmptyLinesToKeep': 2,
  })
  p = subprocess.run(['clang-format', f'-style={style}'], check=True,
     encoding='utf-8', input=code,
@@ -168,7 +168,7 @@ int {func.name}({ func.gen_list('{decl}') })


  def gen_wrappers_file(input_code: str) -> str:
-    res = gen_header()
+    res = ''
  for func in func_decl_iter(input_code):
  res += '\n\n\n'
  res += gen_wrapper(func)
@@ -181,6 +181,7 @@ if __name__ == '__main__':
  exit(f'Usage: {sys.argv[0]} OUT_FILE.c 

Re: [PATCH v8 4/7] scripts: add block-coroutine-wrapper.py

2020-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2020 03:18, Eric Blake wrote:

On 9/15/20 11:44 AM, Vladimir Sementsov-Ogievskiy wrote:

We have a very frequent pattern of creating coroutine from function
with several arguments:




+++ b/scripts/block-coroutine-wrapper.py
@@ -0,0 +1,187 @@
+#!/usr/bin/env python3
+"""Generate coroutine wrappers for block subsystem.


Looking at the generated file after patch 5 is applied,...



+
+def gen_header():
+    copyright = re.sub('^.*Copyright', 'Copyright', __doc__, flags=re.DOTALL)
+    copyright = re.sub('^(?=.)', ' * ', copyright.strip(), flags=re.MULTILINE)
+    copyright = re.sub('^$', ' *', copyright, flags=re.MULTILINE)
+    return f"""\


This generated comment...



+
+
+def gen_wrappers_file(input_code: str) -> str:
+    res = gen_header()


...is getting inserted into the generated file...


+    for func in func_decl_iter(input_code):
+    res += '\n\n\n'
+    res += gen_wrapper(func)
+
+    return prettify(res)  # prettify to wrap long lines
+
+
+if __name__ == '__main__':
+    if len(sys.argv) < 3:
+    exit(f'Usage: {sys.argv[0]} OUT_FILE.c IN_FILE.[ch]...')
+
+    with open(sys.argv[1], 'w') as f_out:
+    for fname in sys.argv[2:]:
+    with open(fname) as f_in:
+    f_out.write(gen_wrappers_file(f_in.read()))


multiple times.  You'll want to hoist the call to gen_header outside the loop 
over fname in sys.argv[2:].



Right, thanks for fixing. I missed it when rebasing on meson system (and move 
to calling gen_wrappers_file() several times). Hmm, gen_wrappers_file() is now 
a bit misleading name, it would better be just gen_wrappers()

--
Best regards,
Vladimir



Re: [PATCH v8 4/7] scripts: add block-coroutine-wrapper.py

2020-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2020 03:00, Eric Blake wrote:

On 9/15/20 3:02 PM, Vladimir Sementsov-Ogievskiy wrote:

15.09.2020 19:44, Vladimir Sementsov-Ogievskiy wrote:

We have a very frequent pattern of creating coroutine from function
with several arguments:

   - create structure to pack parameters
   - create _entry function to call original function taking parameters
 from struct
   - do different magic to handle completion: set ret to NOT_DONE or
 EINPROGRESS or use separate bool field
   - fill the struct and create coroutine from _entry function and this
 struct as a parameter
   - do coroutine enter and BDRV_POLL_WHILE loop

Let's reduce code duplication by generating coroutine wrappers.

This patch adds scripts/block-coroutine-wrapper.py together with some
friends, which will generate functions with declared prototypes marked
by 'generated_co_wrapper' specifier.





 4. add header with generated_co_wrapper declaration into
    COROUTINE_HEADERS list in Makefile


This phrase is out-of-date.  I also see 4 steps here,...



Still, no function is now marked, this work is for the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy
---
  docs/devel/block-coroutine-wrapper.rst |  54 +++
  block/block-gen.h  |  49 +++
  include/block/block.h  |  10 ++
  block/meson.build  |   8 ++
  scripts/block-coroutine-wrapper.py | 187 +
  5 files changed, 308 insertions(+)
  create mode 100644 docs/devel/block-coroutine-wrapper.rst
  create mode 100644 block/block-gen.h
  create mode 100755 scripts/block-coroutine-wrapper.py



Also needed:

diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 04773ce076..cb0abe1e69 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -31,3 +31,4 @@ Contents:
 reset
 s390-dasd-ipl
 clocks
+   block-coroutine-wrapper


I've squashed that in.


+++ b/docs/devel/block-coroutine-wrapper.rst
@@ -0,0 +1,54 @@
+===
+block-coroutine-wrapper
+===
+
+A lot of functions in QEMJ block layer (see ``block/*``) can by called


QEMU

s/by called only/only be called/


+only in coroutine context. Such functions are normally marked by


by the


+coroutine_fn specifier. Still, sometimes we need to call them from
+non-coroutine context, for this we need to start a coroutine, run the



s/context,/context;/


+needed function from it and wait for coroutine finish in


in a


+BDRV_POLL_WHILE() loop. To run a coroutine we need a function with one
+void* argument. So for each coroutine_fn function, which needs


needs a


+non-coroutine interface, we should define a structure to pack the
+parameters, define a separate function to unpack the parameters and
+call the original function and finally define a new interface function
+with same list of arguments as original one, which will pack the
+parameters into a struct, create a coroutine, run it and wait in
+BDRV_POLL_WHILE() loop. It's boring to create such wrappers by hand, so
+we have a script to generate them.

+Usage
+=
+
+Assume we have defined ``coroutine_fn`` function
+``bdrv_co_foo()`` and need a non-coroutine interface for it,
+called ``bdrv_foo()``. In this case the script can help. To
+trigger the generation:
+
+1. You need ``bdrv_foo`` declaration somewhere (for example in
+   ``block/coroutines.h`` with ``generated_co_wrapper`` mark,
+   like this:


Missing a closing ).


+
+.. code-block:: c
+
+    int generated_co_wrapper bdrv_foor();


s/foor/foo/


+
+2. You need to feed this declaration to block-coroutine-wrapper script.


to the block-


+   For this, add .h (or .c) file with the declaration to
+   ``input: files(...)`` list of ``block_gen_c`` target declaration in
+   ``block/meson.build``
+
+You are done. On build, coroutine wrappers will be generated in


s/On/During the/


+``/block/block-gen.c``.


...but 2 in the .rst.  Presumably, the .rst steps belong in the commit message 
as well.


+++ b/block/block-gen.h



+++ b/include/block/block.h
@@ -10,6 +10,16 @@
 #include "block/blockjob.h"
 #include "qemu/hbitmap.h"

+/*
+ * generated_co_wrapper
+ *
+ * Function specifier, which does nothing but marking functions to be


s/marking/mark/


+ * generated by scripts/block-coroutine-wrapper.py
+ *
+ * Read more in docs/devel/block-coroutine-wrapper.rst
+ */
+#define generated_co_wrapper
+
 /* block.c */
 typedef struct BlockDriver BlockDriver;
 typedef struct BdrvChild BdrvChild;
diff --git a/block/meson.build b/block/meson.build
index a3e56b7cd1..88ad73583a 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -107,6 +107,14 @@ module_block_h = custom_target('module_block.h',
    command: [module_block_py, '@OUTPUT0@', modsrc])
 block_ss.add(module_block_h)

+wrapper_py = find_program('../scripts/block-coroutine-wrapper.py')
+block_gen_c = custom_target('block-gen.c',
+    

Re: [PATCH v6 2/5] block/io: bdrv_common_block_status_above: support include_base

2020-09-23 Thread Vladimir Sementsov-Ogievskiy

23.09.2020 19:18, Alberto Garcia wrote:

On Wed 16 Sep 2020 02:20:05 PM CEST, Vladimir Sementsov-Ogievskiy wrote:

-for (p = backing_bs(bs); p != base; p = backing_bs(p)) {
+for (p = backing_bs(bs); include_base || p != base; p = backing_bs(p)) {
  ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
 file);
  if (ret < 0) {
@@ -2420,6 +2422,11 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
  break;
  }
  
+if (p == base) {

+assert(include_base);
+break;
+}
+


Another option is something like:

BlockDriverState *last_bs = include_base ? base : backing_bs(base);


hmm, in case when include_base is false, last bs is not backing_bs(base) but 
the parent of base.



and you get a simpler 'for' loop.

But why do we need include_base at all? Can't the caller just pass
backing_bs(base) instead? I'm talking also about the existing case of
bdrv_is_allocated_above().




include_base was introduced for the case when caller doesn't own 
backing_bs(base), and therefore shouldn't do operations that may yield 
(block_status can) dependent on backing_bs(base). In particular, in block 
stream, where link to base is not frozen.


--
Best regards,
Vladimir



Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions

2020-09-23 Thread Vladimir Sementsov-Ogievskiy

17.09.2020 19:09, Andrey Shinkevich wrote:

On 04.09.2020 14:22, Max Reitz wrote:

On 28.08.20 18:52, Andrey Shinkevich wrote:

Provide API for the COR-filter insertion/removal.

...

Also, drop the filter child permissions for an inactive state when the
filter node is being removed.

Do we need .active for that?  Shouldn’t it be sufficient to just not
require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
node (i.e. perm == 0 in cor_child_perm())?

Of course, using an .active field works, too.  But Vladimir wanted a
similar field in the preallocate filter, and there already is in
backup-top.  I feel like there shouldn’t be a need for this.

.bdrv_child_perm() should generally be able to decide what permissions
it needs based on the information it gets.  If every driver needs to
keep track of whether it has any parents and feed that information into
.bdrv_child_perm() as external state, then something feels wrong.

If perm == 0 is not sufficient to rule out any parents, we should just
explicitly add a boolean that tells .bdrv_child_perm() whether there are
any parents or not – shouldn’t be too difficult.



The issue is that we fail in the bdrv_check_update_perm() with ""Conflicts with use 
by..." if the *nperm = 0 and the *nshared = BLK_PERM_ALL are not set before the call to the 
bdrv_replace_node(). The filter is still in the backing chain by that moment and has a parent 
with child->perm != 0.

The implementation of  the .bdrv_child_perm()in the given patch is similar to 
one in the block/mirror.c.

How could we resolve the issue at the generic layer?




The problem is that when we update permissions in bdrv_replace_node, we consider new placement for "to" node, 
but old placement for "from" node. So, during update, we may consider stricter permission requirements for 
"from" than needed and they conflict with new "to" permissions.

We should consider all graph changes for permission update simultaneously, in 
same transaction. For this, we need refactor permission update system to work 
on queue of nodes instead of simple DFS recursion. And in the queue all the 
nodes to update should  be toplogically sorted. In this way, when we update 
node N, all its parents are already updated. And of course, we should make 
no-perm graph update before permission update, and rollback no-perm movement if 
permission update failed.

And we need topological sort anyway. Consider the following example:

A -
|  \
|  v
|  B
|  |
v  /
C<-

A is parent for B and C, B is parent for C.

Obviously, to update permissions, we should go in order A B C, so, when we 
update C, all it's parents permission already updated. But with current 
approach (simple recursion) we can update in sequence A C B C (C is updated 
twice). On first update of C, we consider old B permissions, so doing wrong 
thing. If it succeed, all is OK, on second C update we will finish with correct 
graph. But if the wrong thing failed, we break the whole process for no reason 
(it's possible that updated B permission will be less strict, but we will never 
check it).

I'll work on a patch for it.

--
Best regards,
Vladimir



Re: [PATCH v2 0/2] block: deprecate the sheepdog driver

2020-09-22 Thread Vladimir Sementsov-Ogievskiy

22.09.2020 21:11, Neal Gompa wrote:

On Tue, Sep 22, 2020 at 1:43 PM Daniel P. Berrangé  wrote:


On Tue, Sep 22, 2020 at 01:09:06PM -0400, Neal Gompa wrote:

On Tue, Sep 22, 2020 at 12:16 PM Daniel P. Berrangé  wrote:


2 years back I proposed dropping the sheepdog mailing list from the
MAINTAINERS file, but somehow the patch never got picked up:

   https://lists.gnu.org/archive/html/qemu-block/2018-03/msg01048.html

So here I am with the same patch again.

This time I go further and deprecate the sheepdog driver entirely.
See the rationale in the second patch commit message.

Daniel P. Berrangé (2):
   block: drop moderated sheepdog mailing list from MAINTAINERS file
   block: deprecate the sheepdog block driver

  MAINTAINERS|  1 -
  block/sheepdog.c   | 15 +++
  configure  |  5 +++--
  docs/system/deprecated.rst |  9 +
  4 files changed, 27 insertions(+), 3 deletions(-)

--
2.26.2




I don't know of anyone shipping this other than Fedora, and I
certainly don't use it there.

Upstream looks like it's unmaintained now, with no commits in over two
years: https://github.com/sheepdog/sheepdog/commits/master

Can we also get a corresponding change to eliminate this from libvirt?


This is only deprecation in QEMU, the feature still exists and is
intended to be as fully functional as in previous releases.

Assuming QEMU actually deletes the feature at end of the deprecation
cycle, then libvirt would look at removing its own support.



Does that stop us from deprecating it in libvirt though?



I think not. Libvirt is not intended to support all Qemu features and
I'm sure it doesn't. Amd it can safely deprecate features even if they
are not-deprecated in Qemu.

The only possible conflict is when Qemu wants to deprecate something
that Libvirt wants to continue support for (or even can't live without).

--
Best regards,
Vladimir



Re: [PATCH v2 2/2] block: deprecate the sheepdog block driver

2020-09-22 Thread Vladimir Sementsov-Ogievskiy

22.09.2020 19:16, Daniel P. Berrangé wrote:

This thread from a little over a year ago:

   http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html

states that sheepdog is no longer actively developed. The only mentioned
users are some companies who are said to have it for legacy reasons with
plans to replace it by Ceph. There is talk about cutting out existing
features to turn it into a simple demo of how to write a distributed
block service. There is no evidence of anyone working on that idea:

   https://github.com/sheepdog/sheepdog/commits/master

No real commits to git since Jan 2018, and before then just some minor
technical debt cleanup..

There is essentially no activity on the mailing list aside from
patches to QEMU that get CC'd due to our MAINTAINERS entry.

Fedora packages for sheepdog failed to build from upstream source
because of the more strict linker that no longer merges duplicate
global symbols. Fedora patches it to add the missing "extern"
annotations and presumably other distros do to, but upstream source
remains broken.

There is only basic compile testing, no functional testing of the
driver.

Since there are no build pre-requisites the sheepdog driver is currently
enabled unconditionally. This would result in configure issuing a
deprecation warning by default for all users. Thus the configure default
is changed to disable it, requiring users to pass --enable-sheepdog to
build the driver.

Signed-off-by: Daniel P. Berrangé


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v2 1/2] block: drop moderated sheepdog mailing list from MAINTAINERS file

2020-09-22 Thread Vladimir Sementsov-Ogievskiy

22.09.2020 19:16, Daniel P. Berrangé wrote:

The sheepdog mailing list is setup to stop and queue messages from
non-subscribers, pending moderator approval. Unfortunately it seems
that the moderation queue is not actively dealt with. Even when messages
are approved, the sender is never added to the whitelist, so every
future mail from the same sender continues to get stopped for moderation.

MAINTAINERS entries should be responsive and not unneccessarily block
mails from QEMU contributors, so drop the sheepdog mailing list.

Reviewed-by: Philippe Mathieu-Daudé
Signed-off-by: Daniel P. Berrangé


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v2 17/20] backup: move to block-copy

2020-09-21 Thread Vladimir Sementsov-Ogievskiy

23.07.2020 12:47, Max Reitz wrote:

On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:

This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block-copy.h |   9 +--
  block/backup.c | 145 +++--
  block/block-copy.c |  21 ++
  3 files changed, 83 insertions(+), 92 deletions(-)


This patch feels like it should be multiple ones.  I don’t see why a
patch that lets backup use block-copy (more) should have to modify the
block-copy code.

More specifically: I think that block_copy_set_progress_callback()
should be removed in a separate patch.  Also, moving
@cb_opaque/@progress_opaque from BlockCopyState into BlockCopyCallState
seems like a separate patch to me, too.

(I presume if the cb had had its own opaque object from patch 5 on,
there wouldn’t be this problem now.  We’d just stop using the progress
callback in backup, and remove it in one separate patch.)

[...]


diff --git a/block/backup.c b/block/backup.c
index ec2676abc2..59c00f5293 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -44,42 +44,19 @@ typedef struct BackupBlockJob {
  BlockdevOnError on_source_error;
  BlockdevOnError on_target_error;
  uint64_t len;
-uint64_t bytes_read;
  int64_t cluster_size;
  int max_workers;
  int64_t max_chunk;
  
  BlockCopyState *bcs;

+
+BlockCopyCallState *bcs_call;


Can this be more descriptive?  E.g. background_bcs?  bg_bcs_call?  bg_bcsc?


+int ret;
+bool error_is_read;
  } BackupBlockJob;
  
  static const BlockJobDriver backup_job_driver;
  


[...]


  static int coroutine_fn backup_loop(BackupBlockJob *job)
  {
-bool error_is_read;
-int64_t offset;
-BdrvDirtyBitmapIter *bdbi;
-int ret = 0;
+while (true) { /* retry loop */
+assert(!job->bcs_call);
+job->bcs_call = block_copy_async(job->bcs, 0,
+ QEMU_ALIGN_UP(job->len,
+   job->cluster_size),
+ true, job->max_workers, 
job->max_chunk,
+ backup_block_copy_callback, job);
  
-bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));

-while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
-do {
-if (yield_and_check(job)) {
-goto out;
+while (job->bcs_call && !job->common.job.cancelled) {
+/* wait and handle pauses */


Doesn’t someone need to reset BlockCopyCallState.cancelled when the job
is unpaused?  I can’t see anyone doing that.

Well, or even just reentering the block-copy operation after
backup_pause() has cancelled it.  Is there some magic I’m missing?

Does pausing (which leads to cancelling the block-copy operation) just
yield to the CB being invoked, so the copy operation is considered done,
and then we just enter the next iteration of the loop and try again?


Yes, that's my idea: we cancel on pause and just run new block_copy operation
on resume.


But cancelling the block-copy operation leads to it returning 0, AFAIR,
so...


Looks like it should be fixed. By returning ECANCELED or by checking the bitmap.




+
+job_pause_point(>common.job);
+
+if (job->bcs_call && !job->common.job.cancelled) {
+job_yield(>common.job);
  }
-ret = backup_do_cow(job, offset, job->cluster_size, 
_is_read);
-if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
-   BLOCK_ERROR_ACTION_REPORT)
-{
-goto out;
+}
+
+if (!job->bcs_call && job->ret == 0) {
+/* Success */
+return 0;


...I would assume we return here when the job is paused.


+}
+
+if (job->common.job.cancelled) {
+if (job->bcs_call) {
+block_copy_cancel(job->bcs_call);
  }
-} while (ret < 0);
+return 0;
+}
+
+if (!job->bcs_call && job->ret < 0 &&


Is it even possible for bcs_call to be non-NULL here?


+(backup_error_action(job, job->error_is_read, -job->ret) ==
+ BLOCK_ERROR_ACTION_REPORT))
+{
+return job->ret;
+}


So if we get an error, and the error action isn’t REPORT, we just try
the whole operation again?  That doesn’t sound very IGNORE-y to me.


Not the whole. We have the dirty bitmap: clusters that was already copied are 
not touched more.



--
Best regards,
Vladimir



Re: [PATCH v2 04/20] block/block-copy: More explicit call_state

2020-09-18 Thread Vladimir Sementsov-Ogievskiy

17.07.2020 16:45, Max Reitz wrote:

On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:

Refactor common path to use BlockCopyCallState pointer as parameter, to
prepare it for use in asynchronous block-copy (at least, we'll need to
run block-copy in a coroutine, passing the whole parameters as one
pointer).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/block-copy.c | 51 ++
  1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 43a018d190..75882a094c 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c


[...]


@@ -646,16 +653,16 @@ out:
   * it means that some I/O operation failed in context of _this_ block_copy 
call,
   * not some parallel operation.
   */
-int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
-bool *error_is_read)
+static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
  {
  int ret;
  
  do {

-ret = block_copy_dirty_clusters(s, offset, bytes, error_is_read);
+ret = block_copy_dirty_clusters(call_state);


It’s possible that much of this code will change in a future patch of
this series, but as it is, it might be nice to make
block_copy_dirty_clusters’s argument a const pointer so it’s clear that
the call to block_copy_wait_one() below will use the original @offset
and @bytes values.


Hm. I'm trying this, and it doesn't work:

block_copy_task_entry() wants to change call_state:

   t->call_state->failed = true;



*shrug*

Reviewed-by: Max Reitz 

  
  if (ret == 0) {

-ret = block_copy_wait_one(s, offset, bytes);
+ret = block_copy_wait_one(call_state->s, call_state->offset,
+  call_state->bytes);
  }
  
  /*





--
Best regards,
Vladimir



[PATCH v6 15/15] scripts/simplebench: add bench_prealloc.py

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
Benchmark for new preallocate filter.

Example usage:
./bench_prealloc.py ../../build/qemu-img \
ssd-ext4:/path/to/mount/point \
ssd-xfs:/path2 hdd-ext4:/path3 hdd-xfs:/path4

The benchmark shows performance improvement (or degradation) when use
new preallocate filter with qcow2 image.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/simplebench/bench_prealloc.py | 128 ++
 1 file changed, 128 insertions(+)
 create mode 100755 scripts/simplebench/bench_prealloc.py

diff --git a/scripts/simplebench/bench_prealloc.py 
b/scripts/simplebench/bench_prealloc.py
new file mode 100755
index 00..fda4b3410e
--- /dev/null
+++ b/scripts/simplebench/bench_prealloc.py
@@ -0,0 +1,128 @@
+#!/usr/bin/env python3
+#
+# Benchmark preallocate filter
+#
+# Copyright (c) 2020 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+import sys
+import os
+import subprocess
+import re
+
+import simplebench
+
+
+def qemu_img_bench(args):
+p = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
+   universal_newlines=True)
+
+if p.returncode == 0:
+try:
+m = re.search(r'Run completed in (\d+.\d+) seconds.', p.stdout)
+return {'seconds': float(m.group(1))}
+except Exception:
+return {'error': f'failed to parse qemu-img output: {p.stdout}'}
+else:
+return {'error': f'qemu-img failed: {p.returncode}: {p.stdout}'}
+
+
+def bench_func(env, case):
+fname = f"{case['dir']}/prealloc-test.qcow2"
+try:
+os.remove(fname)
+except OSError:
+pass
+
+subprocess.run([env['qemu-img-binary'], 'create', '-f', 'qcow2', fname,
+   '16G'], stdout=subprocess.DEVNULL,
+   stderr=subprocess.DEVNULL, check=True)
+
+args = [env['qemu-img-binary'], 'bench', '-c', str(case['count']),
+'-d', '64', '-s', case['block-size'], '-t', 'none', '-n', '-w']
+if env['prealloc']:
+args += ['--image-opts',
+ 'driver=qcow2,file.driver=preallocate,file.file.driver=file,'
+ f'file.file.filename={fname}']
+else:
+args += ['-f', 'qcow2', fname]
+
+return qemu_img_bench(args)
+
+
+def auto_count_bench_func(env, case):
+case['count'] = 100
+while True:
+res = bench_func(env, case)
+if 'error' in res:
+return res
+
+if res['seconds'] >= 1:
+break
+
+case['count'] *= 10
+
+if res['seconds'] < 5:
+case['count'] = round(case['count'] * 5 / res['seconds'])
+res = bench_func(env, case)
+if 'error' in res:
+return res
+
+res['iops'] = case['count'] / res['seconds']
+return res
+
+
+if __name__ == '__main__':
+if len(sys.argv) < 2:
+print(f'USAGE: {sys.argv[0]}  '
+  'DISK_NAME:DIR_PATH ...')
+exit(1)
+
+qemu_img = sys.argv[1]
+
+envs = [
+{
+'id': 'no-prealloc',
+'qemu-img-binary': qemu_img,
+'prealloc': False
+},
+{
+'id': 'prealloc',
+'qemu-img-binary': qemu_img,
+'prealloc': True
+}
+]
+
+aligned_cases = []
+unaligned_cases = []
+
+for disk in sys.argv[2:]:
+name, path = disk.split(':')
+aligned_cases.append({
+'id': f'{name}, aligned sequential 16k',
+'block-size': '16k',
+'dir': path
+})
+unaligned_cases.append({
+'id': f'{name}, unaligned sequential 64k',
+'block-size': '16k',
+'dir': path
+})
+
+result = simplebench.bench(auto_count_bench_func, envs,
+   aligned_cases + unaligned_cases, count=5)
+print(simplebench.ascii(result))
-- 
2.21.3




Re: [PATCH 0/4] nbd reconnect new fixes

2020-09-18 Thread Vladimir Sementsov-Ogievskiy

ping

03.09.2020 22:02, Vladimir Sementsov-Ogievskiy wrote:

Hi! Let's continue fixing nbd reconnect feature.

These series is based on "[PULL 0/6] NBD patches for 2020-09-02"
Based-on: <20200902215400.2673028-1-ebl...@redhat.com>

Vladimir Sementsov-Ogievskiy (4):
   block/nbd: fix drain dead-lock because of nbd reconnect-delay
   block/nbd: correctly use qio_channel_detach_aio_context when needed
   block/nbd: fix reconnect-delay
   block/nbd: nbd_co_reconnect_loop(): don't connect if drained

  block/nbd.c | 71 -
  1 file changed, 60 insertions(+), 11 deletions(-)




--
Best regards,
Vladimir



[PATCH v6 13/15] scripts/simplebench: improve view of ascii table

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
Introduce dynamic float precision and use percentage to show delta.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/simplebench/simplebench.py | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/scripts/simplebench/simplebench.py 
b/scripts/simplebench/simplebench.py
index 716d7fe9b2..56d3a91ea2 100644
--- a/scripts/simplebench/simplebench.py
+++ b/scripts/simplebench/simplebench.py
@@ -79,10 +79,34 @@ def bench_one(test_func, test_env, test_case, count=5, 
initial_run=True):
 return result
 
 
+def format_float(x):
+res = round(x)
+if res >= 100:
+return str(res)
+
+res = f'{x:.1f}'
+if len(res) >= 4:
+return res
+
+return f'{x:.2f}'
+
+
+def format_percent(x):
+x *= 100
+
+res = round(x)
+if res >= 10:
+return str(res)
+
+return f'{x:.1f}' if res >= 1 else f'{x:.2f}'
+
+
 def ascii_one(result):
 """Return ASCII representation of bench_one() returned dict."""
 if 'average' in result:
-s = '{:.2f} +- {:.2f}'.format(result['average'], result['delta'])
+avg = result['average']
+delta_pr = result['delta'] / avg
+s = f'{format_float(avg)}±{format_percent(delta_pr)}%'
 if 'n-failed' in result:
 s += '\n({} failed)'.format(result['n-failed'])
 return s
-- 
2.21.3




[PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
Performance improvements / degradations are usually discussed in
percentage. Let's make the script calculate it for us.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/simplebench/simplebench.py | 46 +++---
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/scripts/simplebench/simplebench.py 
b/scripts/simplebench/simplebench.py
index 56d3a91ea2..0ff05a38b8 100644
--- a/scripts/simplebench/simplebench.py
+++ b/scripts/simplebench/simplebench.py
@@ -153,14 +153,22 @@ def bench(test_func, test_envs, test_cases, *args, 
**vargs):
 
 def ascii(results):
 """Return ASCII representation of bench() returned dict."""
-from tabulate import tabulate
+import tabulate
+
+# We want leading whitespace for difference row cells (see below)
+tabulate.PRESERVE_WHITESPACE = True
 
 dim = None
-tab = [[""] + [c['id'] for c in results['envs']]]
+tab = [
+# Environment columns are named A, B, ...
+[""] + [chr(ord('A') + i) for i in range(len(results['envs']))],
+[""] + [c['id'] for c in results['envs']]
+]
 for case in results['cases']:
 row = [case['id']]
+case_results = results['tab'][case['id']]
 for env in results['envs']:
-res = results['tab'][case['id']][env['id']]
+res = case_results[env['id']]
 if dim is None:
 dim = res['dimension']
 else:
@@ -168,4 +176,34 @@ def ascii(results):
 row.append(ascii_one(res))
 tab.append(row)
 
-return f'All results are in {dim}\n\n' + tabulate(tab)
+# Add row of difference between column. For each column starting from
+# B we calculate difference with all previous columns.
+row = ['', '']  # case name and first column
+for i in range(1, len(results['envs'])):
+cell = ''
+env = results['envs'][i]
+res = case_results[env['id']]
+
+if 'average' not in res:
+# Failed result
+row.append(cell)
+continue
+
+for j in range(0, i):
+env_j = results['envs'][j]
+res_j = case_results[env_j['id']]
+
+if 'average' not in res_j:
+# Failed result
+cell += ' --'
+continue
+
+col_j = chr(ord('A') + j)
+avg_j = res_j['average']
+delta = (res['average'] - avg_j) / avg_j * 100
+delta_delta = (res['delta'] + res_j['delta']) / avg_j * 100
+cell += f' {col_j}{round(delta):+}±{round(delta_delta)}%'
+row.append(cell)
+tab.append(row)
+
+return f'All results are in {dim}\n\n' + tabulate.tabulate(tab)
-- 
2.21.3




[PATCH v6 09/15] qemu-io: add preallocate mode parameter for truncate command

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
This will be used in further test.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qemu-io-cmds.c | 46 --
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index baeae86d8c..64f0246a71 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1698,13 +1698,42 @@ static const cmdinfo_t flush_cmd = {
 .oneline= "flush all in-core file state to disk",
 };
 
+static int truncate_f(BlockBackend *blk, int argc, char **argv);
+static const cmdinfo_t truncate_cmd = {
+.name   = "truncate",
+.altname= "t",
+.cfunc  = truncate_f,
+.perm   = BLK_PERM_WRITE | BLK_PERM_RESIZE,
+.argmin = 1,
+.argmax = 3,
+.args   = "[-m prealloc_mode] off",
+.oneline= "truncates the current file at the given offset",
+};
+
 static int truncate_f(BlockBackend *blk, int argc, char **argv)
 {
 Error *local_err = NULL;
 int64_t offset;
-int ret;
+int c, ret;
+PreallocMode prealloc = PREALLOC_MODE_OFF;
 
-offset = cvtnum(argv[1]);
+while ((c = getopt(argc, argv, "m:")) != -1) {
+switch (c) {
+case 'm':
+prealloc = qapi_enum_parse(_lookup, optarg,
+   PREALLOC_MODE__MAX, NULL);
+if (prealloc == PREALLOC_MODE__MAX) {
+error_report("Invalid preallocation mode '%s'", optarg);
+return -EINVAL;
+}
+break;
+default:
+qemuio_command_usage(_cmd);
+return -EINVAL;
+}
+}
+
+offset = cvtnum(argv[optind]);
 if (offset < 0) {
 print_cvtnum_err(offset, argv[1]);
 return offset;
@@ -1715,7 +1744,7 @@ static int truncate_f(BlockBackend *blk, int argc, char 
**argv)
  * exact=true.  It is better to err on the "emit more errors" side
  * than to be overly permissive.
  */
-ret = blk_truncate(blk, offset, false, PREALLOC_MODE_OFF, 0, _err);
+ret = blk_truncate(blk, offset, false, prealloc, 0, _err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
@@ -1724,17 +1753,6 @@ static int truncate_f(BlockBackend *blk, int argc, char 
**argv)
 return 0;
 }
 
-static const cmdinfo_t truncate_cmd = {
-.name   = "truncate",
-.altname= "t",
-.cfunc  = truncate_f,
-.perm   = BLK_PERM_WRITE | BLK_PERM_RESIZE,
-.argmin = 1,
-.argmax = 1,
-.args   = "off",
-.oneline= "truncates the current file at the given offset",
-};
-
 static int length_f(BlockBackend *blk, int argc, char **argv)
 {
 int64_t size;
-- 
2.21.3




[PATCH v6 12/15] scripts/simplebench: support iops

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
Support benchmarks returning not seconds but iops. We'll use it for
further new test.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/simplebench/simplebench.py | 35 +++---
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/scripts/simplebench/simplebench.py 
b/scripts/simplebench/simplebench.py
index 59e7314ff6..716d7fe9b2 100644
--- a/scripts/simplebench/simplebench.py
+++ b/scripts/simplebench/simplebench.py
@@ -24,9 +24,12 @@ def bench_one(test_func, test_env, test_case, count=5, 
initial_run=True):
 
 test_func   -- benchmarking function with prototype
test_func(env, case), which takes test_env and test_case
-   arguments and returns {'seconds': int} (which is benchmark
-   result) on success and {'error': str} on error. Returned
-   dict may contain any other additional fields.
+   arguments and on success returns dict with 'seconds' or
+   'iops' (or both) fields, specifying the benchmark result.
+   If both 'iops' and 'seconds' provided, the 'iops' is
+   considered the main, and 'seconds' is just an additional
+   info. On failure test_func should return {'error': str}.
+   Returned dict may contain any other additional fields.
 test_env-- test environment - opaque first argument for test_func
 test_case   -- test case - opaque second argument for test_func
 count   -- how many times to call test_func, to calculate average
@@ -34,6 +37,7 @@ def bench_one(test_func, test_env, test_case, count=5, 
initial_run=True):
 
 Returns dict with the following fields:
 'runs': list of test_func results
+'dimension': dimension of results, may be 'seconds' or 'iops'
 'average':  average seconds per run (exists only if at least one run
 succeeded)
 'delta':maximum delta between test_func result and the average
@@ -54,11 +58,20 @@ def bench_one(test_func, test_env, test_case, count=5, 
initial_run=True):
 
 result = {'runs': runs}
 
-successed = [r for r in runs if ('seconds' in r)]
+successed = [r for r in runs if ('seconds' in r or 'iops' in r)]
 if successed:
-avg = sum(r['seconds'] for r in successed) / len(successed)
+dim = 'iops' if ('iops' in successed[0]) else 'seconds'
+if 'iops' in successed[0]:
+assert all('iops' in r for r in successed)
+dim = 'iops'
+else:
+assert all('seconds' in r for r in successed)
+assert all('iops' not in r for r in successed)
+dim = 'seconds'
+avg = sum(r[dim] for r in successed) / len(successed)
+result['dimension'] = dim
 result['average'] = avg
-result['delta'] = max(abs(r['seconds'] - avg) for r in successed)
+result['delta'] = max(abs(r[dim] - avg) for r in successed)
 
 if len(successed) < count:
 result['n-failed'] = count - len(successed)
@@ -118,11 +131,17 @@ def ascii(results):
 """Return ASCII representation of bench() returned dict."""
 from tabulate import tabulate
 
+dim = None
 tab = [[""] + [c['id'] for c in results['envs']]]
 for case in results['cases']:
 row = [case['id']]
 for env in results['envs']:
-row.append(ascii_one(results['tab'][case['id']][env['id']]))
+res = results['tab'][case['id']][env['id']]
+if dim is None:
+dim = res['dimension']
+else:
+assert dim == res['dimension']
+row.append(ascii_one(res))
 tab.append(row)
 
-return tabulate(tab)
+return f'All results are in {dim}\n\n' + tabulate(tab)
-- 
2.21.3




[PATCH v6 07/15] block: bdrv_check_perm(): process children anyway

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
Do generic processing even for drivers which define .bdrv_check_perm
handler. It's needed for further preallocate filter: it will need to do
additional action on bdrv_check_perm, but don't want to reimplement
generic logic.

The patch doesn't change existing behaviour: the only driver that
implements bdrv_check_perm is file-posix, but it never has any
children.

Also, bdrv_set_perm() don't stop processing if driver has
.bdrv_set_perm handler as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 9538af4884..165c2d3cb2 100644
--- a/block.c
+++ b/block.c
@@ -1964,8 +1964,7 @@ static void bdrv_child_perm(BlockDriverState *bs, 
BlockDriverState *child_bs,
 /*
  * Check whether permissions on this node can be changed in a way that
  * @cumulative_perms and @cumulative_shared_perms are the new cumulative
- * permissions of all its parents. This involves checking whether all necessary
- * permission changes to child nodes can be performed.
+ * permissions of all its parents.
  *
  * Will set *tighten_restrictions to true if and only if new permissions have 
to
  * be taken or currently shared permissions are to be unshared.  Otherwise,
@@ -2047,8 +2046,11 @@ static int bdrv_check_perm(BlockDriverState *bs, 
BlockReopenQueue *q,
 }
 
 if (drv->bdrv_check_perm) {
-return drv->bdrv_check_perm(bs, cumulative_perms,
-cumulative_shared_perms, errp);
+ret = drv->bdrv_check_perm(bs, cumulative_perms,
+   cumulative_shared_perms, errp);
+if (ret < 0) {
+return ret;
+}
 }
 
 /* Drivers that never have children can omit .bdrv_child_perm() */
-- 
2.21.3




[PATCH v6 08/15] block: introduce preallocate filter

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
It's intended to be inserted between format and protocol nodes to
preallocate additional space (expanding protocol file) on writes
crossing EOF. It improves performance for file-systems with slow
allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 docs/system/qemu-block-drivers.rst.inc |  26 ++
 qapi/block-core.json   |  20 +-
 block/preallocate.c| 556 +
 block/meson.build  |   1 +
 4 files changed, 602 insertions(+), 1 deletion(-)
 create mode 100644 block/preallocate.c

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index b052a6d14e..60a064b232 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -952,3 +952,29 @@ on host and see if there are locks held by the QEMU 
process on the image file.
 More than one byte could be locked by the QEMU instance, each byte of which
 reflects a particular permission that is acquired or protected by the running
 block driver.
+
+Filter drivers
+~~
+
+QEMU supports several filter drivers, which don't store any data, but perform
+some additional tasks, hooking io requests.
+
+.. program:: filter-drivers
+.. option:: preallocate
+
+  The preallocate filter driver is intended to be inserted between format
+  and protocol nodes and preallocates some additional space
+  (expanding the protocol file) when writing past the file’s end. This can be
+  useful for file-systems with slow allocation.
+
+  Supported options:
+
+  .. program:: preallocate
+  .. option:: prealloc-align
+
+On preallocation, align the file length to this value (in bytes), default 
1M.
+
+  .. program:: preallocate
+  .. option:: prealloc-size
+
+How much to preallocate (in bytes), default 128M.
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2d94873ca0..c8030e19b4 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2819,7 +2819,7 @@
 'cloop', 'compress', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps',
 'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi',
 'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
-'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
+'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
 { 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
 'sheepdog',
 'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
@@ -3088,6 +3088,23 @@
   'data': { 'aes': 'QCryptoBlockOptionsQCow',
 'luks': 'QCryptoBlockOptionsLUKS'} }
 
+##
+# @BlockdevOptionsPreallocate:
+#
+# Filter driver intended to be inserted between format and protocol node
+# and do preallocation in protocol node on write.
+#
+# @prealloc-align: on preallocation, align file length to this number,
+#  default 1048576 (1M)
+#
+# @prealloc-size: how much to preallocate, default 134217728 (128M)
+#
+# Since: 5.2
+##
+{ 'struct': 'BlockdevOptionsPreallocate',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { '*prealloc-align': 'int', '*prealloc-size': 'int' } }
+
 ##
 # @BlockdevOptionsQcow2:
 #
@@ -3993,6 +4010,7 @@
   'null-co':'BlockdevOptionsNull',
   'nvme':   'BlockdevOptionsNVMe',
   'parallels':  'BlockdevOptionsGenericFormat',
+  'preallocate':'BlockdevOptionsPreallocate',
   'qcow2':  'BlockdevOptionsQcow2',
   'qcow':   'BlockdevOptionsQcow',
   'qed':'BlockdevOptionsGenericCOWFormat',
diff --git a/block/preallocate.c b/block/preallocate.c
new file mode 100644
index 00..6510ad0149
--- /dev/null
+++ b/block/preallocate.c
@@ -0,0 +1,556 @@
+/*
+ * preallocate filter driver
+ *
+ * The driver performs preallocate operation: it is injected above
+ * some node, and before each write over EOF it does additional preallocating
+ * write-zeroes request.
+ *
+ * Copyright (c) 2020 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
+#include "qemu/units.h"
+#include "block/block_int.h"
+
+
+typedef struct Preallo

[PATCH v6 10/15] iotests: qemu_io_silent: support --image-opts

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 91e4a57126..3d48108f3a 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -197,7 +197,12 @@ def qemu_io_log(*args):
 
 def qemu_io_silent(*args):
 '''Run qemu-io and return the exit code, suppressing stdout'''
-args = qemu_io_args + list(args)
+if '-f' in args or '--image-opts' in args:
+default_args = qemu_io_args_no_fmt
+else:
+default_args = qemu_io_args
+
+args = default_args + list(args)
 exitcode = subprocess.call(args, stdout=open('/dev/null', 'w'))
 if exitcode < 0:
 sys.stderr.write('qemu-io received signal %i: %s\n' %
-- 
2.21.3




[PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/298 | 186 +
 tests/qemu-iotests/298.out |   5 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 192 insertions(+)
 create mode 100644 tests/qemu-iotests/298
 create mode 100644 tests/qemu-iotests/298.out

diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298
new file mode 100644
index 00..fef10f6a7a
--- /dev/null
+++ b/tests/qemu-iotests/298
@@ -0,0 +1,186 @@
+#!/usr/bin/env python3
+#
+# Test for preallocate filter
+#
+# Copyright (c) 2020 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+
+MiB = 1024 * 1024
+disk = os.path.join(iotests.test_dir, 'disk')
+overlay = os.path.join(iotests.test_dir, 'overlay')
+refdisk = os.path.join(iotests.test_dir, 'refdisk')
+drive_opts = f'node-name=disk,driver={iotests.imgfmt},' \
+f'file.node-name=filter,file.driver=preallocate,' \
+f'file.file.node-name=file,file.file.filename={disk}'
+
+
+class TestPreallocateBase(iotests.QMPTestCase):
+def setUp(self):
+iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))
+
+def tearDown(self):
+try:
+self.check_small()
+check = iotests.qemu_img_check(disk)
+self.assertFalse('leaks' in check)
+self.assertFalse('corruptions' in check)
+self.assertEqual(check['check-errors'], 0)
+finally:
+os.remove(disk)
+
+def check_big(self):
+self.assertTrue(os.path.getsize(disk) > 100 * MiB)
+
+def check_small(self):
+self.assertTrue(os.path.getsize(disk) < 10 * MiB)
+
+
+class TestQemuImg(TestPreallocateBase):
+def test_qemu_img(self):
+p = iotests.QemuIoInteractive('--image-opts', drive_opts)
+
+p.cmd('write 0 1M')
+p.cmd('flush')
+
+self.check_big()
+
+p.close()
+
+
+class TestPreallocateFilter(TestPreallocateBase):
+def setUp(self):
+super().setUp()
+self.vm = iotests.VM().add_drive(path=None, opts=drive_opts)
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+super().tearDown()
+
+def test_prealloc(self):
+self.vm.hmp_qemu_io('drive0', 'write 0 1M')
+self.check_big()
+
+def test_external_snapshot(self):
+self.test_prealloc()
+
+result = self.vm.qmp('blockdev-snapshot-sync', node_name='disk',
+ snapshot_file=overlay,
+ snapshot_node_name='overlay')
+self.assert_qmp(result, 'return', {})
+
+# on reopen to  r-o base preallocation should be dropped
+self.check_small()
+
+self.vm.hmp_qemu_io('drive0', 'write 1M 1M')
+
+result = self.vm.qmp('block-commit', device='overlay')
+self.assert_qmp(result, 'return', {})
+self.complete_and_wait()
+
+# commit of new megabyte should trigger preallocation
+self.check_big()
+
+def test_reopen_opts(self):
+result = self.vm.qmp('x-blockdev-reopen', **{
+'node-name': 'disk',
+'driver': iotests.imgfmt,
+'file': {
+'node-name': 'filter',
+'driver': 'preallocate',
+'prealloc-size': 20 * MiB,
+'prealloc-align': 5 * MiB,
+'file': {
+'node-name': 'file',
+'driver': 'file',
+'filename': disk
+}
+}
+})
+self.assert_qmp(result, 'return', {})
+
+self.vm.hmp_qemu_io('drive0', 'write 0 1M')
+self.assertTrue(os.path.getsize(disk) == 25 * MiB)
+
+
+class TestTruncate(iotests.QMPTestCase):
+def setUp(self):
+iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))
+iotests.qemu_img_create('-f', iotests.imgfmt, refdisk, str(10 * MiB))
+
+def tearDown(self):
+os.remove(disk)
+os.remove(refdisk)
+
+def do_test(self, prealloc_mode, new_size):
+ret = iotests.qemu_io_silent('--image-opts', '-c', 'write 0 10M', '-c',
+ f'truncate -m {prealloc_mode} {new_size}',
+ drive_opts)
+self.assertEqual(ret, 0)
+
+ret = iotests.qemu_io_silent('-f', io

[PATCH v6 06/15] block: introduce BDRV_REQ_NO_WAIT flag

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
Add flag to make serialising request no wait: if there are conflicting
requests, just return error immediately. It's will be used in upcoming
preallocate filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 include/block/block.h |  9 -
 block/io.c| 11 ++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index ef948e3f34..e7188fea05 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -67,8 +67,15 @@ typedef enum {
  * written to qiov parameter which may be NULL.
  */
 BDRV_REQ_PREFETCH  = 0x200,
+
+/*
+ * If we need to wait for other requests, just fail immediately. Used
+ * only together with BDRV_REQ_SERIALISING.
+ */
+BDRV_REQ_NO_WAIT = 0x400,
+
 /* Mask of valid flags */
-BDRV_REQ_MASK   = 0x3ff,
+BDRV_REQ_MASK   = 0x7ff,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/block/io.c b/block/io.c
index 9b148bb8ea..fdcac4888e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1912,9 +1912,18 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
 assert(!(bs->open_flags & BDRV_O_INACTIVE));
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
 assert(!(flags & ~BDRV_REQ_MASK));
+assert(!((flags & BDRV_REQ_NO_WAIT) && !(flags & BDRV_REQ_SERIALISING)));
 
 if (flags & BDRV_REQ_SERIALISING) {
-bdrv_make_request_serialising(req, bdrv_get_cluster_size(bs));
+QEMU_LOCK_GUARD(>reqs_lock);
+
+tracked_request_set_serialising(req, bdrv_get_cluster_size(bs));
+
+if ((flags & BDRV_REQ_NO_WAIT) && bdrv_find_conflicting_request(req)) {
+return -EBUSY;
+}
+
+bdrv_wait_serialising_requests_locked(req);
 } else {
 bdrv_wait_serialising_requests(req);
 }
-- 
2.21.3




[PATCH v6 01/15] block: simplify comment to BDRV_REQ_SERIALISING

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
1. BDRV_REQ_NO_SERIALISING doesn't exist already, don't mention it.

2. We are going to add one more user of BDRV_REQ_SERIALISING, so
   comment about backup becomes a bit confusing here. The use case in
   backup is documented in block/backup.c, so let's just drop
   duplication here.

3. The fact that BDRV_REQ_SERIALISING is only for write requests is
   omitted. Add a note.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Stefan Hajnoczi 
---
 include/block/block.h | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 981ab5b314..ef948e3f34 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -53,16 +53,7 @@ typedef enum {
  * content. */
 BDRV_REQ_WRITE_UNCHANGED= 0x40,
 
-/*
- * BDRV_REQ_SERIALISING forces request serialisation for writes.
- * It is used to ensure that writes to the backing file of a backup process
- * target cannot race with a read of the backup target that defers to the
- * backing file.
- *
- * Note, that BDRV_REQ_SERIALISING is _not_ opposite in meaning to
- * BDRV_REQ_NO_SERIALISING. A more descriptive name for the latter might be
- * _DO_NOT_WAIT_FOR_SERIALISING, except that is too long.
- */
+/* Forces request serialisation. Use only with write requests. */
 BDRV_REQ_SERIALISING= 0x80,
 
 /* Execute the request only if the operation can be offloaded or otherwise
-- 
2.21.3




[PATCH v6 04/15] block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
bs is linked in req, so no needs to pass it separately. Most of
tracked-requests API doesn't have bs argument. Actually, after this
patch only tracked_request_begin has it, but it's for purpose.

While being here, also add a comment about what "_locked" is.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index c58fd36091..ab9ef7fd1a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -761,16 +761,16 @@ bdrv_find_conflicting_request(BdrvTrackedRequest *self)
 return NULL;
 }
 
+/* Called with self->bs->reqs_lock held */
 static bool coroutine_fn
-bdrv_wait_serialising_requests_locked(BlockDriverState *bs,
-  BdrvTrackedRequest *self)
+bdrv_wait_serialising_requests_locked(BdrvTrackedRequest *self)
 {
 BdrvTrackedRequest *req;
 bool waited = false;
 
 while ((req = bdrv_find_conflicting_request(self))) {
 self->waiting_for = req;
-qemu_co_queue_wait(>wait_queue, >reqs_lock);
+qemu_co_queue_wait(>wait_queue, >bs->reqs_lock);
 self->waiting_for = NULL;
 waited = true;
 }
@@ -794,7 +794,7 @@ bool bdrv_mark_request_serialising(BdrvTrackedRequest *req, 
uint64_t align)
 
 req->overlap_offset = MIN(req->overlap_offset, overlap_offset);
 req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes);
-waited = bdrv_wait_serialising_requests_locked(bs, req);
+waited = bdrv_wait_serialising_requests_locked(req);
 qemu_co_mutex_unlock(>reqs_lock);
 return waited;
 }
@@ -876,7 +876,7 @@ static bool coroutine_fn 
bdrv_wait_serialising_requests(BdrvTrackedRequest *self
 }
 
 qemu_co_mutex_lock(>reqs_lock);
-waited = bdrv_wait_serialising_requests_locked(bs, self);
+waited = bdrv_wait_serialising_requests_locked(self);
 qemu_co_mutex_unlock(>reqs_lock);
 
 return waited;
-- 
2.21.3




[PATCH v6 05/15] block: bdrv_mark_request_serialising: split non-waiting function

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
We'll need a separate function, which will only "mark" request
serialising with specified align but not wait for conflicting
requests. So, it will be like old bdrv_mark_request_serialising(),
before merging bdrv_wait_serialising_requests_locked() into it.

To reduce the possible mess, let's do the following:

Public function that does both marking and waiting will be called
bdrv_make_request_serialising, and private function which will only
"mark" will be called tracked_request_set_serialising().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 include/block/block_int.h |  3 ++-
 block/file-posix.c|  2 +-
 block/io.c| 35 +++
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 38cad9d15c..887b0668d8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1052,7 +1052,8 @@ extern unsigned int bdrv_drain_all_count;
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent);
 void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState 
*old_parent);
 
-bool coroutine_fn bdrv_mark_request_serialising(BdrvTrackedRequest *req, 
uint64_t align);
+bool coroutine_fn bdrv_make_request_serialising(BdrvTrackedRequest *req,
+uint64_t align);
 BdrvTrackedRequest *coroutine_fn bdrv_co_get_self_request(BlockDriverState 
*bs);
 
 int get_tmp_filename(char *filename, int size);
diff --git a/block/file-posix.c b/block/file-posix.c
index c63926d592..37d9266f6a 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2953,7 +2953,7 @@ raw_do_pwrite_zeroes(BlockDriverState *bs, int64_t 
offset, int bytes,
 req->bytes = end - req->offset;
 req->overlap_bytes = req->bytes;
 
-bdrv_mark_request_serialising(req, bs->bl.request_alignment);
+bdrv_make_request_serialising(req, bs->bl.request_alignment);
 }
 #endif
 
diff --git a/block/io.c b/block/io.c
index ab9ef7fd1a..9b148bb8ea 100644
--- a/block/io.c
+++ b/block/io.c
@@ -778,15 +778,14 @@ bdrv_wait_serialising_requests_locked(BdrvTrackedRequest 
*self)
 return waited;
 }
 
-bool bdrv_mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
+/* Called with req->bs->reqs_lock held */
+static void tracked_request_set_serialising(BdrvTrackedRequest *req,
+uint64_t align)
 {
-BlockDriverState *bs = req->bs;
 int64_t overlap_offset = req->offset & ~(align - 1);
 uint64_t overlap_bytes = ROUND_UP(req->offset + req->bytes, align)
- overlap_offset;
-bool waited;
 
-qemu_co_mutex_lock(>reqs_lock);
 if (!req->serialising) {
 atomic_inc(>bs->serialising_in_flight);
 req->serialising = true;
@@ -794,9 +793,6 @@ bool bdrv_mark_request_serialising(BdrvTrackedRequest *req, 
uint64_t align)
 
 req->overlap_offset = MIN(req->overlap_offset, overlap_offset);
 req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes);
-waited = bdrv_wait_serialising_requests_locked(req);
-qemu_co_mutex_unlock(>reqs_lock);
-return waited;
 }
 
 /**
@@ -882,6 +878,21 @@ static bool coroutine_fn 
bdrv_wait_serialising_requests(BdrvTrackedRequest *self
 return waited;
 }
 
+bool coroutine_fn bdrv_make_request_serialising(BdrvTrackedRequest *req,
+uint64_t align)
+{
+bool waited;
+
+qemu_co_mutex_lock(>bs->reqs_lock);
+
+tracked_request_set_serialising(req, align);
+waited = bdrv_wait_serialising_requests_locked(req);
+
+qemu_co_mutex_unlock(>bs->reqs_lock);
+
+return waited;
+}
+
 static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
size_t size)
 {
@@ -1492,7 +1503,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
  * with each other for the same cluster.  For example, in copy-on-read
  * it ensures that the CoR read and write operations are atomic and
  * guest writes cannot interleave between them. */
-bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
+bdrv_make_request_serialising(req, bdrv_get_cluster_size(bs));
 } else {
 bdrv_wait_serialising_requests(req);
 }
@@ -1903,7 +1914,7 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
 assert(!(flags & ~BDRV_REQ_MASK));
 
 if (flags & BDRV_REQ_SERIALISING) {
-bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
+bdrv_make_request_serialising(req, bdrv_get_cluster_size(bs));
 } else {
 bdrv_wait_serialising_requests(req);
 }
@@ -2069,7 +2080,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild 
*child,
 
 padding =

[PATCH v6 02/15] block/io.c: drop assertion on double waiting for request serialisation

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
The comments states, that on misaligned request we should have already
been waiting. But for bdrv_padding_rmw_read, we called
bdrv_mark_request_serialising with align = request_alignment, and now
we serialise with align = cluster_size. So we may have to wait again
with larger alignment.

Note, that the only user of BDRV_REQ_SERIALISING is backup which issues
cluster-aligned requests, so seems the assertion should not fire for
now. But it's wrong anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Paolo Bonzini 
---
 block/io.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/block/io.c b/block/io.c
index a2389bb38c..67617bb9b2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1881,7 +1881,6 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
   BdrvTrackedRequest *req, int flags)
 {
 BlockDriverState *bs = child->bs;
-bool waited;
 int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
 
 if (bs->read_only) {
@@ -1893,15 +1892,7 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
 assert(!(flags & ~BDRV_REQ_MASK));
 
 if (flags & BDRV_REQ_SERIALISING) {
-waited = bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
-/*
- * For a misaligned request we should have already waited earlier,
- * because we come after bdrv_padding_rmw_read which must be called
- * with the request already marked as serialising.
- */
-assert(!waited ||
-   (req->offset == req->overlap_offset &&
-req->bytes == req->overlap_bytes));
+bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
 } else {
 bdrv_wait_serialising_requests(req);
 }
-- 
2.21.3




[PATCH v6 03/15] block/io: split out bdrv_find_conflicting_request

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
To be reused in separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c | 71 +++---
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/block/io.c b/block/io.c
index 67617bb9b2..c58fd36091 100644
--- a/block/io.c
+++ b/block/io.c
@@ -727,43 +727,54 @@ static bool tracked_request_overlaps(BdrvTrackedRequest 
*req,
 return true;
 }
 
+/* Called with self->bs->reqs_lock held */
+static BdrvTrackedRequest *
+bdrv_find_conflicting_request(BdrvTrackedRequest *self)
+{
+BdrvTrackedRequest *req;
+
+QLIST_FOREACH(req, >bs->tracked_requests, list) {
+if (req == self || (!req->serialising && !self->serialising)) {
+continue;
+}
+if (tracked_request_overlaps(req, self->overlap_offset,
+ self->overlap_bytes))
+{
+/*
+ * Hitting this means there was a reentrant request, for
+ * example, a block driver issuing nested requests.  This must
+ * never happen since it means deadlock.
+ */
+assert(qemu_coroutine_self() != req->co);
+
+/*
+ * If the request is already (indirectly) waiting for us, or
+ * will wait for us as soon as it wakes up, then just go on
+ * (instead of producing a deadlock in the former case).
+ */
+if (!req->waiting_for) {
+return req;
+}
+}
+}
+
+return NULL;
+}
+
 static bool coroutine_fn
 bdrv_wait_serialising_requests_locked(BlockDriverState *bs,
   BdrvTrackedRequest *self)
 {
 BdrvTrackedRequest *req;
-bool retry;
 bool waited = false;
 
-do {
-retry = false;
-QLIST_FOREACH(req, >tracked_requests, list) {
-if (req == self || (!req->serialising && !self->serialising)) {
-continue;
-}
-if (tracked_request_overlaps(req, self->overlap_offset,
- self->overlap_bytes))
-{
-/* Hitting this means there was a reentrant request, for
- * example, a block driver issuing nested requests.  This must
- * never happen since it means deadlock.
- */
-assert(qemu_coroutine_self() != req->co);
-
-/* If the request is already (indirectly) waiting for us, or
- * will wait for us as soon as it wakes up, then just go on
- * (instead of producing a deadlock in the former case). */
-if (!req->waiting_for) {
-self->waiting_for = req;
-qemu_co_queue_wait(>wait_queue, >reqs_lock);
-self->waiting_for = NULL;
-retry = true;
-waited = true;
-break;
-}
-}
-}
-} while (retry);
+while ((req = bdrv_find_conflicting_request(self))) {
+self->waiting_for = req;
+qemu_co_queue_wait(>wait_queue, >reqs_lock);
+self->waiting_for = NULL;
+waited = true;
+}
+
 return waited;
 }
 
-- 
2.21.3




[PATCH v6 00/15] preallocate filter

2020-09-18 Thread Vladimir Sementsov-Ogievskiy
Hi all!

Here is a filter, which does preallocation on write.

In Virtuozzo we have to deal with some custom distributed storage
solution, where allocation is very-very expensive operation. We have to
workaround it in Qemu, so here is a new filter.

Still, the filter shows good results for me even for xfs and ext4.

Here are results, produced by new benchmark (last 4 patches):

All results are in iops (larger means better)

--  ---  ---
AB
no-prealloc  prealloc
ssd-ext4, aligned sequential 16k19934±1.2%   27108±0.27%
  A+36±2%
ssd-xfs, aligned sequential 16k 15528±5.5%   25953±3.3%
  A+67±11%
hdd-ext4, aligned sequential 16k5079±29% 3165±11%
  A-38±36%
hdd-xfs, aligned sequential 16k 4096±95% 3321±7.6%
  A-19±101%
ssd-ext4, unaligned sequential 64k  19969±1.9%   27043±0.49%
  A+35±3%
ssd-xfs, unaligned sequential 64k   15403±2.8%   25725±6.4%
  A+67±13%
hdd-ext4, unaligned sequential 64k  5250±17% 3239±8.7%
  A-38±23%
hdd-xfs, unaligned sequential 64k   5291±8.2%3336±4.2%
  A-37±11%
--  ---  ---

Note: it's on Fedora 30, kernel 5.6.13-100.fc30.x86_64

The tests are actually qemu-img bench, run like:

  ./qemu-img create -f qcow2 $img 16G

aligned:
  ./qemu-img bench -c 1 -d 64 -f qcow2  -s 16k -t none -n -w $img

unaligned
  ./qemu-img bench -c 1 -d 64 -f qcow2 -o 1k -s 64k -t none -n -w $img

and for preallocation, you'll drop -f qcow2, add --image-opts, and
instead of just $img use
  
driver=qcow2,file.driver=preallocate,file.file.driver=file,file.file.filename=$img
 

v6:
05: add Max's r-b
06: add Max's r-b
07: new
08: Changed a lot. really. no .active now, support more use-cases.
Somehow, now I see performance benefit on xfs too :)
probably due to .zero_start feature.
09: new
10: new
11: mostly rewritten, a lot more cases, drop r-b
12-15: new, to produce final benchmark table

Vladimir Sementsov-Ogievskiy (15):
  block: simplify comment to BDRV_REQ_SERIALISING
  block/io.c: drop assertion on double waiting for request serialisation
  block/io: split out bdrv_find_conflicting_request
  block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg
  block: bdrv_mark_request_serialising: split non-waiting function
  block: introduce BDRV_REQ_NO_WAIT flag
  block: bdrv_check_perm(): process children anyway
  block: introduce preallocate filter
  qemu-io: add preallocate mode parameter for truncate command
  iotests: qemu_io_silent: support --image-opts
  iotests: add 298 to test new preallocate filter driver
  scripts/simplebench: support iops
  scripts/simplebench: improve view of ascii table
  scripts/simplebench: improve ascii table: add difference line
  scripts/simplebench: add bench_prealloc.py

 docs/system/qemu-block-drivers.rst.inc |  26 ++
 qapi/block-core.json   |  20 +-
 include/block/block.h  |  20 +-
 include/block/block_int.h  |   3 +-
 block.c|  10 +-
 block/file-posix.c |   2 +-
 block/io.c | 130 +++---
 block/preallocate.c| 556 +
 qemu-io-cmds.c |  46 +-
 block/meson.build  |   1 +
 scripts/simplebench/bench_prealloc.py  | 128 ++
 scripts/simplebench/simplebench.py | 103 -
 tests/qemu-iotests/298 | 186 +
 tests/qemu-iotests/298.out |   5 +
 tests/qemu-iotests/group   |   1 +
 tests/qemu-iotests/iotests.py  |   7 +-
 16 files changed, 1146 insertions(+), 98 deletions(-)
 create mode 100644 block/preallocate.c
 create mode 100755 scripts/simplebench/bench_prealloc.py
 create mode 100644 tests/qemu-iotests/298
 create mode 100644 tests/qemu-iotests/298.out

-- 
2.21.3




Re: [PATCH v2 12/13] block/qcow2: simplify qcow2_co_invalidate_cache()

2020-09-18 Thread Vladimir Sementsov-Ogievskiy

18.09.2020 18:51, Alberto Garcia wrote:

On Fri 18 Sep 2020 05:30:06 PM CEST, Greg Kurz wrote:

qcow2_do_open correctly sets errp on each failure path. So, we can
simplify code in qcow2_co_invalidate_cache() and drop explicit error
propagation. We should use ERRP_GUARD() (accordingly to comment in
include/qapi/error.h) together with error_append() call which we add to
avoid problems with error_fatal.



The wording gives the impression that we add error_append() to avoid problems
with error_fatal which is certainly not true. Also it isn't _append() but
_prepend() :)

What about ?

"Add ERRP_GUARD() as mandated by the documentation in include/qapi/error.h
  to avoid problems with the error_prepend() call if errp is
  _fatal."


OK for me.



I had to go to the individual error functions to see what "it doesn't
work with _fatal" actually means.

So in a case like qcow2_do_open() which has:

error_setg(errp, ...)
error_append_hint(errp, ...)

As far as I can see this works just fine without ERRP_GUARD() and with
error_fatal, the difference is that if we don't use the guard then the
process exists during error_setg(), and if we use the guard it exists
during the implicit error_propagate() call triggered by its destruction
at the end of the function. In this latter case the printed error
message would include the hint.



Yes the only problem is that without ERRP_GUARD we lose the hint in case of 
error_fatal.

--
Best regards,
Vladimir



Re: [PATCH v2 09/13] block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface

2020-09-18 Thread Vladimir Sementsov-Ogievskiy

18.09.2020 17:54, Alberto Garcia wrote:

On Thu 17 Sep 2020 09:55:15 PM CEST, Vladimir Sementsov-Ogievskiy wrote:

It's recommended for bool functions with errp to return true on success
and false on failure. Non-standard interfaces don't help to understand
the code. The change is also needed to reduce error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 



+/*
+ * Return true on success, false on failure. Anyway, if header_updated
+ * provided set it appropriately.
   */


I'm not a native speaker but it sounds a bit odd to me. Maybe "If
header_updated is not NULL then it is set appropriately regardless of
the return value".


That's better I think, thanks.



But I'm fine with your version, so

Reviewed-by: Alberto Garcia 

Berto




--
Best regards,
Vladimir



Re: [PATCH v2 00/13] block: deal with errp: part I

2020-09-18 Thread Vladimir Sementsov-Ogievskiy

17.09.2020 23:15, no-re...@patchew.org wrote:

Patchew URL: 
https://patchew.org/QEMU/20200917195519.19589-1-vsement...@virtuozzo.com/



Hi,

This series failed build test on FreeBSD host. Please find the details below.






The full log is available at
http://patchew.org/logs/20200917195519.19589-1-vsement...@virtuozzo.com/testing.FreeBSD/?type=message.


Link is broken, it shows: "N/A. Internal error while reading log file"


--
Best regards,
Vladimir



[PATCH v2 13/13] block/qed: bdrv_qed_do_open: deal with errp

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
Set errp always on failure. Generic bdrv_open_driver supports driver
functions which can return negative value and forget to set errp.
That's a strange thing.. Let's improve bdrv_qed_do_open to not behave
this way. This allows to simplify code in
bdrv_qed_co_invalidate_cache().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/qed.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index b27e7546ca..f45c640513 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -393,6 +393,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 
 ret = bdrv_pread(bs->file, 0, _header, sizeof(le_header));
 if (ret < 0) {
+error_setg(errp, "Failed to read QED header");
 return ret;
 }
 qed_header_le_to_cpu(_header, >header);
@@ -408,25 +409,30 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 return -ENOTSUP;
 }
 if (!qed_is_cluster_size_valid(s->header.cluster_size)) {
+error_setg(errp, "QED cluster size is invalid");
 return -EINVAL;
 }
 
 /* Round down file size to the last cluster */
 file_size = bdrv_getlength(bs->file->bs);
 if (file_size < 0) {
+error_setg(errp, "Failed to get file length");
 return file_size;
 }
 s->file_size = qed_start_of_cluster(s, file_size);
 
 if (!qed_is_table_size_valid(s->header.table_size)) {
+error_setg(errp, "QED table size is invalid");
 return -EINVAL;
 }
 if (!qed_is_image_size_valid(s->header.image_size,
  s->header.cluster_size,
  s->header.table_size)) {
+error_setg(errp, "QED image size is invalid");
 return -EINVAL;
 }
 if (!qed_check_table_offset(s, s->header.l1_table_offset)) {
+error_setg(errp, "QED table offset is invalid");
 return -EINVAL;
 }
 
@@ -438,6 +444,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 
 /* Header size calculation must not overflow uint32_t */
 if (s->header.header_size > UINT32_MAX / s->header.cluster_size) {
+error_setg(errp, "QED header size is too large");
 return -EINVAL;
 }
 
@@ -445,6 +452,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 if ((uint64_t)s->header.backing_filename_offset +
 s->header.backing_filename_size >
 s->header.cluster_size * s->header.header_size) {
+error_setg(errp, "QED backing filename offset is invalid");
 return -EINVAL;
 }
 
@@ -453,6 +461,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
   bs->auto_backing_file,
   sizeof(bs->auto_backing_file));
 if (ret < 0) {
+error_setg(errp, "Failed to read backing filename");
 return ret;
 }
 pstrcpy(bs->backing_file, sizeof(bs->backing_file),
@@ -475,6 +484,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 
 ret = qed_write_header_sync(s);
 if (ret) {
+error_setg(errp, "Failed to update header");
 return ret;
 }
 
@@ -487,6 +497,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 
 ret = qed_read_l1_table_sync(s);
 if (ret) {
+error_setg(errp, "Failed to read L1 table");
 goto out;
 }
 
@@ -503,6 +514,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 
 ret = qed_check(s, , true);
 if (ret) {
+error_setg(errp, "Image corrupted");
 goto out;
 }
 }
@@ -1537,22 +1549,16 @@ static void coroutine_fn 
bdrv_qed_co_invalidate_cache(BlockDriverState *bs,
   Error **errp)
 {
 BDRVQEDState *s = bs->opaque;
-Error *local_err = NULL;
 int ret;
 
 bdrv_qed_close(bs);
 
 bdrv_qed_init_state(bs);
 qemu_co_mutex_lock(>table_lock);
-ret = bdrv_qed_do_open(bs, NULL, bs->open_flags, _err);
+ret = bdrv_qed_do_open(bs, NULL, bs->open_flags, errp);
 qemu_co_mutex_unlock(>table_lock);
-if (local_err) {
-error_propagate_prepend(errp, local_err,
-"Could not reopen qed layer: ");
-return;
-} else if (ret < 0) {
-error_setg_errno(errp, -ret, "Could not reopen qed layer");
-return;
+if (ret < 0) {
+error_prepend(errp, "Could not reopen qed layer: ");
 }
 }
 
-- 
2.21.3




[PATCH v2 08/13] block/qcow2: qcow2_get_specific_info(): drop error propagation

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
Don't use error propagation in qcow2_get_specific_info(). For this
refactor qcow2_get_bitmap_info_list, its current interface is rather
weird.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
---
 block/qcow2.h|  4 ++--
 block/qcow2-bitmap.c | 27 +--
 block/qcow2.c| 10 +++---
 3 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index b71e444fca..6eac088f1c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -973,8 +973,8 @@ int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
   void **refcount_table,
   int64_t *refcount_table_size);
 bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp);
-Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
-Error **errp);
+bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
+Qcow2BitmapInfoList **info_list, Error **errp);
 int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
 int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
 void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index d7a31a8ddc..4f6138f544 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -1093,30 +1093,29 @@ static Qcow2BitmapInfoFlagsList 
*get_bitmap_info_flags(uint32_t flags)
 /*
  * qcow2_get_bitmap_info_list()
  * Returns a list of QCOW2 bitmap details.
- * In case of no bitmaps, the function returns NULL and
- * the @errp parameter is not set.
- * When bitmap information can not be obtained, the function returns
- * NULL and the @errp parameter is set.
+ * On success return true with bm_list set (probably to NULL, if no bitmaps),
+ * on failure return false with errp set.
  */
-Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
-Error **errp)
+bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
+Qcow2BitmapInfoList **info_list, Error **errp)
 {
 BDRVQcow2State *s = bs->opaque;
 Qcow2BitmapList *bm_list;
 Qcow2Bitmap *bm;
-Qcow2BitmapInfoList *list = NULL;
-Qcow2BitmapInfoList **plist = 
 
 if (s->nb_bitmaps == 0) {
-return NULL;
+*info_list = NULL;
+return true;
 }
 
 bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
s->bitmap_directory_size, errp);
-if (bm_list == NULL) {
-return NULL;
+if (!bm_list) {
+return false;
 }
 
+*info_list = NULL;
+
 QSIMPLEQ_FOREACH(bm, bm_list, entry) {
 Qcow2BitmapInfo *info = g_new0(Qcow2BitmapInfo, 1);
 Qcow2BitmapInfoList *obj = g_new0(Qcow2BitmapInfoList, 1);
@@ -1124,13 +1123,13 @@ Qcow2BitmapInfoList 
*qcow2_get_bitmap_info_list(BlockDriverState *bs,
 info->name = g_strdup(bm->name);
 info->flags = get_bitmap_info_flags(bm->flags & ~BME_RESERVED_FLAGS);
 obj->value = info;
-*plist = obj;
-plist = >next;
+*info_list = obj;
+info_list = >next;
 }
 
 bitmap_list_free(bm_list);
 
-return list;
+return true;
 }
 
 int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp)
diff --git a/block/qcow2.c b/block/qcow2.c
index 41a29072e6..8c89c98978 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5038,12 +5038,10 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs,
 BDRVQcow2State *s = bs->opaque;
 ImageInfoSpecific *spec_info;
 QCryptoBlockInfo *encrypt_info = NULL;
-Error *local_err = NULL;
 
 if (s->crypto != NULL) {
-encrypt_info = qcrypto_block_get_info(s->crypto, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+encrypt_info = qcrypto_block_get_info(s->crypto, errp);
+if (!encrypt_info) {
 return NULL;
 }
 }
@@ -5060,9 +5058,7 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs,
 };
 } else if (s->qcow_version == 3) {
 Qcow2BitmapInfoList *bitmaps;
-bitmaps = qcow2_get_bitmap_info_list(bs, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!qcow2_get_bitmap_info_list(bs, , errp)) {
 qapi_free_ImageInfoSpecific(spec_info);
 qapi_free_QCryptoBlockInfo(encrypt_info);
 return NULL;
-- 
2.21.3




[PATCH v2 12/13] block/qcow2: simplify qcow2_co_invalidate_cache()

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
qcow2_do_open correctly sets errp on each failure path. So, we can
simplify code in qcow2_co_invalidate_cache() and drop explicit error
propagation. We should use ERRP_GUARD() (accordingly to comment in
include/qapi/error.h) together with error_append() call which we add to
avoid problems with error_fatal.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2b6ec4b757..cd5f48d3fb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2702,11 +2702,11 @@ static void qcow2_close(BlockDriverState *bs)
 static void coroutine_fn qcow2_co_invalidate_cache(BlockDriverState *bs,
Error **errp)
 {
+ERRP_GUARD();
 BDRVQcow2State *s = bs->opaque;
 int flags = s->flags;
 QCryptoBlock *crypto = NULL;
 QDict *options;
-Error *local_err = NULL;
 int ret;
 
 /*
@@ -2724,16 +2724,11 @@ static void coroutine_fn 
qcow2_co_invalidate_cache(BlockDriverState *bs,
 
 flags &= ~BDRV_O_INACTIVE;
 qemu_co_mutex_lock(>lock);
-ret = qcow2_do_open(bs, options, flags, _err);
+ret = qcow2_do_open(bs, options, flags, errp);
 qemu_co_mutex_unlock(>lock);
 qobject_unref(options);
-if (local_err) {
-error_propagate_prepend(errp, local_err,
-"Could not reopen qcow2 layer: ");
-bs->drv = NULL;
-return;
-} else if (ret < 0) {
-error_setg_errno(errp, -ret, "Could not reopen qcow2 layer");
+if (ret < 0) {
+error_prepend(errp, "Could not reopen qcow2 layer: ");
 bs->drv = NULL;
 return;
 }
-- 
2.21.3




[PATCH v2 07/13] blockjob: return status from block_job_set_speed()

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
Better to return status together with setting errp. It allows to avoid
error propagation in the caller.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 include/block/blockjob.h |  2 +-
 blockjob.c   | 18 --
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 35faa3aa26..d200f33c10 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -139,7 +139,7 @@ bool block_job_has_bdrv(BlockJob *job, BlockDriverState 
*bs);
  * Set a rate-limiting parameter for the job; the actual meaning may
  * vary depending on the job type.
  */
-void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
+bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
 
 /**
  * block_job_query:
diff --git a/blockjob.c b/blockjob.c
index 470facfd47..afddf7a1fb 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -254,28 +254,30 @@ static bool job_timer_pending(Job *job)
 return timer_pending(>sleep_timer);
 }
 
-void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
+bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 {
 int64_t old_speed = job->speed;
 
-if (job_apply_verb(>job, JOB_VERB_SET_SPEED, errp)) {
-return;
+if (job_apply_verb(>job, JOB_VERB_SET_SPEED, errp) < 0) {
+return false;
 }
 if (speed < 0) {
 error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "speed",
"a non-negative value");
-return;
+return false;
 }
 
 ratelimit_set_speed(>limit, speed, BLOCK_JOB_SLICE_TIME);
 
 job->speed = speed;
 if (speed && speed <= old_speed) {
-return;
+return true;
 }
 
 /* kick only if a timer is pending */
 job_enter_cond(>job, job_timer_pending);
+
+return true;
 }
 
 int64_t block_job_ratelimit_get_delay(BlockJob *job, uint64_t n)
@@ -448,12 +450,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 
 /* Only set speed when necessary to avoid NotSupported error */
 if (speed != 0) {
-Error *local_err = NULL;
-
-block_job_set_speed(job, speed, _err);
-if (local_err) {
+if (!block_job_set_speed(job, speed, errp)) {
 job_early_fail(>job);
-error_propagate(errp, local_err);
 return NULL;
 }
 }
-- 
2.21.3




[PATCH v2 06/13] block/mirror: drop extra error propagation in commit_active_start()

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
Let's check return value of mirror_start_job to check for failure
instead of local_err.

Rename ret to job, as ret is usually integer variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 block/mirror.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b3778248b8..f7c624d6a9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1851,8 +1851,7 @@ BlockJob *commit_active_start(const char *job_id, 
BlockDriverState *bs,
   bool auto_complete, Error **errp)
 {
 bool base_read_only;
-Error *local_err = NULL;
-BlockJob *ret;
+BlockJob *job;
 
 base_read_only = bdrv_is_read_only(base);
 
@@ -1862,19 +1861,18 @@ BlockJob *commit_active_start(const char *job_id, 
BlockDriverState *bs,
 }
 }
 
-ret = mirror_start_job(
+job = mirror_start_job(
  job_id, bs, creation_flags, base, NULL, speed, 0, 0,
  MIRROR_LEAVE_BACKING_CHAIN, false,
  on_error, on_error, true, cb, opaque,
  _active_job_driver, false, base, auto_complete,
  filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
- _err);
-if (local_err) {
-error_propagate(errp, local_err);
+ errp);
+if (!job) {
 goto error_restore_flags;
 }
 
-return ret;
+return job;
 
 error_restore_flags:
 /* ignore error and errp for bdrv_reopen, because we want to propagate
-- 
2.21.3




[PATCH v2 11/13] block/qcow2: read_cache_sizes: return status value

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
It's better to return status together with setting errp. It allows to
reduce error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 block/qcow2.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c4b86df7c0..2b6ec4b757 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -869,7 +869,7 @@ static void qcow2_attach_aio_context(BlockDriverState *bs,
 cache_clean_timer_init(bs, new_context);
 }
 
-static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
+static bool read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
  uint64_t *l2_cache_size,
  uint64_t *l2_cache_entry_size,
  uint64_t *refcount_cache_size, Error **errp)
@@ -907,16 +907,16 @@ static void read_cache_sizes(BlockDriverState *bs, 
QemuOpts *opts,
 error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE
" and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be set "
"at the same time");
-return;
+return false;
 } else if (l2_cache_size_set &&
(l2_cache_max_setting > combined_cache_size)) {
 error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
QCOW2_OPT_CACHE_SIZE);
-return;
+return false;
 } else if (*refcount_cache_size > combined_cache_size) {
 error_setg(errp, QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not exceed "
QCOW2_OPT_CACHE_SIZE);
-return;
+return false;
 }
 
 if (l2_cache_size_set) {
@@ -955,8 +955,10 @@ static void read_cache_sizes(BlockDriverState *bs, 
QemuOpts *opts,
 error_setg(errp, "L2 cache entry size must be a power of two "
"between %d and the cluster size (%d)",
1 << MIN_CLUSTER_BITS, s->cluster_size);
-return;
+return false;
 }
+
+return true;
 }
 
 typedef struct Qcow2ReopenState {
@@ -983,7 +985,6 @@ static int qcow2_update_options_prepare(BlockDriverState 
*bs,
 int i;
 const char *encryptfmt;
 QDict *encryptopts = NULL;
-Error *local_err = NULL;
 int ret;
 
 qdict_extract_subqdict(options, , "encrypt.");
@@ -996,10 +997,8 @@ static int qcow2_update_options_prepare(BlockDriverState 
*bs,
 }
 
 /* get L2 table/refcount block cache size from command line options */
-read_cache_sizes(bs, opts, _cache_size, _cache_entry_size,
- _cache_size, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!read_cache_sizes(bs, opts, _cache_size, _cache_entry_size,
+  _cache_size, errp)) {
 ret = -EINVAL;
 goto fail;
 }
-- 
2.21.3




[PATCH v2 05/13] block: drop extra error propagation for bdrv_set_backing_hd

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
bdrv_set_backing_hd now returns status, let's use it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 block.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index b4e36d6dd7..1cf825c349 100644
--- a/block.c
+++ b/block.c
@@ -3015,11 +3015,9 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 
 /* Hook up the backing file link; drop our reference, bs owns the
  * backing_hd reference now */
-bdrv_set_backing_hd(bs, backing_hd, _err);
+ret = bdrv_set_backing_hd(bs, backing_hd, errp);
 bdrv_unref(backing_hd);
-if (local_err) {
-error_propagate(errp, local_err);
-ret = -EINVAL;
+if (ret < 0) {
 goto free_exit;
 }
 
-- 
2.21.3




[PATCH v2 09/13] block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
It's recommended for bool functions with errp to return true on success
and false on failure. Non-standard interfaces don't help to understand
the code. The change is also needed to reduce error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h|  3 ++-
 block/qcow2-bitmap.c | 25 ++---
 block/qcow2.c|  6 ++
 3 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 6eac088f1c..3c64dcda33 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -972,7 +972,8 @@ void qcow2_cache_discard(Qcow2Cache *c, void *table);
 int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
   void **refcount_table,
   int64_t *refcount_table_size);
-bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp);
+bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, bool *header_updated,
+  Error **errp);
 bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
 Qcow2BitmapInfoList **info_list, Error **errp);
 int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 4f6138f544..500175f4e8 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -962,25 +962,26 @@ static void set_readonly_helper(gpointer bitmap, gpointer 
value)
 bdrv_dirty_bitmap_set_readonly(bitmap, (bool)value);
 }
 
-/* qcow2_load_dirty_bitmaps()
- * Return value is a hint for caller: true means that the Qcow2 header was
- * updated. (false doesn't mean that the header should be updated by the
- * caller, it just means that updating was not needed or the image cannot be
- * written to).
- * On failure the function returns false.
+/*
+ * Return true on success, false on failure. Anyway, if header_updated
+ * provided set it appropriately.
  */
-bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp)
+bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, bool *header_updated,
+  Error **errp)
 {
 BDRVQcow2State *s = bs->opaque;
 Qcow2BitmapList *bm_list;
 Qcow2Bitmap *bm;
 GSList *created_dirty_bitmaps = NULL;
-bool header_updated = false;
 bool needs_update = false;
 
+if (header_updated) {
+*header_updated = false;
+}
+
 if (s->nb_bitmaps == 0) {
 /* No bitmaps - nothing to do */
-return false;
+return true;
 }
 
 bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
@@ -1036,7 +1037,9 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error 
**errp)
 error_setg_errno(errp, -ret, "Can't update bitmap directory");
 goto fail;
 }
-header_updated = true;
+if (header_updated) {
+*header_updated = true;
+}
 }
 
 if (!can_write(bs)) {
@@ -1047,7 +1050,7 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error 
**errp)
 g_slist_free(created_dirty_bitmaps);
 bitmap_list_free(bm_list);
 
-return header_updated;
+return true;
 
 fail:
 g_slist_foreach(created_dirty_bitmaps, release_dirty_bitmap_helper, bs);
diff --git a/block/qcow2.c b/block/qcow2.c
index 8c89c98978..c4b86df7c0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1297,7 +1297,6 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 unsigned int len, i;
 int ret = 0;
 QCowHeader header;
-Error *local_err = NULL;
 uint64_t ext_end;
 uint64_t l1_vm_state_index;
 bool update_header = false;
@@ -1785,9 +1784,8 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 
 if (!(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) {
 /* It's case 1, 2 or 3.2. Or 3.1 which is BUG in management layer. */
-bool header_updated = qcow2_load_dirty_bitmaps(bs, _err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
+bool header_updated;
+if (!qcow2_load_dirty_bitmaps(bs, _updated, errp)) {
 ret = -EINVAL;
 goto fail;
 }
-- 
2.21.3




[PATCH v2 02/13] block: use return status of bdrv_append()

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
Now bdrv_append returns status and we can drop all the local_err things
around it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 block.c |  5 +
 block/backup-top.c  | 20 
 block/commit.c  |  5 +
 block/mirror.c  |  6 ++
 blockdev.c  |  4 +---
 tests/test-bdrv-graph-mod.c |  6 +++---
 6 files changed, 16 insertions(+), 30 deletions(-)

diff --git a/block.c b/block.c
index f922c6d8f4..b4e36d6dd7 100644
--- a/block.c
+++ b/block.c
@@ -3160,7 +3160,6 @@ static BlockDriverState 
*bdrv_append_temp_snapshot(BlockDriverState *bs,
 int64_t total_size;
 QemuOpts *opts = NULL;
 BlockDriverState *bs_snapshot = NULL;
-Error *local_err = NULL;
 int ret;
 
 /* if snapshot, we create a temporary backing file and open it
@@ -3207,9 +3206,7 @@ static BlockDriverState 
*bdrv_append_temp_snapshot(BlockDriverState *bs,
  * order to be able to return one, we have to increase
  * bs_snapshot's refcount here */
 bdrv_ref(bs_snapshot);
-bdrv_append(bs_snapshot, bs, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (bdrv_append(bs_snapshot, bs, errp) < 0) {
 bs_snapshot = NULL;
 goto out;
 }
diff --git a/block/backup-top.c b/block/backup-top.c
index fe6883cc97..eb6a34b726 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -190,7 +190,7 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
*source,
  BlockCopyState **bcs,
  Error **errp)
 {
-Error *local_err = NULL;
+ERRP_GUARD();
 BDRVBackupTopState *state;
 BlockDriverState *top;
 bool appended = false;
@@ -223,9 +223,8 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
*source,
 bdrv_drained_begin(source);
 
 bdrv_ref(top);
-bdrv_append(top, source, _err);
-if (local_err) {
-error_prepend(_err, "Cannot append backup-top filter: ");
+if (bdrv_append(top, source, errp) < 0) {
+error_prepend(errp, "Cannot append backup-top filter: ");
 goto fail;
 }
 appended = true;
@@ -235,18 +234,16 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
*source,
  * we want.
  */
 state->active = true;
-bdrv_child_refresh_perms(top, top->backing, _err);
-if (local_err) {
-error_prepend(_err,
-  "Cannot set permissions for backup-top filter: ");
+if (bdrv_child_refresh_perms(top, top->backing, errp) < 0) {
+error_prepend(errp, "Cannot set permissions for backup-top filter: ");
 goto fail;
 }
 
 state->cluster_size = cluster_size;
 state->bcs = block_copy_state_new(top->backing, state->target,
-  cluster_size, write_flags, _err);
-if (local_err) {
-error_prepend(_err, "Cannot create block-copy-state: ");
+  cluster_size, write_flags, errp);
+if (!state->bcs) {
+error_prepend(errp, "Cannot create block-copy-state: ");
 goto fail;
 }
 *bcs = state->bcs;
@@ -264,7 +261,6 @@ fail:
 }
 
 bdrv_drained_end(source);
-error_propagate(errp, local_err);
 
 return NULL;
 }
diff --git a/block/commit.c b/block/commit.c
index 1e85c306cc..6da0902f9d 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -254,7 +254,6 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 BlockDriverState *iter;
 BlockDriverState *commit_top_bs = NULL;
 BlockDriverState *filtered_base;
-Error *local_err = NULL;
 int64_t base_size, top_size;
 uint64_t base_perms, iter_shared_perms;
 int ret;
@@ -312,10 +311,8 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 
 commit_top_bs->total_sectors = top->total_sectors;
 
-bdrv_append(commit_top_bs, top, _err);
-if (local_err) {
+if (bdrv_append(commit_top_bs, top, errp) < 0) {
 commit_top_bs = NULL;
-error_propagate(errp, local_err);
 goto fail;
 }
 
diff --git a/block/mirror.c b/block/mirror.c
index 26acf4af6f..b3778248b8 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1560,7 +1560,6 @@ static BlockJob *mirror_start_job(
 BlockDriverState *mirror_top_bs;
 bool target_is_backing;
 uint64_t target_perms, target_shared_perms;
-Error *local_err = NULL;
 int ret;
 
 if (granularity == 0) {
@@ -1609,12 +1608,11 @@ static BlockJob *mirror_start_job(
  * it alive until block_job_create() succeeds even if bs has no parent. */
 bdrv_ref(mirror_top_bs);
 bdrv_drained_begin(bs);
-bdrv_append(mirror_top_bs, bs, _err);
+ret = bdrv_append(mirror_top_bs, bs, errp);
 bdrv_drained_end(bs);
 
-if (local_err) {
+if (ret &

[PATCH v2 03/13] block: check return value of bdrv_open_child and drop error propagation

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
This patch is generated by cocci script:

@@
symbol bdrv_open_child, errp, local_err;
expression file;
@@

  file = bdrv_open_child(...,
-_err
+errp
);
- if (local_err)
+ if (!file)
  {
  ...
- error_propagate(errp, local_err);
  ...
  }

with command

spatch --sp-file x.cocci --macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80 --use-gitgrep block

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 block/blkdebug.c |  6 ++
 block/blklogwrites.c | 10 --
 block/blkreplay.c|  6 ++
 block/blkverify.c| 11 ---
 block/qcow2.c|  5 ++---
 block/quorum.c   |  6 ++
 6 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index eecbf3e5c4..5716b817ae 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -464,7 +464,6 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 {
 BDRVBlkdebugState *s = bs->opaque;
 QemuOpts *opts;
-Error *local_err = NULL;
 int ret;
 uint64_t align;
 
@@ -494,10 +493,9 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 bs->file = bdrv_open_child(qemu_opt_get(opts, "x-image"), options, "image",
bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-   false, _err);
-if (local_err) {
+   false, errp);
+if (!bs->file) {
 ret = -EINVAL;
-error_propagate(errp, local_err);
 goto out;
 }
 
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 13ae63983b..b7579370a3 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -157,19 +157,17 @@ static int blk_log_writes_open(BlockDriverState *bs, 
QDict *options, int flags,
 /* Open the file */
 bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, false,
-   _err);
-if (local_err) {
+   errp);
+if (!bs->file) {
 ret = -EINVAL;
-error_propagate(errp, local_err);
 goto fail;
 }
 
 /* Open the log file */
 s->log_file = bdrv_open_child(NULL, options, "log", bs, _of_bds,
-  BDRV_CHILD_METADATA, false, _err);
-if (local_err) {
+  BDRV_CHILD_METADATA, false, errp);
+if (!s->log_file) {
 ret = -EINVAL;
-error_propagate(errp, local_err);
 goto fail;
 }
 
diff --git a/block/blkreplay.c b/block/blkreplay.c
index 30a0f5d57a..4a247752fd 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -23,16 +23,14 @@ typedef struct Request {
 static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
-Error *local_err = NULL;
 int ret;
 
 /* Open the image file */
 bs->file = bdrv_open_child(NULL, options, "image", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-   false, _err);
-if (local_err) {
+   false, errp);
+if (!bs->file) {
 ret = -EINVAL;
-error_propagate(errp, local_err);
 goto fail;
 }
 
diff --git a/block/blkverify.c b/block/blkverify.c
index 4aed53ab59..95ae73e2aa 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -112,7 +112,6 @@ static int blkverify_open(BlockDriverState *bs, QDict 
*options, int flags,
 {
 BDRVBlkverifyState *s = bs->opaque;
 QemuOpts *opts;
-Error *local_err = NULL;
 int ret;
 
 opts = qemu_opts_create(_opts, NULL, 0, _abort);
@@ -125,20 +124,18 @@ static int blkverify_open(BlockDriverState *bs, QDict 
*options, int flags,
 bs->file = bdrv_open_child(qemu_opt_get(opts, "x-raw"), options, "raw",
bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-   false, _err);
-if (local_err) {
+   false, errp);
+if (!bs->file) {
 ret = -EINVAL;
-error_propagate(errp, local_err);
 goto fail;
 }
 
 /* Open the test file */
 s->test_file = bdrv_open_child(qemu_opt_get(opts, "x-image"), options,
"test", bs, _of_bds, BDRV_CHILD_DATA,
-   false, _err);
-if (local_err) {
+   false, errp);
+if (!s->test_file) {
 ret = -EINVAL;
-error_propagate(errp, local_err);
 goto fail;
 }
 
diff --git a/block/

[PATCH v2 10/13] block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
It's better to return status together with setting errp. It makes
possible to avoid error propagation.

While being here, put ERRP_GUARD() to fix error_prepend(errp, ...)
usage inside qcow2_store_persistent_dirty_bitmaps() (see the comment
above ERRP_GUARD() definition in include/qapi/error.h)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
---
 block/qcow2.h|  2 +-
 block/qcow2-bitmap.c | 13 ++---
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 3c64dcda33..7884a5088d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -978,7 +978,7 @@ bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
 Qcow2BitmapInfoList **info_list, Error **errp);
 int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
 int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
-void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
+bool qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
   bool release_stored, Error **errp);
 int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
 bool qcow2_co_can_store_new_dirty_bitmap(BlockDriverState *bs,
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 500175f4e8..b8ff347885 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -1534,9 +1534,10 @@ out:
  * readonly to begin with, and whether we opened directly or reopened to that
  * state shouldn't matter for the state we get afterward.
  */
-void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
+bool qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
   bool release_stored, Error **errp)
 {
+ERRP_GUARD();
 BdrvDirtyBitmap *bitmap;
 BDRVQcow2State *s = bs->opaque;
 uint32_t new_nb_bitmaps = s->nb_bitmaps;
@@ -1556,7 +1557,7 @@ void 
qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
 bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
s->bitmap_directory_size, errp);
 if (bm_list == NULL) {
-return;
+return false;
 }
 }
 
@@ -1671,7 +1672,7 @@ success:
 }
 
 bitmap_list_free(bm_list);
-return;
+return true;
 
 fail:
 QSIMPLEQ_FOREACH(bm, bm_list, entry) {
@@ -1689,16 +1690,14 @@ fail:
 }
 
 bitmap_list_free(bm_list);
+return false;
 }
 
 int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp)
 {
 BdrvDirtyBitmap *bitmap;
-Error *local_err = NULL;
 
-qcow2_store_persistent_dirty_bitmaps(bs, false, _err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
+if (!qcow2_store_persistent_dirty_bitmaps(bs, false, errp)) {
 return -EINVAL;
 }
 
-- 
2.21.3




[PATCH v2 00/13] block: deal with errp: part I

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
v2:
01-07: add Greg's and Alberto's r-bs
08: fix wording in commit message
add Greg's r-b
09: fix header_updated logic, add comment, drop unrelated style-change [Alberto]
10: - fix commit header
- add note about ERRP_GUARD in commit message
- add Greg's r-b
11: add Greg's and Alberto's r-bs
12: was "[PATCH 13/14] block/qcow2: qcow2_do_open: deal with errp"
- drop wrong update of qcow2_do_open() (and ERRP_GUARD() fix
for qcow2_do_open becomes unrelated, drop it too, will
update later)
- reword commit message correspondingly
13: add Alberto's r-b

"[PATCH 07/14] block/blklogwrites: drop error propagation" is dropped
for now, as it needs separate small series.

Vladimir Sementsov-Ogievskiy (13):
  block: return status from bdrv_append and friends
  block: use return status of bdrv_append()
  block: check return value of bdrv_open_child and drop error
propagation
  blockdev: fix drive_backup_prepare() missed error
  block: drop extra error propagation for bdrv_set_backing_hd
  block/mirror: drop extra error propagation in commit_active_start()
  blockjob: return status from block_job_set_speed()
  block/qcow2: qcow2_get_specific_info(): drop error propagation
  block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface
  block/qcow2-bitmap: return status from
qcow2_store_persistent_dirty_bitmaps
  block/qcow2: read_cache_sizes: return status value
  block/qcow2: simplify qcow2_co_invalidate_cache()
  block/qed: bdrv_qed_do_open: deal with errp

 block/qcow2.h   |  9 ++---
 include/block/block.h   | 12 +++
 include/block/blockjob.h|  2 +-
 block.c | 50 +++-
 block/backup-top.c  | 20 +---
 block/blkdebug.c|  6 ++--
 block/blklogwrites.c| 10 +++---
 block/blkreplay.c   |  6 ++--
 block/blkverify.c   | 11 +++
 block/commit.c  |  5 +--
 block/mirror.c  | 18 --
 block/qcow2-bitmap.c| 65 +++--
 block/qcow2.c   | 53 --
 block/qed.c | 24 +-
 block/quorum.c  |  6 ++--
 blockdev.c  |  7 ++--
 blockjob.c  | 18 +-
 tests/test-bdrv-graph-mod.c |  6 ++--
 18 files changed, 150 insertions(+), 178 deletions(-)

-- 
2.21.3




[PATCH v2 01/13] block: return status from bdrv_append and friends

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
The recommended use of qemu error api assumes returning status together
with setting errp and avoid void functions with errp parameter. Let's
improve bdrv_append and some friends to reduce error-propagation
overhead in further patches.

Choose int return status, because bdrv_replace_node() has call to
bdrv_check_update_perm(), which reports int status, which seems correct
to propagate.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 include/block/block.h | 12 ++--
 block.c   | 39 ---
 2 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 981ab5b314..a997dbf95b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -336,10 +336,10 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp);
 
 BlockDriverState *bdrv_new(void);
-void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
- Error **errp);
-void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
-   Error **errp);
+int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
+Error **errp);
+int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
+  Error **errp);
 
 int bdrv_parse_aio(const char *mode, int *flags);
 int bdrv_parse_cache_mode(const char *mode, int *flags, bool *writethrough);
@@ -351,8 +351,8 @@ BdrvChild *bdrv_open_child(const char *filename,
BdrvChildRole child_role,
bool allow_none, Error **errp);
 BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp);
-void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
- Error **errp);
+int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
+Error **errp);
 int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
const char *bdref_key, Error **errp);
 BlockDriverState *bdrv_open(const char *filename, const char *reference,
diff --git a/block.c b/block.c
index 9538af4884..f922c6d8f4 100644
--- a/block.c
+++ b/block.c
@@ -2869,14 +2869,15 @@ static BdrvChildRole bdrv_backing_role(BlockDriverState 
*bs)
  * Sets the bs->backing link of a BDS. A new reference is created; callers
  * which don't need their own reference any more must call bdrv_unref().
  */
-void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
+int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
  Error **errp)
 {
+int ret = 0;
 bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
 bdrv_inherits_from_recursive(backing_hd, bs);
 
 if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
-return;
+return -EPERM;
 }
 
 if (backing_hd) {
@@ -2895,15 +2896,22 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 
 bs->backing = bdrv_attach_child(bs, backing_hd, "backing", _of_bds,
 bdrv_backing_role(bs), errp);
+if (!bs->backing) {
+ret = -EINVAL;
+goto out;
+}
+
 /* If backing_hd was already part of bs's backing chain, and
  * inherits_from pointed recursively to bs then let's update it to
  * point directly to bs (else it will become NULL). */
-if (bs->backing && update_inherits_from) {
+if (update_inherits_from) {
 backing_hd->inherits_from = bs;
 }
 
 out:
 bdrv_refresh_limits(bs, NULL);
+
+return ret;
 }
 
 /*
@@ -4553,8 +4561,8 @@ static bool should_update_child(BdrvChild *c, 
BlockDriverState *to)
 return ret;
 }
 
-void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
-   Error **errp)
+int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
+  Error **errp)
 {
 BdrvChild *c, *next;
 GSList *list = NULL, *p;
@@ -4576,6 +4584,7 @@ void bdrv_replace_node(BlockDriverState *from, 
BlockDriverState *to,
 continue;
 }
 if (c->frozen) {
+ret = -EPERM;
 error_setg(errp, "Cannot change '%s' link to '%s'",
c->name, from->node_name);
 goto out;
@@ -4611,6 +4620,8 @@ out:
 g_slist_free(list);
 bdrv_drained_end(from);
 bdrv_unref(from);
+
+return ret;
 }
 
 /*
@@ -4629,20 +4640,16 @@ out:
  * parents of bs_top after bdrv_append() returns. If the caller needs to keep a
  * reference of its own, it must call bdrv_ref().
  */
-void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
- Error **errp)
+int bdrv_append(BlockDriver

[PATCH v2 04/13] blockdev: fix drive_backup_prepare() missed error

2020-09-17 Thread Vladimir Sementsov-Ogievskiy
We leak local_err and don't report failure to the caller. It's
definitely wrong, let's fix.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 blockdev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index b9803e553f..d6bde81ad4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1813,8 +1813,7 @@ static void drive_backup_prepare(BlkActionState *common, 
Error **errp)
 aio_context_acquire(aio_context);
 
 if (set_backing_hd) {
-bdrv_set_backing_hd(target_bs, source, _err);
-if (local_err) {
+if (bdrv_set_backing_hd(target_bs, source, errp) < 0) {
 goto unref;
 }
 }
-- 
2.21.3




Re: [PATCH 02/14] block: use return status of bdrv_append()

2020-09-17 Thread Vladimir Sementsov-Ogievskiy

10.09.2020 19:10, Greg Kurz wrote:

On Wed,  9 Sep 2020 21:59:18 +0300
Vladimir Sementsov-Ogievskiy  wrote:


Now bdrv_append returns status and we can drop all the local_err things
around it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---


Reviewed-by: Greg Kurz 

Just one suggestion for a follow-up below...


  block.c |  5 +
  block/backup-top.c  | 20 


[..]


@@ -253,7 +253,6 @@ void commit_start(const char *job_id, BlockDriverState *bs,
  CommitBlockJob *s;
  BlockDriverState *iter;
  BlockDriverState *commit_top_bs = NULL;
-Error *local_err = NULL;
  int ret;
  


... this is unrelated but while reviewing I've noticed that the ret
variable isn't really needed.



Looking at this now, I'm not quite agreed. I think that avoiding multi-line if 
conditions is a good reason for additional variables, like this ret. A kind of 
taste of course, so if you want you may post a patch, but I don't want do it :)

--
Best regards,
Vladimir



Re: [PATCH 10/14] block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface

2020-09-17 Thread Vladimir Sementsov-Ogievskiy

17.09.2020 19:35, Alberto Garcia wrote:

On Wed 09 Sep 2020 08:59:26 PM CEST, Vladimir Sementsov-Ogievskiy 
 wrote:

-/* qcow2_load_dirty_bitmaps()
- * Return value is a hint for caller: true means that the Qcow2 header was
- * updated. (false doesn't mean that the header should be updated by the
- * caller, it just means that updating was not needed or the image cannot be
- * written to).
- * On failure the function returns false.
- */
-bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp)
+/* Return true on success, false on failure. */
+bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, bool *header_updated,
+  Error **errp)


I think that the documentation should clarify under what conditions
'header_updated' is modified.


  if (s->nb_bitmaps == 0) {
  /* No bitmaps - nothing to do */
-return false;
+return true;
  }


Here is it not for example (should it be set to false?).


Ha, I think, it just shows that patch is wrong :) We should set header_updated 
at least on every success path. Or better always (if it is non-NULL of course). 
Thanks for careful review!




-if (bm_list == NULL) {
+if (!bm_list) {
  return false;
  }


This looks like a cosmetic change unrelated to the rest of the patch.

Berto




--
Best regards,
Vladimir



Re: [PATCH 09/14] block/qcow2: qcow2_get_specific_info(): drop error propagation

2020-09-17 Thread Vladimir Sementsov-Ogievskiy

17.09.2020 19:32, Alberto Garcia wrote:

On Wed 09 Sep 2020 08:59:25 PM CEST, Vladimir Sementsov-Ogievskiy 
 wrote:


+ * On success return true with bm_list set (probably to NULL, if no bitmaps),


" probably " ? :-)


I note this as "set to NULL" is not obvious thing (is it "unset" ? :).. And by "probably" I mean 
"may be", i.e. NULL is just one of possible cases. Probably I use "probably" in a wrong way?




+ * on failure return false with errp set.
   */
-Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
-Error **errp)
+bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
+Qcow2BitmapInfoList **info_list, Error **errp)
  {
  BDRVQcow2State *s = bs->opaque;
  Qcow2BitmapList *bm_list;
  Qcow2Bitmap *bm;
-Qcow2BitmapInfoList *list = NULL;
-Qcow2BitmapInfoList **plist = 


So here 'list' points at NULL and 'plist' at 


Hmm, to be precise, list _is_ NULL (and points nowhere), and plist points to 
list.




-*plist = obj;
-plist = >next;


In the original code 'plist' is updated when you add a new element, so
it always points at the end of the list. But 'list' is unchanged and it
still points at the first element.

So the caller receives a pointer to the first element.


+*info_list = obj;
+info_list = >next;


But in the new code there is only one variable (passed by the caller),
which always points at the end of the list.



No: at first "*info_list = obj", we set the result which user will get, users 
pointer now points to the first object in the list.
Then, at "info_list = >next", we reassign info_list to another pointer: to "next" 
field of first list item. So, all further "*info_list = obj" are note visible to the caller.

Actually, the logic is not changed, just instead of plist we use info_list, and instead of list - a variable which 
should be defined in the caller. Look: in old code, first "*plist = obj" sets "list" variable, but 
all further "*plist = obj" don't change "list" variable.


--
Best regards,
Vladimir



Re: [PATCH 13/14] block/qcow2: qcow2_do_open: deal with errp

2020-09-17 Thread Vladimir Sementsov-Ogievskiy

17.09.2020 19:23, Alberto Garcia wrote:

On Wed 09 Sep 2020 08:59:29 PM CEST, Vladimir Sementsov-Ogievskiy 
 wrote:

1. Drop extra error propagation

2. Set errp always on failure. Generic bdrv_open_driver supports driver
functions which can return negative value and forget to set errp.
That's a strange thing.. Let's improve qcow2_do_open to not behave this
way. This allows to simplify code in qcow2_co_invalidate_cache().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2.c | 16 +++-
  1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 31dd28d19e..cc4e7dd461 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1292,6 +1292,7 @@ static int validate_compression_type(BDRVQcow2State *s, 
Error **errp)
  static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
int flags, Error **errp)
  {
+ERRP_GUARD();


Why is this necessary?


Because error_append_hint() used in the function. Without ERRP_GUARD, 
error_append_hint won't work if errp = _fatal
Read more in include/qapi/error.h near ERRP_GUARD definition.

But yes, it's good to not it in commit message.




  BDRVQcow2State *s = bs->opaque;
  unsigned int len, i;
  int ret = 0;
@@ -1426,6 +1427,8 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
  report_unsupported_feature(errp, feature_table,
 s->incompatible_features &
 ~QCOW2_INCOMPAT_MASK);
+error_setg(errp,
+   "qcow2 header contains unknown
  incompatible_feature bits");


I think that this is a mistake because the previous call to
report_unsupported_feature() already calls error_setg();


Oops, you are right.




@@ -2709,11 +2712,11 @@ static void qcow2_close(BlockDriverState *bs)
  static void coroutine_fn qcow2_co_invalidate_cache(BlockDriverState *bs,
 Error **errp)
  {
+ERRP_GUARD();


Again, why is this necessary?



Because it uses error_prepend() after conversion (same reason as for 
error_append_hint()).

Thanks for review! I'll post v2 soon.

--
Best regards,
Vladimir



Re: [PATCH v5 00/10] preallocate filter

2020-09-17 Thread Vladimir Sementsov-Ogievskiy

01.09.2020 18:07, Max Reitz wrote:

On 27.08.20 23:08, Vladimir Sementsov-Ogievskiy wrote:

21.08.2020 17:11, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

Here is a filter, which does preallocation on write.

In Virtuozzo we have to deal with some custom distributed storage
solution, where allocation is relatively expensive operation. We have to
workaround it in Qemu, so here is a new filter.


I have a problem now with this thing.

We need preallocation. But we don't want to explicitly specify it in all
the management tools.


Why?


So it should be inserted by default.


Why?  You mean without any option?  That seems...  Interesting?

(Also like a recipe for reports of performance regression in some cases.)


It's OK for
us to keep this default different from upstream... But there are
problems with the implicitly inserted filter (actually iotests fail and
I failed to fix them)


I would suspect even if the iotests passed we would end up with a heap
of problems that we would only notice at some later point.  I thought
you too weren’t too fond of the idea of implicit filters.


1. I have to set bs->inherits_from for filter and it's child by hand
after bdrv_replace_node(), otherwise bdrv_check_perm doesn't work.

2. I have to set filter_bs->implicit and teach bdrv_refresh_filename()
to ignore implicit filters when it checks for drv->bdrv_file_open, to
avoid appearing of json in backing file names

3. And the real design problem, which seems impossible to fix: reopen is
broken, just because user is not prepared to the fact that file child is
a filter, not a file node and has another options, and don't support
options of file-posix.


Well, what should I say.  I feel like we have made efforts in the past
years to make the block graph fully visible to users and yield the
responsibility of managing it to the users, too, so I’m not surprised if
a step backwards breaks that.


And seems all it (and mostly [3]) shows that implicitly inserting the
filter is near to be impossible..

So, what are possible solutions?

In virtuozzo7 we have preallocation feature done inside qcow2 driver.
This is very uncomfortable: we should to handle each possible over-EOF
write to underlying node (to keep data_end in sync to be able to shrink
preallocation on close()).. I don't like this way and don't want to port
it..

Another option is implementing preallocation inside file-posix driver.
Then, instead of BDRV_REQ_NO_WAIT flag I'll need to extend serialising
requests API (bdrv_make_request_serialising() is already used in
file-posix.c) to dupport no-wait behavior + expanding the serialising
request bounds. This option seems feasible, so I'll try this way if no
other ideas.


Possible, but you haven’t yet explained what the problem with the
management layer inserting the preallocation filter is.


Filter is obviously the true way: we use generic block layer for native
request serialising, don't need to catch every write in qcow2 driver,
don't need to modify any other driver and get a universal thing. But how
to insert it implicitly (or at least automatically in some cases) and
avoid all the problems?


I don’t understand why inserting it implicitly is important.



You are right. Thanks for strong point of view, this makes me to revise my own. 
Now I'm working on v6.

--
Best regards,
Vladimir



Re: [PATCH v2] qemu-img: Support bitmap --merge into backing image

2020-09-16 Thread Vladimir Sementsov-Ogievskiy

14.09.2020 22:10, Eric Blake wrote:

If you have the chain 'base.qcow2 <- top.qcow2' and want to merge a
bitmap from top into base, qemu-img was failing with:

qemu-img: Could not open 'top.qcow2': Could not open backing file: Failed to get shared 
"write" lock
Is another process using the image [base.qcow2]?

The easiest fix is to not open the entire backing chain of either
image (source or destination); after all, the point of 'qemu-img
bitmap' is solely to manipulate bitmaps directly within a single qcow2
image, and this is made more precise if we don't pay attention to
other images in the chain that may happen to have a bitmap by the same
name.

However, note that during normal usage, it is a feature that qemu will
allow a bitmap from a backing image to be exposed by an overlay BDS;
doing so makes it easier to perform incremental backup, where we have:

Base <- Active <- temporrary
   \--block job ->/

with temporary being fed by a block-copy 'sync' job; when exposing
temporary over NBD, referring to a bitmap that lives only in Active is
less effort than having to copy a bitmap into temporary [1].  So the
testsuite additions in this patch check both where bitmaps get
allocated (the qemu-img info output), and, when NOT using 'qemu-img
bitmap', that bitmaps are indeed visible through a backing chain.

[1] Full disclosure: prior to the recent commit 374eedd1c4 and
friends, we were NOT able to see bitmaps through filters, which meant
that we actually did not have nice clean semantics for uniformly being
able to pick up bitmaps from anywhere in the backing chain (seen as a
change in behavior between qemu 4.1 and 4.2 at commit 00e30f05de, when
block-copy swapped from a one-off to a filter).  Which means libvirt
was already coded to copy bitmaps around for the sake of older qemu,
even though modern qemu no longer needs it.  Oh well.

Fixes: http://bugzilla.redhat.com/1877209
Reported-by: Eyal Shenitzky 
Signed-off-by: Eric Blake 
---


Honestly I don't want to bother with carefully checking new test output, at 
least I see that it doesn't produce errors:)
Code change seems obvious.

Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



[PATCH v6 4/5] block/io: fix bdrv_is_allocated_above

2020-09-16 Thread Vladimir Sementsov-Ogievskiy
bdrv_is_allocated_above wrongly handles short backing files: it reports
after-EOF space as UNALLOCATED which is wrong, as on read the data is
generated on the level of short backing file (if all overlays has
unallocated area at that place).

Reusing bdrv_common_block_status_above fixes the issue and unifies code
path.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/io.c | 43 +--
 1 file changed, 5 insertions(+), 38 deletions(-)

diff --git a/block/io.c b/block/io.c
index d864d035ac..95b86429ca 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2491,52 +2491,19 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
*bs, int64_t offset,
  * at 'offset + *pnum' may return the same allocation status (in other
  * words, the result is not necessarily the maximum possible range);
  * but 'pnum' will only be 0 when end of file is reached.
- *
  */
 int bdrv_is_allocated_above(BlockDriverState *top,
 BlockDriverState *base,
 bool include_base, int64_t offset,
 int64_t bytes, int64_t *pnum)
 {
-BlockDriverState *intermediate;
-int ret;
-int64_t n = bytes;
-
-assert(base || !include_base);
-
-intermediate = top;
-while (include_base || intermediate != base) {
-int64_t pnum_inter;
-int64_t size_inter;
-
-assert(intermediate);
-ret = bdrv_is_allocated(intermediate, offset, bytes, _inter);
-if (ret < 0) {
-return ret;
-}
-if (ret) {
-*pnum = pnum_inter;
-return 1;
-}
-
-size_inter = bdrv_getlength(intermediate);
-if (size_inter < 0) {
-return size_inter;
-}
-if (n > pnum_inter &&
-(intermediate == top || offset + pnum_inter < size_inter)) {
-n = pnum_inter;
-}
-
-if (intermediate == base) {
-break;
-}
-
-intermediate = backing_bs(intermediate);
+int ret = bdrv_common_block_status_above(top, base, include_base, false,
+ offset, bytes, pnum, NULL, NULL);
+if (ret < 0) {
+return ret;
 }
 
-*pnum = n;
-return 0;
+return !!(ret & BDRV_BLOCK_ALLOCATED);
 }
 
 int coroutine_fn
-- 
2.21.3




[PATCH v6 3/5] block/io: bdrv_common_block_status_above: support bs == base

2020-09-16 Thread Vladimir Sementsov-Ogievskiy
We are going to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
include_base == false and still bs == base (for ex. from img_rebase()).

So, support this corner case.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block/io.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 0cc2dd7a3e..d864d035ac 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2371,9 +2371,13 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 BlockDriverState *p;
 int64_t eof = 0;
 
-assert(include_base || bs != base);
 assert(!include_base || base); /* Can't include NULL base */
 
+if (!include_base && bs == base) {
+*pnum = bytes;
+return 0;
+}
+
 ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
 if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
 return ret;
-- 
2.21.3




[PATCH v6 2/5] block/io: bdrv_common_block_status_above: support include_base

2020-09-16 Thread Vladimir Sementsov-Ogievskiy
In order to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above, let's support include_base parameter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/coroutines.h |  2 ++
 block/io.c | 17 -
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index f69179f5ef..1cb3128b94 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -41,6 +41,7 @@ bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int 
bytes,
 int coroutine_fn
 bdrv_co_common_block_status_above(BlockDriverState *bs,
   BlockDriverState *base,
+  bool include_base,
   bool want_zero,
   int64_t offset,
   int64_t bytes,
@@ -50,6 +51,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 int generated_co_wrapper
 bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
+   bool include_base,
bool want_zero,
int64_t offset,
int64_t bytes,
diff --git a/block/io.c b/block/io.c
index e381d2da35..0cc2dd7a3e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2359,6 +2359,7 @@ early_out:
 int coroutine_fn
 bdrv_co_common_block_status_above(BlockDriverState *bs,
   BlockDriverState *base,
+  bool include_base,
   bool want_zero,
   int64_t offset,
   int64_t bytes,
@@ -2370,10 +2371,11 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 BlockDriverState *p;
 int64_t eof = 0;
 
-assert(bs != base);
+assert(include_base || bs != base);
+assert(!include_base || base); /* Can't include NULL base */
 
 ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
-if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
+if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
 return ret;
 }
 
@@ -2384,7 +2386,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 assert(*pnum <= bytes);
 bytes = *pnum;
 
-for (p = backing_bs(bs); p != base; p = backing_bs(p)) {
+for (p = backing_bs(bs); include_base || p != base; p = backing_bs(p)) {
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
file);
 if (ret < 0) {
@@ -2420,6 +2422,11 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 break;
 }
 
+if (p == base) {
+assert(include_base);
+break;
+}
+
 /*
  * OK, [offset, offset + *pnum) region is unallocated on this layer,
  * let's continue the diving.
@@ -2439,7 +2446,7 @@ int bdrv_block_status_above(BlockDriverState *bs, 
BlockDriverState *base,
 int64_t offset, int64_t bytes, int64_t *pnum,
 int64_t *map, BlockDriverState **file)
 {
-return bdrv_common_block_status_above(bs, base, true, offset, bytes,
+return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
   pnum, map, file);
 }
 
@@ -2456,7 +2463,7 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, 
int64_t offset,
 int ret;
 int64_t dummy;
 
-ret = bdrv_common_block_status_above(bs, backing_bs(bs), false, offset,
+ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
  bytes, pnum ? pnum : , NULL,
  NULL);
 if (ret < 0) {
-- 
2.21.3




[PATCH v6 1/5] block/io: fix bdrv_co_block_status_above

2020-09-16 Thread Vladimir Sementsov-Ogievskiy
bdrv_co_block_status_above has several design problems with handling
short backing files:

1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
which produces these after-EOF zeros is inside requested backing
sequence.

2. With want_zero=false, it may return pnum=0 prior to actual EOF,
because of EOF of short backing file.

Fix these things, making logic about short backing files clearer.

With fixed bdrv_block_status_above we also have to improve is_zero in
qcow2 code, otherwise iotest 154 will fail, because with this patch we
stop to merge zeros of different types (produced by fully unallocated
in the whole backing chain regions vs produced by short backing files).

Note also, that this patch leaves for another day the general problem
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
vs go-to-backing.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c| 66 ---
 block/qcow2.c | 16 +++--
 2 files changed, 66 insertions(+), 16 deletions(-)

diff --git a/block/io.c b/block/io.c
index 84f82bc069..e381d2da35 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2366,34 +2366,72 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
   int64_t *map,
   BlockDriverState **file)
 {
+int ret;
 BlockDriverState *p;
-int ret = 0;
-bool first = true;
+int64_t eof = 0;
 
 assert(bs != base);
-for (p = bs; p != base; p = backing_bs(p)) {
+
+ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
+if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
+return ret;
+}
+
+if (ret & BDRV_BLOCK_EOF) {
+eof = offset + *pnum;
+}
+
+assert(*pnum <= bytes);
+bytes = *pnum;
+
+for (p = backing_bs(bs); p != base; p = backing_bs(p)) {
 ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
file);
 if (ret < 0) {
-break;
+return ret;
 }
-if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+if (*pnum == 0) {
 /*
- * Reading beyond the end of the file continues to read
- * zeroes, but we can only widen the result to the
- * unallocated length we learned from an earlier
- * iteration.
+ * The top layer deferred to this layer, and because this layer is
+ * short, any zeroes that we synthesize beyond EOF behave as if 
they
+ * were allocated at this layer.
+ *
+ * We don't include BDRV_BLOCK_EOF into ret, as upper layer may be
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
+ * below.
  */
+assert(ret & BDRV_BLOCK_EOF);
 *pnum = bytes;
+if (file) {
+*file = p;
+}
+ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
+break;
 }
-if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
+if (ret & BDRV_BLOCK_ALLOCATED) {
+/*
+ * We've found the node and the status, we must break.
+ *
+ * Drop BDRV_BLOCK_EOF, as it's not for upper layer, which may be
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
+ * below.
+ */
+ret &= ~BDRV_BLOCK_EOF;
 break;
 }
-/* [offset, pnum] unallocated on this layer, which could be only
- * the first part of [offset, bytes].  */
-bytes = MIN(bytes, *pnum);
-first = false;
+
+/*
+ * OK, [offset, offset + *pnum) region is unallocated on this layer,
+ * let's continue the diving.
+ */
+assert(*pnum <= bytes);
+bytes = *pnum;
 }
+
+if (offset + *pnum == eof) {
+ret |= BDRV_BLOCK_EOF;
+}
+
 return ret;
 }
 
diff --git a/block/qcow2.c b/block/qcow2.c
index da56b1a4df..15ba0ce81a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3872,8 +3872,20 @@ static bool is_zero(BlockDriverState *bs, int64_t 
offset, int64_t bytes)
 if (!bytes) {
 return true;
 }
-res = bdrv_block_status_above(bs, NULL, offset, bytes, , NULL, NULL);
-return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
+
+/*
+ * bdrv_block_status_above doesn't merge different types of zeros, for
+ * example, zeros which come from the region which is unallocated in
+ * the whole backing chain, and zeros which comes because of a short
+ * backing file. So, we need a loop.
+ */
+do {
+res = bdrv_block_status_above(bs, NULL, offset, bytes, , NULL, 
NULL);
+offset +

[PATCH v6 5/5] iotests: add commit top->base cases to 274

2020-09-16 Thread Vladimir Sementsov-Ogievskiy
These cases are fixed by previous patches around block_status and
is_allocated.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/274 | 20 +++
 tests/qemu-iotests/274.out | 68 ++
 2 files changed, 88 insertions(+)

diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
index d4571c5465..76b1ba6a52 100755
--- a/tests/qemu-iotests/274
+++ b/tests/qemu-iotests/274
@@ -115,6 +115,26 @@ with iotests.FilePath('base') as base, \
 iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
 iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
 
+iotests.log('=== Testing qemu-img commit (top -> base) ===')
+
+create_chain()
+iotests.qemu_img_log('commit', '-b', base, top)
+iotests.img_info_log(base)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), 
base)
+
+iotests.log('=== Testing QMP active commit (top -> base) ===')
+
+create_chain()
+with create_vm() as vm:
+vm.launch()
+vm.qmp_log('block-commit', device='top', base_node='base',
+   job_id='job0', auto_dismiss=False)
+vm.run_job('job0', wait=5)
+
+iotests.img_info_log(mid)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), 
base)
 
 iotests.log('== Resize tests ==')
 
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
index bf5abd4c10..cfe17a8659 100644
--- a/tests/qemu-iotests/274.out
+++ b/tests/qemu-iotests/274.out
@@ -135,6 +135,74 @@ read 1048576/1048576 bytes at offset 0
 read 1048576/1048576 bytes at offset 1048576
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+=== Testing qemu-img commit (top -> base) ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base 
backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid 
backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
+
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Image committed.
+
+image: TEST_IMG
+file format: IMGFMT
+virtual size: 2 MiB (2097152 bytes)
+cluster_size: 65536
+Format specific information:
+compat: 1.1
+compression type: zlib
+lazy refcounts: false
+refcount bits: 16
+corrupt: false
+extended l2: false
+
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing QMP active commit (top -> base) ===
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base 
backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
+
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid 
backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
+
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": 
"base", "device": "top", "job-id": "job0"}}
+{"return": {}}
+{"execute": "job-complete", "arguments": {"id": "job0"}}
+{"return": {}}
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, 
"type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": 
"USECS", "seconds": "SECS"}}
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, 
"type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "job-dismiss", "arguments": {"id": "job0"}}
+{"return": {}}
+image: TEST_IMG
+file format: 

[PATCH v6 0/5] fix & merge block_status_above and is_allocated_above

2020-09-16 Thread Vladimir Sementsov-Ogievskiy
Hi all!

These series are here to address the following problem:
block-status-above functions may consider space after EOF of
intermediate backing files as unallocated, which is wrong, as these
backing files are the reason of producing zeroes, we never go further by
backing chain after a short backing file. So, if such short-backing file
is _inside_ requested sub-chain of the backing chain, we should never
report space after its EOF as unallocated.

See patches 01,04,05 for details.

Note, that this series leaves for another day the general problem
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
vs go-to-backing.
Audit for this problem is done here:
"backing chain & block status & filters"
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg04706.html
And I'm going to prepare series to address this problem.

Also, get_block_status func have same disease, but remains unfixed here:
I want to make separate series for it.

v6:

01: handle EOF better, don't merge reported ZERO-es with automatic after-EOF 
zeroes,
handle first layer out of the loop to make code read simpler. Drop r-b.
02: rebase on 01
05: update test output for extended l2 and backing file format, keep r-b

Based on series "PATCH v8 0/7] coroutines: generate wrapper code" or
in other words:
Based-on: <20200915164411.20590-1-vsement...@virtuozzo.com>

Vladimir Sementsov-Ogievskiy (5):
  block/io: fix bdrv_co_block_status_above
  block/io: bdrv_common_block_status_above: support include_base
  block/io: bdrv_common_block_status_above: support bs == base
  block/io: fix bdrv_is_allocated_above
  iotests: add commit top->base cases to 274

 block/coroutines.h |   2 +
 block/io.c | 126 +
 block/qcow2.c  |  16 -
 tests/qemu-iotests/274 |  20 ++
 tests/qemu-iotests/274.out |  68 
 5 files changed, 175 insertions(+), 57 deletions(-)

-- 
2.21.3




Re: [PATCH v8 4/7] scripts: add block-coroutine-wrapper.py

2020-09-15 Thread Vladimir Sementsov-Ogievskiy

15.09.2020 19:44, Vladimir Sementsov-Ogievskiy wrote:

We have a very frequent pattern of creating coroutine from function
with several arguments:

   - create structure to pack parameters
   - create _entry function to call original function taking parameters
 from struct
   - do different magic to handle completion: set ret to NOT_DONE or
 EINPROGRESS or use separate bool field
   - fill the struct and create coroutine from _entry function and this
 struct as a parameter
   - do coroutine enter and BDRV_POLL_WHILE loop

Let's reduce code duplication by generating coroutine wrappers.

This patch adds scripts/block-coroutine-wrapper.py together with some
friends, which will generate functions with declared prototypes marked
by 'generated_co_wrapper' specifier.

The usage of new code generation is as follows:

 1. define somewhere

 int coroutine_fn bdrv_co_NAME(...) {...}

function

 2. declare in some header file

 int generated_co_wrapper bdrv_NAME(...);

function with same list of parameters. (you'll need to include
"block/generated-co-wrapper.h" to get the specifier)

 3. both declarations should be available through block/coroutines.h
header.

 4. add header with generated_co_wrapper declaration into
COROUTINE_HEADERS list in Makefile

Still, no function is now marked, this work is for the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy
---
  docs/devel/block-coroutine-wrapper.rst |  54 +++
  block/block-gen.h  |  49 +++
  include/block/block.h  |  10 ++
  block/meson.build  |   8 ++
  scripts/block-coroutine-wrapper.py | 187 +
  5 files changed, 308 insertions(+)
  create mode 100644 docs/devel/block-coroutine-wrapper.rst
  create mode 100644 block/block-gen.h
  create mode 100755 scripts/block-coroutine-wrapper.py



Also needed:

diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 04773ce076..cb0abe1e69 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -31,3 +31,4 @@ Contents:
reset
s390-dasd-ipl
clocks
+   block-coroutine-wrapper

--
Best regards,
Vladimir



[PATCH v8 1/7] block: return error-code from bdrv_invalidate_cache

2020-09-15 Thread Vladimir Sementsov-Ogievskiy
This is the only coroutine wrapper from block.c and block/io.c which
doesn't return a value, so let's convert it to the common behavior, to
simplify moving to generated coroutine wrappers in a further commit.

Also, bdrv_invalidate_cache is a void function, returning error only
through **errp parameter, which is considered to be bad practice, as
it forces callers to define and propagate local_err variable, so
conversion is good anyway.

This patch leaves the conversion of .bdrv_co_invalidate_cache() driver
callbacks and bdrv_invalidate_cache_all() for another day.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 include/block/block.h |  2 +-
 block.c   | 32 ++--
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 6e36154061..8aef849a75 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -460,7 +460,7 @@ void bdrv_aio_cancel_async(BlockAIOCB *acb);
 int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
 
 /* Invalidate any cached metadata used by image formats */
-void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
+int bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
 void bdrv_invalidate_cache_all(Error **errp);
 int bdrv_inactivate_all(void);
 
diff --git a/block.c b/block.c
index 2ba76b2c36..ccfe1d851b 100644
--- a/block.c
+++ b/block.c
@@ -5649,8 +5649,8 @@ void bdrv_init_with_whitelist(void)
 bdrv_init();
 }
 
-static void coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
-  Error **errp)
+static int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
+ Error **errp)
 {
 BdrvChild *child, *parent;
 uint64_t perm, shared_perm;
@@ -5659,14 +5659,14 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 BdrvDirtyBitmap *bm;
 
 if (!bs->drv)  {
-return;
+return -ENOMEDIUM;
 }
 
 QLIST_FOREACH(child, >children, next) {
 bdrv_co_invalidate_cache(child->bs, _err);
 if (local_err) {
 error_propagate(errp, local_err);
-return;
+return -EINVAL;
 }
 }
 
@@ -5689,7 +5689,7 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 ret = bdrv_check_perm(bs, NULL, perm, shared_perm, NULL, NULL, errp);
 if (ret < 0) {
 bs->open_flags |= BDRV_O_INACTIVE;
-return;
+return ret;
 }
 bdrv_set_perm(bs, perm, shared_perm);
 
@@ -5698,7 +5698,7 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 if (local_err) {
 bs->open_flags |= BDRV_O_INACTIVE;
 error_propagate(errp, local_err);
-return;
+return -EINVAL;
 }
 }
 
@@ -5710,7 +5710,7 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 if (ret < 0) {
 bs->open_flags |= BDRV_O_INACTIVE;
 error_setg_errno(errp, -ret, "Could not refresh total sector 
count");
-return;
+return ret;
 }
 }
 
@@ -5720,27 +5720,30 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 if (local_err) {
 bs->open_flags |= BDRV_O_INACTIVE;
 error_propagate(errp, local_err);
-return;
+return -EINVAL;
 }
 }
 }
+
+return 0;
 }
 
 typedef struct InvalidateCacheCo {
 BlockDriverState *bs;
 Error **errp;
 bool done;
+int ret;
 } InvalidateCacheCo;
 
 static void coroutine_fn bdrv_invalidate_cache_co_entry(void *opaque)
 {
 InvalidateCacheCo *ico = opaque;
-bdrv_co_invalidate_cache(ico->bs, ico->errp);
+ico->ret = bdrv_co_invalidate_cache(ico->bs, ico->errp);
 ico->done = true;
 aio_wait_kick();
 }
 
-void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
+int bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
 Coroutine *co;
 InvalidateCacheCo ico = {
@@ -5757,22 +5760,23 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error 
**errp)
 bdrv_coroutine_enter(bs, co);
 BDRV_POLL_WHILE(bs, !ico.done);
 }
+
+return ico.ret;
 }
 
 void bdrv_invalidate_cache_all(Error **errp)
 {
 BlockDriverState *bs;
-Error *local_err = NULL;
 BdrvNextIterator it;
 
 for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
+int ret;
 
 aio_context_acquire(aio_context);
-bdrv_invalidate_cache(bs, _err);
+ret = bdrv_invalidate_cache(bs, errp);
 aio_context_release(aio_context);
-if (local_err) {
-error_propagate(errp, local_err);

[PATCH v8 7/7] block/io: refactor save/load vmstate

2020-09-15 Thread Vladimir Sementsov-Ogievskiy
Like for read/write in a previous commit, drop extra indirection layer,
generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/coroutines.h| 10 +++
 include/block/block.h |  6 ++--
 block/io.c| 67 ++-
 3 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 6c63a819c9..f69179f5ef 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -57,11 +57,9 @@ bdrv_common_block_status_above(BlockDriverState *bs,
int64_t *map,
BlockDriverState **file);
 
-int coroutine_fn
-bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-   bool is_read);
-int generated_co_wrapper
-bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-bool is_read);
+int coroutine_fn bdrv_co_readv_vmstate(BlockDriverState *bs,
+   QEMUIOVector *qiov, int64_t pos);
+int coroutine_fn bdrv_co_writev_vmstate(BlockDriverState *bs,
+QEMUIOVector *qiov, int64_t pos);
 
 #endif /* BLOCK_COROUTINES_INT_H */
diff --git a/include/block/block.h b/include/block/block.h
index b8b4c177de..6cd789724b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -572,8 +572,10 @@ int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
 char *path_combine(const char *base_path, const char *filename);
 
-int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
-int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
+int generated_co_wrapper
+bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
+int generated_co_wrapper
+bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
   int64_t pos, int size);
 
diff --git a/block/io.c b/block/io.c
index 68d7d9cf80..84f82bc069 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2491,66 +2491,67 @@ int bdrv_is_allocated_above(BlockDriverState *top,
 }
 
 int coroutine_fn
-bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-   bool is_read)
+bdrv_co_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
 {
 BlockDriver *drv = bs->drv;
 int ret = -ENOTSUP;
 
+if (!drv) {
+return -ENOMEDIUM;
+}
+
 bdrv_inc_in_flight(bs);
 
-if (!drv) {
-ret = -ENOMEDIUM;
-} else if (drv->bdrv_load_vmstate) {
-if (is_read) {
-ret = drv->bdrv_load_vmstate(bs, qiov, pos);
-} else {
-ret = drv->bdrv_save_vmstate(bs, qiov, pos);
-}
+if (drv->bdrv_load_vmstate) {
+ret = drv->bdrv_load_vmstate(bs, qiov, pos);
 } else if (bs->file) {
-ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
+ret = bdrv_co_readv_vmstate(bs->file->bs, qiov, pos);
 }
 
 bdrv_dec_in_flight(bs);
+
 return ret;
 }
 
-int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
-  int64_t pos, int size)
+int coroutine_fn
+bdrv_co_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
 {
-QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, size);
-int ret;
+BlockDriver *drv = bs->drv;
+int ret = -ENOTSUP;
 
-ret = bdrv_writev_vmstate(bs, , pos);
-if (ret < 0) {
-return ret;
+if (!drv) {
+return -ENOMEDIUM;
 }
 
-return size;
-}
+bdrv_inc_in_flight(bs);
 
-int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
-{
-return bdrv_rw_vmstate(bs, qiov, pos, false);
+if (drv->bdrv_load_vmstate) {
+ret = drv->bdrv_save_vmstate(bs, qiov, pos);
+} else if (bs->file) {
+ret = bdrv_co_writev_vmstate(bs->file->bs, qiov, pos);
+}
+
+bdrv_dec_in_flight(bs);
+
+return ret;
 }
 
-int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
+int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
   int64_t pos, int size)
 {
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, size);
-int ret;
-
-ret = bdrv_readv_vmstate(bs, , pos);
-if (ret < 0) {
-return ret;
-}
+int ret = bdrv_writev_vmstate(bs, , pos);
 
-return size;
+return ret < 0 ? ret : size;
 }
 
-int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
+int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
+  int64_t pos, int size)
 {
-return bdrv_rw_vmstate(bs, qiov, pos, true);
+QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, size);
+int ret = bdrv_

[PATCH v8 5/7] block: generate coroutine-wrapper code

2020-09-15 Thread Vladimir Sementsov-Ogievskiy
Use code generation implemented in previous commit to generated
coroutine wrappers in block.c and block/io.c

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/coroutines.h|   6 +-
 include/block/block.h |  16 ++--
 block.c   |  73 ---
 block/io.c| 212 --
 4 files changed, 13 insertions(+), 294 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 9ce1730a09..c62b3a2697 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -34,7 +34,7 @@ int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState 
*bs, Error **errp);
 int coroutine_fn
 bdrv_co_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
  bool is_write, BdrvRequestFlags flags);
-int
+int generated_co_wrapper
 bdrv_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
   bool is_write, BdrvRequestFlags flags);
 
@@ -47,7 +47,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
   int64_t *pnum,
   int64_t *map,
   BlockDriverState **file);
-int
+int generated_co_wrapper
 bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
bool want_zero,
@@ -60,7 +60,7 @@ bdrv_common_block_status_above(BlockDriverState *bs,
 int coroutine_fn
 bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
bool is_read);
-int
+int generated_co_wrapper
 bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
 bool is_read);
 
diff --git a/include/block/block.h b/include/block/block.h
index a0655b84d6..d8fb02fa2a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -403,8 +403,9 @@ void bdrv_refresh_filename(BlockDriverState *bs);
 int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
   PreallocMode prealloc, BdrvRequestFlags 
flags,
   Error **errp);
-int bdrv_truncate(BdrvChild *child, int64_t offset, bool exact,
-  PreallocMode prealloc, BdrvRequestFlags flags, Error **errp);
+int generated_co_wrapper
+bdrv_truncate(BdrvChild *child, int64_t offset, bool exact,
+  PreallocMode prealloc, BdrvRequestFlags flags, Error **errp);
 
 int64_t bdrv_nb_sectors(BlockDriverState *bs);
 int64_t bdrv_getlength(BlockDriverState *bs);
@@ -446,7 +447,8 @@ typedef enum {
 BDRV_FIX_ERRORS   = 2,
 } BdrvCheckMode;
 
-int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix);
+int generated_co_wrapper bdrv_check(BlockDriverState *bs, BdrvCheckResult *res,
+BdrvCheckMode fix);
 
 /* The units of offset and total_work_size may be chosen arbitrarily by the
  * block driver; total_work_size may change during the course of the amendment
@@ -470,12 +472,13 @@ void bdrv_aio_cancel_async(BlockAIOCB *acb);
 int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
 
 /* Invalidate any cached metadata used by image formats */
-int bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
+int generated_co_wrapper bdrv_invalidate_cache(BlockDriverState *bs,
+   Error **errp);
 void bdrv_invalidate_cache_all(Error **errp);
 int bdrv_inactivate_all(void);
 
 /* Ensure contents are flushed to disk.  */
-int bdrv_flush(BlockDriverState *bs);
+int generated_co_wrapper bdrv_flush(BlockDriverState *bs);
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
 int bdrv_flush_all(void);
 void bdrv_close_all(void);
@@ -490,7 +493,8 @@ void bdrv_drain_all(void);
 AIO_WAIT_WHILE(bdrv_get_aio_context(bs_),  \
cond); })
 
-int bdrv_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
+int generated_co_wrapper bdrv_pdiscard(BdrvChild *child, int64_t offset,
+   int64_t bytes);
 int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
diff --git a/block.c b/block.c
index ec5a8cbd7b..d49d591917 100644
--- a/block.c
+++ b/block.c
@@ -4655,43 +4655,6 @@ int coroutine_fn bdrv_co_check(BlockDriverState *bs,
 return bs->drv->bdrv_co_check(bs, res, fix);
 }
 
-typedef struct CheckCo {
-BlockDriverState *bs;
-BdrvCheckResult *res;
-BdrvCheckMode fix;
-int ret;
-} CheckCo;
-
-static void coroutine_fn bdrv_check_co_entry(void *opaque)
-{
-CheckCo *cco = opaque;
-cco->ret = bdrv_co_check(cco->bs, cco->res, cco->fix);
-aio_wait_kick();
-}
-
-int bdrv_check(BlockDriverState *bs,
-   BdrvCheckResult *res, BdrvCheckMode fix)
-{
-Coroutine *co;
-CheckCo cco = {
-.bs = bs,
-.res = res,
-.ret = -EINPROGR

[PATCH v8 4/7] scripts: add block-coroutine-wrapper.py

2020-09-15 Thread Vladimir Sementsov-Ogievskiy
We have a very frequent pattern of creating coroutine from function
with several arguments:

  - create structure to pack parameters
  - create _entry function to call original function taking parameters
from struct
  - do different magic to handle completion: set ret to NOT_DONE or
EINPROGRESS or use separate bool field
  - fill the struct and create coroutine from _entry function and this
struct as a parameter
  - do coroutine enter and BDRV_POLL_WHILE loop

Let's reduce code duplication by generating coroutine wrappers.

This patch adds scripts/block-coroutine-wrapper.py together with some
friends, which will generate functions with declared prototypes marked
by 'generated_co_wrapper' specifier.

The usage of new code generation is as follows:

1. define somewhere

int coroutine_fn bdrv_co_NAME(...) {...}

   function

2. declare in some header file

int generated_co_wrapper bdrv_NAME(...);

   function with same list of parameters. (you'll need to include
   "block/generated-co-wrapper.h" to get the specifier)

3. both declarations should be available through block/coroutines.h
   header.

4. add header with generated_co_wrapper declaration into
   COROUTINE_HEADERS list in Makefile

Still, no function is now marked, this work is for the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 docs/devel/block-coroutine-wrapper.rst |  54 +++
 block/block-gen.h  |  49 +++
 include/block/block.h  |  10 ++
 block/meson.build  |   8 ++
 scripts/block-coroutine-wrapper.py | 187 +
 5 files changed, 308 insertions(+)
 create mode 100644 docs/devel/block-coroutine-wrapper.rst
 create mode 100644 block/block-gen.h
 create mode 100755 scripts/block-coroutine-wrapper.py

diff --git a/docs/devel/block-coroutine-wrapper.rst 
b/docs/devel/block-coroutine-wrapper.rst
new file mode 100644
index 00..f7050bbc8f
--- /dev/null
+++ b/docs/devel/block-coroutine-wrapper.rst
@@ -0,0 +1,54 @@
+===
+block-coroutine-wrapper
+===
+
+A lot of functions in QEMJ block layer (see ``block/*``) can by called
+only in coroutine context. Such functions are normally marked by
+coroutine_fn specifier. Still, sometimes we need to call them from
+non-coroutine context, for this we need to start a coroutine, run the
+needed function from it and wait for coroutine finish in
+BDRV_POLL_WHILE() loop. To run a coroutine we need a function with one
+void* argument. So for each coroutine_fn function, which needs
+non-coroutine interface, we should define a structure to pack the
+parameters, define a separate function to unpack the parameters and
+call the original function and finally define a new interface function
+with same list of arguments as original one, which will pack the
+parameters into a struct, create a coroutine, run it and wait in
+BDRV_POLL_WHILE() loop. It's boring to create such wrappers by hand, so
+we have a script to generate them.
+
+Usage
+=
+
+Assume we have defined ``coroutine_fn`` function
+``bdrv_co_foo()`` and need a non-coroutine interface for it,
+called ``bdrv_foo()``. In this case the script can help. To
+trigger the generation:
+
+1. You need ``bdrv_foo`` declaration somewhere (for example in
+   ``block/coroutines.h`` with ``generated_co_wrapper`` mark,
+   like this:
+
+.. code-block:: c
+
+int generated_co_wrapper bdrv_foor();
+
+2. You need to feed this declaration to block-coroutine-wrapper script.
+   For this, add .h (or .c) file with the declaration to
+   ``input: files(...)`` list of ``block_gen_c`` target declaration in
+   ``block/meson.build``
+
+You are done. On build, coroutine wrappers will be generated in
+``/block/block-gen.c``.
+
+Links
+=
+
+1. The script location is ``scripts/block-coroutine-wrapper.py``.
+
+2. Generic place for private ``generated_co_wrapper`` declarations is
+   ``block/coroutines.h``, for public declarations:
+   ``include/block/block.h``
+
+3. The core API of generated coroutine wrappers is placed in
+   (not generated) ``block/block-gen.h``
diff --git a/block/block-gen.h b/block/block-gen.h
new file mode 100644
index 00..f80cf4897d
--- /dev/null
+++ b/block/block-gen.h
@@ -0,0 +1,49 @@
+/*
+ * Block coroutine wrapping core, used by auto-generated block/block-gen.c
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ * Copyright (c) 2020 Virtuozzo International GmbH
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ *

[PATCH v8 6/7] block: drop bdrv_prwv

2020-09-15 Thread Vladimir Sementsov-Ogievskiy
Now that we are not maintaining boilerplate code for coroutine
wrappers, there is no more sense in keeping the extra indirection layer
of bdrv_prwv().  Let's drop it and instead generate pure bdrv_preadv()
and bdrv_pwritev().

Currently, bdrv_pwritev() and bdrv_preadv() are returning bytes on
success, auto generated functions will instead return zero, as their
_co_ prototype. Still, it's simple to make the conversion safe: the
only external user of bdrv_pwritev() is test-bdrv-drain, and it is
comfortable enough with bdrv_co_pwritev() instead. So prototypes are
moved to local block/coroutines.h. Next, the only internal use is
bdrv_pread() and bdrv_pwrite(), which are modified to return bytes on
success.

Of course, it would be great to convert bdrv_pread() and bdrv_pwrite()
to return 0 on success. But this requires audit (and probably
conversion) of all their users, let's leave it for another day
refactoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/coroutines.h  | 10 -
 include/block/block.h   |  2 --
 block/io.c  | 49 -
 tests/test-bdrv-drain.c |  2 +-
 4 files changed, 15 insertions(+), 48 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index c62b3a2697..6c63a819c9 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -31,12 +31,12 @@ int coroutine_fn bdrv_co_check(BlockDriverState *bs,
BdrvCheckResult *res, BdrvCheckMode fix);
 int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp);
 
-int coroutine_fn
-bdrv_co_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
- bool is_write, BdrvRequestFlags flags);
 int generated_co_wrapper
-bdrv_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
-  bool is_write, BdrvRequestFlags flags);
+bdrv_preadv(BdrvChild *child, int64_t offset, unsigned int bytes,
+QEMUIOVector *qiov, BdrvRequestFlags flags);
+int generated_co_wrapper
+bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags);
 
 int coroutine_fn
 bdrv_co_common_block_status_above(BlockDriverState *bs,
diff --git a/include/block/block.h b/include/block/block.h
index d8fb02fa2a..b8b4c177de 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -383,9 +383,7 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
int bytes, BdrvRequestFlags flags);
 int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags);
 int bdrv_pread(BdrvChild *child, int64_t offset, void *buf, int bytes);
-int bdrv_preadv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov);
 int bdrv_pwrite(BdrvChild *child, int64_t offset, const void *buf, int bytes);
-int bdrv_pwritev(BdrvChild *child, int64_t offset, QEMUIOVector *qiov);
 int bdrv_pwrite_sync(BdrvChild *child, int64_t offset,
  const void *buf, int count);
 /*
diff --git a/block/io.c b/block/io.c
index 5270d68d72..68d7d9cf80 100644
--- a/block/io.c
+++ b/block/io.c
@@ -890,23 +890,11 @@ static int bdrv_check_byte_request(BlockDriverState *bs, 
int64_t offset,
 return 0;
 }
 
-int coroutine_fn bdrv_co_prwv(BdrvChild *child, int64_t offset,
-  QEMUIOVector *qiov, bool is_write,
-  BdrvRequestFlags flags)
-{
-if (is_write) {
-return bdrv_co_pwritev(child, offset, qiov->size, qiov, flags);
-} else {
-return bdrv_co_preadv(child, offset, qiov->size, qiov, flags);
-}
-}
-
 int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
int bytes, BdrvRequestFlags flags)
 {
-QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, bytes);
-
-return bdrv_prwv(child, offset, , true, BDRV_REQ_ZERO_WRITE | flags);
+return bdrv_pwritev(child, offset, bytes, NULL,
+BDRV_REQ_ZERO_WRITE | flags);
 }
 
 /*
@@ -950,41 +938,19 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags 
flags)
 }
 }
 
-/* return < 0 if error. See bdrv_pwrite() for the return codes */
-int bdrv_preadv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov)
-{
-int ret;
-
-ret = bdrv_prwv(child, offset, qiov, false, 0);
-if (ret < 0) {
-return ret;
-}
-
-return qiov->size;
-}
-
 /* See bdrv_pwrite() for the return codes */
 int bdrv_pread(BdrvChild *child, int64_t offset, void *buf, int bytes)
 {
+int ret;
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
 
 if (bytes < 0) {
 return -EINVAL;
 }
 
-return bdrv_preadv(child, offset, );
-}
-
-int bdrv_pwritev(BdrvChild *child, int64_t offset, QEMUIOVector *qiov)
-{
-int ret;
+ret = bdrv_preadv(child, offset, bytes, ,  0);
 
-ret = bdrv_prwv(child, offset, qiov, true, 0);
-if (ret < 0) {
-return ret;
-}
-
-return qiov->size;
+return ret < 0 ? ret : bytes;
 

[PATCH v8 3/7] block: declare some coroutine functions in block/coroutines.h

2020-09-15 Thread Vladimir Sementsov-Ogievskiy
We are going to keep coroutine-wrappers code (structure-packing
parameters, BDRV_POLL wrapper functions) in separate auto-generated
files. So, we'll need a header with declaration of original _co_
functions, for those which are static now. As well, we'll need
declarations for wrapper functions. Do these declarations now, as a
preparation step.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/coroutines.h | 67 ++
 block.c|  8 +++---
 block/io.c | 34 +++
 3 files changed, 88 insertions(+), 21 deletions(-)
 create mode 100644 block/coroutines.h

diff --git a/block/coroutines.h b/block/coroutines.h
new file mode 100644
index 00..9ce1730a09
--- /dev/null
+++ b/block/coroutines.h
@@ -0,0 +1,67 @@
+/*
+ * Block layer I/O functions
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCK_COROUTINES_INT_H
+#define BLOCK_COROUTINES_INT_H
+
+#include "block/block_int.h"
+
+int coroutine_fn bdrv_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res, BdrvCheckMode fix);
+int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp);
+
+int coroutine_fn
+bdrv_co_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
+ bool is_write, BdrvRequestFlags flags);
+int
+bdrv_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
+  bool is_write, BdrvRequestFlags flags);
+
+int coroutine_fn
+bdrv_co_common_block_status_above(BlockDriverState *bs,
+  BlockDriverState *base,
+  bool want_zero,
+  int64_t offset,
+  int64_t bytes,
+  int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file);
+int
+bdrv_common_block_status_above(BlockDriverState *bs,
+   BlockDriverState *base,
+   bool want_zero,
+   int64_t offset,
+   int64_t bytes,
+   int64_t *pnum,
+   int64_t *map,
+   BlockDriverState **file);
+
+int coroutine_fn
+bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
+   bool is_read);
+int
+bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
+bool is_read);
+
+#endif /* BLOCK_COROUTINES_INT_H */
diff --git a/block.c b/block.c
index ccfe1d851b..ec5a8cbd7b 100644
--- a/block.c
+++ b/block.c
@@ -48,6 +48,7 @@
 #include "qemu/timer.h"
 #include "qemu/cutils.h"
 #include "qemu/id.h"
+#include "block/coroutines.h"
 
 #ifdef CONFIG_BSD
 #include 
@@ -4640,8 +4641,8 @@ static void bdrv_delete(BlockDriverState *bs)
  * free of errors) or -errno when an internal error occurred. The results of 
the
  * check are stored in res.
  */
-static int coroutine_fn bdrv_co_check(BlockDriverState *bs,
-  BdrvCheckResult *res, BdrvCheckMode fix)
+int coroutine_fn bdrv_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res, BdrvCheckMode fix)
 {
 if (bs->drv == NULL) {
 return -ENOMEDIUM;
@@ -5649,8 +5650,7 @@ void bdrv_init_with_whitelist(void)
 bdrv_init();
 }
 
-static int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
- Error **errp)
+int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
 BdrvChild *child, *parent;
 uint64_t perm, shared_perm;
diff --git a/block/io.c b/block/io.c
index 2e2c89ce31..676c932caf 100644
--- a/block/io.c

[PATCH v8 2/7] block/io: refactor coroutine wrappers

2020-09-15 Thread Vladimir Sementsov-Ogievskiy
Most of our coroutine wrappers already follow this convention:

We have 'coroutine_fn bdrv_co_()' as
the core function, and a wrapper 'bdrv_()' which does parameters packing and call bdrv_run_co().

The only outsiders are the bdrv_prwv_co and
bdrv_common_block_status_above wrappers. Let's refactor them to behave
as the others, it simplifies further conversion of coroutine wrappers.

This patch adds indirection layer, but it will be compensated by
further commit, which will drop bdrv_co_prwv together with is_write
logic, to keep read and write path separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/io.c | 60 +-
 1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/block/io.c b/block/io.c
index ad3a51ed53..2e2c89ce31 100644
--- a/block/io.c
+++ b/block/io.c
@@ -933,27 +933,31 @@ typedef struct RwCo {
 BdrvRequestFlags flags;
 } RwCo;
 
+static int coroutine_fn bdrv_co_prwv(BdrvChild *child, int64_t offset,
+ QEMUIOVector *qiov, bool is_write,
+ BdrvRequestFlags flags)
+{
+if (is_write) {
+return bdrv_co_pwritev(child, offset, qiov->size, qiov, flags);
+} else {
+return bdrv_co_preadv(child, offset, qiov->size, qiov, flags);
+}
+}
+
 static int coroutine_fn bdrv_rw_co_entry(void *opaque)
 {
 RwCo *rwco = opaque;
 
-if (!rwco->is_write) {
-return bdrv_co_preadv(rwco->child, rwco->offset,
-  rwco->qiov->size, rwco->qiov,
-  rwco->flags);
-} else {
-return bdrv_co_pwritev(rwco->child, rwco->offset,
-   rwco->qiov->size, rwco->qiov,
-   rwco->flags);
-}
+return bdrv_co_prwv(rwco->child, rwco->offset, rwco->qiov,
+rwco->is_write, rwco->flags);
 }
 
 /*
  * Process a vectored synchronous request using coroutines
  */
-static int bdrv_prwv_co(BdrvChild *child, int64_t offset,
-QEMUIOVector *qiov, bool is_write,
-BdrvRequestFlags flags)
+static int bdrv_prwv(BdrvChild *child, int64_t offset,
+ QEMUIOVector *qiov, bool is_write,
+ BdrvRequestFlags flags)
 {
 RwCo rwco = {
 .child = child,
@@ -971,8 +975,7 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
 {
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, bytes);
 
-return bdrv_prwv_co(child, offset, , true,
-BDRV_REQ_ZERO_WRITE | flags);
+return bdrv_prwv(child, offset, , true, BDRV_REQ_ZERO_WRITE | flags);
 }
 
 /*
@@ -1021,7 +1024,7 @@ int bdrv_preadv(BdrvChild *child, int64_t offset, 
QEMUIOVector *qiov)
 {
 int ret;
 
-ret = bdrv_prwv_co(child, offset, qiov, false, 0);
+ret = bdrv_prwv(child, offset, qiov, false, 0);
 if (ret < 0) {
 return ret;
 }
@@ -1045,7 +1048,7 @@ int bdrv_pwritev(BdrvChild *child, int64_t offset, 
QEMUIOVector *qiov)
 {
 int ret;
 
-ret = bdrv_prwv_co(child, offset, qiov, true, 0);
+ret = bdrv_prwv(child, offset, qiov, true, 0);
 if (ret < 0) {
 return ret;
 }
@@ -2465,14 +2468,15 @@ early_out:
 return ret;
 }
 
-static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
-   BlockDriverState *base,
-   bool want_zero,
-   int64_t offset,
-   int64_t bytes,
-   int64_t *pnum,
-   int64_t *map,
-   BlockDriverState **file)
+static int coroutine_fn
+bdrv_co_common_block_status_above(BlockDriverState *bs,
+  BlockDriverState *base,
+  bool want_zero,
+  int64_t offset,
+  int64_t bytes,
+  int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file)
 {
 BlockDriverState *p;
 int ret = 0;
@@ -2510,10 +2514,10 @@ static int coroutine_fn 
bdrv_block_status_above_co_entry(void *opaque)
 {
 BdrvCoBlockStatusData *data = opaque;
 
-return bdrv_co_block_status_above(data->bs, data->base,
-  data->want_zero,
-  data->offset, data->bytes,
-  data->pnum, data->map, data->file);
+return bdrv_co_common_block_status_above(data->bs, data->base,
+ 

[PATCH v8 0/7] coroutines: generate wrapper code

2020-09-15 Thread Vladimir Sementsov-Ogievskiy
Hi all!

The aim of the series is to reduce code-duplication and writing
parameters structure-packing by hand around coroutine function wrappers.

Benefits:
 - no code duplication
 - less indirection

v8:
04: - rebase on meson build
- script interface is changed to satisfy meson custom_target
- rename script s/coroutine-wrapper.py/block-coroutine-wrapper.py/
- add docs/devel/block-coroutine-wrapper.rst

Vladimir Sementsov-Ogievskiy (7):
  block: return error-code from bdrv_invalidate_cache
  block/io: refactor coroutine wrappers
  block: declare some coroutine functions in block/coroutines.h
  scripts: add block-coroutine-wrapper.py
  block: generate coroutine-wrapper code
  block: drop bdrv_prwv
  block/io: refactor save/load vmstate

 docs/devel/block-coroutine-wrapper.rst |  54 
 block/block-gen.h  |  49 
 block/coroutines.h |  65 +
 include/block/block.h  |  34 ++-
 block.c|  97 ++-
 block/io.c | 336 -
 tests/test-bdrv-drain.c|   2 +-
 block/meson.build  |   8 +
 scripts/block-coroutine-wrapper.py | 187 ++
 9 files changed, 451 insertions(+), 381 deletions(-)
 create mode 100644 docs/devel/block-coroutine-wrapper.rst
 create mode 100644 block/block-gen.h
 create mode 100644 block/coroutines.h
 create mode 100755 scripts/block-coroutine-wrapper.py

-- 
2.21.3




Re: [PATCH v5 0/5] fix & merge block_status_above and is_allocated_above

2020-09-14 Thread Vladimir Sementsov-Ogievskiy

14.09.2020 16:06, Stefan Hajnoczi wrote:

On Wed, Jun 10, 2020 at 03:04:21PM +0300, Vladimir Sementsov-Ogievskiy wrote:

v5: rebase on coroutine-wrappers series, 02 changed correspondingly

Based on series "[PATCH v7 0/7] coroutines: generate wrapper code", or
in other words:
Based-on: <20200610100336.23451-1-vsement...@virtuozzo.com>


Hi Vladimir,
Please rebase this series and the coroutine wrapper series onto
qemu.git/master so the meson build system change is resolved.



Will do this week.

--
Best regards,
Vladimir



  1   2   3   4   5   6   7   8   9   10   >