Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-02-04 Thread Vladimir Sementsov-Ogievskiy

04.02.2022 16:30, Emanuele Giuseppe Esposito wrote:


  From the other thread:

So you mean that backing_hd is modified as its parent is changed, do
I understand correctly?


Yes



As we started to discuss in a thread started with my "[PATCH] block:
bdrv_set_backing_hd(): use drained section", I think draining the whole
subtree when we modify some parent-child relation is too much. In-flight
requests in subtree don't touch these relations, so why to wait/stop
them? Imagine two disks A and B with same backing file C. If we want to
detach A from C, what is the reason to drain in-fligth read request of B
in C?


If I am not mistaken, bdrv_set_backing_hd adds a new node (yes removes
the old backing_hd, but let's not think about this for a moment).
So we have disk B with backing file C, and new disk A that wants to have
backing file C.

I think I understand what you mean, so in theory the operation would be
- create new child
- add child to A->children list
- add child to C->parents list

So in theory we need to
* drain A (without subtree), because it can't happen that child nodes of
    A have in-flight requests that look at A status (children list),
right?
    In other words, if A has another node X, can a request in X inspect
    A->children


It should not happen. In my understanding, IO request never access
parents of the node.


* drain C, as parents can inspect C status (like B). Same assumption
    here, C->children[x]->bs cannot have requests inspecting C->parents
    list?


No, I think we should not drain C. IO requests don't inspect parents
list. And if some _other_ operations do inspect it, drain will not help,
as drain is only about IO requests.


I am still not convinced about this, because of the draining.

While drain can only be called by either main loop or the "home
iothread" (the one running the AioContext), it still means there are 2
threads involved. So while the iothread runs set_backing_hd, main loop
could easily drain one of the children of the nodes. Or the other way
around.
I am not saying that this happens, but it *might* happen.


I agree that it might happen. So, parallel drains are a problem. But drain is 
always a part of graph modification, and any graph modifications running in 
parallel are already a problem, we should forbid it somehow.

When you drain node, you say: please no in-flight requests now at this node. 
But IO requests doesn't do drain themselves, so why to restrict them?

And anyway, I don't see how this help. Ok, assume you drain additional node C 
or even the whole subtree. What protect us from parallel drain from another 
thread?



I am a little bit confused about this, it would be nice to hear opinions
from others as well.

Once we figure this, I will know where exactly to put the assertions,
and consequently what to drain and with which function.

Emanuele




--
Best regards,
Vladimir



Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-02-04 Thread Emanuele Giuseppe Esposito
>>
>>  From the other thread:
>>> So you mean that backing_hd is modified as its parent is changed, do
>>> I understand correctly?
>>
>> Yes
>>
>>>
>>> As we started to discuss in a thread started with my "[PATCH] block:
>>> bdrv_set_backing_hd(): use drained section", I think draining the whole
>>> subtree when we modify some parent-child relation is too much. In-flight
>>> requests in subtree don't touch these relations, so why to wait/stop
>>> them? Imagine two disks A and B with same backing file C. If we want to
>>> detach A from C, what is the reason to drain in-fligth read request of B
>>> in C?
>>
>> If I am not mistaken, bdrv_set_backing_hd adds a new node (yes removes
>> the old backing_hd, but let's not think about this for a moment).
>> So we have disk B with backing file C, and new disk A that wants to have
>> backing file C.
>>
>> I think I understand what you mean, so in theory the operation would be
>> - create new child
>> - add child to A->children list
>> - add child to C->parents list
>>
>> So in theory we need to
>> * drain A (without subtree), because it can't happen that child nodes of
>>    A have in-flight requests that look at A status (children list),
>> right?
>>    In other words, if A has another node X, can a request in X inspect
>>    A->children
> 
> It should not happen. In my understanding, IO request never access
> parents of the node.
> 
>> * drain C, as parents can inspect C status (like B). Same assumption
>>    here, C->children[x]->bs cannot have requests inspecting C->parents
>>    list?
> 
> No, I think we should not drain C. IO requests don't inspect parents
> list. And if some _other_ operations do inspect it, drain will not help,
> as drain is only about IO requests.

I am still not convinced about this, because of the draining.

While drain can only be called by either main loop or the "home
iothread" (the one running the AioContext), it still means there are 2
threads involved. So while the iothread runs set_backing_hd, main loop
could easily drain one of the children of the nodes. Or the other way
around.
I am not saying that this happens, but it *might* happen.

I am a little bit confused about this, it would be nice to hear opinions
from others as well.

Once we figure this, I will know where exactly to put the assertions,
and consequently what to drain and with which function.

Emanuele




Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-02-04 Thread Vladimir Sementsov-Ogievskiy

02.02.2022 18:37, Emanuele Giuseppe Esposito wrote:



On 01/02/2022 15:47, Vladimir Sementsov-Ogievskiy wrote:

18.01.2022 19:27, Emanuele Giuseppe Esposito wrote:

Protect bdrv_replace_child_noperm, as it modifies the
graph by adding/removing elements to .children and .parents
list of a bs. Use the newly introduced
bdrv_subtree_drained_{begin/end}_unlocked drains to achieve
that and be free from the aiocontext lock.

One important criteria to keep in mind is that if the caller of
bdrv_replace_child_noperm creates a transaction, we need to make sure
that the
whole transaction is under the same drain block. This is imperative,
as having
multiple drains also in the .abort() class of functions causes
discrepancies
in the drained counters (as nodes are put back into the original
positions),
making it really hard to retourn all to zero and leaving the code very
buggy.
See https://patchew.org/QEMU/20211213104014.69858-1-eespo...@redhat.com/
for more explanations.

Unfortunately we still need to have bdrv_subtree_drained_begin/end
in bdrv_detach_child() releasing and then holding the AioContext
lock, since it later invokes bdrv_try_set_aio_context() that is
not safe yet. Once all is cleaned up, we can also remove the
acquire/release locks in job_unref, artificially added because of this.

Signed-off-by: Emanuele Giuseppe Esposito 
---
   block.c | 50 --
   1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index fcc44a49a0..6196c95aae 100644
--- a/block.c
+++ b/block.c
@@ -3114,8 +3114,22 @@ static void bdrv_detach_child(BdrvChild **childp)
   BlockDriverState *old_bs = (*childp)->bs;
     assert(qemu_in_main_thread());
+    if (old_bs) {
+    /*
+ * TODO: this is called by job_unref with lock held, because
+ * afterwards it calls bdrv_try_set_aio_context.
+ * Once all of this is fixed, take care of removing
+ * the aiocontext lock and make this function _unlocked.
+ */
+    bdrv_subtree_drained_begin(old_bs);
+    }
+
   bdrv_replace_child_noperm(childp, NULL, true);
   +    if (old_bs) {
+    bdrv_subtree_drained_end(old_bs);
+    }
+
   if (old_bs) {
   /*
    * Update permissions for old node. We're just taking a
parent away, so
@@ -3154,6 +3168,7 @@ BdrvChild
*bdrv_root_attach_child(BlockDriverState *child_bs,
   Transaction *tran = tran_new();
     assert(qemu_in_main_thread());
+    bdrv_subtree_drained_begin_unlocked(child_bs);
     ret = bdrv_attach_child_common(child_bs, child_name, child_class,
  child_role, perm, shared_perm,
opaque,
@@ -3168,6 +3183,7 @@ out:
   tran_finalize(tran, ret);
   /* child is unset on failure by bdrv_attach_child_common_abort() */
   assert((ret < 0) == !child);
+    bdrv_subtree_drained_end_unlocked(child_bs);
     bdrv_unref(child_bs);
   return child;
@@ -3197,6 +3213,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
*parent_bs,
     assert(qemu_in_main_thread());
   +    bdrv_subtree_drained_begin_unlocked(parent_bs);
+    bdrv_subtree_drained_begin_unlocked(child_bs);
+
   ret = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
child_class,
  child_role, , tran, errp);
   if (ret < 0) {
@@ -3211,6 +3230,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
*parent_bs,
   out:
   tran_finalize(tran, ret);
   /* child is unset on failure by bdrv_attach_child_common_abort() */
+    bdrv_subtree_drained_end_unlocked(child_bs);
+    bdrv_subtree_drained_end_unlocked(parent_bs);
+
   assert((ret < 0) == !child);
     bdrv_unref(child_bs);
@@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
BlockDriverState *backing_hd,
     assert(qemu_in_main_thread());
   +    bdrv_subtree_drained_begin_unlocked(bs);
+    if (backing_hd) {
+    bdrv_subtree_drained_begin_unlocked(backing_hd);
+    }
+
   ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
   if (ret < 0) {
   goto out;
@@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
BlockDriverState *backing_hd,
   ret = bdrv_refresh_perms(bs, errp);
   out:
   tran_finalize(tran, ret);
+    if (backing_hd) {
+    bdrv_subtree_drained_end_unlocked(backing_hd);
+    }
+    bdrv_subtree_drained_end_unlocked(bs);
     return ret;
   }
@@ -5266,7 +5297,8 @@ static int
bdrv_replace_node_common(BlockDriverState *from,
     assert(qemu_get_current_aio_context() == qemu_get_aio_context());
   assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
-    bdrv_drained_begin(from);
+    bdrv_subtree_drained_begin_unlocked(from);
+    bdrv_subtree_drained_begin_unlocked(to);
     /*
    * Do the replacement without permission update.
@@ -5298,7 +5330,8 @@ static int
bdrv_replace_node_common(BlockDriverState *from,
   out:
   tran_finalize(tran, 

Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-02-03 Thread Emanuele Giuseppe Esposito



On 19/01/2022 10:47, Paolo Bonzini wrote:
> 
> About this:
> 
>> + * TODO: this is called by job_unref with lock held, because
>> + * afterwards it calls bdrv_try_set_aio_context.
>> + * Once all of this is fixed, take care of removing
>> + * the aiocontext lock and make this function _unlocked.
> 
> It may be clear to you, but it's quite cryptic:
> 

I think you figured it by looking at the job patches, but:

> - which lock is held by job_unref()?  Also, would it make more sense to
> talk about block_job_free() rather than job_unref()?  I can't quite
> follow where the AioContext lock is taken.

AioContext lock. I think this is a change I made in the job patches, so
comparing it with the master would make this piece harder to understand.
In the job series, I reduce the granularity of the AioContext lock,
ending up having it only around few callbacks of JobDriver, namely
->free(). This is why I talk about job_unref, because it calls ->free.

The actual lock is taken in job_unref, but the caller (->free) is
block_job_free. Yes it makes more sense mentioning block_job_free.

> - what is "all of this", and what do you mean by "not safe yet"?  Do
> both refer to bdrv_try_set_aio_context() needing the AioContext lock?

Yes

> - what is "this function" (that should become _unlocked)?

bdrv_subtree_drained_begin

This is the new comment I intend to put:
/*
* TODO: this function is called by BlockJobDriver's ->free()
* callback, block_job_free.
* We unfortunately must still take the AioContext lock around
* ->free() in job_unref, because of the bdrv_try_set_aio_context
* call below that still releases/acquires it.
* Once bdrv_try_set_aio_context does not require the AioContext lock,
* take care of removing it around ->free() and replace
* the following bdrv_subtree_drained_begin with its
* _unlocked version.
*/
> 
> I think you could also split the patch in multiple parts for different
> call chains.  In particular bdrv_set_backing_hd can be merged with the
> patch to bdrv_reopen_parse_file_or_backing, since both of them deal with
> bdrv_set_file_or_backing_noperm.
> Ok, I will try to do that.

Emanuele




Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-02-03 Thread Emanuele Giuseppe Esposito



On 02/02/2022 18:38, Paolo Bonzini wrote:
> On 2/2/22 16:37, Emanuele Giuseppe Esposito wrote:
>> So we have disk B with backing file C, and new disk A that wants to have
>> backing file C.
>>
>> I think I understand what you mean, so in theory the operation would be
>> - create new child
>> - add child to A->children list
>> - add child to C->parents list
>>
>> So in theory we need to
>> * drain A (without subtree), because it can't happen that child nodes of
>>    A have in-flight requests that look at A status (children list),
>> right?
>>    In other words, if A has another node X, can a request in X inspect
>>    A->children

I am not sure if this can happen. It doesn't seem so, though. All
functions inspecting ->parents are either GS or don't call recursive
function that go down on children list again.

>> * drain C, as parents can inspect C status (like B). Same assumption
>>    here, C->children[x]->bs cannot have requests inspecting C->parents
>>    list?

What if C->children[x]->bs drains? we would have a child inspecting
C->parents.

> 
> In that case (i.e. if parents have to be drained, but children need not)
> bdrv_drained_begin_unlocked would be enough, right?

Should be, if that is the case.

> 
> That would mean that ->children is I/O state but ->parents is global
> state.  I think it's quite a bit more complicated to analyze and to
> understand.

Not sure I follow this one, why is ->parents GS? it is also used by the
drain API...

Emanuele




Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-02-02 Thread Paolo Bonzini

On 2/2/22 16:37, Emanuele Giuseppe Esposito wrote:

So we have disk B with backing file C, and new disk A that wants to have
backing file C.

I think I understand what you mean, so in theory the operation would be
- create new child
- add child to A->children list
- add child to C->parents list

So in theory we need to
* drain A (without subtree), because it can't happen that child nodes of
   A have in-flight requests that look at A status (children list), right?
   In other words, if A has another node X, can a request in X inspect
   A->children
* drain C, as parents can inspect C status (like B). Same assumption
   here, C->children[x]->bs cannot have requests inspecting C->parents
   list?


In that case (i.e. if parents have to be drained, but children need not) 
bdrv_drained_begin_unlocked would be enough, right?


That would mean that ->children is I/O state but ->parents is global 
state.  I think it's quite a bit more complicated to analyze and to 
understand.


Paolo



Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-02-02 Thread Emanuele Giuseppe Esposito



On 01/02/2022 15:47, Vladimir Sementsov-Ogievskiy wrote:
> 18.01.2022 19:27, Emanuele Giuseppe Esposito wrote:
>> Protect bdrv_replace_child_noperm, as it modifies the
>> graph by adding/removing elements to .children and .parents
>> list of a bs. Use the newly introduced
>> bdrv_subtree_drained_{begin/end}_unlocked drains to achieve
>> that and be free from the aiocontext lock.
>>
>> One important criteria to keep in mind is that if the caller of
>> bdrv_replace_child_noperm creates a transaction, we need to make sure
>> that the
>> whole transaction is under the same drain block. This is imperative,
>> as having
>> multiple drains also in the .abort() class of functions causes
>> discrepancies
>> in the drained counters (as nodes are put back into the original
>> positions),
>> making it really hard to retourn all to zero and leaving the code very
>> buggy.
>> See https://patchew.org/QEMU/20211213104014.69858-1-eespo...@redhat.com/
>> for more explanations.
>>
>> Unfortunately we still need to have bdrv_subtree_drained_begin/end
>> in bdrv_detach_child() releasing and then holding the AioContext
>> lock, since it later invokes bdrv_try_set_aio_context() that is
>> not safe yet. Once all is cleaned up, we can also remove the
>> acquire/release locks in job_unref, artificially added because of this.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
>> ---
>>   block.c | 50 --
>>   1 file changed, 44 insertions(+), 6 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index fcc44a49a0..6196c95aae 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -3114,8 +3114,22 @@ static void bdrv_detach_child(BdrvChild **childp)
>>   BlockDriverState *old_bs = (*childp)->bs;
>>     assert(qemu_in_main_thread());
>> +    if (old_bs) {
>> +    /*
>> + * TODO: this is called by job_unref with lock held, because
>> + * afterwards it calls bdrv_try_set_aio_context.
>> + * Once all of this is fixed, take care of removing
>> + * the aiocontext lock and make this function _unlocked.
>> + */
>> +    bdrv_subtree_drained_begin(old_bs);
>> +    }
>> +
>>   bdrv_replace_child_noperm(childp, NULL, true);
>>   +    if (old_bs) {
>> +    bdrv_subtree_drained_end(old_bs);
>> +    }
>> +
>>   if (old_bs) {
>>   /*
>>    * Update permissions for old node. We're just taking a
>> parent away, so
>> @@ -3154,6 +3168,7 @@ BdrvChild
>> *bdrv_root_attach_child(BlockDriverState *child_bs,
>>   Transaction *tran = tran_new();
>>     assert(qemu_in_main_thread());
>> +    bdrv_subtree_drained_begin_unlocked(child_bs);
>>     ret = bdrv_attach_child_common(child_bs, child_name, child_class,
>>  child_role, perm, shared_perm,
>> opaque,
>> @@ -3168,6 +3183,7 @@ out:
>>   tran_finalize(tran, ret);
>>   /* child is unset on failure by bdrv_attach_child_common_abort() */
>>   assert((ret < 0) == !child);
>> +    bdrv_subtree_drained_end_unlocked(child_bs);
>>     bdrv_unref(child_bs);
>>   return child;
>> @@ -3197,6 +3213,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
>> *parent_bs,
>>     assert(qemu_in_main_thread());
>>   +    bdrv_subtree_drained_begin_unlocked(parent_bs);
>> +    bdrv_subtree_drained_begin_unlocked(child_bs);
>> +
>>   ret = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
>> child_class,
>>  child_role, , tran, errp);
>>   if (ret < 0) {
>> @@ -3211,6 +3230,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
>> *parent_bs,
>>   out:
>>   tran_finalize(tran, ret);
>>   /* child is unset on failure by bdrv_attach_child_common_abort() */
>> +    bdrv_subtree_drained_end_unlocked(child_bs);
>> +    bdrv_subtree_drained_end_unlocked(parent_bs);
>> +
>>   assert((ret < 0) == !child);
>>     bdrv_unref(child_bs);
>> @@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
>> BlockDriverState *backing_hd,
>>     assert(qemu_in_main_thread());
>>   +    bdrv_subtree_drained_begin_unlocked(bs);
>> +    if (backing_hd) {
>> +    bdrv_subtree_drained_begin_unlocked(backing_hd);
>> +    }
>> +
>>   ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
>>   if (ret < 0) {
>>   goto out;
>> @@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
>> BlockDriverState *backing_hd,
>>   ret = bdrv_refresh_perms(bs, errp);
>>   out:
>>   tran_finalize(tran, ret);
>> +    if (backing_hd) {
>> +    bdrv_subtree_drained_end_unlocked(backing_hd);
>> +    }
>> +    bdrv_subtree_drained_end_unlocked(bs);
>>     return ret;
>>   }
>> @@ -5266,7 +5297,8 @@ static int
>> bdrv_replace_node_common(BlockDriverState *from,
>>     assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>>   assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
>> -    bdrv_drained_begin(from);
>> +    

Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-02-01 Thread Vladimir Sementsov-Ogievskiy

18.01.2022 19:27, Emanuele Giuseppe Esposito wrote:

Protect bdrv_replace_child_noperm, as it modifies the
graph by adding/removing elements to .children and .parents
list of a bs. Use the newly introduced
bdrv_subtree_drained_{begin/end}_unlocked drains to achieve
that and be free from the aiocontext lock.

One important criteria to keep in mind is that if the caller of
bdrv_replace_child_noperm creates a transaction, we need to make sure that the
whole transaction is under the same drain block. This is imperative, as having
multiple drains also in the .abort() class of functions causes discrepancies
in the drained counters (as nodes are put back into the original positions),
making it really hard to retourn all to zero and leaving the code very buggy.
See https://patchew.org/QEMU/20211213104014.69858-1-eespo...@redhat.com/
for more explanations.

Unfortunately we still need to have bdrv_subtree_drained_begin/end
in bdrv_detach_child() releasing and then holding the AioContext
lock, since it later invokes bdrv_try_set_aio_context() that is
not safe yet. Once all is cleaned up, we can also remove the
acquire/release locks in job_unref, artificially added because of this.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block.c | 50 --
  1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index fcc44a49a0..6196c95aae 100644
--- a/block.c
+++ b/block.c
@@ -3114,8 +3114,22 @@ static void bdrv_detach_child(BdrvChild **childp)
  BlockDriverState *old_bs = (*childp)->bs;
  
  assert(qemu_in_main_thread());

+if (old_bs) {
+/*
+ * TODO: this is called by job_unref with lock held, because
+ * afterwards it calls bdrv_try_set_aio_context.
+ * Once all of this is fixed, take care of removing
+ * the aiocontext lock and make this function _unlocked.
+ */
+bdrv_subtree_drained_begin(old_bs);
+}
+
  bdrv_replace_child_noperm(childp, NULL, true);
  
+if (old_bs) {

+bdrv_subtree_drained_end(old_bs);
+}
+
  if (old_bs) {
  /*
   * Update permissions for old node. We're just taking a parent away, 
so
@@ -3154,6 +3168,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
  Transaction *tran = tran_new();
  
  assert(qemu_in_main_thread());

+bdrv_subtree_drained_begin_unlocked(child_bs);
  
  ret = bdrv_attach_child_common(child_bs, child_name, child_class,

 child_role, perm, shared_perm, opaque,
@@ -3168,6 +3183,7 @@ out:
  tran_finalize(tran, ret);
  /* child is unset on failure by bdrv_attach_child_common_abort() */
  assert((ret < 0) == !child);
+bdrv_subtree_drained_end_unlocked(child_bs);
  
  bdrv_unref(child_bs);

  return child;
@@ -3197,6 +3213,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  
  assert(qemu_in_main_thread());
  
+bdrv_subtree_drained_begin_unlocked(parent_bs);

+bdrv_subtree_drained_begin_unlocked(child_bs);
+
  ret = bdrv_attach_child_noperm(parent_bs, child_bs, child_name, 
child_class,
 child_role, , tran, errp);
  if (ret < 0) {
@@ -3211,6 +3230,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  out:
  tran_finalize(tran, ret);
  /* child is unset on failure by bdrv_attach_child_common_abort() */
+bdrv_subtree_drained_end_unlocked(child_bs);
+bdrv_subtree_drained_end_unlocked(parent_bs);
+
  assert((ret < 0) == !child);
  
  bdrv_unref(child_bs);

@@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
  
  assert(qemu_in_main_thread());
  
+bdrv_subtree_drained_begin_unlocked(bs);

+if (backing_hd) {
+bdrv_subtree_drained_begin_unlocked(backing_hd);
+}
+
  ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
  if (ret < 0) {
  goto out;
@@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
  ret = bdrv_refresh_perms(bs, errp);
  out:
  tran_finalize(tran, ret);
+if (backing_hd) {
+bdrv_subtree_drained_end_unlocked(backing_hd);
+}
+bdrv_subtree_drained_end_unlocked(bs);
  
  return ret;

  }
@@ -5266,7 +5297,8 @@ static int bdrv_replace_node_common(BlockDriverState 
*from,
  
  assert(qemu_get_current_aio_context() == qemu_get_aio_context());

  assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
-bdrv_drained_begin(from);
+bdrv_subtree_drained_begin_unlocked(from);
+bdrv_subtree_drained_begin_unlocked(to);
  
  /*

   * Do the replacement without permission update.
@@ -5298,7 +5330,8 @@ static int bdrv_replace_node_common(BlockDriverState 
*from,
  out:
  tran_finalize(tran, ret);
  
-bdrv_drained_end(from);

+bdrv_subtree_drained_end_unlocked(to);
+

Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-01-19 Thread Paolo Bonzini
Subject doesn't say what needs them: "block: drain block devices across 
graph modifications"


On 1/18/22 17:27, Emanuele Giuseppe Esposito wrote:

Protect bdrv_replace_child_noperm, as it modifies the
graph by adding/removing elements to .children and .parents
list of a bs. Use the newly introduced
bdrv_subtree_drained_{begin/end}_unlocked drains to achieve
that and be free from the aiocontext lock.

One important criteria to keep in mind is that if the caller of
bdrv_replace_child_noperm creates a transaction, we need to make sure that the
whole transaction is under the same drain block.


Okay, this is the important part and it should be mentioned in patch 8 
as well.  It should also be in a comment above bdrv_replace_child_noperm().



This is imperative, as having
multiple drains also in the .abort() class of functions causes discrepancies
in the drained counters (as nodes are put back into the original positions),
making it really hard to retourn all to zero and leaving the code very buggy.
See https://patchew.org/QEMU/20211213104014.69858-1-eespo...@redhat.com/
for more explanations.

Unfortunately we still need to have bdrv_subtree_drained_begin/end
in bdrv_detach_child() releasing and then holding the AioContext
lock, since it later invokes bdrv_try_set_aio_context() that is
not safe yet. Once all is cleaned up, we can also remove the
acquire/release locks in job_unref, artificially added because of this.


About this:


+ * TODO: this is called by job_unref with lock held, because
+ * afterwards it calls bdrv_try_set_aio_context.
+ * Once all of this is fixed, take care of removing
+ * the aiocontext lock and make this function _unlocked.


It may be clear to you, but it's quite cryptic:

- which lock is held by job_unref()?  Also, would it make more sense to 
talk about block_job_free() rather than job_unref()?  I can't quite 
follow where the AioContext lock is taken.


- what is "all of this", and what do you mean by "not safe yet"?  Do 
both refer to bdrv_try_set_aio_context() needing the AioContext lock?


- what is "this function" (that should become _unlocked)?

I think you could also split the patch in multiple parts for different 
call chains.  In particular bdrv_set_backing_hd can be merged with the 
patch to bdrv_reopen_parse_file_or_backing, since both of them deal with 
bdrv_set_file_or_backing_noperm.


Paolo


Signed-off-by: Emanuele Giuseppe Esposito 
---
  block.c | 50 --
  1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index fcc44a49a0..6196c95aae 100644
--- a/block.c
+++ b/block.c
@@ -3114,8 +3114,22 @@ static void bdrv_detach_child(BdrvChild **childp)
  BlockDriverState *old_bs = (*childp)->bs;
  
  assert(qemu_in_main_thread());

+if (old_bs) {
+/*
+ * TODO: this is called by job_unref with lock held, because
+ * afterwards it calls bdrv_try_set_aio_context.
+ * Once all of this is fixed, take care of removing
+ * the aiocontext lock and make this function _unlocked.
+ */
+bdrv_subtree_drained_begin(old_bs);
+}
+
  bdrv_replace_child_noperm(childp, NULL, true);
  
+if (old_bs) {

+bdrv_subtree_drained_end(old_bs);
+}
+
  if (old_bs) {
  /*
   * Update permissions for old node. We're just taking a parent away, 
so
@@ -3154,6 +3168,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
  Transaction *tran = tran_new();
  
  assert(qemu_in_main_thread());

+bdrv_subtree_drained_begin_unlocked(child_bs);
  
  ret = bdrv_attach_child_common(child_bs, child_name, child_class,

 child_role, perm, shared_perm, opaque,
@@ -3168,6 +3183,7 @@ out:
  tran_finalize(tran, ret);
  /* child is unset on failure by bdrv_attach_child_common_abort() */
  assert((ret < 0) == !child);
+bdrv_subtree_drained_end_unlocked(child_bs);
  
  bdrv_unref(child_bs);

  return child;
@@ -3197,6 +3213,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  
  assert(qemu_in_main_thread());
  
+bdrv_subtree_drained_begin_unlocked(parent_bs);

+bdrv_subtree_drained_begin_unlocked(child_bs);
+
  ret = bdrv_attach_child_noperm(parent_bs, child_bs, child_name, 
child_class,
 child_role, , tran, errp);
  if (ret < 0) {
@@ -3211,6 +3230,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  out:
  tran_finalize(tran, ret);
  /* child is unset on failure by bdrv_attach_child_common_abort() */
+bdrv_subtree_drained_end_unlocked(child_bs);
+bdrv_subtree_drained_end_unlocked(parent_bs);
+
  assert((ret < 0) == !child);
  
  bdrv_unref(child_bs);

@@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
  
  assert(qemu_in_main_thread());
  
+

[PATCH 10/12] block.c: add subtree_drains where needed

2022-01-18 Thread Emanuele Giuseppe Esposito
Protect bdrv_replace_child_noperm, as it modifies the
graph by adding/removing elements to .children and .parents
list of a bs. Use the newly introduced
bdrv_subtree_drained_{begin/end}_unlocked drains to achieve
that and be free from the aiocontext lock.

One important criteria to keep in mind is that if the caller of
bdrv_replace_child_noperm creates a transaction, we need to make sure that the
whole transaction is under the same drain block. This is imperative, as having
multiple drains also in the .abort() class of functions causes discrepancies
in the drained counters (as nodes are put back into the original positions),
making it really hard to retourn all to zero and leaving the code very buggy.
See https://patchew.org/QEMU/20211213104014.69858-1-eespo...@redhat.com/
for more explanations.

Unfortunately we still need to have bdrv_subtree_drained_begin/end
in bdrv_detach_child() releasing and then holding the AioContext
lock, since it later invokes bdrv_try_set_aio_context() that is
not safe yet. Once all is cleaned up, we can also remove the
acquire/release locks in job_unref, artificially added because of this.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block.c | 50 --
 1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index fcc44a49a0..6196c95aae 100644
--- a/block.c
+++ b/block.c
@@ -3114,8 +3114,22 @@ static void bdrv_detach_child(BdrvChild **childp)
 BlockDriverState *old_bs = (*childp)->bs;
 
 assert(qemu_in_main_thread());
+if (old_bs) {
+/*
+ * TODO: this is called by job_unref with lock held, because
+ * afterwards it calls bdrv_try_set_aio_context.
+ * Once all of this is fixed, take care of removing
+ * the aiocontext lock and make this function _unlocked.
+ */
+bdrv_subtree_drained_begin(old_bs);
+}
+
 bdrv_replace_child_noperm(childp, NULL, true);
 
+if (old_bs) {
+bdrv_subtree_drained_end(old_bs);
+}
+
 if (old_bs) {
 /*
  * Update permissions for old node. We're just taking a parent away, so
@@ -3154,6 +3168,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
 Transaction *tran = tran_new();
 
 assert(qemu_in_main_thread());
+bdrv_subtree_drained_begin_unlocked(child_bs);
 
 ret = bdrv_attach_child_common(child_bs, child_name, child_class,
child_role, perm, shared_perm, opaque,
@@ -3168,6 +3183,7 @@ out:
 tran_finalize(tran, ret);
 /* child is unset on failure by bdrv_attach_child_common_abort() */
 assert((ret < 0) == !child);
+bdrv_subtree_drained_end_unlocked(child_bs);
 
 bdrv_unref(child_bs);
 return child;
@@ -3197,6 +3213,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
 assert(qemu_in_main_thread());
 
+bdrv_subtree_drained_begin_unlocked(parent_bs);
+bdrv_subtree_drained_begin_unlocked(child_bs);
+
 ret = bdrv_attach_child_noperm(parent_bs, child_bs, child_name, 
child_class,
child_role, , tran, errp);
 if (ret < 0) {
@@ -3211,6 +3230,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 out:
 tran_finalize(tran, ret);
 /* child is unset on failure by bdrv_attach_child_common_abort() */
+bdrv_subtree_drained_end_unlocked(child_bs);
+bdrv_subtree_drained_end_unlocked(parent_bs);
+
 assert((ret < 0) == !child);
 
 bdrv_unref(child_bs);
@@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 
 assert(qemu_in_main_thread());
 
+bdrv_subtree_drained_begin_unlocked(bs);
+if (backing_hd) {
+bdrv_subtree_drained_begin_unlocked(backing_hd);
+}
+
 ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
 if (ret < 0) {
 goto out;
@@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 ret = bdrv_refresh_perms(bs, errp);
 out:
 tran_finalize(tran, ret);
+if (backing_hd) {
+bdrv_subtree_drained_end_unlocked(backing_hd);
+}
+bdrv_subtree_drained_end_unlocked(bs);
 
 return ret;
 }
@@ -5266,7 +5297,8 @@ static int bdrv_replace_node_common(BlockDriverState 
*from,
 
 assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
-bdrv_drained_begin(from);
+bdrv_subtree_drained_begin_unlocked(from);
+bdrv_subtree_drained_begin_unlocked(to);
 
 /*
  * Do the replacement without permission update.
@@ -5298,7 +5330,8 @@ static int bdrv_replace_node_common(BlockDriverState 
*from,
 out:
 tran_finalize(tran, ret);
 
-bdrv_drained_end(from);
+bdrv_subtree_drained_end_unlocked(to);
+bdrv_subtree_drained_end_unlocked(from);
 bdrv_unref(from);
 
 return ret;
@@ -5345,6 +5378,9 @@ int bdrv_append(BlockDriverState *bs_new,