Re: [PATCH 08/13] stream: Replace subtree drain with a single node drain

2022-11-14 Thread Hanna Reitz

On 08.11.22 13:37, Kevin Wolf wrote:

The subtree drain was introduced in commit b1e1af394d9 as a way to avoid
graph changes between finding the base node and changing the block graph
as necessary on completion of the image streaming job.

The block graph could change between these two points because
bdrv_set_backing_hd() first drains the parent node, which involved
polling and can do anything.

Subtree draining was an imperfect way to make this less likely (because
with it, fewer callbacks are called during this window). Everyone agreed
that it's not really the right solution, and it was only committed as a
stopgap solution.

This replaces the subtree drain with a solution that simply drains the
parent node before we try to find the base node, and then call a version
of bdrv_set_backing_hd() that doesn't drain, but just asserts that the
parent node is already drained.

This way, any graph changes caused by draining happen before we start
looking at the graph and things stay consistent between finding the base
node and changing the graph.

Signed-off-by: Kevin Wolf 
---
  include/block/block-global-state.h |  3 +++
  block.c| 17 ++---
  block/stream.c | 20 ++--
  3 files changed, 27 insertions(+), 13 deletions(-)


Reviewed-by: Hanna Reitz 




Re: [PATCH 08/13] stream: Replace subtree drain with a single node drain

2022-11-10 Thread Kevin Wolf
Am 10.11.2022 um 12:25 hat Vladimir Sementsov-Ogievskiy geschrieben:
> On 11/10/22 13:16, Kevin Wolf wrote:
> > Am 09.11.2022 um 17:52 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > On 11/8/22 15:37, Kevin Wolf wrote:
> > > > The subtree drain was introduced in commit b1e1af394d9 as a way to avoid
> > > > graph changes between finding the base node and changing the block graph
> > > > as necessary on completion of the image streaming job.
> > > > 
> > > > The block graph could change between these two points because
> > > > bdrv_set_backing_hd() first drains the parent node, which involved
> > > > polling and can do anything.
> > > > 
> > > > Subtree draining was an imperfect way to make this less likely (because
> > > > with it, fewer callbacks are called during this window). Everyone agreed
> > > > that it's not really the right solution, and it was only committed as a
> > > > stopgap solution.
> > > > 
> > > > This replaces the subtree drain with a solution that simply drains the
> > > > parent node before we try to find the base node, and then call a version
> > > > of bdrv_set_backing_hd() that doesn't drain, but just asserts that the
> > > > parent node is already drained.
> > > > 
> > > > This way, any graph changes caused by draining happen before we start
> > > > looking at the graph and things stay consistent between finding the base
> > > > node and changing the graph.
> > > > 
> > > > Signed-off-by: Kevin Wolf
> > > [..]
> > > 
> > > >base = bdrv_filter_or_cow_bs(s->above_base);
> > > > -if (base) {
> > > > -bdrv_ref(base);
> > > > -}
> > > > -
> > > >unfiltered_base = bdrv_skip_filters(base);
> > > >if (bdrv_cow_child(unfiltered_bs)) {
> > > > @@ -82,7 +85,7 @@ static int stream_prepare(Job *job)
> > > >}
> > > >}
> > > > -bdrv_set_backing_hd(unfiltered_bs, base, _err);
> > > > +bdrv_set_backing_hd_drained(unfiltered_bs, base, _err);
> > > >ret = bdrv_change_backing_file(unfiltered_bs, base_id, 
> > > > base_fmt, false);
> > > If we have yield points / polls during bdrv_set_backing_hd_drained()
> > > and bdrv_change_backing_file(), it's still bad and another
> > > graph-modifying operation may interleave. But b1e1af394d9 reports only
> > > polling in bdrv_set_backing_hd(), so I think it's OK to not care about
> > > other cases.
> > At this point in the series, bdrv_replace_child_noperm() can indeed
> > still poll. I'm not sure how bad it is, but at this point we're already
> > reconfiguring the graph with two specific nodes and somehow this poll
> > hasn't caused problems in the past. Anyway, at the end of the series,
> > there isn't be any polling left in bdrv_set_backing_hd_drained(), as far
> > as I can tell.
> > 
> > bdrv_change_backing_file() will certainly poll because it does I/O to
> > the image file. However, the change to the graph is completed at that
> > point, so I don't think it's a problem. Do you think it would be worth
> > putting a comment before bdrv_change_backing_file() that mentions that
> > the graph may change again from here on, but we've completed the graph
> > change?
> > 
> 
> Comment won't hurt. I think theoretically that's possible that we
> 
> 1. change the graph
> 2. yield in bdrv_change_backing_file
> 3. switch to another graph-modifying operation, change backing file and do 
> another bdrv_change_backing_file()
> 4. return to bdrv_change_backing_file() of [2] and write wrong backing file 
> to metadata
> 
> And the only solution for such things that I can imagine is a kind of
> global graph-modifying lock, which should be held around the whole
> graph modifying operation, including writing metadata.

Actually, I don't think this is the case. The problem that you get here
is just that we haven't really defined what happens when you get two
concurrent .bdrv_change_backing_file requests. To solve this, you don't
need to lock the whole graph, you just need to order the updates at the
block driver level instead of doing them in parallel, so that we know
that the last .bdrv_change_backing_file call wins. I think taking
s->lock in qcow2 would already achieve this (but still lock more than is
strictly necessary).

> Probably, we shouldn't care until we have real bug reports of it.
> Actually I hope that the only user who start stream and commit jobs in
> parallel on same backing-chain is our iotests :)

Yes, it sounds very theoretical. :-)

Kevin




Re: [PATCH 08/13] stream: Replace subtree drain with a single node drain

2022-11-10 Thread Vladimir Sementsov-Ogievskiy

On 11/10/22 13:16, Kevin Wolf wrote:

Am 09.11.2022 um 17:52 hat Vladimir Sementsov-Ogievskiy geschrieben:

On 11/8/22 15:37, Kevin Wolf wrote:

The subtree drain was introduced in commit b1e1af394d9 as a way to avoid
graph changes between finding the base node and changing the block graph
as necessary on completion of the image streaming job.

The block graph could change between these two points because
bdrv_set_backing_hd() first drains the parent node, which involved
polling and can do anything.

Subtree draining was an imperfect way to make this less likely (because
with it, fewer callbacks are called during this window). Everyone agreed
that it's not really the right solution, and it was only committed as a
stopgap solution.

This replaces the subtree drain with a solution that simply drains the
parent node before we try to find the base node, and then call a version
of bdrv_set_backing_hd() that doesn't drain, but just asserts that the
parent node is already drained.

This way, any graph changes caused by draining happen before we start
looking at the graph and things stay consistent between finding the base
node and changing the graph.

Signed-off-by: Kevin Wolf

[..]


   base = bdrv_filter_or_cow_bs(s->above_base);
-if (base) {
-bdrv_ref(base);
-}
-
   unfiltered_base = bdrv_skip_filters(base);
   if (bdrv_cow_child(unfiltered_bs)) {
@@ -82,7 +85,7 @@ static int stream_prepare(Job *job)
   }
   }
-bdrv_set_backing_hd(unfiltered_bs, base, _err);
+bdrv_set_backing_hd_drained(unfiltered_bs, base, _err);
   ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt, 
false);

If we have yield points / polls during bdrv_set_backing_hd_drained()
and bdrv_change_backing_file(), it's still bad and another
graph-modifying operation may interleave. But b1e1af394d9 reports only
polling in bdrv_set_backing_hd(), so I think it's OK to not care about
other cases.

At this point in the series, bdrv_replace_child_noperm() can indeed
still poll. I'm not sure how bad it is, but at this point we're already
reconfiguring the graph with two specific nodes and somehow this poll
hasn't caused problems in the past. Anyway, at the end of the series,
there isn't be any polling left in bdrv_set_backing_hd_drained(), as far
as I can tell.

bdrv_change_backing_file() will certainly poll because it does I/O to
the image file. However, the change to the graph is completed at that
point, so I don't think it's a problem. Do you think it would be worth
putting a comment before bdrv_change_backing_file() that mentions that
the graph may change again from here on, but we've completed the graph
change?



Comment won't hurt. I think theoretically that's possible that we

1. change the graph
2. yield in bdrv_change_backing_file
3. switch to another graph-modifying operation, change backing file and do 
another bdrv_change_backing_file()
4. return to bdrv_change_backing_file() of [2] and write wrong backing file to 
metadata

And the only solution for such things that I can imagine is a kind of global 
graph-modifying lock, which should be held around the whole graph modifying 
operation, including writing metadata. Probably, we shouldn't care until we 
have real bug reports of it. Actually I hope that the only user who start 
stream and commit jobs in parallel on same backing-chain is our iotests :)


--
Best regards,
Vladimir




Re: [PATCH 08/13] stream: Replace subtree drain with a single node drain

2022-11-10 Thread Kevin Wolf
Am 09.11.2022 um 17:52 hat Vladimir Sementsov-Ogievskiy geschrieben:
> On 11/8/22 15:37, Kevin Wolf wrote:
> > The subtree drain was introduced in commit b1e1af394d9 as a way to avoid
> > graph changes between finding the base node and changing the block graph
> > as necessary on completion of the image streaming job.
> > 
> > The block graph could change between these two points because
> > bdrv_set_backing_hd() first drains the parent node, which involved
> > polling and can do anything.
> > 
> > Subtree draining was an imperfect way to make this less likely (because
> > with it, fewer callbacks are called during this window). Everyone agreed
> > that it's not really the right solution, and it was only committed as a
> > stopgap solution.
> > 
> > This replaces the subtree drain with a solution that simply drains the
> > parent node before we try to find the base node, and then call a version
> > of bdrv_set_backing_hd() that doesn't drain, but just asserts that the
> > parent node is already drained.
> > 
> > This way, any graph changes caused by draining happen before we start
> > looking at the graph and things stay consistent between finding the base
> > node and changing the graph.
> > 
> > Signed-off-by: Kevin Wolf 
> 
> [..]
> 
> >   base = bdrv_filter_or_cow_bs(s->above_base);
> > -if (base) {
> > -bdrv_ref(base);
> > -}
> > -
> >   unfiltered_base = bdrv_skip_filters(base);
> >   if (bdrv_cow_child(unfiltered_bs)) {
> > @@ -82,7 +85,7 @@ static int stream_prepare(Job *job)
> >   }
> >   }
> > -bdrv_set_backing_hd(unfiltered_bs, base, _err);
> > +bdrv_set_backing_hd_drained(unfiltered_bs, base, _err);
> >   ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt, 
> > false);
> 
> If we have yield points / polls during bdrv_set_backing_hd_drained()
> and bdrv_change_backing_file(), it's still bad and another
> graph-modifying operation may interleave. But b1e1af394d9 reports only
> polling in bdrv_set_backing_hd(), so I think it's OK to not care about
> other cases.

At this point in the series, bdrv_replace_child_noperm() can indeed
still poll. I'm not sure how bad it is, but at this point we're already
reconfiguring the graph with two specific nodes and somehow this poll
hasn't caused problems in the past. Anyway, at the end of the series,
there isn't be any polling left in bdrv_set_backing_hd_drained(), as far
as I can tell.

bdrv_change_backing_file() will certainly poll because it does I/O to
the image file. However, the change to the graph is completed at that
point, so I don't think it's a problem. Do you think it would be worth
putting a comment before bdrv_change_backing_file() that mentions that
the graph may change again from here on, but we've completed the graph
change?

> >   if (local_err) {
> >   error_report_err(local_err);
> > @@ -92,10 +95,7 @@ static int stream_prepare(Job *job)
> >   }
> >   out:
> > -if (base) {
> > -bdrv_unref(base);
> > -}
> > -bdrv_subtree_drained_end(s->above_base);
> > +bdrv_drained_end(unfiltered_bs);
> >   return ret;
> >   }
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 

Thanks.

Kevin




Re: [PATCH 08/13] stream: Replace subtree drain with a single node drain

2022-11-09 Thread Vladimir Sementsov-Ogievskiy

On 11/8/22 15:37, Kevin Wolf wrote:

The subtree drain was introduced in commit b1e1af394d9 as a way to avoid
graph changes between finding the base node and changing the block graph
as necessary on completion of the image streaming job.

The block graph could change between these two points because
bdrv_set_backing_hd() first drains the parent node, which involved
polling and can do anything.

Subtree draining was an imperfect way to make this less likely (because
with it, fewer callbacks are called during this window). Everyone agreed
that it's not really the right solution, and it was only committed as a
stopgap solution.

This replaces the subtree drain with a solution that simply drains the
parent node before we try to find the base node, and then call a version
of bdrv_set_backing_hd() that doesn't drain, but just asserts that the
parent node is already drained.

This way, any graph changes caused by draining happen before we start
looking at the graph and things stay consistent between finding the base
node and changing the graph.

Signed-off-by: Kevin Wolf 


[..]

  
  base = bdrv_filter_or_cow_bs(s->above_base);

-if (base) {
-bdrv_ref(base);
-}
-
  unfiltered_base = bdrv_skip_filters(base);
  
  if (bdrv_cow_child(unfiltered_bs)) {

@@ -82,7 +85,7 @@ static int stream_prepare(Job *job)
  }
  }
  
-bdrv_set_backing_hd(unfiltered_bs, base, _err);

+bdrv_set_backing_hd_drained(unfiltered_bs, base, _err);
  ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt, 
false);


If we have yield points / polls during bdrv_set_backing_hd_drained() and 
bdrv_change_backing_file(), it's still bad and another graph-modifying 
operation may interleave. But b1e1af394d9 reports only polling in 
bdrv_set_backing_hd(), so I think it's OK to not care about other cases.


  if (local_err) {
  error_report_err(local_err);
@@ -92,10 +95,7 @@ static int stream_prepare(Job *job)
  }
  
  out:

-if (base) {
-bdrv_unref(base);
-}
-bdrv_subtree_drained_end(s->above_base);
+bdrv_drained_end(unfiltered_bs);
  return ret;
  }
  


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir




[PATCH 08/13] stream: Replace subtree drain with a single node drain

2022-11-08 Thread Kevin Wolf
The subtree drain was introduced in commit b1e1af394d9 as a way to avoid
graph changes between finding the base node and changing the block graph
as necessary on completion of the image streaming job.

The block graph could change between these two points because
bdrv_set_backing_hd() first drains the parent node, which involved
polling and can do anything.

Subtree draining was an imperfect way to make this less likely (because
with it, fewer callbacks are called during this window). Everyone agreed
that it's not really the right solution, and it was only committed as a
stopgap solution.

This replaces the subtree drain with a solution that simply drains the
parent node before we try to find the base node, and then call a version
of bdrv_set_backing_hd() that doesn't drain, but just asserts that the
parent node is already drained.

This way, any graph changes caused by draining happen before we start
looking at the graph and things stay consistent between finding the base
node and changing the graph.

Signed-off-by: Kevin Wolf 
---
 include/block/block-global-state.h |  3 +++
 block.c| 17 ++---
 block/stream.c | 20 ++--
 3 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index bb42ed9559..7923415d4e 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -82,6 +82,9 @@ int bdrv_open_file_child(const char *filename,
 BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp);
 int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
 Error **errp);
+int bdrv_set_backing_hd_drained(BlockDriverState *bs,
+BlockDriverState *backing_hd,
+Error **errp);
 int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
const char *bdref_key, Error **errp);
 BlockDriverState *bdrv_open(const char *filename, const char *reference,
diff --git a/block.c b/block.c
index 2f6b25875f..43b893dd6c 100644
--- a/block.c
+++ b/block.c
@@ -3395,14 +3395,15 @@ static int bdrv_set_backing_noperm(BlockDriverState *bs,
 return bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
 }
 
-int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
-Error **errp)
+int bdrv_set_backing_hd_drained(BlockDriverState *bs,
+BlockDriverState *backing_hd,
+Error **errp)
 {
 int ret;
 Transaction *tran = tran_new();
 
 GLOBAL_STATE_CODE();
-bdrv_drained_begin(bs);
+assert(bs->quiesce_counter > 0);
 
 ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
 if (ret < 0) {
@@ -3412,7 +3413,17 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 ret = bdrv_refresh_perms(bs, errp);
 out:
 tran_finalize(tran, ret);
+return ret;
+}
 
+int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
+Error **errp)
+{
+int ret;
+GLOBAL_STATE_CODE();
+
+bdrv_drained_begin(bs);
+ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
 bdrv_drained_end(bs);
 
 return ret;
diff --git a/block/stream.c b/block/stream.c
index 694709bd25..81dcf5a417 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -64,13 +64,16 @@ static int stream_prepare(Job *job)
 bdrv_cor_filter_drop(s->cor_filter_bs);
 s->cor_filter_bs = NULL;
 
-bdrv_subtree_drained_begin(s->above_base);
+/*
+ * bdrv_set_backing_hd() requires that unfiltered_bs is drained. Drain
+ * already here and use bdrv_set_backing_hd_drained() instead because
+ * the polling during drained_begin() might change the graph, and if we do
+ * this only later, we may end up working with the wrong base node (or it
+ * might even have gone away by the time we want to use it).
+ */
+bdrv_drained_begin(unfiltered_bs);
 
 base = bdrv_filter_or_cow_bs(s->above_base);
-if (base) {
-bdrv_ref(base);
-}
-
 unfiltered_base = bdrv_skip_filters(base);
 
 if (bdrv_cow_child(unfiltered_bs)) {
@@ -82,7 +85,7 @@ static int stream_prepare(Job *job)
 }
 }
 
-bdrv_set_backing_hd(unfiltered_bs, base, _err);
+bdrv_set_backing_hd_drained(unfiltered_bs, base, _err);
 ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt, 
false);
 if (local_err) {
 error_report_err(local_err);
@@ -92,10 +95,7 @@ static int stream_prepare(Job *job)
 }
 
 out:
-if (base) {
-bdrv_unref(base);
-}
-bdrv_subtree_drained_end(s->above_base);
+bdrv_drained_end(unfiltered_bs);
 return ret;
 }
 
-- 
2.38.1