Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-25 Thread Emanuele Giuseppe Esposito



Am 24/05/2022 um 14:10 schrieb Kevin Wolf:
> Am 18.05.2022 um 14:28 hat Emanuele Giuseppe Esposito geschrieben:
>> label: // read till the end to see why I wrote this here
>>
>> I was hoping someone from the "No" party would answer to your question,
>> because as you know we reached this same conclusion together.
>>
>> We thought about dropping the drain for various reasons, the main one
>> (at least as far as I understood) is that we are not sure if something
>> can still happen in between drain_begin/end, and it is a little bit
>> confusing to use the same mechanism to block I/O and protect the graph.
>>
>> We then thought about implementing a rwlock. A rdlock would clarify what
>> we are protecting and who is using the lock. I had a rwlock draft
>> implementation sent in this thread, but this also lead to additional
>> problems.
>> Main problem was that this new lock would introduce nested event loops,
>> that together with such locking would just create deadlocks.
>> If readers are in coroutines and writers are not (because graph
>> operations are not running in coroutines), we have a lot of deadlocks.
>> If a writer has to take the lock, it must wait all other readers to
>> finish. And it does it by internally calling AIO_WAIT_WHILE, creating
>> nested event loop. We don't know what could execute when polling for
>> events, and for example another writer could be resumed.
> 
> Why is this a problem? Your AIO_WAIT_WHILE() condition would be that
> there are neither readers nor writers, so you would just keep waiting
> until the other writer is done.

Yes, but when we get to the AIO_WAIT_WHILE() condition the wrlock is
already being taken in the current writer.
I think that's what you also mean below.

> 
>> Ideally, we want writers in coroutines too.
>>
>> Additionally, many readers are running in what we call "mixed"
>> functions: usually implemented automatically with generated_co_wrapper
>> tag, they let a function (usually bdrv callback) run always in a
>> coroutine, creating one if necessary. For example, bdrv_flush() makes
>> sure hat bs->bdrv_co_flush() is always run in a coroutine.
>> Such mixed functions are used in other callbacks too, making it really
>> difficult to understand if we are in a coroutine or not, and mostly
>> important make rwlock usage very difficult.
> 
> How do they make rwlock usage difficult?
> 
> *goes back to old IRC discussions*
> 
> Ah, the problem was not the AIO_WAIT_WHILE() while taking the lock, but
> taking the lock first and then running an AIO_WAIT_WHILE() inside the
> locked section. This is forbidden because the callbacks that run during
> AIO_WAIT_WHILE() may in turn wait for the lock that you own, causing a
> deadlock.
> 

Yes

> This is indeed a problem that running in coroutines would avoid because
> the inner waiter would just yield and the outer one could complete its
> job as soon as it's its turn.
> 
> My conclusion in the IRC discussion was that maybe we need to take the
> graph locks when we're entering coroutine context, i.e. the "mixed"
> functions would rdlock the graph when called from non-coroutine context
> and would assume that it's already locked when called from coroutine
> context.

Yes, and that's what I tried to do.
But the first step was to transform all callbacks as coroutines. I think
you also agree with this, correct?

And therefore the easiest step was to convert all callbacks in
generated_co_wrapper functions, so that afterwards we could split them
between coroutine and not-coroutine logic, as discussed on IRC.
Once split, we add the lock in the way you suggested.

However, I didn't even get to the first step part, because tests were
deadlocking after just transforming 2-3 callbacks.

See Paolo thread for a nice explanation on why they are deadlocking and
converting these callbacks is difficult.

> 
>> Which lead us to stepping back once more and try to convert all
>> BlockDriverState callbacks in coroutines. This would greatly simplify
>> rwlock usage, because we could make the rwlock coroutine-frendly
>> (without any AIO_WAIT_WHILE, allowing a writer to wait for readers by
>> just yielding and queuing itself in coroutine queues).
>>
>> First step was then to convert all callbacks in coroutines, using
>> generated_coroutine_wrapper (g_c_w).
>> A typical g_c_w is implemented in this way:
>>  if (qemu_in_coroutine()) {
>>  callback();
>>  } else { // much simplified
>>  co = qemu_coroutine_create(callback);
>>  bdrv_coroutine_enter(bs, co);
>>  BDRV_POLL_WHILE(bs, coroutine_in_progress);
>>  }
>> Once all callbacks are implemented using g_c_w, we can start splitting
>> the two sides of the if function to only create a coroutine when we are
>> outside from a bdrv callback.
>>
>> However, we immediately found a problem while starting to convert the
>> first callbacks: the AioContext lock is taken around some non coroutine
>> callbacks! For example, bs->bdrv_open() is 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-25 Thread Paolo Bonzini

On 5/24/22 12:36, Kevin Wolf wrote:

* IO_OR_GS_CODE() functions, when called from coroutine context, expect to
be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)


Am I right to say this is not inherently part of the definition of
IO_OR_GS_CODE(), but just a property that these functions have in
practice? In practice, the functions that are IO_OR_GS_CODE()
are those that call AIO_WAIT_WHILE().


Yes.


Using a different code path means that the restrictions from
AIO_WAIT_WHILE() don't really apply any more and these functions become
effectively IO_CODE() when called in a coroutine. (At least I'm not
aware of any other piece of code apart from AIO_WAIT_WHILE() that makes
a function IO_OR_GS_CODE().)


The second point is correct.

The first depends on the definition of the coroutine path.  For example, 
bdrv_co_yield_to_drain() expects to run with bdrv_get_aio_context(bs)'s 
lock taken.  An iothread cannot take another iothread's AioContext lock 
to avoid AB-BA deadlocks; therefore, bdrv_co_yield_to_drain() can only 
run in the main thread or in bs's home iothread.



* to call these functions with the lock taken, the code has to run in the
BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
happen without releasing the aiocontext lock)


This problem can't happen in the main thread itself, AIO_WAIT_WHILE() is
safe both in the home thread and the main thread (at least as long as
you lock only once) because it temporarily drops the lock. It has also
become the definition of IO_OR_GS_CODE(): This code has to run in the
home thread or the main thread.


It doesn't work to run bdrv_open_driver() in the home iothread because 
bdrv_open_driver can change the AioContext of a BDS (causing extreme 
confusion on what runs where and what AioContext locks have to be taken 
and released).


So in order to have bdrv_open_driver() run in the main thread, Emanuele 
added a variant of generated_co_wrapper that waits on the main thread. 
But it didn't work either, because then AIO_WAIT_WHILE() does not 
release the BDS's AioContext lock.


When ->bdrv_open() calls bdrv_replace_child_noperm() and it tries to 
drain, bdrv_co_yield_to_drain() schedules a bottom half on the iothread 
and yields; the bottom half can never run, because the AioContext lock 
is already taken elsewhere.


main thread ctx home thread
aio_context_acquire(ctx)
bdrv_open()
  drv->bdrv_co_open()
bdrv_replace_child_noperm()
  bdrv_co_yield_to_drain()
aio_context_release(ctx)
aio_schedule_bh_oneshot()

So, bdrv_open_driver() and bdrv_close() are un-coroutin-izable.  I guess 
we could split the callback in two parts, one doing I/O and one doing 
graph manipulation (it may even be a good idea---the ->bdrv_inactivate() 
callback even exists already in the case of bdrv_close()---but I'm not 
sure it's always applicable).



Come to think of it, I believe that many of the functions we declared
IO_CODE() are actually just IO_OR_GS_CODE() (at best; for iothreads,
they certainly require running in the home thread, but the main thread
allowed by IO_OR_GS_CODE() might not work). We have all the coroutine
machinery so that the AioContext lock of the current thread is
automatically released and reacquired across yield. However, this is the
wrong AioContext when called from a different thread, so we need an
additional lock - which should be dropped during yield, too, but it
doesn't happen.


There's no need for additional locks.  Drivers have to be protected by 
individual locks, which are either QemuMutex and dropped during yield, 
or CoMutex.  A QemuMutex used to protect a CoQueue is also dropped 
safely during yield.


The IO_CODE() distinction is indeed mostly theoretical until the file 
I/O BlockDriver protocols are protected by the AioContext lock and 
therefore are actually IO_OR_GS_CODE().  But that's the least of our 
worries; if file-posix AIO is done on qemu_get_current_aio_context() 
instead of bdrv_get_aio_context(bs), the AioContext lock is not needed 
anymore for I/O.


Apart from file I/O, all drivers were thread-safe at some point (now it 
seems blklogwrites.c is not, but the code also seems not safe against 
power loss and I wonder if it should be just deleted).



Switching to drain for locking doesn't solve the problem, but only
possibly defer it. In order to complete the multiqueue work, we need
to make IO_CODE() functions actually conform to the definition of
IO_CODE(). Do we have a plan what this should look like in the final
state when all the multiqueue work is completed? Or will we only have
coroutine locks which don't have this problem?


The latter apart from the occasional QemuMutex, which is used only when 
it does not cause the problem.



* running the code in the BDS's home iothread is not possible for

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-24 Thread Paolo Bonzini

On 5/24/22 12:20, Stefan Hajnoczi wrote:

Maybe it's safe to run it without a lock because it runs after
virtio_set_status(vdev, 0) but I'd rather play it safe and protect s->rq
with a lock.


What does the lock protect?

A lock can prevent s->rq or req->vq corruption but it cannot prevent
request leaks. This loop's job is to free all requests so there is no
leak. If a lock is necessary then this code is already broken in a more
fundamental way because it can leak.


Yes, you're right.  This particular list is always accessed in the 
iothread (if any) and blk_drain() is enough.  virtio-blk should already 
not need aio_context_{acquire,release}.


It's worth a comment, though!

Paolo



Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-24 Thread Kevin Wolf
Am 18.05.2022 um 14:28 hat Emanuele Giuseppe Esposito geschrieben:
> label: // read till the end to see why I wrote this here
> 
> I was hoping someone from the "No" party would answer to your question,
> because as you know we reached this same conclusion together.
> 
> We thought about dropping the drain for various reasons, the main one
> (at least as far as I understood) is that we are not sure if something
> can still happen in between drain_begin/end, and it is a little bit
> confusing to use the same mechanism to block I/O and protect the graph.
> 
> We then thought about implementing a rwlock. A rdlock would clarify what
> we are protecting and who is using the lock. I had a rwlock draft
> implementation sent in this thread, but this also lead to additional
> problems.
> Main problem was that this new lock would introduce nested event loops,
> that together with such locking would just create deadlocks.
> If readers are in coroutines and writers are not (because graph
> operations are not running in coroutines), we have a lot of deadlocks.
> If a writer has to take the lock, it must wait all other readers to
> finish. And it does it by internally calling AIO_WAIT_WHILE, creating
> nested event loop. We don't know what could execute when polling for
> events, and for example another writer could be resumed.

Why is this a problem? Your AIO_WAIT_WHILE() condition would be that
there are neither readers nor writers, so you would just keep waiting
until the other writer is done.

> Ideally, we want writers in coroutines too.
>
> Additionally, many readers are running in what we call "mixed"
> functions: usually implemented automatically with generated_co_wrapper
> tag, they let a function (usually bdrv callback) run always in a
> coroutine, creating one if necessary. For example, bdrv_flush() makes
> sure hat bs->bdrv_co_flush() is always run in a coroutine.
> Such mixed functions are used in other callbacks too, making it really
> difficult to understand if we are in a coroutine or not, and mostly
> important make rwlock usage very difficult.

How do they make rwlock usage difficult?

*goes back to old IRC discussions*

Ah, the problem was not the AIO_WAIT_WHILE() while taking the lock, but
taking the lock first and then running an AIO_WAIT_WHILE() inside the
locked section. This is forbidden because the callbacks that run during
AIO_WAIT_WHILE() may in turn wait for the lock that you own, causing a
deadlock.

This is indeed a problem that running in coroutines would avoid because
the inner waiter would just yield and the outer one could complete its
job as soon as it's its turn.

My conclusion in the IRC discussion was that maybe we need to take the
graph locks when we're entering coroutine context, i.e. the "mixed"
functions would rdlock the graph when called from non-coroutine context
and would assume that it's already locked when called from coroutine
context.

> Which lead us to stepping back once more and try to convert all
> BlockDriverState callbacks in coroutines. This would greatly simplify
> rwlock usage, because we could make the rwlock coroutine-frendly
> (without any AIO_WAIT_WHILE, allowing a writer to wait for readers by
> just yielding and queuing itself in coroutine queues).
> 
> First step was then to convert all callbacks in coroutines, using
> generated_coroutine_wrapper (g_c_w).
> A typical g_c_w is implemented in this way:
>   if (qemu_in_coroutine()) {
>   callback();
>   } else { // much simplified
>   co = qemu_coroutine_create(callback);
>   bdrv_coroutine_enter(bs, co);
>   BDRV_POLL_WHILE(bs, coroutine_in_progress);
>   }
> Once all callbacks are implemented using g_c_w, we can start splitting
> the two sides of the if function to only create a coroutine when we are
> outside from a bdrv callback.
> 
> However, we immediately found a problem while starting to convert the
> first callbacks: the AioContext lock is taken around some non coroutine
> callbacks! For example, bs->bdrv_open() is always called with the
> AioContext lock taken. In addition, callbacks like bdrv_open are
> graph-modifying functions, which is probably why we are taking the
> Aiocontext lock, and they do not like to run in coroutines.
> Anyways, the real problem comes when we create a coroutine in such
> places where the AioContext lock is taken and we have a graph-modifying
> function.
> 
> bdrv_coroutine_enter() calls aio_co_enter(), which in turns first checks
>  if the coroutine is entering another context from the current (which is
> not the case for open) and if we are already in coroutine (for sure
> not). Therefore it resorts to the following calls;
>   aio_context_acquire(ctx);
> qemu_aio_coroutine_enter(ctx, co);
> aio_context_release(ctx);
> Which is clearly a problem, because we are taking the lock twice: once
> from the original caller of the callback, and once here due to the
> coroutine. This creates a lot of 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-24 Thread Kevin Wolf
Am 18.05.2022 um 14:43 hat Paolo Bonzini geschrieben:
> On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
> > For example, all callers of bdrv_open() always take the AioContext lock.
> > Often it is taken very high in the call stack, but it's always taken.
> 
> I think it's actually not a problem of who takes the AioContext lock or
> where; the requirements are contradictory:

Okay, now that I have explained which challenges I see with the drain
solution, I'd also like to understand the problem that even motivates
you to go back to drains.

> * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
> be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)

Am I right to say this is not inherently part of the definition of
IO_OR_GS_CODE(), but just a property that these functions have in
practice?

In practice, the functions that are IO_OR_GS_CODE() are those that call
AIO_WAIT_WHILE() - which drops the lock, so it must have been taken
first. Of course, when calling from coroutine context, AIO_WAIT_WHILE()
is wrong, so these functions all have a different code path for
coroutines (or they aren't suitable to be called in coroutines at all).

Using a different code path means that the restrictions from
AIO_WAIT_WHILE() don't really apply any more and these functions become
effectively IO_CODE() when called in a coroutine. (At least I'm not
aware of any other piece of code apart from AIO_WAIT_WHILE() that makes
a function IO_OR_GS_CODE().)

Surrounding IO_CODE() with aio_context_acquire/release() is in the
definition of IO_CODE(), so your assumption seems right. (Not sure if
it's actually necessary in all cases and whether all callers do this
correctly, but with this definition we have declared that expections to
this are in fact bugs.)

(You mention bdrv_co_yield_to_drain() as an example. I don't think it's
a good example. The function isn't annotated, but it seems to me that
the correct annotation would be IO_CODE() anyway, i.e. safe to call from
any thread, not just IO_OR_GS_CODE().)

> * to call these functions with the lock taken, the code has to run in the
> BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
> main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
> happen without releasing the aiocontext lock)

This problem can't happen in the main thread itself, AIO_WAIT_WHILE() is
safe both in the home thread and the main thread (at least as long as
you lock only once) because it temporarily drops the lock. It has also
become the definition of IO_OR_GS_CODE(): This code has to run in the
home thread or the main thread.


Of course, above I just concluded that when called from coroutines, in
practice IO_OR_GS_CODE() essentially turns into IO_CODE(). This is
supposed to allow much more:

 * I/O API functions. These functions are thread-safe, and therefore
 * can run in any thread as long as the thread has called
 * aio_context_acquire/release().

Come to think of it, I believe that many of the functions we declared
IO_CODE() are actually just IO_OR_GS_CODE() (at best; for iothreads,
they certainly require running in the home thread, but the main thread
allowed by IO_OR_GS_CODE() might not work). We have all the coroutine
machinery so that the AioContext lock of the current thread is
automatically released and reacquired across yield. However, this is the
wrong AioContext when called from a different thread, so we need an
additional lock - which should be dropped during yield, too, but it
doesn't happen.

Maybe this is really the scenario you mean with this point?

Switching to drain for locking doesn't solve the problem, but only
possibly defer it. In order to complete the multiqueue work, we need
to make IO_CODE() functions actually conform to the definition of
IO_CODE().

Do we have a plan what this should look like in the final state when all
the multiqueue work is completed? Will it require replacing the more or
less automatic AioContext lock handling that we currently have in
coroutines with explicit unlock/lock around all yield points? I assume
that some kind of lock will still have to be held and it wouldn't just
disappear with the removal of the AioContext lock? Or will we only have
coroutine locks which don't have this problem?

> * running the code in the BDS's home iothread is not possible for
> GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
> thread, but that cannot be guaranteed in general)

There is nothing that stops GLOBAL_STATE_CODE() from scheduling work in
the home iothread of a BDS and then waiting for it - or if it is a
coroutine, even to reschedule itself into the BDS home thread
temporarily.

This is essentially what all of the generated_co_wrapper functions do,
and the coroutine case is what bdrv_co_enter() does.

So I'm not sure what you mean by this?

Kevin




Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-24 Thread Stefan Hajnoczi
On Tue, May 24, 2022 at 11:17:06AM +0200, Paolo Bonzini wrote:
> On 5/24/22 10:08, Stefan Hajnoczi wrote:
> > On Tue, May 24, 2022 at 09:55:39AM +0200, Paolo Bonzini wrote:
> > > On 5/22/22 17:06, Stefan Hajnoczi wrote:
> > > > However, I hit on a problem that I think Emanuele and Paolo have already
> > > > pointed out: draining is GS & IO. This might have worked under the 1 
> > > > IOThread
> > > > model but it does not make sense for multi-queue. It is possible to 
> > > > submit I/O
> > > > requests in drained sections. How can multiple threads be in drained 
> > > > sections
> > > > simultaneously and possibly submit further I/O requests in their drained
> > > > sections? Those sections wouldn't be "drained" in any useful sense of 
> > > > the word.
> > > 
> > > Yeah, that works only if the drained sections are well-behaved.
> > > 
> > > "External" sources of I/O are fine; they are disabled using is_external, 
> > > and
> > > don't drain themselves I think.
> > 
> > I/O requests for a given BDS may be executing in multiple AioContexts,
> > so how do you call aio_disable_external() on all relevant AioContexts?
> 
> With multiqueue yeah, we have to replace aio_disable_external() with
> drained_begin/end() callbacks; but I'm not talking about that yet.
> 
> > > In parallel to the block layer discussions, it's possible to work on
> > > introducing a request queue lock in virtio-blk and virtio-scsi.  That's 
> > > the
> > > only thing that relies on the AioContext lock outside the block layer.
> > 
> > I'm not sure what the request queue lock protects in virtio-blk? In
> > virtio-scsi I guess a lock is needed to protect SCSI target emulation
> > state?
> 
> Yes, but even in virtio-blk there is this code that runs in the main thread
> and is currently protected by aio_context_acquire/release:
> 
> blk_drain(s->blk);
> 
> /* We drop queued requests after blk_drain() because blk_drain()
>  * itself can produce them. */
> while (s->rq) {
> req = s->rq;
> s->rq = req->next;
> virtqueue_detach_element(req->vq, >elem, 0);
> virtio_blk_free_request(req);
> }
> 
> Maybe it's safe to run it without a lock because it runs after
> virtio_set_status(vdev, 0) but I'd rather play it safe and protect s->rq
> with a lock.

What does the lock protect?

A lock can prevent s->rq or req->vq corruption but it cannot prevent
request leaks. This loop's job is to free all requests so there is no
leak. If a lock is necessary then this code is already broken in a more
fundamental way because it can leak.

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-24 Thread Paolo Bonzini

On 5/24/22 10:08, Stefan Hajnoczi wrote:

On Tue, May 24, 2022 at 09:55:39AM +0200, Paolo Bonzini wrote:

On 5/22/22 17:06, Stefan Hajnoczi wrote:

However, I hit on a problem that I think Emanuele and Paolo have already
pointed out: draining is GS & IO. This might have worked under the 1 IOThread
model but it does not make sense for multi-queue. It is possible to submit I/O
requests in drained sections. How can multiple threads be in drained sections
simultaneously and possibly submit further I/O requests in their drained
sections? Those sections wouldn't be "drained" in any useful sense of the word.


Yeah, that works only if the drained sections are well-behaved.

"External" sources of I/O are fine; they are disabled using is_external, and
don't drain themselves I think.


I/O requests for a given BDS may be executing in multiple AioContexts,
so how do you call aio_disable_external() on all relevant AioContexts?


With multiqueue yeah, we have to replace aio_disable_external() with 
drained_begin/end() callbacks; but I'm not talking about that yet.



In parallel to the block layer discussions, it's possible to work on
introducing a request queue lock in virtio-blk and virtio-scsi.  That's the
only thing that relies on the AioContext lock outside the block layer.


I'm not sure what the request queue lock protects in virtio-blk? In
virtio-scsi I guess a lock is needed to protect SCSI target emulation
state?


Yes, but even in virtio-blk there is this code that runs in the main 
thread and is currently protected by aio_context_acquire/release:


blk_drain(s->blk);

/* We drop queued requests after blk_drain() because blk_drain()
 * itself can produce them. */
while (s->rq) {
req = s->rq;
s->rq = req->next;
virtqueue_detach_element(req->vq, >elem, 0);
virtio_blk_free_request(req);
}

Maybe it's safe to run it without a lock because it runs after 
virtio_set_status(vdev, 0) but I'd rather play it safe and protect s->rq 
with a lock.


Paolo



Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-24 Thread Stefan Hajnoczi
On Tue, May 24, 2022 at 09:55:39AM +0200, Paolo Bonzini wrote:
> On 5/22/22 17:06, Stefan Hajnoczi wrote:
> > However, I hit on a problem that I think Emanuele and Paolo have already
> > pointed out: draining is GS & IO. This might have worked under the 1 
> > IOThread
> > model but it does not make sense for multi-queue. It is possible to submit 
> > I/O
> > requests in drained sections. How can multiple threads be in drained 
> > sections
> > simultaneously and possibly submit further I/O requests in their drained
> > sections? Those sections wouldn't be "drained" in any useful sense of the 
> > word.
> 
> Yeah, that works only if the drained sections are well-behaved.
> 
> "External" sources of I/O are fine; they are disabled using is_external, and
> don't drain themselves I think.

I/O requests for a given BDS may be executing in multiple AioContexts,
so how do you call aio_disable_external() on all relevant AioContexts?

We covered this below but I wanted to reply here in case someone else
reads this part without reading the rest.

> Mirror is the only I/O user of drain, and it's fine because it never submits
> I/O to the drained BDS.
>
> Drained sections in the main thread can be special cased to allow I/O
> (wrlock in this series would also allow I/O).
> 
> So I think that the "cooperation from all relevant places" that Kevin
> mentioned is already there, except for coroutine commands in the monitor.
> Those are a bad idea in my opinion and I'd rather revert commit eb94b81a94
> ("block: Convert 'block_resize' to coroutine") until we have a clearer idea
> of how to handle them.
> 
> I agree that it's basically impossible to review the change.  On the other
> hand, there's already a substantial amount of faith involved in the
> correctness of the current code.
> 
> In particular the AioContext lock does absolutely nothing to protect
> corutines in the main thread against graph changes---both from the monitor
> (including BHs as in "block: Fix BB.root changing across bdrv_next()") and
> from BDS coroutines.  The former are unprotected; the latter are protected
> by drain only: using drain to protect against graph writes would be a matter
> of extending *existing* faith to the multi-iothread case.
> 
> Once the deadlock is broken, we can proceed to remove the AioContext lock
> and then introduce actual coroutine-based locking.
> 
> > Possible steps for AioContext removal
> > -
> > I also wanted to share my assumptions about multi-queue and AioContext 
> > removal.
> > Please let me know if anything seems wrong or questionable:
> > 
> > - IO code can execute in any thread that has an AioContext.
> > - Multiple threads may execute a IO code at the same time.
> > - GS code only execute under the BQL.
> > 
> > For AioContext removal this means:
> > 
> > - bdrv_get_aio_context() becomes mostly meaningless since there is no need 
> > for
> >a special "home" AioContext.
> 
> Correct.  bdrv_set_aio_context() remains useful as a way to set a home
> AioContext for sockets.
> 
> > - bdrv_coroutine_enter() becomes mostly meaningless because there is no 
> > need to
> >run a coroutine in the BDS's AioContext.
> > - aio_disable_external(bdrv_get_aio_context(bs)) no longer works because 
> > many
> >threads/AioContexts may submit new I/O requests. 
> > BlockDevOps.drained_begin()
> >may be used instead (e.g. to temporarily disable ioeventfds on a 
> > multi-queue
> >virtio-blk device).
> 
> This is a change that can be done independent of this work.
> 
> > - AIO_WAIT_WHILE() simplifies to
> > 
> >  while ((cond)) {
> >  aio_poll(qemu_get_current_aio_context(), true);
> >  ...
> >  }
> > 
> >and the distinction between home AioContext and non-home context is
> >eliminated. AioContext unlocking is dropped.
> 
> (I'll reply on this from elsewhere in the thread).
> 
> > Does this make sense? I haven't seen these things in recent patch series.
> 
> I agree, and yeah all these are blocked on protecting graph modifications.
> 
> In parallel to the block layer discussions, it's possible to work on
> introducing a request queue lock in virtio-blk and virtio-scsi.  That's the
> only thing that relies on the AioContext lock outside the block layer.

I'm not sure what the request queue lock protects in virtio-blk? In
virtio-scsi I guess a lock is needed to protect SCSI target emulation
state?

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-24 Thread Paolo Bonzini

On 5/22/22 17:06, Stefan Hajnoczi wrote:

However, I hit on a problem that I think Emanuele and Paolo have already
pointed out: draining is GS & IO. This might have worked under the 1 IOThread
model but it does not make sense for multi-queue. It is possible to submit I/O
requests in drained sections. How can multiple threads be in drained sections
simultaneously and possibly submit further I/O requests in their drained
sections? Those sections wouldn't be "drained" in any useful sense of the word.


Yeah, that works only if the drained sections are well-behaved.

"External" sources of I/O are fine; they are disabled using is_external, 
and don't drain themselves I think.


Mirror is the only I/O user of drain, and it's fine because it never 
submits I/O to the drained BDS.


Drained sections in the main thread can be special cased to allow I/O 
(wrlock in this series would also allow I/O).


So I think that the "cooperation from all relevant places" that Kevin 
mentioned is already there, except for coroutine commands in the 
monitor.  Those are a bad idea in my opinion and I'd rather revert 
commit eb94b81a94 ("block: Convert 'block_resize' to coroutine") until 
we have a clearer idea of how to handle them.


I agree that it's basically impossible to review the change.  On the 
other hand, there's already a substantial amount of faith involved in 
the correctness of the current code.


In particular the AioContext lock does absolutely nothing to protect 
corutines in the main thread against graph changes---both from the 
monitor (including BHs as in "block: Fix BB.root changing across 
bdrv_next()") and from BDS coroutines.  The former are unprotected; the 
latter are protected by drain only: using drain to protect against graph 
writes would be a matter of extending *existing* faith to the 
multi-iothread case.


Once the deadlock is broken, we can proceed to remove the AioContext 
lock and then introduce actual coroutine-based locking.



Possible steps for AioContext removal
-
I also wanted to share my assumptions about multi-queue and AioContext removal.
Please let me know if anything seems wrong or questionable:

- IO code can execute in any thread that has an AioContext.
- Multiple threads may execute a IO code at the same time.
- GS code only execute under the BQL.

For AioContext removal this means:

- bdrv_get_aio_context() becomes mostly meaningless since there is no need for
   a special "home" AioContext.


Correct.  bdrv_set_aio_context() remains useful as a way to set a home 
AioContext for sockets.



- bdrv_coroutine_enter() becomes mostly meaningless because there is no need to
   run a coroutine in the BDS's AioContext.
- aio_disable_external(bdrv_get_aio_context(bs)) no longer works because many
   threads/AioContexts may submit new I/O requests. BlockDevOps.drained_begin()
   may be used instead (e.g. to temporarily disable ioeventfds on a multi-queue
   virtio-blk device).


This is a change that can be done independent of this work.


- AIO_WAIT_WHILE() simplifies to

 while ((cond)) {
 aio_poll(qemu_get_current_aio_context(), true);
 ...
 }

   and the distinction between home AioContext and non-home context is
   eliminated. AioContext unlocking is dropped.


(I'll reply on this from elsewhere in the thread).


Does this make sense? I haven't seen these things in recent patch series.


I agree, and yeah all these are blocked on protecting graph modifications.

In parallel to the block layer discussions, it's possible to work on 
introducing a request queue lock in virtio-blk and virtio-scsi.  That's 
the only thing that relies on the AioContext lock outside the block layer.


Paolo




Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-23 Thread Stefan Hajnoczi
On Mon, May 23, 2022 at 06:04:55PM +0200, Kevin Wolf wrote:
> Am 23.05.2022 um 17:13 hat Stefan Hajnoczi geschrieben:
> > On Mon, May 23, 2022 at 03:02:05PM +0200, Kevin Wolf wrote:
> > > Am 22.05.2022 um 17:06 hat Stefan Hajnoczi geschrieben:
> > > > Hi,
> > > > Sorry for the long email. I've included three topics that may help us 
> > > > discuss
> > > > draining and AioContext removal further.
> > > > 
> > > > The shortcomings of drain
> > > > -
> > > > I wanted to explore the logical argument that making graph 
> > > > modifications within
> > > > a drained section is correct:
> > > > - Graph modifications and BB/BDS lookup are Global State (GS).
> > > > - Graph traversal from a BB/BDS pointer is IO.
> > > > - Only one thread executes GS code at a time.
> > > > - IO is quiesced within a drained section.
> > > 
> > > I think you're mixing two things here and calling both of them IO.
> > > 
> > > If a function is marked as IO, that means that it is safe to call from
> > > I/O requests, which may be running in any iothread (they currently only
> > > run in the home thread of the node's AioContext, but the function
> > > annotations already promise that any thread is safe).
> > > 
> > > However, if a function is marked as IO, this doesn't necessarily mean
> > > that it is always only called in the context of an I/O request. It can
> > > be called by any other code, too.
> > > 
> > > So while drain does quiesce all I/O requests, this doesn't mean that
> > > functions marked as IO won't run any more.
> > 
> > My model is that between bdrv_drained_begin() and bdrv_drained_end()
> > only the caller is allowed to invoke BB/BDS functions (including
> > functions marked IO).
> 
> Okay, this sounds better. It's not actually related to IO/GS, but to
> BB/BDS functions, including both IO and GS functions.
> 
> So graph traversal from a BB/BDS pointer makes it a BB/BDS function, and
> BB/BDS functions need to be quiesced within a drained section.
> 
> > The caller isn't strictly one thread and one or no coroutines. The
> > caller could use threads and coroutines, but the important thing is
> > that everything else that accesses the BB/BDS is suspended due to
> > .drained_begin() callbacks (and is_external).
> > 
> > So when you say a function marked IO can be called from outside an I/O
> > request, that "outside an I/O request" code must be quiesced too.
> > Otherwise drain is definitely not safe for graph modifications.
> 
> If you phrase it as a condition and as a goal to achieve, then I agree
> that this is required when you want to use drain for locking.
> 
> My impression was that you were using this to argue that the code is
> already doing this and is already safe in this scenario, and this isn't
> true. I think I misunderstood you there and you were really describing
> the future state that you're envisioning.

Sorry, I didn't present it clearly. I'm trying to think through the
model needed to make graph modifications under drain safe.

> > > > - Therefore a drained section in GS code suspends graph traversal, 
> > > > other graph
> > > >   modifications, and BB/BDS lookup.
> > > > - Therefore it is safe to modify the graph from a GS drained section.
> > > 
> > > So unfortunately, I don't think this conclusion is correct.
> > > 
> > > In order to make your assumption true, we would have to require that all
> > > callers of IO functions must have called bdrv_inc_in_flight(). We would
> > > also have to find a way to enforce this preferable at compile time or
> > > with static analysis, or at least with runtime assertions, because it
> > > would be very easy to get wrong.
> > 
> > Or that they are quiesced by .drained_begin() callbacks or is_external.
> > 
> > Do you have a concrete example?
> 
> Yes, you're right, bdrv_inc_in_flight() is only one way to achieve this.
> They just need to make bdrv_drain_poll() return true as long as they are
> active so that bdrv_drained_begin() waits for them. .drained_poll() is
> another valid way to achieve this.
> 
> However, if we want to rely on static analysis to verify that everything
> is covered, always requiring bdrv_inc_in_flight() might make this
> easier. So possibly we want to require it even in cases where
> .drained_poll() or aio_disable_external() would be enough in theory.
> 
> > > 
> > > > However, I hit on a problem that I think Emanuele and Paolo have already
> > > > pointed out: draining is GS & IO. This might have worked under the 1 
> > > > IOThread
> > > > model but it does not make sense for multi-queue. It is possible to 
> > > > submit I/O
> > > > requests in drained sections. How can multiple threads be in drained 
> > > > sections
> > > > simultaneously and possibly submit further I/O requests in their drained
> > > > sections? Those sections wouldn't be "drained" in any useful sense of 
> > > > the word.
> > > 
> > > Drains asks other users not to touch the block node. Currently this only
> > 
> > BTW interesting language 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-23 Thread Kevin Wolf
Am 23.05.2022 um 17:13 hat Stefan Hajnoczi geschrieben:
> On Mon, May 23, 2022 at 03:02:05PM +0200, Kevin Wolf wrote:
> > Am 22.05.2022 um 17:06 hat Stefan Hajnoczi geschrieben:
> > > Hi,
> > > Sorry for the long email. I've included three topics that may help us 
> > > discuss
> > > draining and AioContext removal further.
> > > 
> > > The shortcomings of drain
> > > -
> > > I wanted to explore the logical argument that making graph modifications 
> > > within
> > > a drained section is correct:
> > > - Graph modifications and BB/BDS lookup are Global State (GS).
> > > - Graph traversal from a BB/BDS pointer is IO.
> > > - Only one thread executes GS code at a time.
> > > - IO is quiesced within a drained section.
> > 
> > I think you're mixing two things here and calling both of them IO.
> > 
> > If a function is marked as IO, that means that it is safe to call from
> > I/O requests, which may be running in any iothread (they currently only
> > run in the home thread of the node's AioContext, but the function
> > annotations already promise that any thread is safe).
> > 
> > However, if a function is marked as IO, this doesn't necessarily mean
> > that it is always only called in the context of an I/O request. It can
> > be called by any other code, too.
> > 
> > So while drain does quiesce all I/O requests, this doesn't mean that
> > functions marked as IO won't run any more.
> 
> My model is that between bdrv_drained_begin() and bdrv_drained_end()
> only the caller is allowed to invoke BB/BDS functions (including
> functions marked IO).

Okay, this sounds better. It's not actually related to IO/GS, but to
BB/BDS functions, including both IO and GS functions.

So graph traversal from a BB/BDS pointer makes it a BB/BDS function, and
BB/BDS functions need to be quiesced within a drained section.

> The caller isn't strictly one thread and one or no coroutines. The
> caller could use threads and coroutines, but the important thing is
> that everything else that accesses the BB/BDS is suspended due to
> .drained_begin() callbacks (and is_external).
> 
> So when you say a function marked IO can be called from outside an I/O
> request, that "outside an I/O request" code must be quiesced too.
> Otherwise drain is definitely not safe for graph modifications.

If you phrase it as a condition and as a goal to achieve, then I agree
that this is required when you want to use drain for locking.

My impression was that you were using this to argue that the code is
already doing this and is already safe in this scenario, and this isn't
true. I think I misunderstood you there and you were really describing
the future state that you're envisioning.

> > > - Therefore a drained section in GS code suspends graph traversal, other 
> > > graph
> > >   modifications, and BB/BDS lookup.
> > > - Therefore it is safe to modify the graph from a GS drained section.
> > 
> > So unfortunately, I don't think this conclusion is correct.
> > 
> > In order to make your assumption true, we would have to require that all
> > callers of IO functions must have called bdrv_inc_in_flight(). We would
> > also have to find a way to enforce this preferable at compile time or
> > with static analysis, or at least with runtime assertions, because it
> > would be very easy to get wrong.
> 
> Or that they are quiesced by .drained_begin() callbacks or is_external.
> 
> Do you have a concrete example?

Yes, you're right, bdrv_inc_in_flight() is only one way to achieve this.
They just need to make bdrv_drain_poll() return true as long as they are
active so that bdrv_drained_begin() waits for them. .drained_poll() is
another valid way to achieve this.

However, if we want to rely on static analysis to verify that everything
is covered, always requiring bdrv_inc_in_flight() might make this
easier. So possibly we want to require it even in cases where
.drained_poll() or aio_disable_external() would be enough in theory.

> > 
> > > However, I hit on a problem that I think Emanuele and Paolo have already
> > > pointed out: draining is GS & IO. This might have worked under the 1 
> > > IOThread
> > > model but it does not make sense for multi-queue. It is possible to 
> > > submit I/O
> > > requests in drained sections. How can multiple threads be in drained 
> > > sections
> > > simultaneously and possibly submit further I/O requests in their drained
> > > sections? Those sections wouldn't be "drained" in any useful sense of the 
> > > word.
> > 
> > Drains asks other users not to touch the block node. Currently this only
> 
> BTW interesting language choice here: you're using the more general
> definition of "other users" and "touch[ing] the block node" rather than
> the narrow definition of just "I/O requests". That's exactly how I think
> of drain but based on what you wrote above I'm not sure that's how you
> think of it too?

I hope that my reply above made it clearer: The broader definition is
what is needed if we want to use 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-23 Thread Stefan Hajnoczi
On Mon, May 23, 2022 at 03:02:05PM +0200, Kevin Wolf wrote:
> Am 22.05.2022 um 17:06 hat Stefan Hajnoczi geschrieben:
> > Hi,
> > Sorry for the long email. I've included three topics that may help us 
> > discuss
> > draining and AioContext removal further.
> > 
> > The shortcomings of drain
> > -
> > I wanted to explore the logical argument that making graph modifications 
> > within
> > a drained section is correct:
> > - Graph modifications and BB/BDS lookup are Global State (GS).
> > - Graph traversal from a BB/BDS pointer is IO.
> > - Only one thread executes GS code at a time.
> > - IO is quiesced within a drained section.
> 
> I think you're mixing two things here and calling both of them IO.
> 
> If a function is marked as IO, that means that it is safe to call from
> I/O requests, which may be running in any iothread (they currently only
> run in the home thread of the node's AioContext, but the function
> annotations already promise that any thread is safe).
> 
> However, if a function is marked as IO, this doesn't necessarily mean
> that it is always only called in the context of an I/O request. It can
> be called by any other code, too.
> 
> So while drain does quiesce all I/O requests, this doesn't mean that
> functions marked as IO won't run any more.

My model is that between bdrv_drained_begin() and bdrv_drained_end()
only the caller is allowed to invoke BB/BDS functions (including
functions marked IO).

The caller isn't strictly one thread and one or no coroutines. The
caller could use threads and coroutines, but the important thing is that
everything else that accesses the BB/BDS is suspended due to
.drained_begin() callbacks (and is_external).

So when you say a function marked IO can be called from outside an I/O
request, that "outside an I/O request" code must be quiesced too.
Otherwise drain is definitely not safe for graph modifications.

> > - Therefore a drained section in GS code suspends graph traversal, other 
> > graph
> >   modifications, and BB/BDS lookup.
> > - Therefore it is safe to modify the graph from a GS drained section.
> 
> So unfortunately, I don't think this conclusion is correct.
> 
> In order to make your assumption true, we would have to require that all
> callers of IO functions must have called bdrv_inc_in_flight(). We would
> also have to find a way to enforce this preferable at compile time or
> with static analysis, or at least with runtime assertions, because it
> would be very easy to get wrong.

Or that they are quiesced by .drained_begin() callbacks or is_external.

Do you have a concrete example?

> 
> > However, I hit on a problem that I think Emanuele and Paolo have already
> > pointed out: draining is GS & IO. This might have worked under the 1 
> > IOThread
> > model but it does not make sense for multi-queue. It is possible to submit 
> > I/O
> > requests in drained sections. How can multiple threads be in drained 
> > sections
> > simultaneously and possibly submit further I/O requests in their drained
> > sections? Those sections wouldn't be "drained" in any useful sense of the 
> > word.
> 
> Drains asks other users not to touch the block node. Currently this only

BTW interesting language choice here: you're using the more general
definition of "other users" and "touch[ing] the block node" rather than
the narrow definition of just "I/O requests". That's exactly how I think
of drain but based on what you wrote above I'm not sure that's how you
think of it too?

> includes, but the desire to use drain for locking means that we need to
> extend it to pretty much any operation on the node, which would include
> calling drain on that block node. So you should never have two callers
> in a drain section at the same time, it would be a bug in this model.

Yes! Drain in its current form doesn't make sense for multi-queue since
multiple threads could be in drained sections at the same time and they
would all be allowed to submit new I/O requests.

> Of course, we know that currently drain is not respected by all relevant
> callers (most importantly, the monitor). If you want to use drain as a
> form of locking, you need to solve this first.
> 
> > It is necessary to adjust how recursive drained sections work:
> > bdrv_drained_begin() must not return while there are deeper nested drained
> > sections.
> > 
> > This is allowed:
> > 
> >  Monitor commandBlock job
> >  ----
> >   > bdrv_drained_begin()
> >. > bdrv_drained_begin()
> >. < bdrv_drained_begin()
> >.  .
> >.  .
> >. > bdrv_drained_end()
> >. < bdrv_drained_end()
> >   < bdrv_drained_begin()
> >.
> >.
> >   > bdrv_drained_end()
> >   < bdrv_drained_end()
> 
> Just to make sure I understand the scenario that you 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-23 Thread Emanuele Giuseppe Esposito



Am 23/05/2022 um 15:15 schrieb Stefan Hajnoczi:
> On Mon, May 23, 2022 at 10:48:48AM +0200, Emanuele Giuseppe Esposito wrote:
>>
>>
>> Am 22/05/2022 um 17:06 schrieb Stefan Hajnoczi:
>>> On Wed, May 18, 2022 at 06:14:17PM +0200, Kevin Wolf wrote:
 Am 18.05.2022 um 14:43 hat Paolo Bonzini geschrieben:
> On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
>> For example, all callers of bdrv_open() always take the AioContext lock.
>> Often it is taken very high in the call stack, but it's always taken.
>
> I think it's actually not a problem of who takes the AioContext lock or
> where; the requirements are contradictory:
>
> * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
> be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)
>
> * to call these functions with the lock taken, the code has to run in the
> BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
> main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
> happen without releasing the aiocontext lock)
>
> * running the code in the BDS's home iothread is not possible for
> GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
> thread, but that cannot be guaranteed in general)
>
>> We might suppose that many callbacks are called under drain and in
>> GLOBAL_STATE, which should be enough, but from our experimentation in
>> the previous series we saw that currently not everything is under drain,
>> leaving some operations unprotected (remember assert_graph_writable
>> temporarily disabled, since drain coverage for bdrv_replace_child_noperm
>> was not 100%?).
>> Therefore we need to add more drains. But isn't drain what we decided to
>> drop at the beginning? Why isn't drain good?
>
> To sum up the patch ordering deadlock that we have right now:
>
> * in some cases, graph manipulations are protected by the AioContext lock
>
> * eliminating the AioContext lock is needed to move callbacks to coroutine
> contexts (see above for the deadlock scenario)
>
> * moving callbacks to coroutine context is needed by the graph rwlock
> implementation
>
> On one hand, we cannot protect the graph across manipulations with a graph
> rwlock without removing the AioContext lock; on the other hand, the
> AioContext lock is what _right now_ protects the graph.
>
> So I'd rather go back to Emanuele's draining approach.  It may not be
> beautiful, but it allows progress.  Once that is in place, we can remove 
> the
> AioContext lock (which mostly protects virtio-blk/virtio-scsi code right
> now) and reevaluate our next steps.

 If we want to use drain for locking, we need to make sure that drain
 actually does the job correctly. I see two major problems with it:

 The first one is that drain only covers I/O paths, but we need to
 protect against _anything_ touching block nodes. This might mean a
 massive audit and making sure that everything in QEMU that could
 possibly touch a block node is integrated with drain.

 I think Emanuele has argued before that because writes to the graph only
 happen in the main thread and we believe that currently only I/O
 requests are processed in iothreads, this is safe and we don't actually
 need to audit everything.

 This is true as long as the assumption holds true (how do we ensure that
 nobody ever introduces non-I/O code touching a block node in an
 iothread?) and as long as the graph writer never yields or polls. I
 think the latter condition is violated today, a random example is that
 adjusting drain counts in bdrv_replace_child_noperm() does poll. Without
 cooperation from all relevant places, the effectively locked code
 section ends right there, even if the drained section continues. Even if
 we can fix this, verifying that the conditions are met everywhere seems
 not trivial.

 And that's exactly my second major concern: Even if we manage to
 correctly implement things with drain, I don't see a way to meaningfully
 review it. I just don't know how to verify with some confidence that
 it's actually correct and covering everything that needs to be covered.

 Drain is not really a lock. But if you use it as one, the best it can
 provide is advisory locking (callers, inside and outside the block
 layer, need to explicitly support drain instead of having the lock
 applied somewhere in the block layer functions). And even if all
 relevant pieces actually make use of it, it still has an awkward
 interface for locking:

 /* Similar to rdlock(), but doesn't wait for writers to finish. It is
  * the callers responsibility to make sure that there are no writers. */
 bdrv_inc_in_flight()


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-23 Thread Stefan Hajnoczi
On Mon, May 23, 2022 at 10:48:48AM +0200, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 22/05/2022 um 17:06 schrieb Stefan Hajnoczi:
> > On Wed, May 18, 2022 at 06:14:17PM +0200, Kevin Wolf wrote:
> >> Am 18.05.2022 um 14:43 hat Paolo Bonzini geschrieben:
> >>> On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
>  For example, all callers of bdrv_open() always take the AioContext lock.
>  Often it is taken very high in the call stack, but it's always taken.
> >>>
> >>> I think it's actually not a problem of who takes the AioContext lock or
> >>> where; the requirements are contradictory:
> >>>
> >>> * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
> >>> be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)
> >>>
> >>> * to call these functions with the lock taken, the code has to run in the
> >>> BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
> >>> main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
> >>> happen without releasing the aiocontext lock)
> >>>
> >>> * running the code in the BDS's home iothread is not possible for
> >>> GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
> >>> thread, but that cannot be guaranteed in general)
> >>>
>  We might suppose that many callbacks are called under drain and in
>  GLOBAL_STATE, which should be enough, but from our experimentation in
>  the previous series we saw that currently not everything is under drain,
>  leaving some operations unprotected (remember assert_graph_writable
>  temporarily disabled, since drain coverage for bdrv_replace_child_noperm
>  was not 100%?).
>  Therefore we need to add more drains. But isn't drain what we decided to
>  drop at the beginning? Why isn't drain good?
> >>>
> >>> To sum up the patch ordering deadlock that we have right now:
> >>>
> >>> * in some cases, graph manipulations are protected by the AioContext lock
> >>>
> >>> * eliminating the AioContext lock is needed to move callbacks to coroutine
> >>> contexts (see above for the deadlock scenario)
> >>>
> >>> * moving callbacks to coroutine context is needed by the graph rwlock
> >>> implementation
> >>>
> >>> On one hand, we cannot protect the graph across manipulations with a graph
> >>> rwlock without removing the AioContext lock; on the other hand, the
> >>> AioContext lock is what _right now_ protects the graph.
> >>>
> >>> So I'd rather go back to Emanuele's draining approach.  It may not be
> >>> beautiful, but it allows progress.  Once that is in place, we can remove 
> >>> the
> >>> AioContext lock (which mostly protects virtio-blk/virtio-scsi code right
> >>> now) and reevaluate our next steps.
> >>
> >> If we want to use drain for locking, we need to make sure that drain
> >> actually does the job correctly. I see two major problems with it:
> >>
> >> The first one is that drain only covers I/O paths, but we need to
> >> protect against _anything_ touching block nodes. This might mean a
> >> massive audit and making sure that everything in QEMU that could
> >> possibly touch a block node is integrated with drain.
> >>
> >> I think Emanuele has argued before that because writes to the graph only
> >> happen in the main thread and we believe that currently only I/O
> >> requests are processed in iothreads, this is safe and we don't actually
> >> need to audit everything.
> >>
> >> This is true as long as the assumption holds true (how do we ensure that
> >> nobody ever introduces non-I/O code touching a block node in an
> >> iothread?) and as long as the graph writer never yields or polls. I
> >> think the latter condition is violated today, a random example is that
> >> adjusting drain counts in bdrv_replace_child_noperm() does poll. Without
> >> cooperation from all relevant places, the effectively locked code
> >> section ends right there, even if the drained section continues. Even if
> >> we can fix this, verifying that the conditions are met everywhere seems
> >> not trivial.
> >>
> >> And that's exactly my second major concern: Even if we manage to
> >> correctly implement things with drain, I don't see a way to meaningfully
> >> review it. I just don't know how to verify with some confidence that
> >> it's actually correct and covering everything that needs to be covered.
> >>
> >> Drain is not really a lock. But if you use it as one, the best it can
> >> provide is advisory locking (callers, inside and outside the block
> >> layer, need to explicitly support drain instead of having the lock
> >> applied somewhere in the block layer functions). And even if all
> >> relevant pieces actually make use of it, it still has an awkward
> >> interface for locking:
> >>
> >> /* Similar to rdlock(), but doesn't wait for writers to finish. It is
> >>  * the callers responsibility to make sure that there are no writers. */
> >> bdrv_inc_in_flight()
> >>
> >> /* Similar to wrlock(). Waits for readers to 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-23 Thread Kevin Wolf
Am 22.05.2022 um 17:06 hat Stefan Hajnoczi geschrieben:
> Hi,
> Sorry for the long email. I've included three topics that may help us discuss
> draining and AioContext removal further.
> 
> The shortcomings of drain
> -
> I wanted to explore the logical argument that making graph modifications 
> within
> a drained section is correct:
> - Graph modifications and BB/BDS lookup are Global State (GS).
> - Graph traversal from a BB/BDS pointer is IO.
> - Only one thread executes GS code at a time.
> - IO is quiesced within a drained section.

I think you're mixing two things here and calling both of them IO.

If a function is marked as IO, that means that it is safe to call from
I/O requests, which may be running in any iothread (they currently only
run in the home thread of the node's AioContext, but the function
annotations already promise that any thread is safe).

However, if a function is marked as IO, this doesn't necessarily mean
that it is always only called in the context of an I/O request. It can
be called by any other code, too.

So while drain does quiesce all I/O requests, this doesn't mean that
functions marked as IO won't run any more.

> - Therefore a drained section in GS code suspends graph traversal, other graph
>   modifications, and BB/BDS lookup.
> - Therefore it is safe to modify the graph from a GS drained section.

So unfortunately, I don't think this conclusion is correct.

In order to make your assumption true, we would have to require that all
callers of IO functions must have called bdrv_inc_in_flight(). We would
also have to find a way to enforce this preferable at compile time or
with static analysis, or at least with runtime assertions, because it
would be very easy to get wrong.

> However, I hit on a problem that I think Emanuele and Paolo have already
> pointed out: draining is GS & IO. This might have worked under the 1 IOThread
> model but it does not make sense for multi-queue. It is possible to submit I/O
> requests in drained sections. How can multiple threads be in drained sections
> simultaneously and possibly submit further I/O requests in their drained
> sections? Those sections wouldn't be "drained" in any useful sense of the 
> word.

Drains asks other users not to touch the block node. Currently this only
includes, but the desire to use drain for locking means that we need to
extend it to pretty much any operation on the node, which would include
calling drain on that block node. So you should never have two callers
in a drain section at the same time, it would be a bug in this model.

Of course, we know that currently drain is not respected by all relevant
callers (most importantly, the monitor). If you want to use drain as a
form of locking, you need to solve this first.

> It is necessary to adjust how recursive drained sections work:
> bdrv_drained_begin() must not return while there are deeper nested drained
> sections.
> 
> This is allowed:
> 
>  Monitor commandBlock job
>  ----
>   > bdrv_drained_begin()
>. > bdrv_drained_begin()
>. < bdrv_drained_begin()
>.  .
>.  .
>. > bdrv_drained_end()
>. < bdrv_drained_end()
>   < bdrv_drained_begin()
>.
>.
>   > bdrv_drained_end()
>   < bdrv_drained_end()

Just to make sure I understand the scenario that you have in mind here:
The monitor command is not what causes the block job to do the draining
because this is inside bdrv_drained_begin(), so the block job just
randomly got a callback that caused it to do so (maybe completion)?

Before bdrv_drained_begin() returns, anything is still allowed to run,
so I agree that this is valid.

> This is not allowed:
> 
>  Monitor commandBlock job
>  ----
>   > bdrv_drained_begin()
>. > bdrv_drained_begin()
>. < bdrv_drained_begin()
>.  .
>.  .
>   < bdrv_drained_begin()  .
>.  .
>. > bdrv_drained_end()
>. < bdrv_drained_end()
>   > bdrv_drained_end()
>   < bdrv_drained_end()
> 
> This restriction creates an ordering between bdrv_drained_begin() callers. In
> this example the block job must not depend on the monitor command completing
> first. Otherwise there would be a deadlock just like with two threads wait for
> each other while holding a mutex.

So essentially drain needs to increase bs->in_flight, so that the outer
drain has to wait for the inner drain section to end before its
bdrv_drained_begin() can return.

> For sanity I think draining should be GS-only. IO code should not invoke
> bdrv_drained_begin() to avoid 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-23 Thread Emanuele Giuseppe Esposito



Am 22/05/2022 um 17:06 schrieb Stefan Hajnoczi:
> On Wed, May 18, 2022 at 06:14:17PM +0200, Kevin Wolf wrote:
>> Am 18.05.2022 um 14:43 hat Paolo Bonzini geschrieben:
>>> On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
 For example, all callers of bdrv_open() always take the AioContext lock.
 Often it is taken very high in the call stack, but it's always taken.
>>>
>>> I think it's actually not a problem of who takes the AioContext lock or
>>> where; the requirements are contradictory:
>>>
>>> * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
>>> be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)
>>>
>>> * to call these functions with the lock taken, the code has to run in the
>>> BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
>>> main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
>>> happen without releasing the aiocontext lock)
>>>
>>> * running the code in the BDS's home iothread is not possible for
>>> GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
>>> thread, but that cannot be guaranteed in general)
>>>
 We might suppose that many callbacks are called under drain and in
 GLOBAL_STATE, which should be enough, but from our experimentation in
 the previous series we saw that currently not everything is under drain,
 leaving some operations unprotected (remember assert_graph_writable
 temporarily disabled, since drain coverage for bdrv_replace_child_noperm
 was not 100%?).
 Therefore we need to add more drains. But isn't drain what we decided to
 drop at the beginning? Why isn't drain good?
>>>
>>> To sum up the patch ordering deadlock that we have right now:
>>>
>>> * in some cases, graph manipulations are protected by the AioContext lock
>>>
>>> * eliminating the AioContext lock is needed to move callbacks to coroutine
>>> contexts (see above for the deadlock scenario)
>>>
>>> * moving callbacks to coroutine context is needed by the graph rwlock
>>> implementation
>>>
>>> On one hand, we cannot protect the graph across manipulations with a graph
>>> rwlock without removing the AioContext lock; on the other hand, the
>>> AioContext lock is what _right now_ protects the graph.
>>>
>>> So I'd rather go back to Emanuele's draining approach.  It may not be
>>> beautiful, but it allows progress.  Once that is in place, we can remove the
>>> AioContext lock (which mostly protects virtio-blk/virtio-scsi code right
>>> now) and reevaluate our next steps.
>>
>> If we want to use drain for locking, we need to make sure that drain
>> actually does the job correctly. I see two major problems with it:
>>
>> The first one is that drain only covers I/O paths, but we need to
>> protect against _anything_ touching block nodes. This might mean a
>> massive audit and making sure that everything in QEMU that could
>> possibly touch a block node is integrated with drain.
>>
>> I think Emanuele has argued before that because writes to the graph only
>> happen in the main thread and we believe that currently only I/O
>> requests are processed in iothreads, this is safe and we don't actually
>> need to audit everything.
>>
>> This is true as long as the assumption holds true (how do we ensure that
>> nobody ever introduces non-I/O code touching a block node in an
>> iothread?) and as long as the graph writer never yields or polls. I
>> think the latter condition is violated today, a random example is that
>> adjusting drain counts in bdrv_replace_child_noperm() does poll. Without
>> cooperation from all relevant places, the effectively locked code
>> section ends right there, even if the drained section continues. Even if
>> we can fix this, verifying that the conditions are met everywhere seems
>> not trivial.
>>
>> And that's exactly my second major concern: Even if we manage to
>> correctly implement things with drain, I don't see a way to meaningfully
>> review it. I just don't know how to verify with some confidence that
>> it's actually correct and covering everything that needs to be covered.
>>
>> Drain is not really a lock. But if you use it as one, the best it can
>> provide is advisory locking (callers, inside and outside the block
>> layer, need to explicitly support drain instead of having the lock
>> applied somewhere in the block layer functions). And even if all
>> relevant pieces actually make use of it, it still has an awkward
>> interface for locking:
>>
>> /* Similar to rdlock(), but doesn't wait for writers to finish. It is
>>  * the callers responsibility to make sure that there are no writers. */
>> bdrv_inc_in_flight()
>>
>> /* Similar to wrlock(). Waits for readers to finish. New readers are not
>>  * prevented from starting after it returns. Third parties are politely
>>  * asked not to touch the block node while it is drained. */
>> bdrv_drained_begin()
>>
>> (I think the unlock counterparts actually behave as expected from a real

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-22 Thread Stefan Hajnoczi
On Wed, May 18, 2022 at 06:14:17PM +0200, Kevin Wolf wrote:
> Am 18.05.2022 um 14:43 hat Paolo Bonzini geschrieben:
> > On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
> > > For example, all callers of bdrv_open() always take the AioContext lock.
> > > Often it is taken very high in the call stack, but it's always taken.
> > 
> > I think it's actually not a problem of who takes the AioContext lock or
> > where; the requirements are contradictory:
> > 
> > * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
> > be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)
> > 
> > * to call these functions with the lock taken, the code has to run in the
> > BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
> > main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
> > happen without releasing the aiocontext lock)
> > 
> > * running the code in the BDS's home iothread is not possible for
> > GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
> > thread, but that cannot be guaranteed in general)
> > 
> > > We might suppose that many callbacks are called under drain and in
> > > GLOBAL_STATE, which should be enough, but from our experimentation in
> > > the previous series we saw that currently not everything is under drain,
> > > leaving some operations unprotected (remember assert_graph_writable
> > > temporarily disabled, since drain coverage for bdrv_replace_child_noperm
> > > was not 100%?).
> > > Therefore we need to add more drains. But isn't drain what we decided to
> > > drop at the beginning? Why isn't drain good?
> > 
> > To sum up the patch ordering deadlock that we have right now:
> > 
> > * in some cases, graph manipulations are protected by the AioContext lock
> > 
> > * eliminating the AioContext lock is needed to move callbacks to coroutine
> > contexts (see above for the deadlock scenario)
> > 
> > * moving callbacks to coroutine context is needed by the graph rwlock
> > implementation
> > 
> > On one hand, we cannot protect the graph across manipulations with a graph
> > rwlock without removing the AioContext lock; on the other hand, the
> > AioContext lock is what _right now_ protects the graph.
> > 
> > So I'd rather go back to Emanuele's draining approach.  It may not be
> > beautiful, but it allows progress.  Once that is in place, we can remove the
> > AioContext lock (which mostly protects virtio-blk/virtio-scsi code right
> > now) and reevaluate our next steps.
> 
> If we want to use drain for locking, we need to make sure that drain
> actually does the job correctly. I see two major problems with it:
> 
> The first one is that drain only covers I/O paths, but we need to
> protect against _anything_ touching block nodes. This might mean a
> massive audit and making sure that everything in QEMU that could
> possibly touch a block node is integrated with drain.
> 
> I think Emanuele has argued before that because writes to the graph only
> happen in the main thread and we believe that currently only I/O
> requests are processed in iothreads, this is safe and we don't actually
> need to audit everything.
> 
> This is true as long as the assumption holds true (how do we ensure that
> nobody ever introduces non-I/O code touching a block node in an
> iothread?) and as long as the graph writer never yields or polls. I
> think the latter condition is violated today, a random example is that
> adjusting drain counts in bdrv_replace_child_noperm() does poll. Without
> cooperation from all relevant places, the effectively locked code
> section ends right there, even if the drained section continues. Even if
> we can fix this, verifying that the conditions are met everywhere seems
> not trivial.
> 
> And that's exactly my second major concern: Even if we manage to
> correctly implement things with drain, I don't see a way to meaningfully
> review it. I just don't know how to verify with some confidence that
> it's actually correct and covering everything that needs to be covered.
> 
> Drain is not really a lock. But if you use it as one, the best it can
> provide is advisory locking (callers, inside and outside the block
> layer, need to explicitly support drain instead of having the lock
> applied somewhere in the block layer functions). And even if all
> relevant pieces actually make use of it, it still has an awkward
> interface for locking:
> 
> /* Similar to rdlock(), but doesn't wait for writers to finish. It is
>  * the callers responsibility to make sure that there are no writers. */
> bdrv_inc_in_flight()
> 
> /* Similar to wrlock(). Waits for readers to finish. New readers are not
>  * prevented from starting after it returns. Third parties are politely
>  * asked not to touch the block node while it is drained. */
> bdrv_drained_begin()
> 
> (I think the unlock counterparts actually behave as expected from a real
> lock.)
> 
> Having an actual rdlock() (that waits for writers), and 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-19 Thread Kevin Wolf
Am 19.05.2022 um 13:27 hat Stefan Hajnoczi geschrieben:
> On Wed, May 18, 2022 at 06:14:17PM +0200, Kevin Wolf wrote:
> > If we want to use drain for locking, we need to make sure that drain
> > actually does the job correctly. I see two major problems with it:
> > 
> > The first one is that drain only covers I/O paths, but we need to
> > protect against _anything_ touching block nodes. This might mean a
> > massive audit and making sure that everything in QEMU that could
> > possibly touch a block node is integrated with drain.
> 
> > I think Emanuele has argued before that because writes to the graph only
> > happen in the main thread and we believe that currently only I/O
> > requests are processed in iothreads, this is safe and we don't actually
> > need to audit everything.
> 
> I'm interested in the non-I/O code path cases you're thinking about:
> 
> Block jobs receive callbacks during drain. They are safe.

We've had bugs in the callbacks before, so while they are probably as
safe as they can get when each user has to actively support drain, I'm
a bit less than 100% confident.

> Exports:
> - The nbd export has code to deal with drain and looks safe.
> - vhost-user-blk uses aio_set_fd_handler(is_external=true) for virtqueue
>   kick fds but not for the vhost-user UNIX domain socket (not sure if
>   that's a problem).
> - FUSE uses aio_set_fd_handler(is_external=true) and looks safe.
> 
> The monitor runs with the BQL in the main loop and doesn't use
> coroutines. It should be safe.

The monitor does use coroutines (though I think this doesn't make a
difference; the more important point is that this coroutine runs in
qemu_aio_context while executing commands) and is not safe with respect
to drain and we're seeing bugs coming from it:

https://lists.gnu.org/archive/html/qemu-block/2022-03/msg00582.html

The broader point here is that even running with the BQL in the main
loop doesn't help you at all if the graph writer it could interfere with
polls or is a coroutine that yields. You're only safe if _both_ sides
run with the BQL in the main thread and neither poll nor yield. This is
the condition I explained under which Emanuele's reasoning holds true.

So you can choose between verifying that the monitor actively respects
drain (it doesn't currently) or verifying that no graph writer can poll
or yield during its drain section so that holding the BQL is enough (not
true today and I'm not sure if it could be made true even in theory).

> Anything else?

How to figure this out is precisely my question to you. :-) Maybe
migration completion? Some random BHs? A guest enabling a virtio-blk
device so that the dataplane gets started and its backend is moved to a
different AioContext? I'm not sure if these cases are bad. Maybe they
aren't. But how do I tell without reading every code path?

Well, and the follow-up question: How do we make sure that the list of
"anything else" doesn't change over time when we rely on auditing every
item on it for the correctness of drain?

Kevin


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-19 Thread Stefan Hajnoczi
On Wed, May 18, 2022 at 06:14:17PM +0200, Kevin Wolf wrote:
> If we want to use drain for locking, we need to make sure that drain
> actually does the job correctly. I see two major problems with it:
> 
> The first one is that drain only covers I/O paths, but we need to
> protect against _anything_ touching block nodes. This might mean a
> massive audit and making sure that everything in QEMU that could
> possibly touch a block node is integrated with drain.

> I think Emanuele has argued before that because writes to the graph only
> happen in the main thread and we believe that currently only I/O
> requests are processed in iothreads, this is safe and we don't actually
> need to audit everything.

I'm interested in the non-I/O code path cases you're thinking about:

Block jobs receive callbacks during drain. They are safe.

Exports:
- The nbd export has code to deal with drain and looks safe.
- vhost-user-blk uses aio_set_fd_handler(is_external=true) for virtqueue
  kick fds but not for the vhost-user UNIX domain socket (not sure if
  that's a problem).
- FUSE uses aio_set_fd_handler(is_external=true) and looks safe.

The monitor runs with the BQL in the main loop and doesn't use
coroutines. It should be safe.

Anything else?

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-18 Thread Kevin Wolf
Am 18.05.2022 um 14:43 hat Paolo Bonzini geschrieben:
> On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
> > For example, all callers of bdrv_open() always take the AioContext lock.
> > Often it is taken very high in the call stack, but it's always taken.
> 
> I think it's actually not a problem of who takes the AioContext lock or
> where; the requirements are contradictory:
> 
> * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
> be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)
> 
> * to call these functions with the lock taken, the code has to run in the
> BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
> main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
> happen without releasing the aiocontext lock)
> 
> * running the code in the BDS's home iothread is not possible for
> GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
> thread, but that cannot be guaranteed in general)
> 
> > We might suppose that many callbacks are called under drain and in
> > GLOBAL_STATE, which should be enough, but from our experimentation in
> > the previous series we saw that currently not everything is under drain,
> > leaving some operations unprotected (remember assert_graph_writable
> > temporarily disabled, since drain coverage for bdrv_replace_child_noperm
> > was not 100%?).
> > Therefore we need to add more drains. But isn't drain what we decided to
> > drop at the beginning? Why isn't drain good?
> 
> To sum up the patch ordering deadlock that we have right now:
> 
> * in some cases, graph manipulations are protected by the AioContext lock
> 
> * eliminating the AioContext lock is needed to move callbacks to coroutine
> contexts (see above for the deadlock scenario)
> 
> * moving callbacks to coroutine context is needed by the graph rwlock
> implementation
> 
> On one hand, we cannot protect the graph across manipulations with a graph
> rwlock without removing the AioContext lock; on the other hand, the
> AioContext lock is what _right now_ protects the graph.
> 
> So I'd rather go back to Emanuele's draining approach.  It may not be
> beautiful, but it allows progress.  Once that is in place, we can remove the
> AioContext lock (which mostly protects virtio-blk/virtio-scsi code right
> now) and reevaluate our next steps.

If we want to use drain for locking, we need to make sure that drain
actually does the job correctly. I see two major problems with it:

The first one is that drain only covers I/O paths, but we need to
protect against _anything_ touching block nodes. This might mean a
massive audit and making sure that everything in QEMU that could
possibly touch a block node is integrated with drain.

I think Emanuele has argued before that because writes to the graph only
happen in the main thread and we believe that currently only I/O
requests are processed in iothreads, this is safe and we don't actually
need to audit everything.

This is true as long as the assumption holds true (how do we ensure that
nobody ever introduces non-I/O code touching a block node in an
iothread?) and as long as the graph writer never yields or polls. I
think the latter condition is violated today, a random example is that
adjusting drain counts in bdrv_replace_child_noperm() does poll. Without
cooperation from all relevant places, the effectively locked code
section ends right there, even if the drained section continues. Even if
we can fix this, verifying that the conditions are met everywhere seems
not trivial.

And that's exactly my second major concern: Even if we manage to
correctly implement things with drain, I don't see a way to meaningfully
review it. I just don't know how to verify with some confidence that
it's actually correct and covering everything that needs to be covered.

Drain is not really a lock. But if you use it as one, the best it can
provide is advisory locking (callers, inside and outside the block
layer, need to explicitly support drain instead of having the lock
applied somewhere in the block layer functions). And even if all
relevant pieces actually make use of it, it still has an awkward
interface for locking:

/* Similar to rdlock(), but doesn't wait for writers to finish. It is
 * the callers responsibility to make sure that there are no writers. */
bdrv_inc_in_flight()

/* Similar to wrlock(). Waits for readers to finish. New readers are not
 * prevented from starting after it returns. Third parties are politely
 * asked not to touch the block node while it is drained. */
bdrv_drained_begin()

(I think the unlock counterparts actually behave as expected from a real
lock.)

Having an actual rdlock() (that waits for writers), and using static
analysis to verify that all relevant places use it (so that wrlock()
doesn't rely on politely asking third parties to leave the node alone)
gave me some confidence that we could verify the result.

I'm not sure at all how to achieve the 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-18 Thread Stefan Hajnoczi
On Wed, May 18, 2022 at 02:43:50PM +0200, Paolo Bonzini wrote:
> On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
> > For example, all callers of bdrv_open() always take the AioContext lock.
> > Often it is taken very high in the call stack, but it's always taken.
> 
> I think it's actually not a problem of who takes the AioContext lock or
> where; the requirements are contradictory:
> 
> * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
> be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)
> 
> * to call these functions with the lock taken, the code has to run in the
> BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
> main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
> happen without releasing the aiocontext lock)
>
> 
> * running the code in the BDS's home iothread is not possible for
> GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
> thread, but that cannot be guaranteed in general)
> 
> > We might suppose that many callbacks are called under drain and in
> > GLOBAL_STATE, which should be enough, but from our experimentation in
> > the previous series we saw that currently not everything is under drain,
> > leaving some operations unprotected (remember assert_graph_writable
> > temporarily disabled, since drain coverage for bdrv_replace_child_noperm
> > was not 100%?).
> > Therefore we need to add more drains. But isn't drain what we decided to
> > drop at the beginning? Why isn't drain good?
> 
> To sum up the patch ordering deadlock that we have right now:
> 
> * in some cases, graph manipulations are protected by the AioContext lock
> 
> * eliminating the AioContext lock is needed to move callbacks to coroutine
> contexts (see above for the deadlock scenario)
> 
> * moving callbacks to coroutine context is needed by the graph rwlock
> implementation
> 
> On one hand, we cannot protect the graph across manipulations with a graph
> rwlock without removing the AioContext lock; on the other hand, the
> AioContext lock is what _right now_ protects the graph.
> 
> So I'd rather go back to Emanuele's draining approach.  It may not be
> beautiful, but it allows progress.  Once that is in place, we can remove the
> AioContext lock (which mostly protects virtio-blk/virtio-scsi code right
> now) and reevaluate our next steps.

Me too, I don't think the rwlock was particularly nice either.

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-18 Thread Stefan Hajnoczi
On Wed, May 18, 2022 at 02:28:41PM +0200, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 17/05/2022 um 12:59 schrieb Stefan Hajnoczi:
> > On Wed, May 04, 2022 at 02:39:05PM +0100, Stefan Hajnoczi wrote:
> >> On Tue, Apr 26, 2022 at 04:51:06AM -0400, Emanuele Giuseppe Esposito wrote:
> >>> This is a new attempt to replace the need to take the AioContext lock to
> >>> protect graph modifications. In particular, we aim to remove
> >>> (or better, substitute) the AioContext around bdrv_replace_child_noperm,
> >>> since this function changes BlockDriverState's ->parents and ->children
> >>> lists.
> >>>
> >>> In the previous version, we decided to discard using subtree_drains to
> >>> protect the nodes, for different reasons: for those unfamiliar with it,
> >>> please see 
> >>> https://patchew.org/QEMU/20220301142113.163174-1-eespo...@redhat.com/
> >>
> >> I reread the thread and it's unclear to me why drain is the wrong
> >> mechanism for protecting graph modifications. We theorized a lot but
> >> ultimately is this new mechanism sufficiently different from
> >> bdrv_drained_begin()/end() to make it worth implementing?
> >>
> >> Instead of invoking .drained_begin() callbacks to stop further I/O,
> >> we're now queuing coroutines (without backpressure information that
> >> whoever is spawning I/O needs so they can stop). The writer still waits
> >> for in-flight I/O to finish, including I/O not associated with the bdrv
> >> graph we wish to modify (because rdlock is per-AioContext and unrelated
> >> to a specific graph). Is this really more lightweight than drain?
> >>
> >> If I understand correctly, the original goal was to avoid the need to
> >> hold the AioContext lock across bdrv_replace_child_noperm(). I would
> >> focus on that and use drain for now.
> >>
> >> Maybe I've missed an important point about why the new mechanism is
> >> needed?
> > 
> > Ping?
> 
> label: // read till the end to see why I wrote this here
> 
> I was hoping someone from the "No" party would answer to your question,
> because as you know we reached this same conclusion together.
> 
> We thought about dropping the drain for various reasons, the main one
> (at least as far as I understood) is that we are not sure if something
> can still happen in between drain_begin/end, and it is a little bit
> confusing to use the same mechanism to block I/O and protect the graph.

We had discussions about what could go wrong and there was a feeling
that maybe a separate mechanism is appropriate for graph modifications,
but there was no concrete reason why draining around graph modification
won't work.

If no one has a concrete reason then drain still seems like the most
promising approach to protecting graph modifications. The rwlock patch
wasn't sufficiently different from drain to have significant advantages
in my opinion.

> We then thought about implementing a rwlock. A rdlock would clarify what
> we are protecting and who is using the lock. I had a rwlock draft
> implementation sent in this thread, but this also lead to additional
> problems.
> Main problem was that this new lock would introduce nested event loops,
> that together with such locking would just create deadlocks.
> If readers are in coroutines and writers are not (because graph
> operations are not running in coroutines), we have a lot of deadlocks.
> If a writer has to take the lock, it must wait all other readers to
> finish. And it does it by internally calling AIO_WAIT_WHILE, creating
> nested event loop. We don't know what could execute when polling for
> events, and for example another writer could be resumed.
> Ideally, we want writers in coroutines too.

What is the deadlock? Do the readers depend on the writer somehow?

> Additionally, many readers are running in what we call "mixed"
> functions: usually implemented automatically with generated_co_wrapper
> tag, they let a function (usually bdrv callback) run always in a
> coroutine, creating one if necessary. For example, bdrv_flush() makes
> sure hat bs->bdrv_co_flush() is always run in a coroutine.
> Such mixed functions are used in other callbacks too, making it really
> difficult to understand if we are in a coroutine or not, and mostly
> important make rwlock usage very difficult.
> 
> Which lead us to stepping back once more and try to convert all
> BlockDriverState callbacks in coroutines. This would greatly simplify
> rwlock usage, because we could make the rwlock coroutine-frendly
> (without any AIO_WAIT_WHILE, allowing a writer to wait for readers by
> just yielding and queuing itself in coroutine queues).
> 
> First step was then to convert all callbacks in coroutines, using
> generated_coroutine_wrapper (g_c_w).
> A typical g_c_w is implemented in this way:
>   if (qemu_in_coroutine()) {
>   callback();
>   } else { // much simplified
>   co = qemu_coroutine_create(callback);
>   bdrv_coroutine_enter(bs, co);
>   BDRV_POLL_WHILE(bs, 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-18 Thread Paolo Bonzini

On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:

For example, all callers of bdrv_open() always take the AioContext lock.
Often it is taken very high in the call stack, but it's always taken.


I think it's actually not a problem of who takes the AioContext lock or 
where; the requirements are contradictory:


* IO_OR_GS_CODE() functions, when called from coroutine context, expect 
to be called with the AioContext lock taken (example: 
bdrv_co_yield_to_drain)


* to call these functions with the lock taken, the code has to run in 
the BDS's home iothread.  Attempts to do otherwise results in deadlocks 
(the main loop's AIO_WAIT_WHILEs expect progress from the iothread, that 
cannot happen without releasing the aiocontext lock)


* running the code in the BDS's home iothread is not possible for 
GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main 
thread, but that cannot be guaranteed in general)



We might suppose that many callbacks are called under drain and in
GLOBAL_STATE, which should be enough, but from our experimentation in
the previous series we saw that currently not everything is under drain,
leaving some operations unprotected (remember assert_graph_writable
temporarily disabled, since drain coverage for bdrv_replace_child_noperm
was not 100%?).
Therefore we need to add more drains. But isn't drain what we decided to
drop at the beginning? Why isn't drain good?


To sum up the patch ordering deadlock that we have right now:

* in some cases, graph manipulations are protected by the AioContext lock

* eliminating the AioContext lock is needed to move callbacks to 
coroutine contexts (see above for the deadlock scenario)


* moving callbacks to coroutine context is needed by the graph rwlock 
implementation


On one hand, we cannot protect the graph across manipulations with a 
graph rwlock without removing the AioContext lock; on the other hand, 
the AioContext lock is what _right now_ protects the graph.


So I'd rather go back to Emanuele's draining approach.  It may not be 
beautiful, but it allows progress.  Once that is in place, we can remove 
the AioContext lock (which mostly protects virtio-blk/virtio-scsi code 
right now) and reevaluate our next steps.


Paolo



Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-18 Thread Emanuele Giuseppe Esposito



Am 17/05/2022 um 12:59 schrieb Stefan Hajnoczi:
> On Wed, May 04, 2022 at 02:39:05PM +0100, Stefan Hajnoczi wrote:
>> On Tue, Apr 26, 2022 at 04:51:06AM -0400, Emanuele Giuseppe Esposito wrote:
>>> This is a new attempt to replace the need to take the AioContext lock to
>>> protect graph modifications. In particular, we aim to remove
>>> (or better, substitute) the AioContext around bdrv_replace_child_noperm,
>>> since this function changes BlockDriverState's ->parents and ->children
>>> lists.
>>>
>>> In the previous version, we decided to discard using subtree_drains to
>>> protect the nodes, for different reasons: for those unfamiliar with it,
>>> please see 
>>> https://patchew.org/QEMU/20220301142113.163174-1-eespo...@redhat.com/
>>
>> I reread the thread and it's unclear to me why drain is the wrong
>> mechanism for protecting graph modifications. We theorized a lot but
>> ultimately is this new mechanism sufficiently different from
>> bdrv_drained_begin()/end() to make it worth implementing?
>>
>> Instead of invoking .drained_begin() callbacks to stop further I/O,
>> we're now queuing coroutines (without backpressure information that
>> whoever is spawning I/O needs so they can stop). The writer still waits
>> for in-flight I/O to finish, including I/O not associated with the bdrv
>> graph we wish to modify (because rdlock is per-AioContext and unrelated
>> to a specific graph). Is this really more lightweight than drain?
>>
>> If I understand correctly, the original goal was to avoid the need to
>> hold the AioContext lock across bdrv_replace_child_noperm(). I would
>> focus on that and use drain for now.
>>
>> Maybe I've missed an important point about why the new mechanism is
>> needed?
> 
> Ping?

label: // read till the end to see why I wrote this here

I was hoping someone from the "No" party would answer to your question,
because as you know we reached this same conclusion together.

We thought about dropping the drain for various reasons, the main one
(at least as far as I understood) is that we are not sure if something
can still happen in between drain_begin/end, and it is a little bit
confusing to use the same mechanism to block I/O and protect the graph.

We then thought about implementing a rwlock. A rdlock would clarify what
we are protecting and who is using the lock. I had a rwlock draft
implementation sent in this thread, but this also lead to additional
problems.
Main problem was that this new lock would introduce nested event loops,
that together with such locking would just create deadlocks.
If readers are in coroutines and writers are not (because graph
operations are not running in coroutines), we have a lot of deadlocks.
If a writer has to take the lock, it must wait all other readers to
finish. And it does it by internally calling AIO_WAIT_WHILE, creating
nested event loop. We don't know what could execute when polling for
events, and for example another writer could be resumed.
Ideally, we want writers in coroutines too.

Additionally, many readers are running in what we call "mixed"
functions: usually implemented automatically with generated_co_wrapper
tag, they let a function (usually bdrv callback) run always in a
coroutine, creating one if necessary. For example, bdrv_flush() makes
sure hat bs->bdrv_co_flush() is always run in a coroutine.
Such mixed functions are used in other callbacks too, making it really
difficult to understand if we are in a coroutine or not, and mostly
important make rwlock usage very difficult.

Which lead us to stepping back once more and try to convert all
BlockDriverState callbacks in coroutines. This would greatly simplify
rwlock usage, because we could make the rwlock coroutine-frendly
(without any AIO_WAIT_WHILE, allowing a writer to wait for readers by
just yielding and queuing itself in coroutine queues).

First step was then to convert all callbacks in coroutines, using
generated_coroutine_wrapper (g_c_w).
A typical g_c_w is implemented in this way:
if (qemu_in_coroutine()) {
callback();
} else { // much simplified
co = qemu_coroutine_create(callback);
bdrv_coroutine_enter(bs, co);
BDRV_POLL_WHILE(bs, coroutine_in_progress);
}
Once all callbacks are implemented using g_c_w, we can start splitting
the two sides of the if function to only create a coroutine when we are
outside from a bdrv callback.

However, we immediately found a problem while starting to convert the
first callbacks: the AioContext lock is taken around some non coroutine
callbacks! For example, bs->bdrv_open() is always called with the
AioContext lock taken. In addition, callbacks like bdrv_open are
graph-modifying functions, which is probably why we are taking the
Aiocontext lock, and they do not like to run in coroutines.
Anyways, the real problem comes when we create a coroutine in such
places where the AioContext lock is taken and we have a graph-modifying
function.


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-17 Thread Stefan Hajnoczi
On Wed, May 04, 2022 at 02:39:05PM +0100, Stefan Hajnoczi wrote:
> On Tue, Apr 26, 2022 at 04:51:06AM -0400, Emanuele Giuseppe Esposito wrote:
> > This is a new attempt to replace the need to take the AioContext lock to
> > protect graph modifications. In particular, we aim to remove
> > (or better, substitute) the AioContext around bdrv_replace_child_noperm,
> > since this function changes BlockDriverState's ->parents and ->children
> > lists.
> > 
> > In the previous version, we decided to discard using subtree_drains to
> > protect the nodes, for different reasons: for those unfamiliar with it,
> > please see 
> > https://patchew.org/QEMU/20220301142113.163174-1-eespo...@redhat.com/
> 
> I reread the thread and it's unclear to me why drain is the wrong
> mechanism for protecting graph modifications. We theorized a lot but
> ultimately is this new mechanism sufficiently different from
> bdrv_drained_begin()/end() to make it worth implementing?
> 
> Instead of invoking .drained_begin() callbacks to stop further I/O,
> we're now queuing coroutines (without backpressure information that
> whoever is spawning I/O needs so they can stop). The writer still waits
> for in-flight I/O to finish, including I/O not associated with the bdrv
> graph we wish to modify (because rdlock is per-AioContext and unrelated
> to a specific graph). Is this really more lightweight than drain?
> 
> If I understand correctly, the original goal was to avoid the need to
> hold the AioContext lock across bdrv_replace_child_noperm(). I would
> focus on that and use drain for now.
> 
> Maybe I've missed an important point about why the new mechanism is
> needed?

Ping?

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-04 Thread Stefan Hajnoczi
On Tue, Apr 26, 2022 at 04:51:06AM -0400, Emanuele Giuseppe Esposito wrote:
> This is a new attempt to replace the need to take the AioContext lock to
> protect graph modifications. In particular, we aim to remove
> (or better, substitute) the AioContext around bdrv_replace_child_noperm,
> since this function changes BlockDriverState's ->parents and ->children
> lists.
> 
> In the previous version, we decided to discard using subtree_drains to
> protect the nodes, for different reasons: for those unfamiliar with it,
> please see 
> https://patchew.org/QEMU/20220301142113.163174-1-eespo...@redhat.com/

I reread the thread and it's unclear to me why drain is the wrong
mechanism for protecting graph modifications. We theorized a lot but
ultimately is this new mechanism sufficiently different from
bdrv_drained_begin()/end() to make it worth implementing?

Instead of invoking .drained_begin() callbacks to stop further I/O,
we're now queuing coroutines (without backpressure information that
whoever is spawning I/O needs so they can stop). The writer still waits
for in-flight I/O to finish, including I/O not associated with the bdrv
graph we wish to modify (because rdlock is per-AioContext and unrelated
to a specific graph). Is this really more lightweight than drain?

If I understand correctly, the original goal was to avoid the need to
hold the AioContext lock across bdrv_replace_child_noperm(). I would
focus on that and use drain for now.

Maybe I've missed an important point about why the new mechanism is
needed?

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-03 Thread Stefan Hajnoczi
On Mon, May 02, 2022 at 10:02:14AM +0200, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 30/04/2022 um 07:17 schrieb Stefan Hajnoczi:
> > On Thu, Apr 28, 2022 at 11:56:09PM +0200, Emanuele Giuseppe Esposito wrote:
> >>
> >>
> >> Am 28/04/2022 um 12:45 schrieb Stefan Hajnoczi:
> >>> On Wed, Apr 27, 2022 at 08:55:35AM +0200, Emanuele Giuseppe Esposito 
> >>> wrote:
> 
> 
>  Am 26/04/2022 um 10:51 schrieb Emanuele Giuseppe Esposito:
> > Luckly, most of the cases where we recursively go through a graph are
> > the BlockDriverState callback functions in block_int-common.h
> > In order to understand what to protect, I categorized the callbacks in
> > block_int-common.h depending on the type of function that calls them:
> >
> > 1) If the caller is a generated_co_wrapper, this function must be
> >protected by rdlock. The reason is that generated_co_wrapper create
> >coroutines that run in the given bs AioContext, so it doesn't matter
> >if we are running in the main loop or not, the coroutine might run
> >in an iothread.
> > 2) If the caller calls it directly, and has the GLOBAL_STATE_CODE() 
> > macro,
> >then the function is safe. The main loop is the writer and thus won't
> >read and write at the same time.
> > 3) If the caller calls it directly, but has not the GLOBAL_STATE_CODE()
> >macro, then we need to check the callers and see case-by-case if the
> >caller is in the main loop, if it needs to take the lock, or delegate
> >this duty to its caller (to reduce the places where to take it).
> >
> > I used the vrc script (https://github.com/bonzini/vrc) to get help 
> > finding
> > all the callers of a callback. Using its filter function, I can
> > omit all functions protected by the added lock to avoid having 
> > duplicates
> > when querying for new callbacks.
> 
>  I was wondering, if a function is in category (3) and runs in an
>  Iothread but the function itself is not (currently) recursive, meaning
>  it doesn't really traverse the graph or calls someone that traverses it,
>  should I add the rdlock anyways or not?
> 
>  Example: bdrv_co_drain_end
> 
>  Pros:
> + Covers if in future a new recursive callback for a new/existing
>   BlockDriver is implemented.
> + Covers also the case where I or someone missed the recursive part.
> 
>  Cons:
> - Potentially introducing an unnecessary critical section.
> 
>  What do you think?
> >>>
> >>> ->bdrv_co_drain_end() is a callback function. Do you mean whether its
> >>> caller, bdrv_drain_invoke_entry(), should take the rdlock around
> >>> ->bdrv_co_drain_end()?
> >>
> >> Yes. The problem is that the coroutine is created in bs AioContext, so
> >> it might be in an iothread.
> >>
> >>>
> >>> Going up further in the call chain (and maybe switching threads),
> >>> bdrv_do_drained_end() has QLIST_FOREACH(child, >children, next) so
> >>> it needs protection. If the caller of bdrv_do_drained_end() holds then
> >>> rdlock then I think none of the child functions (including
> >>> ->bdrv_co_drain_end()) need to take it explicitly.
> >>
> >> Regarding bdrv_do_drained_end and similar, they are either running in
> >> the main loop (or they will be, if coming from a coroutine) or in the
> >> iothread running the AioContext of the bs involved.
> >>
> >> I think that most of the drains except for mirror.c are coming from main
> >> loop. I protected mirror.c in patch 8, even though right now I am not
> >> really sure that what I did is necessary, since the bh will be scheduled
> >> in the main loop.
> >>
> >> Therefore we don't really need locks around drains.
> > 
> > Are you saying rdlock isn't necessary in the main loop because nothing
> > can take the wrlock while our code is executing in the main loop?
> 
> Yes, that's the idea.
> If I am not mistaken (and I hope I am not), only the main loop currently
> modifies/is allowed to modify the graph.
> 
> The only case where currently we need to take the rdlock in main loop is
> when we have the case
> 
> simplified_flush_callback(bs) {
>   for (child in bs)
>   bdrv_flush(child->bs);
> }
> 
> some_function() {
>   GLOBAL_STATE_CODE();
>   /* assume bdrv_get_aio_context(bs) != qemu_in_main_thread() */
> 
>   bdrv_flush(bs);
>   co = coroutine_create(bdrv_get_aio_context(bs))
>   qemu_coroutine_enter(co, simplified_flush_callback)
> }

Why does the main loop need to take rdlock in this case? Only the
coroutine that runs in the non-main loop AioContext iterates over
children.

Also, there are a bunch of other functions in build/block/block-gen.c
that seem to follow the same pattern, so I'm not sure why bdrv_flush()
would be the only case?

In general I'm having a hard time following this patch series and the
discussion because you aren't giving a chain of 

Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-03 Thread Kevin Wolf
Am 02.05.2022 um 10:02 hat Emanuele Giuseppe Esposito geschrieben:
> 
> 
> Am 30/04/2022 um 07:17 schrieb Stefan Hajnoczi:
> > On Thu, Apr 28, 2022 at 11:56:09PM +0200, Emanuele Giuseppe Esposito wrote:
> >>
> >>
> >> Am 28/04/2022 um 12:45 schrieb Stefan Hajnoczi:
> >>> On Wed, Apr 27, 2022 at 08:55:35AM +0200, Emanuele Giuseppe Esposito 
> >>> wrote:
> 
> 
>  Am 26/04/2022 um 10:51 schrieb Emanuele Giuseppe Esposito:
> > Luckly, most of the cases where we recursively go through a graph are
> > the BlockDriverState callback functions in block_int-common.h
> > In order to understand what to protect, I categorized the callbacks in
> > block_int-common.h depending on the type of function that calls them:
> >
> > 1) If the caller is a generated_co_wrapper, this function must be
> >protected by rdlock. The reason is that generated_co_wrapper create
> >coroutines that run in the given bs AioContext, so it doesn't matter
> >if we are running in the main loop or not, the coroutine might run
> >in an iothread.
> > 2) If the caller calls it directly, and has the GLOBAL_STATE_CODE() 
> > macro,
> >then the function is safe. The main loop is the writer and thus won't
> >read and write at the same time.
> > 3) If the caller calls it directly, but has not the GLOBAL_STATE_CODE()
> >macro, then we need to check the callers and see case-by-case if the
> >caller is in the main loop, if it needs to take the lock, or delegate
> >this duty to its caller (to reduce the places where to take it).
> >
> > I used the vrc script (https://github.com/bonzini/vrc) to get help 
> > finding
> > all the callers of a callback. Using its filter function, I can
> > omit all functions protected by the added lock to avoid having 
> > duplicates
> > when querying for new callbacks.
> 
>  I was wondering, if a function is in category (3) and runs in an
>  Iothread but the function itself is not (currently) recursive, meaning
>  it doesn't really traverse the graph or calls someone that traverses it,
>  should I add the rdlock anyways or not?
> 
>  Example: bdrv_co_drain_end
> 
>  Pros:
> + Covers if in future a new recursive callback for a new/existing
>   BlockDriver is implemented.
> + Covers also the case where I or someone missed the recursive part.
> 
>  Cons:
> - Potentially introducing an unnecessary critical section.
> 
>  What do you think?
> >>>
> >>> ->bdrv_co_drain_end() is a callback function. Do you mean whether its
> >>> caller, bdrv_drain_invoke_entry(), should take the rdlock around
> >>> ->bdrv_co_drain_end()?
> >>
> >> Yes. The problem is that the coroutine is created in bs AioContext, so
> >> it might be in an iothread.
> >>
> >>>
> >>> Going up further in the call chain (and maybe switching threads),
> >>> bdrv_do_drained_end() has QLIST_FOREACH(child, >children, next) so
> >>> it needs protection. If the caller of bdrv_do_drained_end() holds then
> >>> rdlock then I think none of the child functions (including
> >>> ->bdrv_co_drain_end()) need to take it explicitly.
> >>
> >> Regarding bdrv_do_drained_end and similar, they are either running in
> >> the main loop (or they will be, if coming from a coroutine) or in the
> >> iothread running the AioContext of the bs involved.
> >>
> >> I think that most of the drains except for mirror.c are coming from main
> >> loop. I protected mirror.c in patch 8, even though right now I am not
> >> really sure that what I did is necessary, since the bh will be scheduled
> >> in the main loop.
> >>
> >> Therefore we don't really need locks around drains.
> > 
> > Are you saying rdlock isn't necessary in the main loop because nothing
> > can take the wrlock while our code is executing in the main loop?
> 
> Yes, that's the idea.
> If I am not mistaken (and I hope I am not), only the main loop currently
> modifies/is allowed to modify the graph.

Aren't you completely ignoring coroutines in this? What would a
coroutine do that requires the graph not to change across a yield?

(It's not easily possible to protect this today, and I think this was
the source of some bugs in the past. But if we introduce some proper
locking, I would expect it to solve this.)

Kevin




Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-02 Thread Paolo Bonzini

On 5/2/22 10:02, Emanuele Giuseppe Esposito wrote:

Are you saying rdlock isn't necessary in the main loop because nothing
can take the wrlock while our code is executing in the main loop?

Yes, that's the idea.
If I am not mistaken (and I hope I am not), only the main loop currently
modifies/is allowed to modify the graph.

The only case where currently we need to take the rdlock in main loop is
when we have the case

simplified_flush_callback(bs) {
for (child in bs)
bdrv_flush(child->bs);
}

some_function() {
GLOBAL_STATE_CODE();
/* assume bdrv_get_aio_context(bs) != qemu_in_main_thread() */

bdrv_flush(bs);
co = coroutine_create(bdrv_get_aio_context(bs))
qemu_coroutine_enter(co, simplified_flush_callback)
}


This is correct, but it is very unsafe as long as bdrv_flush(bs) is 
allowed to run both in coroutine context and outside.  So we go circling 
back to the same issue that was there in the stackless coroutine 
experiment, i.e. functions that can run both in coroutine context and 
outside.


This issue is fundamentally one of unclear invariants, and reminds me a 
lot of the problems with recursive mutexes.


Paolo




Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-05-02 Thread Emanuele Giuseppe Esposito



Am 30/04/2022 um 07:17 schrieb Stefan Hajnoczi:
> On Thu, Apr 28, 2022 at 11:56:09PM +0200, Emanuele Giuseppe Esposito wrote:
>>
>>
>> Am 28/04/2022 um 12:45 schrieb Stefan Hajnoczi:
>>> On Wed, Apr 27, 2022 at 08:55:35AM +0200, Emanuele Giuseppe Esposito wrote:


 Am 26/04/2022 um 10:51 schrieb Emanuele Giuseppe Esposito:
> Luckly, most of the cases where we recursively go through a graph are
> the BlockDriverState callback functions in block_int-common.h
> In order to understand what to protect, I categorized the callbacks in
> block_int-common.h depending on the type of function that calls them:
>
> 1) If the caller is a generated_co_wrapper, this function must be
>protected by rdlock. The reason is that generated_co_wrapper create
>coroutines that run in the given bs AioContext, so it doesn't matter
>if we are running in the main loop or not, the coroutine might run
>in an iothread.
> 2) If the caller calls it directly, and has the GLOBAL_STATE_CODE() macro,
>then the function is safe. The main loop is the writer and thus won't
>read and write at the same time.
> 3) If the caller calls it directly, but has not the GLOBAL_STATE_CODE()
>macro, then we need to check the callers and see case-by-case if the
>caller is in the main loop, if it needs to take the lock, or delegate
>this duty to its caller (to reduce the places where to take it).
>
> I used the vrc script (https://github.com/bonzini/vrc) to get help finding
> all the callers of a callback. Using its filter function, I can
> omit all functions protected by the added lock to avoid having duplicates
> when querying for new callbacks.

 I was wondering, if a function is in category (3) and runs in an
 Iothread but the function itself is not (currently) recursive, meaning
 it doesn't really traverse the graph or calls someone that traverses it,
 should I add the rdlock anyways or not?

 Example: bdrv_co_drain_end

 Pros:
+ Covers if in future a new recursive callback for a new/existing
  BlockDriver is implemented.
+ Covers also the case where I or someone missed the recursive part.

 Cons:
- Potentially introducing an unnecessary critical section.

 What do you think?
>>>
>>> ->bdrv_co_drain_end() is a callback function. Do you mean whether its
>>> caller, bdrv_drain_invoke_entry(), should take the rdlock around
>>> ->bdrv_co_drain_end()?
>>
>> Yes. The problem is that the coroutine is created in bs AioContext, so
>> it might be in an iothread.
>>
>>>
>>> Going up further in the call chain (and maybe switching threads),
>>> bdrv_do_drained_end() has QLIST_FOREACH(child, >children, next) so
>>> it needs protection. If the caller of bdrv_do_drained_end() holds then
>>> rdlock then I think none of the child functions (including
>>> ->bdrv_co_drain_end()) need to take it explicitly.
>>
>> Regarding bdrv_do_drained_end and similar, they are either running in
>> the main loop (or they will be, if coming from a coroutine) or in the
>> iothread running the AioContext of the bs involved.
>>
>> I think that most of the drains except for mirror.c are coming from main
>> loop. I protected mirror.c in patch 8, even though right now I am not
>> really sure that what I did is necessary, since the bh will be scheduled
>> in the main loop.
>>
>> Therefore we don't really need locks around drains.
> 
> Are you saying rdlock isn't necessary in the main loop because nothing
> can take the wrlock while our code is executing in the main loop?

Yes, that's the idea.
If I am not mistaken (and I hope I am not), only the main loop currently
modifies/is allowed to modify the graph.

The only case where currently we need to take the rdlock in main loop is
when we have the case

simplified_flush_callback(bs) {
for (child in bs)
bdrv_flush(child->bs);
}

some_function() {
GLOBAL_STATE_CODE();
/* assume bdrv_get_aio_context(bs) != qemu_in_main_thread() */

bdrv_flush(bs);
co = coroutine_create(bdrv_get_aio_context(bs))
qemu_coroutine_enter(co, simplified_flush_callback)
}
> 
> Stefan
> 




Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-04-29 Thread Stefan Hajnoczi
On Thu, Apr 28, 2022 at 11:56:09PM +0200, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 28/04/2022 um 12:45 schrieb Stefan Hajnoczi:
> > On Wed, Apr 27, 2022 at 08:55:35AM +0200, Emanuele Giuseppe Esposito wrote:
> >>
> >>
> >> Am 26/04/2022 um 10:51 schrieb Emanuele Giuseppe Esposito:
> >>> Luckly, most of the cases where we recursively go through a graph are
> >>> the BlockDriverState callback functions in block_int-common.h
> >>> In order to understand what to protect, I categorized the callbacks in
> >>> block_int-common.h depending on the type of function that calls them:
> >>>
> >>> 1) If the caller is a generated_co_wrapper, this function must be
> >>>protected by rdlock. The reason is that generated_co_wrapper create
> >>>coroutines that run in the given bs AioContext, so it doesn't matter
> >>>if we are running in the main loop or not, the coroutine might run
> >>>in an iothread.
> >>> 2) If the caller calls it directly, and has the GLOBAL_STATE_CODE() macro,
> >>>then the function is safe. The main loop is the writer and thus won't
> >>>read and write at the same time.
> >>> 3) If the caller calls it directly, but has not the GLOBAL_STATE_CODE()
> >>>macro, then we need to check the callers and see case-by-case if the
> >>>caller is in the main loop, if it needs to take the lock, or delegate
> >>>this duty to its caller (to reduce the places where to take it).
> >>>
> >>> I used the vrc script (https://github.com/bonzini/vrc) to get help finding
> >>> all the callers of a callback. Using its filter function, I can
> >>> omit all functions protected by the added lock to avoid having duplicates
> >>> when querying for new callbacks.
> >>
> >> I was wondering, if a function is in category (3) and runs in an
> >> Iothread but the function itself is not (currently) recursive, meaning
> >> it doesn't really traverse the graph or calls someone that traverses it,
> >> should I add the rdlock anyways or not?
> >>
> >> Example: bdrv_co_drain_end
> >>
> >> Pros:
> >>+ Covers if in future a new recursive callback for a new/existing
> >>  BlockDriver is implemented.
> >>+ Covers also the case where I or someone missed the recursive part.
> >>
> >> Cons:
> >>- Potentially introducing an unnecessary critical section.
> >>
> >> What do you think?
> > 
> > ->bdrv_co_drain_end() is a callback function. Do you mean whether its
> > caller, bdrv_drain_invoke_entry(), should take the rdlock around
> > ->bdrv_co_drain_end()?
> 
> Yes. The problem is that the coroutine is created in bs AioContext, so
> it might be in an iothread.
> 
> > 
> > Going up further in the call chain (and maybe switching threads),
> > bdrv_do_drained_end() has QLIST_FOREACH(child, >children, next) so
> > it needs protection. If the caller of bdrv_do_drained_end() holds then
> > rdlock then I think none of the child functions (including
> > ->bdrv_co_drain_end()) need to take it explicitly.
> 
> Regarding bdrv_do_drained_end and similar, they are either running in
> the main loop (or they will be, if coming from a coroutine) or in the
> iothread running the AioContext of the bs involved.
> 
> I think that most of the drains except for mirror.c are coming from main
> loop. I protected mirror.c in patch 8, even though right now I am not
> really sure that what I did is necessary, since the bh will be scheduled
> in the main loop.
> 
> Therefore we don't really need locks around drains.

Are you saying rdlock isn't necessary in the main loop because nothing
can take the wrlock while our code is executing in the main loop?

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-04-29 Thread Emanuele Giuseppe Esposito



Am 28/04/2022 um 12:34 schrieb Stefan Hajnoczi:
> On Tue, Apr 26, 2022 at 04:51:06AM -0400, Emanuele Giuseppe Esposito wrote:
>> Next step is the most complex one: figure where to put the rdlocks.
> 
> I got a little lost at this step but will hopefully understand more when
> reading the patches. Earlier it said there is a counter for readers, so
> it seems trivial to make it a recursive lock. I don't understand why you
> want to avoid a recursive locking - the complications you described stem
> from the fact that you want to avoid recursive locks?

I'll let Paolo to further elaborate on this. I think it would be cleaner
if we try at least not to have recursive locks.

> 
> Secondly, can you describe the role of the lock? The text mentions the
> old AioContext lock that is being replaced but doesn't define this new
> lock in its own right. Is the idea to make ->parents and ->children list
> accesses thread-safe and all loops "safe" so they don't break when a BDS
> is removed?

Yes. Replace what AioContext lock is doing right now.

Emanuele
> 
> Stefan
> 




Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-04-28 Thread Emanuele Giuseppe Esposito



Am 28/04/2022 um 12:45 schrieb Stefan Hajnoczi:
> On Wed, Apr 27, 2022 at 08:55:35AM +0200, Emanuele Giuseppe Esposito wrote:
>>
>>
>> Am 26/04/2022 um 10:51 schrieb Emanuele Giuseppe Esposito:
>>> Luckly, most of the cases where we recursively go through a graph are
>>> the BlockDriverState callback functions in block_int-common.h
>>> In order to understand what to protect, I categorized the callbacks in
>>> block_int-common.h depending on the type of function that calls them:
>>>
>>> 1) If the caller is a generated_co_wrapper, this function must be
>>>protected by rdlock. The reason is that generated_co_wrapper create
>>>coroutines that run in the given bs AioContext, so it doesn't matter
>>>if we are running in the main loop or not, the coroutine might run
>>>in an iothread.
>>> 2) If the caller calls it directly, and has the GLOBAL_STATE_CODE() macro,
>>>then the function is safe. The main loop is the writer and thus won't
>>>read and write at the same time.
>>> 3) If the caller calls it directly, but has not the GLOBAL_STATE_CODE()
>>>macro, then we need to check the callers and see case-by-case if the
>>>caller is in the main loop, if it needs to take the lock, or delegate
>>>this duty to its caller (to reduce the places where to take it).
>>>
>>> I used the vrc script (https://github.com/bonzini/vrc) to get help finding
>>> all the callers of a callback. Using its filter function, I can
>>> omit all functions protected by the added lock to avoid having duplicates
>>> when querying for new callbacks.
>>
>> I was wondering, if a function is in category (3) and runs in an
>> Iothread but the function itself is not (currently) recursive, meaning
>> it doesn't really traverse the graph or calls someone that traverses it,
>> should I add the rdlock anyways or not?
>>
>> Example: bdrv_co_drain_end
>>
>> Pros:
>>+ Covers if in future a new recursive callback for a new/existing
>>  BlockDriver is implemented.
>>+ Covers also the case where I or someone missed the recursive part.
>>
>> Cons:
>>- Potentially introducing an unnecessary critical section.
>>
>> What do you think?
> 
> ->bdrv_co_drain_end() is a callback function. Do you mean whether its
> caller, bdrv_drain_invoke_entry(), should take the rdlock around
> ->bdrv_co_drain_end()?

Yes. The problem is that the coroutine is created in bs AioContext, so
it might be in an iothread.

> 
> Going up further in the call chain (and maybe switching threads),
> bdrv_do_drained_end() has QLIST_FOREACH(child, >children, next) so
> it needs protection. If the caller of bdrv_do_drained_end() holds then
> rdlock then I think none of the child functions (including
> ->bdrv_co_drain_end()) need to take it explicitly.

Regarding bdrv_do_drained_end and similar, they are either running in
the main loop (or they will be, if coming from a coroutine) or in the
iothread running the AioContext of the bs involved.

I think that most of the drains except for mirror.c are coming from main
loop. I protected mirror.c in patch 8, even though right now I am not
really sure that what I did is necessary, since the bh will be scheduled
in the main loop.

Therefore we don't really need locks around drains.

Emanuele
> 
> Stefan
> 




Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-04-28 Thread Stefan Hajnoczi
On Wed, Apr 27, 2022 at 08:55:35AM +0200, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 26/04/2022 um 10:51 schrieb Emanuele Giuseppe Esposito:
> > Luckly, most of the cases where we recursively go through a graph are
> > the BlockDriverState callback functions in block_int-common.h
> > In order to understand what to protect, I categorized the callbacks in
> > block_int-common.h depending on the type of function that calls them:
> > 
> > 1) If the caller is a generated_co_wrapper, this function must be
> >protected by rdlock. The reason is that generated_co_wrapper create
> >coroutines that run in the given bs AioContext, so it doesn't matter
> >if we are running in the main loop or not, the coroutine might run
> >in an iothread.
> > 2) If the caller calls it directly, and has the GLOBAL_STATE_CODE() macro,
> >then the function is safe. The main loop is the writer and thus won't
> >read and write at the same time.
> > 3) If the caller calls it directly, but has not the GLOBAL_STATE_CODE()
> >macro, then we need to check the callers and see case-by-case if the
> >caller is in the main loop, if it needs to take the lock, or delegate
> >this duty to its caller (to reduce the places where to take it).
> > 
> > I used the vrc script (https://github.com/bonzini/vrc) to get help finding
> > all the callers of a callback. Using its filter function, I can
> > omit all functions protected by the added lock to avoid having duplicates
> > when querying for new callbacks.
> 
> I was wondering, if a function is in category (3) and runs in an
> Iothread but the function itself is not (currently) recursive, meaning
> it doesn't really traverse the graph or calls someone that traverses it,
> should I add the rdlock anyways or not?
> 
> Example: bdrv_co_drain_end
> 
> Pros:
>+ Covers if in future a new recursive callback for a new/existing
>  BlockDriver is implemented.
>+ Covers also the case where I or someone missed the recursive part.
> 
> Cons:
>- Potentially introducing an unnecessary critical section.
> 
> What do you think?

->bdrv_co_drain_end() is a callback function. Do you mean whether its
caller, bdrv_drain_invoke_entry(), should take the rdlock around
->bdrv_co_drain_end()?

Going up further in the call chain (and maybe switching threads),
bdrv_do_drained_end() has QLIST_FOREACH(child, >children, next) so
it needs protection. If the caller of bdrv_do_drained_end() holds then
rdlock then I think none of the child functions (including
->bdrv_co_drain_end()) need to take it explicitly.

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-04-28 Thread Stefan Hajnoczi
On Tue, Apr 26, 2022 at 04:51:06AM -0400, Emanuele Giuseppe Esposito wrote:
> Next step is the most complex one: figure where to put the rdlocks.

I got a little lost at this step but will hopefully understand more when
reading the patches. Earlier it said there is a counter for readers, so
it seems trivial to make it a recursive lock. I don't understand why you
want to avoid a recursive locking - the complications you described stem
from the fact that you want to avoid recursive locks?

Secondly, can you describe the role of the lock? The text mentions the
old AioContext lock that is being replaced but doesn't define this new
lock in its own right. Is the idea to make ->parents and ->children list
accesses thread-safe and all loops "safe" so they don't break when a BDS
is removed?

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-04-27 Thread Emanuele Giuseppe Esposito



Am 26/04/2022 um 10:51 schrieb Emanuele Giuseppe Esposito:
> Luckly, most of the cases where we recursively go through a graph are
> the BlockDriverState callback functions in block_int-common.h
> In order to understand what to protect, I categorized the callbacks in
> block_int-common.h depending on the type of function that calls them:
> 
> 1) If the caller is a generated_co_wrapper, this function must be
>protected by rdlock. The reason is that generated_co_wrapper create
>coroutines that run in the given bs AioContext, so it doesn't matter
>if we are running in the main loop or not, the coroutine might run
>in an iothread.
> 2) If the caller calls it directly, and has the GLOBAL_STATE_CODE() macro,
>then the function is safe. The main loop is the writer and thus won't
>read and write at the same time.
> 3) If the caller calls it directly, but has not the GLOBAL_STATE_CODE()
>macro, then we need to check the callers and see case-by-case if the
>caller is in the main loop, if it needs to take the lock, or delegate
>this duty to its caller (to reduce the places where to take it).
> 
> I used the vrc script (https://github.com/bonzini/vrc) to get help finding
> all the callers of a callback. Using its filter function, I can
> omit all functions protected by the added lock to avoid having duplicates
> when querying for new callbacks.

I was wondering, if a function is in category (3) and runs in an
Iothread but the function itself is not (currently) recursive, meaning
it doesn't really traverse the graph or calls someone that traverses it,
should I add the rdlock anyways or not?

Example: bdrv_co_drain_end

Pros:
   + Covers if in future a new recursive callback for a new/existing
 BlockDriver is implemented.
   + Covers also the case where I or someone missed the recursive part.

Cons:
   - Potentially introducing an unnecessary critical section.

What do you think?

Emanuele




[RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock

2022-04-26 Thread Emanuele Giuseppe Esposito
This is a new attempt to replace the need to take the AioContext lock to
protect graph modifications. In particular, we aim to remove
(or better, substitute) the AioContext around bdrv_replace_child_noperm,
since this function changes BlockDriverState's ->parents and ->children
lists.

In the previous version, we decided to discard using subtree_drains to
protect the nodes, for different reasons: for those unfamiliar with it,
please see https://patchew.org/QEMU/20220301142113.163174-1-eespo...@redhat.com/

In this new attempt, we introduce a custom rwlock implementation.
This is inspired from cpus-common.c implementation, and aims to
reduce cacheline bouncing by having per-aiocontext counter of readers.
All details and implementation of the lock are in patch 3.

We distinguish between writer (main loop, under BQL) that modifies the
graph, and readers (all other coroutines running in various AioContext),
that go through the graph edges, reading ->parents and->children.
The writer (main loop)  has an "exclusive" access, so it first waits for
current read to finish, and then prevents incoming ones from
entering while it has the exclusive access.
The readers (coroutines in multiple AioContext) are free to
access the graph as long the writer is not modifying the graph.
In case it is, they go in a CoQueue and sleep until the writer
is done.

Patch 1 and 2 are in preparation for the lock, and fix bugs or missing
cases that popped out while implementing this lock.
Patch 3-7 implement and use the graph and assertions.
Note that unfortunately assert_graph_readable is not very efficient,
because it iterates through the list of all AioContexts to get the total number 
of readers.
For now we will use it only for debugging purposes and to be sure all
places are properly protected, but eventually we might want to disable it.

Patch 8 is an example of usage: as you can see, it is not essential that
the right coroutine takes the rdlock, but just that it is taken
somewhere. Otherwise the writer is not aware of any readers, and can write
while others are reading.

Next step is the most complex one: figure where to put the rdlocks.
While wrlock is easy, rdlock should ideally be:
- not recursive, otherwise it is not very intuitive
- not everywhere, prefereably only on the top caller of recursion trees
- still manage to protect ->parents and ->children reads.

Luckly, most of the cases where we recursively go through a graph are
the BlockDriverState callback functions in block_int-common.h
In order to understand what to protect, I categorized the callbacks in
block_int-common.h depending on the type of function that calls them:

1) If the caller is a generated_co_wrapper, this function must be
   protected by rdlock. The reason is that generated_co_wrapper create
   coroutines that run in the given bs AioContext, so it doesn't matter
   if we are running in the main loop or not, the coroutine might run
   in an iothread.
2) If the caller calls it directly, and has the GLOBAL_STATE_CODE() macro,
   then the function is safe. The main loop is the writer and thus won't
   read and write at the same time.
3) If the caller calls it directly, but has not the GLOBAL_STATE_CODE()
   macro, then we need to check the callers and see case-by-case if the
   caller is in the main loop, if it needs to take the lock, or delegate
   this duty to its caller (to reduce the places where to take it).

I used the vrc script (https://github.com/bonzini/vrc) to get help finding
all the callers of a callback. Using its filter function, I can
omit all functions protected by the added lock to avoid having duplicates
when querying for new callbacks.


Emanuele Giuseppe Esposito (8):
  aio_wait_kick: add missing memory barrier
  coroutine-lock: release lock when restarting all coroutines
  block: introduce a lock to protect graph operations
  async: register/unregister aiocontext in graph lock list
  block.c: wrlock in bdrv_replace_child_noperm
  block: assert that graph read and writes are performed correctly
  graph-lock: implement WITH_GRAPH_RDLOCK_GUARD and GRAPH_RDLOCK_GUARD
macros
  mirror: protect drains in coroutine with rdlock

 block.c|  39 -
 block/block-backend.c  |   6 +-
 block/graph-lock.c | 227 +
 block/meson.build  |   1 +
 block/mirror.c |   6 +
 include/block/aio.h|   9 +
 include/block/block_int-common.h   |   8 +-
 include/block/block_int-global-state.h |  17 --
 include/block/block_int.h  |   1 +
 include/block/graph-lock.h | 101 +++
 include/qemu/coroutine.h   |  10 ++
 util/aio-wait.c|   3 +-
 util/async.c   |   4 +
 util/meson.build   |   1 +
 util/qemu-coroutine-lock.c |  26 ++-
 15 files changed, 413 insertions(+), 46 deletions(-)
 create