On Mon, Apr 17, 2017 at 04:27:19PM +0800, Fam Zheng wrote: > On Fri, 04/14 09:51, Stefan Hajnoczi wrote: > > On Fri, Apr 14, 2017 at 9:02 AM, Fam Zheng <[email protected]> wrote: > > > @@ -398,11 +399,15 @@ void bdrv_drain_all(void); > > > */ \ > > > assert(!bs_->wakeup); \ > > > bs_->wakeup = true; \ > > > - while ((cond)) { \ > > > - aio_context_release(ctx_); \ > > > - aio_poll(qemu_get_aio_context(), true); \ > > > - aio_context_acquire(ctx_); \ > > > - waited_ = true; \ > > > + while (busy_) { \ > > > + if ((cond)) { \ > > > + waited_ = busy_ = true; \ > > > + aio_context_release(ctx_); \ > > > + aio_poll(qemu_get_aio_context(), true); \ > > > + aio_context_acquire(ctx_); \ > > > + } else { \ > > > + busy_ = aio_poll(ctx_, false); \ > > > + } \ > > > > Wait, I'm confused. The current thread is not in the BDS AioContext. > > We're not allowed to call aio_poll(ctx_, false). > > > > Did you mean aio_poll(qemu_get_aio_context(), false) in order to > > process defer to main loop BHs? > > At this point it's even unclear to me what should be the plan for 2.9. v1 IMO > was the least intrusive, but didn't cover bdrv_drain_all_begin. v2 has this > controversial "aio_poll(ctx_, false)", however its alternative, > "aio_poll(qemu_get_aio_context(), false)", "introduces" another crash that is > not seen otherwise. > > What should we do now? > > Fam >
I think the priority is fixing the regression, which v1 does. Is there a regression lurking in the bdrv_drain_all() path? I've not been able to find one. If there is no regressive path through bdrv_drain_all(), my vote would be the simplest, least intrusive patch that fixes the current regression, and I think that is v1. Then we can fix everything correctly, and more comprehensively, for 2.10. -Jeff
