flush bdrv queue while RR is working

Pavel Dovgalyuk Fri, 30 Nov 2018 02:51:51 -0800

> From: Kevin Wolf [mailto:kw...@redhat.com]
> Am 30.11.2018 um 08:55 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > Am 10.10.2018 um 15:33 hat Pavel Dovgalyuk geschrieben:
> > > > In record/replay mode bdrv queue is controlled by replay mechanism.
> > > > It does not allow saving or loading the snapshots
> > > > when bdrv queue is not empty. Stopping the VM is not blocked by nonempty
> > > > queue, but flushing the queue is still impossible there,
> > > > because it may cause deadlocks in replay mode.
> > > > This patch disables bdrv_drain_all and bdrv_flush_all in
> > > > record/replay mode.
> > > >
> > > > Signed-off-by: Pavel Dovgalyuk <pavel.dovga...@ispras.ru>
> > >
> > > Disabling bdrv_flush_all() shouldn't make a big difference in replay
> > > mode, so I agree with that.
> > >
> > > But does the bdrv_drain_all() change mean that bdrv_drain_all_begin()
> > > can return while there are still requests in flight? This sounds like
> > > something that could break some existing code, because the meaning of
> > > the function is that if it returns success, no requests are in flight
> > > any more.
> > >
> > > What would move the request forward in the replay case once it has been
> > > created? Does this also involve some user interaction? Or may we process
> > > the next event in a drain callback of blkreplay or something?
> >
> > You are right - there are some operation that can't be performed at any
> > time during the replay, because of unfinished block requests.
> >
> > One of these is creating the VM snapshot. I've added a check in another
> > patch, which prevents snapshotting while the event queue (which holds
> > the block requests) is not empty.
> >
> > There could also be other things, that will be fixed when we try to use it.
> >
> > Moving replay forward until the queue is empty is not a good idea, because
> > step-by-step debugging should be available and work without skipping the 
> > instructions.
> 
> So if the idea is that in replay mode, bdrv_drain_all_begin() is only
> called when there are no requests in flight anyway (because callers
> catch this situation earler), can we make this an assertion that the
> block device is already quiesced?


Maybe it's behavior changed in the latest version?
We observed that bdrv_drain_all hangs, when there is a block request in
the queue. Therefore we decided that stopping with non-empty queue
is better than skipping some steps.

> Skipping drain without knowing that there are no requests in flight
> feels simply wrong, and I'm almost sure it will result in bugs. On the
> other hand, if we know that no requests are in flight, there's no real
> point in skipping.

What bugs can happen?

Pavel Dovgalyuk

Re: [Qemu-devel] [PATCH v7 04/19] replay: don't drain/flush bdrv queue while RR is working

Reply via email to