> From: Kevin Wolf [mailto:kw...@redhat.com] > Am 30.11.2018 um 08:55 hat Pavel Dovgalyuk geschrieben: > > > From: Kevin Wolf [mailto:kw...@redhat.com] > > > Am 10.10.2018 um 15:33 hat Pavel Dovgalyuk geschrieben: > > > > In record/replay mode bdrv queue is controlled by replay mechanism. > > > > It does not allow saving or loading the snapshots > > > > when bdrv queue is not empty. Stopping the VM is not blocked by nonempty > > > > queue, but flushing the queue is still impossible there, > > > > because it may cause deadlocks in replay mode. > > > > This patch disables bdrv_drain_all and bdrv_flush_all in > > > > record/replay mode. > > > > > > > > Signed-off-by: Pavel Dovgalyuk <pavel.dovga...@ispras.ru> > > > > > > Disabling bdrv_flush_all() shouldn't make a big difference in replay > > > mode, so I agree with that. > > > > > > But does the bdrv_drain_all() change mean that bdrv_drain_all_begin() > > > can return while there are still requests in flight? This sounds like > > > something that could break some existing code, because the meaning of > > > the function is that if it returns success, no requests are in flight > > > any more. > > > > > > What would move the request forward in the replay case once it has been > > > created? Does this also involve some user interaction? Or may we process > > > the next event in a drain callback of blkreplay or something? > > > > You are right - there are some operation that can't be performed at any > > time during the replay, because of unfinished block requests. > > > > One of these is creating the VM snapshot. I've added a check in another > > patch, which prevents snapshotting while the event queue (which holds > > the block requests) is not empty. > > > > There could also be other things, that will be fixed when we try to use it. > > > > Moving replay forward until the queue is empty is not a good idea, because > > step-by-step debugging should be available and work without skipping the > > instructions. > > So if the idea is that in replay mode, bdrv_drain_all_begin() is only > called when there are no requests in flight anyway (because callers > catch this situation earler), can we make this an assertion that the > block device is already quiesced?
Maybe it's behavior changed in the latest version? We observed that bdrv_drain_all hangs, when there is a block request in the queue. Therefore we decided that stopping with non-empty queue is better than skipping some steps. > Skipping drain without knowing that there are no requests in flight > feels simply wrong, and I'm almost sure it will result in bugs. On the > other hand, if we know that no requests are in flight, there's no real > point in skipping. What bugs can happen? Pavel Dovgalyuk