1. Debug bdrv_drain_all() and find out whether there are any I/O
    requests remaining.

I believe that's what happens:

Context 1:
- commit_one_iteration makes write request (req A)
- request A is handled to io thread, qemu_coroutine_yield() is called

Context 2:
- VM makes write request (req B)
- request B is inserted into bs->tracked_requests
- request B is handled to io thread, qemu_coroutine_yield() is called

- request A is completed, bdrv_co_io_em notification is called and jumps into context 1

- meanwhile request B is completed. Main thread is currently executing context 1

Context 1:
- calls bdrv_drain_all
- calls bdrv_requests_pending_all. It returns true as bs->tracked_request is not empty (it still has req B) - calls aio_pool which hangs, as req B has been already completed but it notification has not been called yet. (this part I'm not sure. But it hangs forever for some reason...)

This is based from traces and debug prints I collected.

I've made patch that moves bdrv_drop_intermediate() into separate bottom half and couldn't recreate hang after this. But it probably affects mirror_run as well so I don't know if this is acceptable solution for you.

--
mg

Reply via email to