On 14.06.19 11:22, Vladimir Sementsov-Ogievskiy wrote: > 13.06.2019 19:03, Max Reitz wrote: >> [re-adding the original CCs, why not] >> >> On 13.06.19 16:30, Vladimir Sementsov-Ogievskiy wrote: >>> 13.06.2019 17:21, Max Reitz wrote: >>>> On 13.06.19 16:19, Vladimir Sementsov-Ogievskiy wrote: >>>>> 13.06.2019 1:08, Max Reitz wrote: >>>>>> If the main loop cancels all block jobs while the block layer is not >>>>>> drained, this cancelling may not happen instantaneously. We can start a >>>>>> drained section before vm_shutdown(), which entails another >>>>>> bdrv_drain_all(); this nested bdrv_drain_all() will thus be a no-op, >>>>>> basically. >>>>>> >>>>>> We do not have to end the drained section, because we actually do not >>>>>> want any requests to happen from this point on. >>>>>> >>>>>> Signed-off-by: Max Reitz <[email protected]> >>>>>> --- >>>>>> I don't know whether it actually makes sense to never end this drained >>>>>> section. It makes sense to me. Please correct me if I'm wrong. >>>>>> --- >>>>>> vl.c | 11 +++++++++++ >>>>>> 1 file changed, 11 insertions(+) >>>>>> >>>>>> diff --git a/vl.c b/vl.c >>>>>> index cd1fbc4cdc..3f8b3f74f5 100644 >>>>>> --- a/vl.c >>>>>> +++ b/vl.c >>>>>> @@ -4538,6 +4538,17 @@ int main(int argc, char **argv, char **envp) >>>>>> */ >>>>>> migration_shutdown(); >>>>>> >>>>>> + /* >>>>>> + * We must cancel all block jobs while the block layer is drained, >>>>>> + * or cancelling will be affected by throttling and thus may block >>>>>> + * for an extended period of time. >>>>>> + * vm_shutdown() will bdrv_drain_all(), so we may as well include >>>>>> + * it in the drained section. >>>>>> + * We do not need to end this section, because we do not want any >>>>>> + * requests happening from here on anyway. >>>>>> + */ >>>>>> + bdrv_drain_all_begin(); >>>>>> + >>>>>> /* No more vcpu or device emulation activity beyond this point */ >>>>>> vm_shutdown(); >>>>>> >>>>>> >>>>> >>>>> So, actually, the problem is that we may wait for job requests twice: >>>>> on drain and then on cancel. >>>> >>>> We don’t wait on drain. When the throttle node is drained, it will >>>> ignore throttling (as noted in the cover letter). >>>> >>>> We do wait when cancelling a job while the throttle node isn’t drained, >>>> though. That’s the problem. >>> >>> Ah, understand now. >>> >>> Is it safe to drain_begin before stopping cpus? We may finish up then with >>> some queued >>> somewhere IO requests.. >> >> Hm... Aren’t guest devices prohibited from issuing requests to the >> block layer while their respective block device is drained? > > It's at least a buggy place, I remember Denis Plotnikov sent patch to fix it > and had a huge > discussion with Kevin. > And here it is: > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00732.html
Ah, I even have that in my inbox... The latest reply I see came in April: https://lists.nongnu.org/archive/html/qemu-block/2019-04/msg00243.html Where Kevin asked for an RFC patch in the current state. I’m not sure whether I should work around a potential bug here, if we can agree that it is a bug, and if it isn’t clear whether this place would actually be affected. Max
signature.asc
Description: OpenPGP digital signature
