Hi, See patch 1 for a detailed explanation of the problem.
The gist is: Draining a READY job makes it transition to STANDBY, and jobs on STANDBY cannot be completed. Ending the drained section will schedule the job (so it is then resumed), but not wait until it is actually running again. Therefore, it can happen that issuing block-job-complete fails when you issue it right after some draining operation. I tried to come up with an iotest reproducer, but in the end I only got something that reproduced the issue like 2/10 times, and it required heavy I/O, so it is nothing I would like to have as part of the iotests. Instead, I opted for a unit test, which allows me to cheat a bit (specifically, locking the job IO thread before ending the drained section). Max Reitz (3): job: Add job_wait_unpaused() for block-job-complete test-blockjob: Test job_wait_unpaused() iotests/041: block-job-complete on user-paused job include/qemu/job.h | 15 ++++ blockdev.c | 3 + job.c | 42 +++++++++++ tests/unit/test-blockjob.c | 140 +++++++++++++++++++++++++++++++++++++ tests/qemu-iotests/041 | 13 +++- 5 files changed, 212 insertions(+), 1 deletion(-) -- 2.29.2