From: Fiona Ebner <[email protected]>

Note that this issue seems already fixed as a consequence of the large
io_uring rework with 047dabef97 ("block/io_uring: use aio_add_sqe()")
in current master, so this is purely for QEMU stable branches.

At the end of ioq_submit(), there is an opportunistic call to
luring_process_completions(). This is the single caller of
luring_process_completions() that doesn't use the
luring_process_completions_and_submit() wrapper.

Other callers use the wrapper, because luring_process_completions()
might require a subsequent call to ioq_submit() after resubmitting a
request. As noted for luring_resubmit():

> Resubmit a request by appending it to submit_queue.  The caller must ensure
> that ioq_submit() is called later so that submit_queue requests are started.

So the caller at the end of ioq_submit() violates the contract and can
in fact be problematic if no other requests come in later. In such a
case, the request intended to be resubmitted will never be actually be
submitted via io_uring_submit().

A reproducer exposing this issue is [0], which is based on user
reports from [1]. Another reproducer is iotest 109 with '-i io_uring'.

I had the most success to trigger the issue with [0] when using a
BTRFS RAID 1 storage. With tmpfs, it can take quite a few iterations,
but also triggers eventually on my machine. With iotest 109 with '-i
io_uring' the issue triggers reliably on my ext4 file system.

Have ioq_submit() submit any resubmitted requests after calling
luring_process_completions(). The return value from io_uring_submit()
is checked to be non-negative before the opportunistic processing of
completions and going for the new resubmit logic, to ensure that a
failure of io_uring_submit() is not missed. Also note that the return
value already was not necessarily the total number of submissions,
since the loop might've been iterated more than once even before the
current change.

Only trigger the resubmission logic if it is actually necessary to
avoid changing behavior more than necessary. For example iotest 109
would produce more 'mirror ready' events if always resubmitting after
luring_process_completions() at the end of ioq_submit().

Note iotest 109 still does not pass as is when run with '-i io_uring',
because of two offset values for BLOCK_JOB_COMPLETED events being zero
instead of non-zero as in the expected output. Note that the two
affected test cases are expected failures and still fail, so they just
fail "faster". The test cases are actually not triggering the resubmit
logic, so the reason seems to be different ordering of requests and
completions of the current aio=io_uring implementation versus
aio=threads.

[0]:

> #!/bin/bash -e
> #file=/mnt/btrfs/disk.raw
> file=/tmp/disk.raw
> filesize=256
> readsize=512
> rm -f $file
> truncate -s $filesize $file
> ./qemu-system-x86_64 --trace '*uring*' --qmp stdio \
> --blockdev 
> raw,node-name=node0,file.driver=file,file.cache.direct=off,file.filename=$file,file.aio=io_uring
>  \
> <<EOF
> {"execute": "qmp_capabilities"}
> {"execute": "human-monitor-command", "arguments": { "command-line": "qemu-io 
> node0 \"read 0 $readsize \"" }}
> {"execute": "quit"}
> EOF

[1]: https://forum.proxmox.com/threads/170045/

Cc: [email protected]
Signed-off-by: Fiona Ebner <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

diff --git a/block/io_uring.c b/block/io_uring.c
index dd4f304910..5dbafc8f7b 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -120,11 +120,14 @@ static void luring_resubmit_short_read(LuringState *s, 
LuringAIOCB *luringcb,
  * event loop.  When there are no events left  to complete the BH is being
  * canceled.
  *
+ * Returns whether ioq_submit() must be called again afterwards since requests
+ * were resubmitted via luring_resubmit().
  */
-static void luring_process_completions(LuringState *s)
+static bool luring_process_completions(LuringState *s)
 {
     struct io_uring_cqe *cqes;
     int total_bytes;
+    bool resubmit = false;
 
     defer_call_begin();
 
@@ -182,6 +185,7 @@ static void luring_process_completions(LuringState *s)
              */
             if (ret == -EINTR || ret == -EAGAIN) {
                 luring_resubmit(s, luringcb);
+                resubmit = true;
                 continue;
             }
         } else if (!luringcb->qiov) {
@@ -194,6 +198,7 @@ static void luring_process_completions(LuringState *s)
             if (luringcb->is_read) {
                 if (ret > 0) {
                     luring_resubmit_short_read(s, luringcb, ret);
+                    resubmit = true;
                     continue;
                 } else {
                     /* Pad with zeroes */
@@ -224,6 +229,8 @@ end:
     qemu_bh_cancel(s->completion_bh);
 
     defer_call_end();
+
+    return resubmit;
 }
 
 static int ioq_submit(LuringState *s)
@@ -231,6 +238,7 @@ static int ioq_submit(LuringState *s)
     int ret = 0;
     LuringAIOCB *luringcb, *luringcb_next;
 
+resubmit:
     while (s->io_q.in_queue > 0) {
         /*
          * Try to fetch sqes from the ring for requests waiting in
@@ -260,12 +268,14 @@ static int ioq_submit(LuringState *s)
     }
     s->io_q.blocked = (s->io_q.in_queue > 0);
 
-    if (s->io_q.in_flight) {
+    if (ret >= 0 && s->io_q.in_flight) {
         /*
          * We can try to complete something just right away if there are
          * still requests in-flight.
          */
-        luring_process_completions(s);
+        if (luring_process_completions(s)) {
+            goto resubmit;
+        }
     }
     return ret;
 }
-- 
2.47.3


Reply via email to