On Mon, Aug 02, 2021 at 02:40:36PM +0200, Kevin Wolf wrote:
Am 29.07.2021 um 11:10 hat Fabian Ebner geschrieben:
Linux SCSI can throw spurious -EAGAIN in some corner cases in its
completion path, which will end up being the result in the completed
io_uring request.

Resubmitting such requests should allow block jobs to complete, even
if such spurious errors are encountered.

Co-authored-by: Stefan Hajnoczi <stefa...@gmail.com>
Reviewed-by: Stefano Garzarella <sgarz...@redhat.com>
Signed-off-by: Fabian Ebner <f.eb...@proxmox.com>
---

Changes from v1:
    * Focus on what's relevant for the patch itself in the commit
      message.
    * Add Stefan's comment.
    * Add Stefano's R-b tag (I hope that's fine, since there was no
      change code-wise).

 block/io_uring.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index 00a3ee9fb8..dfa475cc87 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s)
         total_bytes = ret + luringcb->total_read;

         if (ret < 0) {
-            if (ret == -EINTR) {
+            /*
+             * Only writev/readv/fsync requests on regular files or host block
+             * devices are submitted. Therefore -EAGAIN is not expected but 
it's
+             * known to happen sometimes with Linux SCSI. Submit again and hope
+             * the request completes successfully.
+             *
+             * For more information, see:
+             * 
https://lore.kernel.org/io-uring/20210727165811.284510-3-ax...@kernel.dk/T/#u
+             *
+             * If the code is changed to submit other types of requests in the
+             * future, then this workaround may need to be extended to deal 
with
+             * genuine -EAGAIN results that should not be resubmitted
+             * immediately.
+             */
+            if (ret == -EINTR || ret == -EAGAIN) {
                 luring_resubmit(s, luringcb);
                 continue;
             }

Reviewed-by: Kevin Wolf <kw...@redhat.com>

Question about the preexisting code, though: luring_resubmit() requires
that the caller calls ioq_submit() later so that the request doesn't
just sit in a queue without getting any attention, but actually gets
submitted to the kernel.

In the call chain ioq_submit() -> luring_process_completions() ->
luring_resubmit(), who takes care of that?

Mmm, good point.
There should be the same problem with ioq_submit() -> luring_process_completions() -> luring_resubmit_short_read() -> luring_resubmit().

Should we schedule a BH for example in luring_resubmit() to make sure that ioq_submit() is invoked after a resubmission?

Thanks,
Stefano


Reply via email to