Hi, On 2025-03-25 13:18:50 -0700, Noah Misch wrote: > On Tue, Mar 25, 2025 at 04:07:35PM -0400, Andres Freund wrote: > > On 2025-03-25 12:39:56 -0700, Noah Misch wrote: > > > On Tue, Mar 25, 2025 at 02:58:37PM -0400, Andres Freund wrote: > > > > There are 2 1/2 ways around this: > > > > > > > > 1) Stop using IOSQE_ASYNC heuristic > > > > 2a) Wait for all in-flight IOs when any FD gets closed > > > > 2b) Wait for all in-flight IOs using FD when it gets closed > > > > > > > > Given that we have clear evidence that io_uring doesn't completely > > > > support > > > > closing FDs while IOs are in flight, be it a bug or intentional, it > > > > seems > > > > clearly better to go for 2a or 2b. > > > > > > Agreed. If a workload spends significant time on fd.c closing files, I > > > suspect that workload already won't have impressive benchmark numbers. > > > Performance-seeking workloads will already want to tune FD usage high > > > enough > > > to keep FDs long-lived. So (1) clearly loses, and neither (2a) nor (2b) > > > clearly beats the other. I'd try (2b) first but, if complicated, quickly > > > abandon it in favor of (2a). What other considerations could be > > > important? > > > > The only other consideration I can think of is whether this should happen > > for > > all io_methods or not. > > Either way is fine, I think.
Here's a draft incremental patch (attached as a .fixup to avoid triggering cfbot) implementing 2b). > > I'm inclined to do it via a bool in IoMethodOps, but I guess one could argue > > it's a bit weird to have a bool in a struct called *Ops. > > That wouldn't bother me. IndexAmRoutine has many bools, and "Ops" is > basically a synonym of "Routine". Cool. Done that way. The repeated-iteration approach taken in pgaio_closing_fd() isn't the prettiest, but it's hard to to imagine that ever being a noticeable. This survives a testrun where I use your torture patch and where I force all IOs to use ASYNC. Previously that did not get very far. I also did verify that, if I allow a small number of FDs, we do not wrongly wait for all IOs. Greetings, Andres Freund
commit 290de30c0e4a87ed3bd6cdef478376d39a922912 Author: Andres Freund <and...@anarazel.de> Date: 2025-03-25 16:50:27 -0400 wip: aio: io_uring wait for in-flight IOs Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch: diff --git a/src/include/storage/aio_internal.h b/src/include/storage/aio_internal.h index a46d4393ebc..352610f60a8 100644 --- a/src/include/storage/aio_internal.h +++ b/src/include/storage/aio_internal.h @@ -287,6 +287,15 @@ typedef struct PgAioCtl */ typedef struct IoMethodOps { + /* properties */ + + /* + * If an FD is about to be closed, do we need to wait for all in-flight + * IOs referencing that FD? + */ + bool wait_on_fd_before_close; + + /* global initialization */ /* @@ -368,6 +377,7 @@ extern PgAioResult pgaio_io_call_complete_local(PgAioHandle *ioh); /* aio_io.c */ extern void pgaio_io_perform_synchronously(PgAioHandle *ioh); extern const char *pgaio_io_get_op_name(PgAioHandle *ioh); +extern bool pgaio_io_uses_fd(PgAioHandle *ioh, int fd); /* aio_target.c */ extern bool pgaio_io_can_reopen(PgAioHandle *ioh); diff --git a/src/backend/storage/aio/aio.c b/src/backend/storage/aio/aio.c index c512495dce3..8816ccfa0d7 100644 --- a/src/backend/storage/aio/aio.c +++ b/src/backend/storage/aio/aio.c @@ -1303,6 +1303,41 @@ pgaio_closing_fd(int fd) * it's probably not worth it. */ pgaio_submit_staged(); + + /* + * If requested by the IO method, wait for all IOs that use the + * to-be-closed FD. + */ + if (pgaio_method_ops->wait_on_fd_before_close) + { + /* + * As waiting for one IO to complete may complete multiple IOs, we + * can't just use a mutable list iterator. The maximum number of + * in-flight IOs is fairly small, so just restart the loop after + * waiting for an IO. + */ + while (!dclist_is_empty(&pgaio_my_backend->in_flight_ios)) + { + dlist_iter iter; + PgAioHandle *ioh = NULL; + + dclist_foreach(iter, &pgaio_my_backend->in_flight_ios) + { + ioh = dclist_container(PgAioHandle, node, iter.cur); + + if (pgaio_io_uses_fd(ioh, fd)) + break; + else + ioh = NULL; + } + + if (!ioh) + break; + + /* see comment in pgaio_io_wait_for_free() about raciness */ + pgaio_io_wait(ioh, ioh->generation); + } + } } /* diff --git a/src/backend/storage/aio/aio_io.c b/src/backend/storage/aio/aio_io.c index 97930d687c2..108021fa377 100644 --- a/src/backend/storage/aio/aio_io.c +++ b/src/backend/storage/aio/aio_io.c @@ -186,3 +186,25 @@ pgaio_io_get_op_name(PgAioHandle *ioh) return NULL; /* silence compiler */ } + +/* + * Used to determine if an IO needs to be waited upon before the file + * descriptor can be closed. + */ +bool +pgaio_io_uses_fd(PgAioHandle *ioh, int fd) +{ + Assert(ioh->state >= PGAIO_HS_DEFINED); + + switch (ioh->op) + { + case PGAIO_OP_READV: + return ioh->op_data.read.fd == fd; + case PGAIO_OP_WRITEV: + return ioh->op_data.write.fd == fd; + case PGAIO_OP_INVALID: + return false; + } + + return false; /* silence compiler */ +} diff --git a/src/backend/storage/aio/method_io_uring.c b/src/backend/storage/aio/method_io_uring.c index aab09793684..3b299dcf388 100644 --- a/src/backend/storage/aio/method_io_uring.c +++ b/src/backend/storage/aio/method_io_uring.c @@ -58,6 +58,15 @@ static void pgaio_uring_sq_from_io(PgAioHandle *ioh, struct io_uring_sqe *sqe); const IoMethodOps pgaio_uring_ops = { + /* + * While io_uring mostly is OK with FDs getting closed while the IO is in + * flight, that is not true for IOs submitted with IOSQE_ASYNC. + * + * See + * https://postgr.es/m/5ons2rtmwarqqhhexb3dnqulw5rjgwgoct57vpdau4rujlrffj%403fls6d2mkiwc + */ + .wait_on_fd_before_close = true, + .shmem_size = pgaio_uring_shmem_size, .shmem_init = pgaio_uring_shmem_init, .init_backend = pgaio_uring_init_backend,