Hi, On 2025-03-25 08:58:08 -0700, Noah Misch wrote: > While having nagging thoughts that we might be releasing FDs before io_uring > gets them into kernel custody, I tried this hack to maximize FD turnover: > > static void > ReleaseLruFiles(void) > { > #if 0 > while (nfile + numAllocatedDescs + numExternalFDs >= max_safe_fds) > { > if (!ReleaseLruFile()) > break; > } > #else > while (ReleaseLruFile()) > ; > #endif > } > > "make check" with default settings (io_method=worker) passes, but > io_method=io_uring in the TEMP_CONFIG file got different diffs in each of two > runs. s/#if 0/#if 1/ (restore normal FD turnover) removes the failures. > Here's the richer of the two diffs:
Yikes. That's a very good catch. I spent a bit of time debugging this. I think I see what's going on - it turns out that the kernel does *not* open the FDs during io_uring_enter() if IOSQE_ASYNC is specified [1]. Which we do add heuristically, in an attempt to avoid a small but measurable slowdown for sequential scans that are fully buffered (c.f. pgaio_uring_submit()). If I disable that heuristic, your patch above passes all tests here. I don't know if that's an intentional or unintentional behavioral difference. There are 2 1/2 ways around this: 1) Stop using IOSQE_ASYNC heuristic 2a) Wait for all in-flight IOs when any FD gets closed 2b) Wait for all in-flight IOs using FD when it gets closed Given that we have clear evidence that io_uring doesn't completely support closing FDs while IOs are in flight, be it a bug or intentional, it seems clearly better to go for 2a or 2b. Greetings, Andres Freund [1] Instead files are opened when the queue entry is being worked on instead. Interestingly that only happens when the IO is *explicitly* requested to be executed in the workqueue with IOSQE_ASYNC, not when it's put there because it couldn't be done in a non-blocking way.