Hi, On 2025-06-18 10:32:08 +0300, Konstantin Knizhnik wrote: > On 17/06/2025 6:08 pm, Andres Freund wrote: > > > > I don't think it can - this must be an independent bug from the one that Tom > > and I were encountering.
> I see... It's a pity. Indeed. Konstantin, Alexander, can you share what commit you're testing and what precise changes have been applied to the source? I've now tested this on a significant number of apple machines for many many days without being able to reproduce this a single time, despite using various compiler [versions]. Something has to be different on the two systems you're testing on. > By the way, I have a questions concerning using interrupts in AIO. > The comments say: > > pgaio_io_release(PgAioHandle *ioh) > /* > * Note that no interrupts are processed between > * pgaio_io_was_recycled() and this check - that's important > * as otherwise an interrupt could have already reclaimed > the > * handle. > */ > > pgaio_io_update_state(PgAioHandle *ioh, PgAioHandleState new_state) > /* > * All callers need to have held interrupts in some form, otherwise > * interrupt processing could wait for the IO to complete, while in an > * intermediary state. > */ > ... > > But I failed to understand how handle can be reclaimed by interrupt or how > any other AIO processing activity can be done in interrupt handlers, > `IoWorkerMain` is not registering some IO specific interrupts. Can you > explain please how interrupts can affect AIO, because I suspect that > interrupts may be the only possible explanation of such behavior? The most problematic interrupt is ProcessBarrierSmgrRelease(). To prevent problems with closing file-descriptors of in-progress IOs (with io_uring), we may need to wait for IO for file descriptors to complete before closing the file descriptor. If we started to wait for IO while in the middle of updating the state of an IO we'd re-enter the AIO code from the interrupt processing, which would lead to confusion. Note that this isn't specific to IO workers, it applies to nearly all process types (I guess not to the logger process, but ...). > Also I tried to write small test reproducing AIO data flow: > > #include <assert.h> > #include <pthread.h> > > #define read_barrier() __atomic_thread_fence(__ATOMIC_ACQUIRE) > #define write_barrier() __atomic_thread_fence(__ATOMIC_RELEASE) > > typedef struct { > int state:8; > int target:8; > int op:8; > int result; > } Handle; > > enum State { IDLE, GO, DONE }; > enum Operation { NOP, READ }; > > void* io_thread_proc(void* arg) > { > Handle* h = (Handle*)arg; > while (1) > { > if (h->state == GO) Strictly speaking I don't think the compiler is actually forced to reload h->state from memory here... > { > assert(h->op == READ); > h->result += 1; > write_barrier(); > h->state = DONE; > } > } > return 0; > } > > void* client_thread_proc(void* arg) > { > Handle* h = (Handle*)arg; > int expected_result = 0; > while (1) > { > assert(h->op == NOP); > assert(h->state == IDLE); > h->op = READ; > write_barrier(); > h->state = GO; > while (h->state != DONE); Same here. > Do you think that this test is doing something similar as Postgres AIO or > something should be changed (certainly AIO is not doing busy loop like this > test, but unlikely it is important for reproducing the problem). It's reasonably similar. As you say, with pg's aio there's no such busy looping - I guess it's possible that this only happens if there are scheduler interactions in the loops. Greetings, Andres Freund