On 01/01/2025 06:03, Andres Freund wrote:
Hi,
Attached is a new version of the AIO patchset.
I haven't gone through it all yet, but some comments below.
The biggest changes are:
- The README has been extended with an overview of the API. I think it gives a
good overview of how the API fits together. I'd be very good to get
feedback from folks that aren't as familiar with AIO, I can't really see
what's easy/hard anymore.
Thanks, the README is super helpful! I was overwhelmed by all the new
concepts before, now it all makes much more sense.
Now that it's all laid out more clearly, I see how many different
concepts and states there really are:
- For a single IO, there is an "IO handle", "IO references", and an "IO
return". You first allocate an IO handle (PgAioHandle), and then you get
a reference (PgAioHandleRef) and an "IO return" (PgAioReturn) struct for it.
- An IO handle has eight different states (PgAioHandleState).
I'm sure all those concepts exist for a reason. But still I wonder: can
we simplify?
pgaio_io_get() and pgaio_io_release() are a bit asymmetric, I'd suggest
pgaio_io_acquire() or similar. "get" also feels very innocent, even
though it may wait for previous IO to finish. Especially when
pgaio_io_get_ref() actually is innocent.
typedef enum PgAioHandleState
{
/* not in use */
AHS_IDLE = 0,
/* returned by pgaio_io_get() */
AHS_HANDED_OUT,
/* pgaio_io_start_*() has been called, but IO hasn't been submitted yet
*/
AHS_DEFINED,
/* subject's prepare() callback has been called */
AHS_PREPARED,
/* IO has been submitted and is being executed */
AHS_IN_FLIGHT,
/* IO finished, but result has not yet been processed */
AHS_REAPED,
/* IO completed, shared completion has been called */
AHS_COMPLETED_SHARED,
/* IO completed, local completion has been called */
AHS_COMPLETED_LOCAL,
} PgAioHandleState;
Do we need to distinguish between DEFINED and PREPARED? At quick glance,
those states are treated the same. (The comment refers to
pgaio_io_start_*() functions, but there's no such thing)
I didn't quite understand the point of the prepare callbacks. For
example, when AsyncReadBuffers() calls smgrstartreadv(), the
shared_buffer_readv_prepare() callback will be called. Why doesn't
AsyncReadBuffers() do the "prepare" work itself directly; why does it
need to be in a callback? I assume it's somehow related to error
handling, but I didn't quite get it. Perhaps an "abort" callback that'd
be called on error, instead of a "prepare" callback, would be better?
There are some synonyms used in the code: I think "in-flight" and
"submitted" mean the same thing. And "prepared" and "staged". I'd
suggest picking just one term for each concept.
I didn't understand the COMPLETED_SHARED and COMPLETED_LOCAL states.
does a single IO go through both states, or are the mutually exclusive?
At quick glance, I don't actually see any code that would set the
COMPLETED_LOCAL state; is it dead code?
REAPED feels like a bad name. It sounds like a later stage than
COMPLETED, but it's actually vice versa.
I'm a little surprised that the term "IO request" isn't used anywhere. I
have no concrete suggestion, but perhaps that would be a useful term.
- Retries for partial IOs (i.e. short reads) are now implemented. Turned out
to take all of three lines and adding one missing variable initialization.
:-)
- There's no obvious way to tell "internal" function operating on an IO handle
apart from functions that are expected to be called by the issuer of an IO.
One way to deal with this would be to introduce a distinct "issuer IO
reference" type. I think that might be a good idea, it would also make it
clearer that a good number of the functions can only be called by the
issuer, before the IO is submitted.
This would also make it easier to order functions more sensibly in aio.c, as
all the issuer functions would be together.
The functions on AIO handles that everyone can call already have a distinct
type (PgAioHandleRef vs PgAioHandle*).
Hmm, yeah I think you might be onto something here.
Could pgaio_io_get() return an PgAioHandleRef directly, so that the
issuer would never see a raw PgAioHandle ?
Finally, attached are a couple of typos and other trivial suggestions.
--
Heikki Linnakangas
Neon (https://neon.tech)
diff --git a/src/backend/storage/aio/README.md b/src/backend/storage/aio/README.md
index 0076ea4aa10..db3257c2705 100644
--- a/src/backend/storage/aio/README.md
+++ b/src/backend/storage/aio/README.md
@@ -15,7 +15,7 @@ In this example, a buffer will be read into shared buffers.
PgAioReturn ioret;
/*
- * Acquire AIO Handle, ioret will get result upon completion.
+ * Acquire an AIO Handle, ioret will get the result upon completion.
*/
PgAioHandle *ioh = pgaio_io_get(CurrentResourceOwner, &ioret);
@@ -46,15 +46,15 @@ pgaio_io_add_shared_cb(ioh, ASC_SHARED_BUFFER_READ);
pgaio_io_set_io_data_32(ioh, (uint32 *) buffer, 1);
/*
- * Hand AIO handle to lower-level function. When operating on the level of
+ * Pass the AIO handle to lower-level function. When operating on the level of
* buffers, we don't know how exactly the IO is performed, that is the
* responsibility of the storage manager implementation.
*
* E.g. md.c needs to translate block numbers into offsets in segments.
*
- * Once the IO handle has been handed of, it may not further be used, as the
- * IO may immediately get executed below smgrstartreadv() and the handle reused
- * for another IO.
+ * Once the IO handle has been handed off to smgrstartreadv(), it may not
+ * further be used, as the IO may immediately get executed in smgrstartreadv()
+ * and the handle reused for another IO.
*/
smgrstartreadv(ioh, operation->smgr, forknum, blkno,
BufferGetBlock(buffer), 1);
@@ -167,7 +167,7 @@ The main reason *not* to use Direct IO are:
explicit prefetching.
- In situations where shared_buffers cannot be set appropriately large,
e.g. because there are many different postgres instances hosted on shared
- hardware, performance will often be worse then when using buffered IO.
+ hardware, performance will often be worse than when using buffered IO.
### Deadlock and Starvation Dangers due to AIO
diff --git a/src/backend/storage/aio/aio.c b/src/backend/storage/aio/aio.c
index 261a752fb80..1cef6ef556b 100644
--- a/src/backend/storage/aio/aio.c
+++ b/src/backend/storage/aio/aio.c
@@ -123,10 +123,10 @@ static PgAioHandle *inj_cur_handle;
*
* If a handle was acquired but then does not turn out to be needed,
* e.g. because pgaio_io_get() is called before starting an IO in a critical
- * section, the handle needs to be be released with pgaio_io_release().
+ * section, the handle needs to be released with pgaio_io_release().
*
*
- * To react to the completion of the IO as soon as it is know to have
+ * To react to the completion of the IO as soon as it is known to have
* completed, callbacks can be registered with pgaio_io_add_shared_cb().
*
* To actually execute IO using the returned handle, the pgaio_io_prep_*()
diff --git a/src/backend/storage/aio/aio_io.c b/src/backend/storage/aio/aio_io.c
index 3c255775833..9e111c04b7e 100644
--- a/src/backend/storage/aio/aio_io.c
+++ b/src/backend/storage/aio/aio_io.c
@@ -31,7 +31,7 @@ static void pgaio_io_before_prep(PgAioHandle *ioh);
/* --------------------------------------------------------------------------------
* "Preparation" routines for individual IO types
*
- * These are called by place the place actually initiating an IO, to associate
+ * These are called by XXX place the place actually initiating an IO, to associate
* the IO specific data with an AIO handle.
*
* Each of the preparation routines first needs to call
diff --git a/src/include/storage/aio_internal.h b/src/include/storage/aio_internal.h
index f4c57438dd4..7a81e211d48 100644
--- a/src/include/storage/aio_internal.h
+++ b/src/include/storage/aio_internal.h
@@ -38,12 +38,13 @@ typedef enum PgAioHandleState
AHS_HANDED_OUT,
/* pgaio_io_start_*() has been called, but IO hasn't been submitted yet */
+ /* XXX: there are no pgaio_io_start_*() functions */
AHS_DEFINED,
- /* subjects prepare() callback has been called */
+ /* subject's prepare() callback has been called */
AHS_PREPARED,
- /* IO is being executed */
+ /* IO has been submitted and is being executed */
AHS_IN_FLIGHT,
/* IO finished, but result has not yet been processed */