Re: AIO v2.5

Andres Freund Thu, 10 Jul 2025 12:29:59 -0700

Hi,

On 2025-07-10 21:00:21 +0200, Matthias van de Meent wrote:
> On Wed, 9 Jul 2025 at 16:59, Andres Freund <and...@anarazel.de> wrote:
> > > 3. I noticed that there is AIO code for writev-related operations
> > > (specifically, pgaio_io_start_writev is exposed, as is
> > > PGAIO_OP_WRITEV), but no practical way to excercise that code: it's
> > > not called from anywhere in the project, and there is no way for
> > > extensions to register the relevant callbacks required to make writev
> > > work well on buffered contents. Is that intentional?
> >
> > Yes.  We obviously do want to support writes eventually, and it didn't seem
> > useful to not have the most basic code for writes in the AIO infrastructure.
> >
> > You could still use it to e.g. write out temporary file data or such.
>
> Yes, though IIUC that would require an implementation of at least
> PgAioTargetInfo for such a use case (it's definitely not a SMGR
> target), which currently isn't available and can't be registered
> dynamically by an extension. Or maybe did I miss something?


I can see some hacky ways around that, but they're just that, hacky...



> (PS. I'm not quite 100% sure that it is impossible to use, just that
> there are rather few handles available for using this part of the new
> tool, and it seems completely untested in the PG18 branch)

I'm not saying it's 100% ready to use without modifying core code, but for
something that's like 30 lines of code, as part of a considerably larger
subystem, I just don't see a problem with writev not yet being covered.  It's
just incremental development.


> -----
>
> Something else I've just noticed is the use of int32 in
> PgAIOHandle->result. In sync and worker mode, pg_preadv and pg_pwritev
> return ssize_t, which most modern systems can't fit in int32 (the
> output was int before, then size_t, then ssize_t: [0]).

I don't think there's anything that can actually do IO that's large enough to
be problematic. What's the potential scenario where you'd want to read/write
more than 3GB of data within one syscall? That just doesn't seem to make
sense.


> While not directly an issue in default PG18 due to the use of 1GB relation
> segments capping the max IO size for SMGR-managed IOs (and various other
> code-level constraints), this may have more issues when an extension starts
> bulk-reading data on a system compiled with RELSEG_SIZE >= 2GB; I can't find
> any protective checks against overflows in downcasting the IO result.

I don't think the relation size is relevant piece here, it's just that it
doesn't make sense (and likely isn't possible) to read that much data at once.


Greetings,

Andres Freund

Re: AIO v2.5

Reply via email to