Hi,

On 2025-07-10 21:00:21 +0200, Matthias van de Meent wrote:
> On Wed, 9 Jul 2025 at 16:59, Andres Freund <and...@anarazel.de> wrote:
> > > 3. I noticed that there is AIO code for writev-related operations
> > > (specifically, pgaio_io_start_writev is exposed, as is
> > > PGAIO_OP_WRITEV), but no practical way to excercise that code: it's
> > > not called from anywhere in the project, and there is no way for
> > > extensions to register the relevant callbacks required to make writev
> > > work well on buffered contents. Is that intentional?
> >
> > Yes.  We obviously do want to support writes eventually, and it didn't seem
> > useful to not have the most basic code for writes in the AIO infrastructure.
> >
> > You could still use it to e.g. write out temporary file data or such.
>
> Yes, though IIUC that would require an implementation of at least
> PgAioTargetInfo for such a use case (it's definitely not a SMGR
> target), which currently isn't available and can't be registered
> dynamically by an extension. Or maybe did I miss something?

I can see some hacky ways around that, but they're just that, hacky...



> (PS. I'm not quite 100% sure that it is impossible to use, just that
> there are rather few handles available for using this part of the new
> tool, and it seems completely untested in the PG18 branch)

I'm not saying it's 100% ready to use without modifying core code, but for
something that's like 30 lines of code, as part of a considerably larger
subystem, I just don't see a problem with writev not yet being covered.  It's
just incremental development.


> -----
>
> Something else I've just noticed is the use of int32 in
> PgAIOHandle->result. In sync and worker mode, pg_preadv and pg_pwritev
> return ssize_t, which most modern systems can't fit in int32 (the
> output was int before, then size_t, then ssize_t: [0]).

I don't think there's anything that can actually do IO that's large enough to
be problematic. What's the potential scenario where you'd want to read/write
more than 3GB of data within one syscall? That just doesn't seem to make
sense.


> While not directly an issue in default PG18 due to the use of 1GB relation
> segments capping the max IO size for SMGR-managed IOs (and various other
> code-level constraints), this may have more issues when an extension starts
> bulk-reading data on a system compiled with RELSEG_SIZE >= 2GB; I can't find
> any protective checks against overflows in downcasting the IO result.

I don't think the relation size is relevant piece here, it's just that it
doesn't make sense (and likely isn't possible) to read that much data at once.


Greetings,

Andres Freund


Reply via email to