Hi, On 2025-07-10 21:00:21 +0200, Matthias van de Meent wrote: > On Wed, 9 Jul 2025 at 16:59, Andres Freund <and...@anarazel.de> wrote: > > > 3. I noticed that there is AIO code for writev-related operations > > > (specifically, pgaio_io_start_writev is exposed, as is > > > PGAIO_OP_WRITEV), but no practical way to excercise that code: it's > > > not called from anywhere in the project, and there is no way for > > > extensions to register the relevant callbacks required to make writev > > > work well on buffered contents. Is that intentional? > > > > Yes. We obviously do want to support writes eventually, and it didn't seem > > useful to not have the most basic code for writes in the AIO infrastructure. > > > > You could still use it to e.g. write out temporary file data or such. > > Yes, though IIUC that would require an implementation of at least > PgAioTargetInfo for such a use case (it's definitely not a SMGR > target), which currently isn't available and can't be registered > dynamically by an extension. Or maybe did I miss something?
I can see some hacky ways around that, but they're just that, hacky... > (PS. I'm not quite 100% sure that it is impossible to use, just that > there are rather few handles available for using this part of the new > tool, and it seems completely untested in the PG18 branch) I'm not saying it's 100% ready to use without modifying core code, but for something that's like 30 lines of code, as part of a considerably larger subystem, I just don't see a problem with writev not yet being covered. It's just incremental development. > ----- > > Something else I've just noticed is the use of int32 in > PgAIOHandle->result. In sync and worker mode, pg_preadv and pg_pwritev > return ssize_t, which most modern systems can't fit in int32 (the > output was int before, then size_t, then ssize_t: [0]). I don't think there's anything that can actually do IO that's large enough to be problematic. What's the potential scenario where you'd want to read/write more than 3GB of data within one syscall? That just doesn't seem to make sense. > While not directly an issue in default PG18 due to the use of 1GB relation > segments capping the max IO size for SMGR-managed IOs (and various other > code-level constraints), this may have more issues when an extension starts > bulk-reading data on a system compiled with RELSEG_SIZE >= 2GB; I can't find > any protective checks against overflows in downcasting the IO result. I don't think the relation size is relevant piece here, it's just that it doesn't make sense (and likely isn't possible) to read that much data at once. Greetings, Andres Freund