On Wed, Mar 13, 2024 at 9:57 PM Heikki Linnakangas <hlinn...@iki.fi> wrote: > Let's bite the bullet and merge the smgrwrite and smgrextend functions > at the smgr level too. I propose the following signature: > > #define SWF_SKIP_FSYNC 0x01 > #define SWF_EXTEND 0x02 > #define SWF_ZERO 0x04 > > void smgrwritev(SMgrRelation reln, ForkNumber forknum, > BlockNumber blocknum, > const void **buffer, BlockNumber nblocks, > int flags); > > This would replace smgwrite, smgrextend, and smgrzeroextend. The
That sounds pretty good to me. > > Here also is a first attempt at improving the memory allocation and > > memory layout. > > ... > > +typedef union BufferSlot > > +{ > > + PGIOAlignedBlock buffer; > > + dlist_node freelist_node; > > +} BufferSlot; > > + > > If you allocated the buffers in one large contiguous chunk, you could > often do one large write() instead of a gathered writev() of multiple > blocks. That should be even better, although I don't know much of a > difference it makes. The above layout wastes a fair amount memory too, > because 'buffer' is I/O aligned. The patch I posted has an array of buffers with the properties you describe, so you get a pwrite() (no 'v') sometimes, and a pwritev() with a small iovcnt when it wraps around: pwrite(...) = 131072 (0x20000) pwritev(...,3,...) = 131072 (0x20000) pwrite(...) = 131072 (0x20000) pwritev(...,3,...) = 131072 (0x20000) pwrite(...) = 131072 (0x20000) Hmm, I expected pwrite() alternating with pwritev(iovcnt=2), the latter for when it wraps around the buffer array, so I'm not sure why it's 3. I guess the btree code isn't writing them strictly monotonically or something... I don't believe it wastes any memory on padding (except a few bytes wasted by palloc_aligned() before BulkWriteState): (gdb) p &bulkstate->buffer_slots[0] $4 = (BufferSlot *) 0x15c731cb4000 (gdb) p &bulkstate->buffer_slots[1] $5 = (BufferSlot *) 0x15c731cb6000 (gdb) p sizeof(bulkstate->buffer_slots[0]) $6 = 8192