On Fri, Jul 04, 2025 at 02:38:34PM +0300, Nikita Malakhov wrote:
> Hannu, we'd already made an attempt to extract the TOAST functionality as
> API and make it extensible and usable by other AMs in [1], the patch
> set was met calmly but we still have some hopes on it.

Yeah, it's one of these I have studied, and just found that
overcomplicated, preventing us from moving on with a simpler proposal,
because I care about two things first:
- More compression methods, with more meta-data, but let's just add
more vartag_external for that once/if they're really required.
- Enlarge optionally to 8-byte values.
So I really want to stress about these two points, nothing else for
now, echoing from the feedback from 2022 and the fact that all
proposals done after that lacked a simple approach.

IMO, we would live fine enough, *if* being able to plug in a pluggable
TOAST engine makes sense, if we just limit ourselves with an external
interface.  We could allow backends to load their own vartag_external
with their own set of callbacks like the ones I am proposing here, so
as we can translate from/to a Datum in heap (or a different table AM)
to an external source, with the backend able to understand what this
external source should be.  The key is to define a structure good
enough for the backend (toast_external_data in the patch).  So to
answer your and Hannu's question: I had the case of different table
AMs in mind with an interface able to plug into it, yes.  And yes, I
have designed the patch set with this in scope.  Now there's also a
data type component to that, so that's assuming that a table AM would
want to rely on a varlena to store this data externally, somewhere
else that may not be exactly TOAST, still we want an OID and a value
to be able to retrieve this external value, and we want to store this
external OID and this value (+extra like a compression method and
sizes) in a Datum of the main relation file.

FYI, the patch set posted on this thread is not the latest one.  I
have a v2, posted on this branch, where I have reordered things:
https://github.com/michaelpq/postgres/tree/toast_64bit_v2

The refactoring to the new toast_external_data with its callbacks is
done first, and the new vartag_external with 8-byte value support is
added on top of that.  There were still two things I wanted to do, and
could not get down to it because I've spent my last week or so
working on other's stuff so I lacked room:
- Evaluate the cost of the transfer layer to toast_external_data.  The
worst case I was planning to work with is a non-compressed data stored
in TOAST, then check profiles with the the detoasting path by grabbing
slices of the data with pgbench and a read-only query.  The
write/insert path is not going to matter, the detoast is.  The
reordering is actually for this reason: I want to see the effect of
the new interface first, and this needs to happen before we even
consider the option of adding 8-byte values.
- Add a callback for the value ID assignment.  I was hesitating to add
that when I first backed on the patch but I think that's the correct
design moving forward, with an extra logic to be able to check if an
8-byte value is already in use in a relation, as we do for OID
assignment, but applied to the Toast generator added to the patch.
The backend should decide if a new value is required, we should not
decide the rewrite cases in the callback.

There is a second branch that I use for development, force-pushing to
it periodically, as well:
https://github.com/michaelpq/postgres/tree/toast_64bit
That's much dirtier, always WIP, just one of my playgrounds.
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to