Thanks Michael, After looking more closely at your “8‑byte TOAST values / infinite loop” thread and patch series, I see this is very much the same direction you outlined there: introduce a normalized in-memory representation for external pointers (toast_external_data) and keep most call sites from having to reason about vartag_external/va_extinfo details directly [1].
For this refactor patch I kept the decoder local to detoast.c to minimize scope and avoid committing to a broader API boundary too early. But if the consensus heads toward a shared interface closer to the format definitions (as in your toast_external approach), I’m happy to respin/rework this patch to align with that direction, rather than working on parallel abstractions. It should also be straightforward to mold this refactor in the direction of the 8‑byte value-id work without changing the overall detoast structure. On HAS_TCINFO flag: the intent is to make payload layout explicit. In the current code, “external is compressed” effectively implies “payload begins with tcinfo”, which is wired into fetch/slice logic. For a vartag-based follow-up (e.g., zstd), we may want compressed payloads without a tcinfo prefix, so having an explicit flag keeps detoast paths uniform and avoids method-specific shortcuts. Let me know what you’d prefer for next steps: keep this patch as a detoast-local refactor, or respin it to align more directly with a shared decoded external-pointer interface in the direction of the 8‑byte work. [1] https://www.postgresql.org/message-id/flat/CAN-LCVNsE4x0k11ZRWvU4ySTbe98fwA16qzV7p8dxogWnD5Jng%40mail.gmail.com#8253d63beab18b73706e3555ce19a0f4 Thanks, Dharin On Tue, Dec 30, 2025 at 12:46 AM Michael Paquier <[email protected]> wrote: > On Mon, Dec 29, 2025 at 02:45:27PM +0100, Dharin Shah wrote: > > The goal is to centralize “how do we interpret an external datum?” so > that > > detoast code paths don’t have to reason directly about va_extinfo > encoding > > vs payload layout details. This is intended as groundwork for a follow-up > > patch adding a new vartag-based method (e.g., zstd) without scattering > > special cases in detoast paths. > > +static bool > +decode_external_toast_pointer(const struct varlena *attr, > + DecodedExternalToast > *decoded) > [...] > +typedef enum ToastDecompressMethod > +{ > + TOAST_DECOMP_NONE = 0, > + TOAST_DECOMP_PGLZ = 1, > + TOAST_DECOMP_LZ4 = 2 > +} ToastDecompressMethod; > + > +typedef struct DecodedExternalToast > +{ > + Oid toastrelid; > + Oid valueid; > + uint32 rawsize; /* Decompressed size; for > future methods without tcinfo */ > + uint32 extsize; /* On-disk payload size */ > + ToastDecompressMethod method; > + uint8 flags; > +} DecodedExternalToast; > > Yeah, honestly this is a layer I have been thinking about as well as > one option, but contrary to you I have been focusing on putting that > into varatt.h, with the exception of the value being an Oid8. I think > that you have an interesting point in focusing your implementation to > be stored in the detoast part, though. I'd need to spend a bit more > time to see the result this would lead at with the larger 8-byte issue > in mind, but this is something that would come at no real cost as it > has no function pointer redirection compared to what I was first > envisioning on the other thread. That's especially true if it makes > the CompressionId business easier to mold around when adding a new > vartag. > > > Why HAS_TCINFO? > > - Previously, “is compressed?” was used as a proxy for whether the > external > > payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO > > captures payload layout, which is distinct from whether the value is > > compressed. This separation is needed for future methods that may store > > external compressed payloads without tcinfo. > > It is possible to model the on-memory data as we want. This > suggestion would be OK with some flags. > -- > Michael >
