Hello Michael, Following up on the discussion about avoiding method-specific shortcuts in detoast paths, this patch is a refactor-only step: it introduces a small decoded/in-memory representation of an on-disk external TOAST pointer, and refactors detoast_attr() and detoast_attr_slice() to use it.
The goal is to centralize “how do we interpret an external datum?” so that detoast code paths don’t have to reason directly about va_extinfo encoding vs payload layout details. This is intended as groundwork for a follow-up patch adding a new vartag-based method (e.g., zstd) without scattering special cases in detoast paths. Key changes - Introduces DecodedExternalToast + ToastDecompressMethod + TOAST_EXT_HAS_TCINFO in toast_internals.h. - Adds a small static decoder in detoast.c (decode_external_toast_pointer()) - Refactors detoast_attr() and detoast_attr_slice() to use a decode -> fetch -> decompress dispatch pattern - No on-disk format changes; existing behavior preserved (including error behavior for unsupported compression builds). Why HAS_TCINFO? - Previously, “is compressed?” was used as a proxy for whether the external payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO captures payload layout, which is distinct from whether the value is compressed. This separation is needed for future methods that may store external compressed payloads without tcinfo. Testing: Core regression suites pass Performance: I ran a small detoast-focused benchmark that forces external storage; results were within run-to-run variance, with no measurable regression. (Benchmark script attached: benchmark_toast_detoast.sql for reproduction) Thanks, Dharin On Thu, Dec 25, 2025 at 1:54 AM Dharin Shah <[email protected]> wrote: > Thanks Michael & Robert, > > Agreed — I don’t think it’s realistic or practical to talk about > deprecating or replacing pglz (or lz4) given on-disk compatibility > requirements. > > > Note that I am not on board with simply reusing varatt_external for > > zstd-compressed entries, neither do I think that this is the best move > > ever. It makes the core patch simpler, but it makes things like > > ToastCompressionId more complicated to think about. If anything, I'd > > consider a rename of varatt_external as the best way to go with an > > intermediate "translation" structure only used in memory as I am > > proposing on the other thread (something that others seem meh enough > > about but I am not seeing alternate proposals floating around, > > either). This would make things like detoast_external_attr() less > > confusing, I think, as the latest patch posted on this thread is > > actually proving with its shortcut for toast_fetch_datum as one > > example of something I'd rather not do.. > > On the design: I understand & share the same concerns that we’d end up > with multiple “sources of truth” for external compression method > identification (pglz/lz4 via va_extinfo bits, zstd via vartag), and that > this pushes method-specific shortcuts into detoast paths. > > Would you be OK if I split this into two steps? > > 1.First, a refactor-only patch introducing a small decoded/in-memory > representation of an external TOAST pointer, so detoast/toast code paths > don’t have to reason directly about tcinfo vs vartag vs va_extinfo. This > would be a cleanup with no on-disk format change and no behavioral change > for existing methods. Is this the same “translation structure” approach you > mentioned in the other thread? If you can point me to it, I’ll align with > that proposal. > > 2. Then, a follow-up patch adding zstd using VARTAG_ONDISK_ZSTD, > implemented on top of that abstraction to keep zstd handling centralized > and minimize special-casing in detoast. > If that direction matches what you had in mind, I can first post the > proposed translation structure/API for feedback before respinning the zstd > patch. > > Thanks, > Dharin > > > On Thu, Dec 25, 2025 at 1:25 AM Michael Paquier <[email protected]> > wrote: > >> On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote: >> > Agreed that I can't see pglz being removed any time soon, if ever. >> > Thinking through what a conversion process would look like seems >> > unwieldy at best, so I think we definitely need it for backwards >> > compatibility, plus I think it is useful to have a self-contained >> > option. I'd almost suggest we should look at replacing lz4, but I >> > don't think that is significantly easier, it just has a smaller, more >> > invested, blast radius. >> >> Backward-compatibility requirements make a replacement of LZ4 >> basically impossible to me, for the same reasons as pglz. We could >> not replace the bit used in the va_extinfo to track if LZ4 compression >> is used, either. One thing that I do wonder is if it would make >> things simpler in the long-run if we introduced a new separated vartag >> for LZ4-compressed external TOAST pointers as well. At least we'd >> have a leaner design: it means that we have to keep the >> varatt_external available on read, but we could update to the new >> format when writing entries. Or perhaps that's not worth the >> complication based on the last sentence you are writing.. >> >> > That said, I do suspect ztsd could quickly >> > become a popular recommendation and/or default among users / >> > consultants / service providers. >> >> .. Because I strongly suspect that this is going to be true, and that >> zstd would just be a better replacement over lz4. That's a trend that >> I see is already going on for wal_compression. >> >> Note that I am not on board with simply reusing varatt_external for >> zstd-compressed entries, neither do I think that this is the best move >> ever. It makes the core patch simpler, but it makes things like >> ToastCompressionId more complicated to think about. If anything, I'd >> consider a rename of varatt_external as the best way to go with an >> intermediate "translation" structure only used in memory as I am >> proposing on the other thread (something that others seem meh enough >> about but I am not seeing alternate proposals floating around, >> either). This would make things like detoast_external_attr() less >> confusing, I think, as the latest patch posted on this thread is >> actually proving with its shortcut for toast_fetch_datum as one >> example of something I'd rather not do.. >> -- >> Michael >> >
benchmark_toast_detoast.sql
Description: Binary data
0001-refactor-detoast-to-use-decoded-external-toast-abstr.patch
Description: Binary data
