Hi Michael (and Peter), Thanks for the detailed feedback — this is really helpful.
I want to make sure I understand your main point: you're OK with a new `vartag_external`, but prefer we avoid increasing the heap TOAST pointer from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in the main heap tuple. I also realize the "compatibility" of the extended header doesn't buy us much — we'll need to support the existing 16-byte varatt_external forever for backward compatibility. Adding a 20-byte structure just means two formats to maintain indefinitely. A couple clarifying questions if we go with new vartag (e.g., `VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as discriminator 1. How should we handle future methods beyond zstd? One tag per method, or store a method id elsewhere (e.g., in TOAST chunk header)? 2. And re: "as long as the TOAST value is 32 bits" — are you referring to the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from extsize for method encoding)? Test Rows Uncompressed PGLZ LZ4 ZSTD PGLZ/ZSTD LZ4/ZSTD T1: Large JSON (~18KB/row) 500 ~9,000 KB 1496 KB 1528 KB 976 KB 1.53x 1.57x T2: Repetitive Text (~246KB/row) 500 ~123,000 KB 1672 KB 648 KB 248 KB 6.74x 2.61x T3: MD5 Hash Data (~16KB/row) 500 ~8,000 KB 8288 KB 8232 KB 4256 KB 1.95x 1.93x T4: Server Logs (~3.5KB/row) 1000 ~3,500 KB 400 KB 352 KB 456 KB 0.88x 0.77x *Key findings (i guess well known at this point):* - ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ) - For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better - The T4 result showing zstd as "worse" is not about compression quality - it's about missing inline storage support. ZSTD actually compresses better, but pays unnecessary TOAST overhead. I'll share the detailed benchmark script with the next patch revision. But also a potential path forward could be that we could just fully replace pglz (can bring it up later in different thread) *On Testing and Patch Structure* Agreed on both points: - I'll use `compression_zstd.sql` following the `compression_lz4.sql` pattern (removing the test_toast_ext module) - I'll split the GUC refactoring into a separate preparatory patch Once you confirm which representation you're advocating, I'll respin accordingly. Thanks, Dharin On Thu, Dec 18, 2025 at 7:35 AM Michael Paquier <[email protected]> wrote: > On Wed, Dec 17, 2025 at 04:11:38PM +0100, Peter Eisentraut wrote: > > On 16.12.25 11:51, Dharin Shah wrote: > > > - Zstd only applies to external TOAST, not inline compression. The > 2-bit > > > limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work > fine > > > anyway. Zstd's wins show up on larger values. > > > > This is a very complicated patch. To motivate it, you should show some > > detailed performance measurements that show these wins. > > Yes, this is expected for any patch posted. Zstd is an improved > version of lz4, acting as a sort of industry standard these days, and > any byte sequences I have looked at points that zstd leads kind of > always to a better compression ratio for less or equivalent CPU cost > compared to LZ4. Not saying that numbers are not required, they are. > But I strongly suspect numbers among these lines. > > FWIW, it's not a complicated patch, it is a large mechanical patch > that enforces a bunch of TOAST code paths to do what it wants. If we > are going to do something about that and agree on something, I think > that we should just use a new vartag_external for this matter > (spoiler: I think we should use a new vartag_external value), but > keep the toast structure at 16 bytes all the time, leaving alone the > extra bit in the existing varatt_external structure so as there is no > impact for heap relations if zstd is used, as long as the TOAST value > is 32 bits. The patch introduces a new vartag_external with > VARTAG_ONDISK_EXTENDED, so while it leads to a better compatibility, > it also means that all zstd entries have to pay an extra amount of > space in the main relation as an effect of a different > default_toast_compression. The difficulty is not in the > implementation, it would be on agreeing on what folks would be OK > with in terms if vartag and varatt structures, and that's one of the > oldest areas of the PG code, that has complications and assumptions of > its own. > > The test implementation looks wrong to me. Why is there any need for > an extra test module test_toast_ext? You could just reuse the same > structure as compression_lz4.sql, but adapted to zstd. That's why a > split with compression.sql has been done in 74a3fc36f314, FYI. > > You should also aim at splitting the patch more to make it easier to > review: one of the sticky point of this area of the code is to untie > completely DefaultCompressionId, its GUC and the TOAST code. The GUC > default_toast_compression accepts by design only 4 values. This needs > to go further, and should be refactored as a piece of its own. > > And also, I would prefer if the 32-bit value issue is tackled first, > but that's a digression here, for a different thread. :) > -- > Michael >
