On 22/01/2024 19:23, Robert Haas wrote:
In the case of this particular patch, I think the problem is that
there's no consensus on the design. There's not a ton of debate on
this thread, but thread [1] linked in the original post contains a lot
of vigorous debate about what the right thing to do is here and I
don't believe we reached any meeting of the minds.

Yeah, so it seems.

It looks like I never replied to
https://www.postgresql.org/message-id/20221019192130.ebjbycpw6bzjry4v%40awork3.anarazel.de
but, FWIW, I agree with Andres that applying the same technique to
multiple fields that are stored together (DB OID, TS OID, rel #, block
#) is unlikely in practice to produce many cases that regress. But the
question for this thread is really more about whether we're OK with
using ad-hoc bit swizzling to reduce the size of xlog records or
whether we want to insist on the use of a uniform varint encoding.
Heikki and Andres both seem to favor the latter. IIRC, I was initially
more optimistic about ad-hoc bit swizzling being a potentially
acceptable technique, but I'm not convinced enough about it to argue
against two very smart committers both of whom know more about
micro-optimizing performance than I do, and nobody else seems to
making this argument on this thread either, so I just don't really see
how this patch is ever going to go anywhere in its current form.

I don't have a clear idea of how to proceed with this either. Some thoughts I have:

Using varint encoding makes sense for length fields. The common values are small, and if a length of anything is large, then the size of the length field itself is insignificant compared to the actual data.

I don't like using varint encoding for OID. They might be small in common cases, but it feels wrong to rely on that. They're just arbitrary numbers. We could pick them randomly, it's just an implementation detail that we use a counter to choose the next one. I really dislike the idea that someone would do a pg_dump + restore, just to get smaller OIDs and smaller WAL as a result.

It does make sense to have a fast-path (small-path?) for 0 OIDs though.

To shrink OIDs fields, you could refer to earlier WAL records. A special value for "same relation as in previous record", or something like that. Now we're just re-inventing LZ-style compression though. Might as well use LZ4 or Snappy or something to compress the whole WAL stream. It's a bit tricky to get the crash-safety right, but shouldn't be impossible.

Has anyone seriously considered implementing wholesale compression of WAL?

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to