On Wed, Sep 28, 2022 at 6:48 AM Robert Haas <robertmh...@gmail.com> wrote: > On second thought, I'm going to revert the whole thing. There's a > bigger mess here than can be cleaned up on the fly. The > alignment-related mess in ParseCommitRecord is maybe something for > which I could just hack a quick fix, but what I've also just now > realized is that this makes a huge number of WAL records larger by 4 > bytes, since most WAL records will contain a block reference.
It would be useful if there were generic tests that caught issues like this. There are various subtle effects related to how struct layout can impact WAL record size that might easily be missed. It's not like there are a huge number of truly critical WAL records to have tests for. The example that comes to mind is the XLOG_BTREE_INSERT_POST record type, which is used for B-Tree index tuple inserts with a posting list split. There is only an extra 2 bytes of payload for these record types compared to conventional XLOG_BTREE_INSERT_LEAF records, but we nevertheless tend to see a final record size that is consistently a full 8 bytes larger in many important cases, despite not needing to stored the IndexTuple with alignment padding. I believe that this is a consequence of the record header itself needing to be MAXALIGN()'d. Another important factor in this scenario is the general tendency for index tuple sizes to leave the final XLOG_BTREE_INSERT_LEAF record size at 64 bytes. It wouldn't have been okay if the deduplication work made that size jump up to 72 bytes for many kinds of indexes across the board, even when there was no accompanying posting list split (i.e. the vast majority of the time). Maybe it would have been okay if nbtree leaf page insert records were naturally rare, but that isn't the case at all, obviously. That's why we have two different record types here in the first place. Earlier versions of the deduplication patch just added an OffsetNumber field to XLOG_BTREE_INSERT_LEAF which could be set to InvalidOffsetNumber, resulting in a surprisingly large amount of waste in terms of WAL size. Because of the presence of 3 different factors. We don't bother doing this with the split records, which can also have accompanying posting list splits, because it makes hardly any difference at all (split records are much rarer than any kind of leaf insert record, and are far larger when considered individually). -- Peter Geoghegan