On Sun, Apr 17, 2022 at 11:22 PM Noah Misch <n...@leadboat.com> wrote: > > Yes, but it could be false positives in some cases. For instance, the > > column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4 > > and 8 platforms but the new test fails. > > I'm happy with that, because the affected author should look for padding-free > layouts before settling on your example layout. If the padding-free layouts > are all unacceptable, the author should update the expected sanity_check.out > to show the one row where the test "fails".
I realize that it was necessary to get something committed quickly here to unbreak the buildfarm, but this is really a mess. As I understand it, the problem here is that typalign='d' is either 4 bytes or 8 depending on how the 'double' type is aligned on that platform, but we use that typalign value also for some other data types that may not be aligned in the same way as 'double'. Consequently, it's possible to have a situation where the behavior of the C compiler diverges from the behavior of heap_form_tuple(). To avoid that, we need every catalog column that uses typalign=='d' to begin on an 8-byte boundary. We also want all such columns to occur before the first NameData column in the catalog, to guard against the possibility that NAMEDATALEN has been redefined to an odd value. I think this set of constraints is a nuisance and that it's mostly good luck we haven't run into any really awkward problems here so far. In many of our catalogs, the first member is an OID and the second member of the struct is of type NameData: pg_namespace, pg_class, pg_proc, etc. That common design pattern is in direct contradiction to the desires of this test case. As soon as someone wants to add a typalign='d' member to any of those system catalogs, the struct layout is going to have to get shuffled around -- and then it will look different from all the other ones. Or else we'd have to rearrange them all to move all the NameData columns to the end. I feel like it's weird to introduce a test case that so obviously flies in the face of how catalog layout has been done up to this point, especially for the sake of a hypothetical user who want to set NAMEDATALEN to an odd number. I doubt such scenarios have been thoroughly tested, or ever will be. Perhaps instead we ought to legislate that NAMEDATALEN must be a multiple of 8, or some such thing. The other constraint, that typalign='d' fields must always fall on an 8 byte boundary, is probably less annoying in practice, but it's easy to imagine a future catalog running into trouble. Let's say we want to introduce a new catalog that has only an Oid column and a float8 column. Perhaps with 0-3 bool or uint8 columns as well, or with any number of NameData columns as well. Well, the only way to satisfy this constraint is to put the float8 column first and the Oid column after it, which immediately makes it look different from every other catalog we have. It's hard to feel like that would be a good solution here. I think we ought to try to engineer a solution where heap_form_tuple() is going to do the same thing as the C compiler without the sorts of extra rules that this test case enforces. AFAICS, we could do that by: 1. De-supporting platforms that have this problem, or 2. Introducing new typalign values, as Noah proposed back on April 2, or 3. Somehow forcing values that are sometimes 4-byte aligned and sometimes 8-byte aligned to be 8-byte alignment on all platforms I also don't like the fact that the test case doesn't even catch exactly the problematic set of cases, but rather a superset, leaving it up to future patch authors to make a correct judgment about whether a certain new column can be listed as an expected output of the test case or whether the catalog representation must be changed. The idea that we'll reliably get that right might be optimistic. Again, I don't mean to say that this is the fault of this test case since, without the test case, we'd have no idea that there was even a potential problem, which would not be better. But it feels to me like we're hacking around the real problem instead of fixing it, and it seems to me that we should try to do better. -- Robert Haas EDB: http://www.enterprisedb.com