Re: mcvstats serialization code is still shy of a load

Tom Lane Wed, 26 Jun 2019 08:26:44 -0700

Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
> Attached is a patch that should (hopefully) fix this. It essentially
> treats the item as (char *) and does all pointer arithmetics without any
> additional casts. So there are no intermediate casts.


This passes the eyeball test, and it also allows my OpenBSD/hppa
installation to get through the core regression tests, so I think
it's good as far as it goes.  Please push.

However ... nosing around in mcv.c, I noticed that the next macro:

/*
 * Used to compute size of serialized MCV list representation.
 */
#define MinSizeOfMCVList                \
        (VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))

#define SizeOfMCVList(ndims,nitems)     \
        (MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
         MAXALIGN((ndims) * sizeof(DimensionInfo)) + \
         MAXALIGN((nitems) * ITEM_SIZE(ndims)))

is both woefully underdocumented and completely at variance with
reality.  It doesn't seem to be accounting for the actual data values.
No doubt this is why it's not used in the places where it'd matter;
the tests that do use it are testing much weaker conditions than they
should.

> The fix keeps the binary format as is, so the serialized MCV items are
> max-aligned. That means we can access the uint16 indexes directly, but we
> need to copy the rest of the fields (because those may not be aligned). In
> hindsight that seems a bit silly, we might as well copy everything, not
> care about the alignment and maybe save a few more bytes.

I think that part of the problem here is that the way this code is
written, "maxaligned" is no such thing.  What you're actually maxaligning
seems to be the offset from the start of the data area of a varlena value,
which is generally going to be a maxaligned palloc result plus 4 bytes.
So "aligned" double values are actually guaranteed to be on odd word
boundaries not even ones.

What's more, it's difficult to convince oneself that the maxaligns done
in different parts of the code are all enforcing the same choices about
which substructures get pseudo-maxaligned and which don't, because the
logic doesn't line up very well.

If we do need another catversion bump before v12, I'd vote for ripping
out Every Single One of the "maxalign" operations in this code, just
on the grounds of code simplicity and bug reduction.

                        regards, tom lane

Re: mcvstats serialization code is still shy of a load

Reply via email to