Re: [HACKERS] Fixed length data types issue

Tom Lane Sun, 10 Sep 2006 18:17:42 -0700

Gregory Stark <[EMAIL PROTECTED]> writes:
> I'm a bit confused by this and how it would be handled in your sketch. I
> assumed we needed a bit pattern dedicated to 4-byte length headers because
> even though it would never occur on disk it would be necessary to for the
> uncompressed and/or detoasted data.


> In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted
> to larger than 16k?

I'm imagining that it would give you the same old uncompressed in-memory
representation as it does now, ie, 4-byte length word and uncompressed
data.

The weak spot of the scheme is that it assumes different, incompatible
in-memory and on-disk representations.  This seems to require either
(a) coercing values to in-memory form before they ever get handed to any
datatype manipulation function, or (b) thinking of some magic way to
pass out-of-band info about the contents of the datum.  (b) is the same
stumbling block we have in connection with making typmod available to
datatype manipulation functions.  I don't want to reject (b) entirely,
but it seems to require some pretty major structural changes.

OTOH (a) is not very pleasant either, and so what would be nice is if
we could tell by inspection of the Datum alone which format it's in.

After further thought I have an alternate proposal that does that,
but it's got its own disadvantage: it requires storing uncompressed
4-byte length words in big-endian byte order everywhere.  This might
be a showstopper (does anyone know the cost of ntohl() on modern
Intel CPUs?), but if it's not then I see things working like this:

* If high order bit of datum's first byte is 0, then it's an
uncompressed datum in what's essentially the same as our current
in-memory format except that the 4-byte length word must be big-endian
(to ensure that the leading bit can be kept zero).  In particular this
format will be aligned on 4- or 8-byte boundary as called for by the
datatype definition.

* If high order bit of first byte is 1, then it's some compressed
variant.  I'd propose divvying up the code space like this:

        * 0xxxxxxx  uncompressed 4-byte length word as stated above
        * 10xxxxxx  1-byte length word, up to 62 bytes of data
        * 110xxxxx  2-byte length word, uncompressed inline data
        * 1110xxxx  2-byte length word, compressed inline data
        * 1111xxxx  1-byte length word, out-of-line TOAST pointer

This limits us to 8K uncompressed or 4K compressed inline data without
toasting, which is slightly annoying but probably still an insignificant
limitation.  It also means more distinct cases for the heap_deform_tuple
inner loop to think about, which might be a problem.

Since the compressed forms would not be aligned to any boundary,
there's an important special case here: how can heap_deform_tuple tell
whether the next field is compressed or not?  The answer is that we'll
have to require pad bytes between fields to be zero.  (They already are
zeroed by heap_form_tuple, but now it'd be a requirement.)  So the
algorithm for decoding a non-null field is:

        * if looking at a byte with high bit 0, then we are either
        on the start of an uncompressed field, or on a pad byte before
        such a field.  Advance to the declared alignment boundary for
        the datatype, read a 4-byte length word, and proceed.

        * if looking at a byte with high bit 1, then we are at the
        start of a compressed field (which will never have any preceding
        pad bytes).  Decode length as per rules above.

The good thing about this approach is that it requires zero changes to
fundamental system structure.  The pack/unpack rules in heap_form_tuple
and heap_deform_tuple change a bit, and the mechanics of
PG_DETOAST_DATUM change, but a Datum is still just a pointer and you
can always tell what you've got by examining the pointed-to data.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: [HACKERS] Fixed length data types issue

Reply via email to