Tom Lane <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> I'm a bit confused by this and how it would be handled in your sketch. I >> assumed we needed a bit pattern dedicated to 4-byte length headers because >> even though it would never occur on disk it would be necessary to for the >> uncompressed and/or detoasted data. > >> In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted >> to larger than 16k? > > I'm imagining that it would give you the same old uncompressed in-memory > representation as it does now, ie, 4-byte length word and uncompressed > data.
Sure, but how would you know? Sometimes you would get a pointer to a varlena starting with a bytes with a leading 00 indicating a 1-byte varlena header and sometimes you would get a pointer to a varlena with the old uncompressed representation with a 4-byte length header which may well start with a 00. > * If high order bit of first byte is 1, then it's some compressed > variant. I'd propose divvying up the code space like this: > > * 0xxxxxxx uncompressed 4-byte length word as stated above > * 10xxxxxx 1-byte length word, up to 62 bytes of data > * 110xxxxx 2-byte length word, uncompressed inline data > * 1110xxxx 2-byte length word, compressed inline data > * 1111xxxx 1-byte length word, out-of-line TOAST pointer I'm unclear how you're using the remaining bits. Are you saying you would have a 4-byte length word following this bit-flag byte? Or are you saying we would use 31 bits for the 4-byte length word, 13 bits for the 2-byte uncompressed length word and 12 bits for the compressed length word? Also Heikki points out here that it would be nice to allow for the case for a 0-byte header. So for example if the leading bit is 0 then the remaining 7 bits are available for the datum itself. This would actually vacate much of my argument for a fixed length char(n) data type. The most frequent use case is for things like CHAR(1) fields containg 'Y' or 'N'. In any case it seems a bit backwards to me. Wouldn't it be better to preserve bits in the case of short length words where they're precious rather than long ones? If we make 0xxxxxxx the 1-byte case it means limiting our maximum datum size to something like .5G but if you're working with .5G data wouldn't you be using an api that lets you access it by chunks anyways? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend