Manfred Koizar <[EMAIL PROTECTED]> writes: > I.e. drop the flags for very short strings.
I don't think the interaction of this idea with TOAST has been thought through very carefully. Some points to consider: * If the 'e' (external) bit is set, the actual length is 20 or so bytes. Always. There is no need for an 'e' bit in the variants that store a longer length. * I agree with Manfred that you could blow off the case of compressing strings shorter than 128 bytes, so you really only need the 'c' bit for the longer variants. Hence, just one flag bit is needed in all cases. * The proposal to reduce the alignment to char breaks TOAST pointers, which contain fields that need to be word-aligned (at least on machines that are picky about alignment). Maybe we could just live with that, by copying the in-datum data to and from properly aligned local storage in the routines that need to access the pointer fields. It's an extra cost though, as well as a probable source of bugs that will only be noticed on non-Intel hardware. * In the 'c' case you need to store *two* lengths, the physical length and the decompressed length. Is it worth playing the same kind of game with the decompressed length? Again, you can't just do nothing because the proposal breaks word-alignment of the decompressed length. Another thing that's bothering me about this is that it'll lead to weird new coding rules that will certainly break a lot of code; notably "you can't access VARDATA(x) until you've assigned the correct value to VARSIZE(x)." This will be a particular pain in the neck for code that likes to fill in the data before it makes a final determination of the result size. So I'm feeling a good deal of sales resistance to the idea that this ought to replace our existing varlena representation. At least for internal use. Maybe the right way to think about this is that a compressed length word is a form of TOASTing, which we apply during toast compression and undo during PG_DETOAST_DATUM. If we did that, the changes would be vastly more localized than if we try to make all the datatype manipulation routines cope. Not sure about the speed tradeoffs involved, though. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly