"Tom Lane" <[EMAIL PROTECTED]> writes: > The point I'm trying to get across here is to do things one small step > at a time; if you insist on a "big bang" patch then it'll never get > done. You might want to go back and review the CVS history for some > other big changes like TOAST and the version-1 function-call protocol > to see our previous uses of this approach.
Believe me I'm not looking for a "big bang" approach. It's just that I've found problems with any of the incremental approaches. I suppose it's inevitable that "big bang" approaches always seem like the most perfect since they automatically involve fixing anything that could be a problem. You keep suggesting things that I've previously considered and rejected -- perhaps prematurely. Off the top of my head I recall the following four options from our discussions. It looks like we're circling around option 4. As long as we're willing to live with the palloc/memcpy overhead at least for now given that we can reduce it by whittling away at the sites that use the old macros, that seems like a good compromise and the shortest development path except perhaps option 1. And I take it you're not worried about sites that might not detoast a datum or detoast one in the wrong memory context where previously they were guaranteed it wouldn't generate a copy? In particular I'm worried about btree code and plpgsql row/record handling. I'm not sure what to do about the alignment issue. We could just never align 1-byte headers. That would work just fine as long a the data types that need alignment don't get ported to the new macros. It seems silly to waste space on disk in order to save a cpu memcpy that we aren't even going to be saving for now anyways. Option 1) We detect cases where the typmod guarantees either a fixed size or a maximum size < 256 bytes. In which case instead of copying the typlen from pg_type into tupledesc and the table's attlen we use the implied attlen or store -3 indicating single byte headers everywhere. Cons: This implies pallocing and memcpying the datum in heap_deform_tuple. It doesn't help text or varchar(xxxx) at all even if we mostly store small data in them. Pros: it buys us 0-byte headers for things like numeric(1,0) and even char(n) on single-byte encodings. It buys us 1-byte headers for most numerics and char(n) or varchar(n). Option 2) We have heap_form_tuple/heap_deform_tuple compress and uncompress the headers. The rule is that headers are always compressed in a tuple and never compressed when at the end of a datum. Cons: This implies pallocing and memcpying the datum in heap_deform_tuple It requires restricting the range of 1-byte headers to 127 or 63 bytes and always uses 1 byte even for fixed size data. (We could get 0-byte headers for a small range (ascii characters and numeric integers up to 127) but then 1-byte headers would be down to 31 byte data.) It implies a different varlena format for on-disk and in-memory Pros: it works for text/varchar as long as we store small data. It lets us use network byte order or some other format for the on-disk headers without changing the macros to access in-memory headers. Option 3) We have the toaster (or heap_form_tuple as a shortcut) compress the headers but delay decompression until pg_detoast_datum. tuples only contain compressed headers but Datums sometimes point to compressed headers and sometimes uncompressed headers just as they sometimes point to toasted data and sometimes detoasted data. Cons: This still implies pallocing and memcpying the datum at least for now There could be cases where code expects to deform_tuple and be guaranteed a non-toasted pointer into the tuple. requires replacing VARATT_SIZEP with SET_VARLENA_LEN() Pros: It allows for future macros to examine the compressed datum without decompressing it. Option 4) We have compressed data constructed on the fly everywhere possible. Cons: requires replacing VARATT_SIZEP and also requires hacking VARDATA to find the data in the appropriate place. Might need an additional pair of macros for backwards compatibility in code that really needs to construct a 4-byte headered varlena. fragility with risk of VARSIZE / VARDATA being filled in out of order Requires changing header to be in network byte order all the time. Pros: one consistent representation for varlena everywhere. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly