Re: [HACKERS] Variable length varlena headers redux

Gregory Stark Tue, 13 Feb 2007 07:51:24 -0800

Tom Lane <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
> > I don't really see a way around it though. Places that fill in VARDATA 
> > before
> > the size (formatting.c seems to be the worst case) will just have to be
> > changed and it'll be a fairly fragile point.
> 
> No, we're not going there: it'd break too much code now and it'd be a
> continuing source of bugs for the foreseeable future.  The sane way to
> design this is that
> 
> (1) code written to existing practice will always generate 4-byte
> headers.  (Hence, VARDATA() acts the same as now.)  That's the format
> that generally gets passed around in memory.


So then we don't need to replace VARSIZE with SET_VARLENA_LEN at all.

> (2) creation of a short header is handled by the TOAST code just before
> the tuple goes to disk.
> 
> (3) replacement of a short header with a 4-byte header is considered
> part of de-TOASTing.

So (nigh) every tuple will get deformed and reformed once before it goes to
disk? Currently the toast code doesn't even look at a tuple if it's small
enough, but in this case we would want it to fire even on very narrow rows.

One design possibility I considered was doing this in heap_deform_tuple and
heap_form_tuple. Basically skipping the extra deform/form_tuple cycle in the
toast code. I had considered having heap_deform_tuple palloc copies of these
data before returning them. But that has the same problems.

The other problem is that there may be places in the code that receive a datum
from someplace where they have every right to expect it not to be toasted. For
example, plpgsql deforming a tuple they just formed, or even as the return
value from a function. They might be quite surprised to receive a toasted
tuple.

Note also that that's going to force us to palloc and memcpy these data. Are
there going to be circumstances where existing code where this changes the
memory context lifetime of some data? If, for example, soemthing like the inet
code knows its arguments can never be large enough to get toasted and doesn't
do a FREE_IF_COPY on its btree operator arguments.


> After we have that working, we can work on offering alternative macros
> that let specific functions avoid the overhead of conversion between
> 4-byte headers and short ones, in much the same way that there are TOAST
> macros now that let specific functions get down-and-dirty with the
> out-of-line TOAST representation.  But first we have to get to the point
> where 4-byte-header datums can be distinguished from short-header datums
> by inspection; and that requires either network byte order in the 4-byte
> length word or some other change in its representation.
> 
> > Actually I think neither htonl nor bitshifting the entire 4-byte word is 
> > going
> > to really work here. Both will require 4-byte alignment.
> 
> And your point is what?  The 4-byte form can continue to require
> alignment, and *will* require it in any case, since many of the affected
> datatypes expect alignment of the data within the varlena.  The trick is
> that when we are examining a non-aligned address within a tuple, we have
> to be able to tell whether we are looking at the first byte of a
> short-header datum (not aligned) or a pad byte.  This is easily done,
> for instance by decreeing that pad bytes must be zeroes.

Well if we're doing it in toast then the alignment of the payload really
doesn't matter at all. It'll be realigned after detoasting anyways.

What I had had in mind was to prohibit using smaller headers than the
alignment of the data type. But that was on the assumption we would continue
to use the compressed header in memory and not copy it.

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Re: [HACKERS] Variable length varlena headers redux

Reply via email to