Gregory Stark <[EMAIL PROTECTED]> writes: > I don't really see a way around it though. Places that fill in VARDATA before > the size (formatting.c seems to be the worst case) will just have to be > changed and it'll be a fairly fragile point.
No, we're not going there: it'd break too much code now and it'd be a continuing source of bugs for the foreseeable future. The sane way to design this is that (1) code written to existing practice will always generate 4-byte headers. (Hence, VARDATA() acts the same as now.) That's the format that generally gets passed around in memory. (2) creation of a short header is handled by the TOAST code just before the tuple goes to disk. (3) replacement of a short header with a 4-byte header is considered part of de-TOASTing. After we have that working, we can work on offering alternative macros that let specific functions avoid the overhead of conversion between 4-byte headers and short ones, in much the same way that there are TOAST macros now that let specific functions get down-and-dirty with the out-of-line TOAST representation. But first we have to get to the point where 4-byte-header datums can be distinguished from short-header datums by inspection; and that requires either network byte order in the 4-byte length word or some other change in its representation. > Actually I think neither htonl nor bitshifting the entire 4-byte word is going > to really work here. Both will require 4-byte alignment. And your point is what? The 4-byte form can continue to require alignment, and *will* require it in any case, since many of the affected datatypes expect alignment of the data within the varlena. The trick is that when we are examining a non-aligned address within a tuple, we have to be able to tell whether we are looking at the first byte of a short-header datum (not aligned) or a pad byte. This is easily done, for instance by decreeing that pad bytes must be zeroes. I think we should probably consider making use of different alignment codes for different varlena datatypes. For instance the geometry types probably will still need align 'd' since they contain doubles; this may mean that we should just punt on any short-header optimization for them. But text and friends could have align 'c' showing that they need no padding and would be perfectly happy with a nonaligned VARDATA pointer. (Actually, maybe we should only do this whole thing for 'c'-alignable data types? But NUMERIC is a bit of a problem, it'd like 's'-alignment. OTOH we could just switch NUMERIC to an all-two-byte format that's independent of TOAST per se.) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend