Re: [HACKERS] Plan for compressed varlena headers
On Thu, Feb 15, 2007 at 10:42:49AM -0500, Tom Lane wrote: > > #define VARDATA_4B(PTR) ((PTR)->va_4byte.va_data) > > #define VARDATA_2B(PTR) ((PTR)->va_2byte.va_data) > > #define VARDATA_1B(PTR) ((PTR)->va_1byte.va_data) > I thought we had abandoned the 2-byte-header variant? Maybe you need to > start a separate thread about exactly which of the bit-level proposals > you want to implement. There were quite a few tradeoffs discussed in > the previous thread IIRC. I agreed with Tom in the last thread. The 2 byte case doesn't seem like good value for the return. Simpler analysis results in easier to optimize code for the compiler, and less complexity stored on disk. Please remove 2B. :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Plan for compressed varlena headers
"Tom Lane" <[EMAIL PROTECTED]> writes: >> #define VARDATA_4B(PTR) ((PTR)->va_4byte.va_data) >> #define VARDATA_2B(PTR) ((PTR)->va_2byte.va_data) >> #define VARDATA_1B(PTR) ((PTR)->va_1byte.va_data) > > I thought we had abandoned the 2-byte-header variant? Hm, I don't remember anyone saying that. I had actually considered doing that but concluded we couldn't or there wouldn't enough bits to indicate inline compression. Now that I think about it we could just do it but I don't see much of a case to do it. > Maybe you need to start a separate thread about exactly which of the > bit-level proposals you want to implement. There were quite a few tradeoffs > discussed in the previous thread IIRC. If I get the macro api right we can flip around what things indicate easily enough. Currently I'm just doing the second of the two you posted. That's the one with the hard coded external toast datum size but able to handle 1-byte headers for data up to 127 bytes long. The other one you posted had one fewer cases for deform_tuple to consider but only handled datums up to 63 bytes long in single byte headers. Both of those were uniformly better than the previous alternatives. >> Incidentally I profiled htonl against a right shift on my machine (an intel >> 2Ghz core duo). htonl is four times slower but that's 3.2ns versus 0.8ns. > > Yeah, but what about machines that have stupider compilers, or maybe > htonl isn't inlined at all? With a shift you pretty much know what > you're getting, with htonl I'm not so sure. I'm not against doing the shifting, I'm just going to get this working first and then we can always add a separate set of equivalent macros for little-endian machines. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Plan for compressed varlena headers
Gregory Stark <[EMAIL PROTECTED]> writes: > There's also VARATT_CDATA which I suppose I should rename to VARCDATA. I > may not even need it once I hit tuptoaster.c since that file works directly > with the structure members anyways. > I supposed we also rename VARATT_IS_{COMPRESSED,EXTERNAL,EXTENDED} ? > Is VAR_IS_* ok or does that sound too generic? I think the VARATT_xxx names are OK for stuff that is intended to be private to the TOAST-related code (or at least not intended for widespread use). Maybe part of the problem is just that postgres.h fails to document which macros are preferred for widespread use and which are semi-private. > For the internal macros for each specific size I have: > #define VARDATA_4B(PTR) ((PTR)->va_4byte.va_data) > #define VARDATA_2B(PTR) ((PTR)->va_2byte.va_data) > #define VARDATA_1B(PTR) ((PTR)->va_1byte.va_data) I thought we had abandoned the 2-byte-header variant? Maybe you need to start a separate thread about exactly which of the bit-level proposals you want to implement. There were quite a few tradeoffs discussed in the previous thread IIRC. > Incidentally I profiled htonl against a right shift on my machine (an intel > 2Ghz core duo). htonl is four times slower but that's 3.2ns versus 0.8ns. Yeah, but what about machines that have stupider compilers, or maybe htonl isn't inlined at all? With a shift you pretty much know what you're getting, with htonl I'm not so sure. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Plan for compressed varlena headers
"Tom Lane" <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> 1) Replace the VARATT_SIZEP macro with SET_VARLENA_LEN. > > If we're going to do this then it's time to play the name game; Least...fun...game...evar... > A first-cut proposal: > > VARHDRSZsame as now, ie, size of 4-byte header > VARSIZE(x) for *reading* a 4-byte-header length word > VARDATA(x) same as now, ie, ptr + 4 bytes > SET_VARSIZE(x, len) for *writing* a 4-byte-header length word There's also VARATT_CDATA which I suppose I should rename to VARCDATA. I may not even need it once I hit tuptoaster.c since that file works directly with the structure members anyways. I supposed we also rename VARATT_IS_{COMPRESSED,EXTERNAL,EXTENDED} ? Is VAR_IS_* ok or does that sound too generic? > We'll also need names for the macros that can read the length and find > the data of a datum in either-1-or-4-byte-header format. These should > probably be named as variants of VARSIZE and VARDATA, but I'm not sure > what exactly; any thoughts? I can't think of any good names for the "automatic" macros. Right now I have VARSIZE_ANY(ptr) but that doesn't seem particularly pleasing. For the internal macros for each specific size I have: #define VARDATA_4B(PTR) ((PTR)->va_4byte.va_data) #define VARDATA_2B(PTR) ((PTR)->va_2byte.va_data) #define VARDATA_1B(PTR) ((PTR)->va_1byte.va_data) #define VARSIZE_IS_4B(PTR) ((PTR)->va_1byte.va_header & ~0x3F == 0x00) #define VARSIZE_IS_2B(PTR) ((PTR)->va_1byte.va_header & ~0x1F == 0x20) #define VARSIZE_IS_1B(PTR) ((PTR)->va_1byte.va_header & ~0x7F == 0x80) #define VARSIZE_4B(PTR) (ntohl((PTR)->va_4byte.va_header) & 0x3FFF) #define VARSIZE_2B(PTR) (ntohs((PTR)->va_2byte.va_header) & 0x1FFF) #define VARSIZE_1B(PTR) ( ((PTR)->va_1byte.va_header) & 0x7F) #define SET_VARSIZE_4B(PTR,len) ((PTR)->va_4byte.va_header = htonl(len)) #define SET_VARSIZE_2B(PTR,len) ((PTR)->va_2byte.va_header = htons((len) | 0x2000)) #define SET_VARSIZE_1B(PTR,len) ((PTR)->va_1byte.va_header = (len) | 0x80) I had a separate version for little-endian but it was driving me nuts having two versions to keep tweaking. I also had the magic constants as #defines but it really didn't enhance readability at all so I took them out when I rewrote this just now. Incidentally I profiled htonl against a right shift on my machine (an intel 2Ghz core duo). htonl is four times slower but that's 3.2ns versus 0.8ns. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Plan for compressed varlena headers
Gregory Stark <[EMAIL PROTECTED]> writes: > "Tom Lane" <[EMAIL PROTECTED]> writes: >> Plus you can't do it without multiple evaluation of the macro argument, > I don't see any way to do VARSIZE without multiply evaluating its argument. Some variant of #define VARSIZE(x) (ntohl((x)->vl_len) & 0x3fff) #define VARSIZE(x) ((x)->vl_len >> 2) The 1-or-4-byte version is a lot harder, but also will be used in a lot fewer places, all of which will get looked at when it gets installed. I'm prepared to put up with multiple eval for that. I *don't* want to assume that existing code can tolerate multiple eval in a macro that has existed forever and never did it before. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Plan for compressed varlena headers
"Tom Lane" <[EMAIL PROTECTED]> writes: > Plus you can't do it without multiple evaluation of the macro argument, > which is something I'd really prefer we not introduce into such a > widely-used macro. I don't see any way to do VARSIZE without multiply evaluating its argument. It's got to mask out the relevant bits then take the appropriate number of bytes and shift the appropriate number of bits to the right. If we wanted to require GCC we could use temporary variables in macros. Or we could use a global variable and declare that you can't use VARSIZE inside the argument to VARSIZE. (Actually I can't construct a scenario where it would break, perhaps it's safe.) But I just did a find-grep and the most complicated expression I can find is VARATT_SIZE(DatumGetPointer(values[i])). And that's in code I think I'll have to touch anyways. I can't find any instances of anything that would be dangerous or noticeably inefficient. It would be pretty strange code that wanted to know the size of a datum but didn't care enough about the actual contents of the datum to store it in a temporary variable. The only circumstances I could see it happening is if someone wrote code like: len = VARSIZE(datum = DirectFunctionCall()) There are no instances that I can find of that form that I can find. > The only argument in favor is to save a couple bytes of alignment padding, > but since this is only going to happen for wide data values, the > significance of that is minimal. Yeah I realized the same thing earlier. At least in the case of four-byte headers padding is logical on all fronts since nul bytes will have the right bitpattern. And it's a big cpu win on four-byte headers since it affects reading non-compressed varlenas being passed around in the executor. I'm not sure the same logic holds for two-byte headers. They're a) not so expensive in the first place, b) not so large in the first place and c) would require padding with a special pad byte. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Plan for compressed varlena headers
Bruce Momjian <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: >> If we want to allow storing >1 headers unaligned which I think would be >> good then I still think we have to read them using bytewise lookups -- ie >> by casting to (char*). That means network byte order or using the low order >> bits is equally efficient. > I think the plan was to have the macro code conditional on big-little > endian. We can deal with doing >1 headers unaligned at some future > date if we feel we need it, and the macros will make it transparent. Forcing bytewise access does not sound like a good plan to me --- you're very much at the mercy of the compiler whether you get good code for that. Plus you can't do it without multiple evaluation of the macro argument, which is something I'd really prefer we not introduce into such a widely-used macro. The only argument in favor is to save a couple bytes of alignment padding, but since this is only going to happen for wide data values, the significance of that is minimal. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Plan for compressed varlena headers
Gregory Stark wrote: > 2) Replace VARATT* macros to store and retrieve the toast bits in a manner >that will work for variable length headers. This either means storing the >bits at the least-significant position or using network byte order. > >If we want to allow storing >1 headers unaligned which I think would be >good then I still think we have to read them using bytewise lookups -- ie >by casting to (char*). That means network byte order or using the low order >bits is equally efficient. I think the plan was to have the macro code conditional on big-little endian. We can deal with doing >1 headers unaligned at some future date if we feel we need it, and the macros will make it transparent. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Plan for compressed varlena headers
Gregory Stark <[EMAIL PROTECTED]> writes: > 1) Replace the VARATT_SIZEP macro with SET_VARLENA_LEN. If we're going to do this then it's time to play the name game; that is, pick names we actually like and that work well together. The original TOAST patch made things a bit messy in this area. We should try to improve the situation now, not make it worse. We have the following macro names that are commonly used for varlena struct access: VARHDRSZ VARATT_SIZEP(x) VARATT_SIZE(x) VARATT_DATA(x) VARSIZE(x) equivalent to VARATT_SIZE VARDATA(x) equivalent to VARATT_DATA My recollection is that VARHDRSZ, VARSIZE, VARDATA were Berkeley-era and the other three were added by the TOAST patch. The lack of consistency is a bit glaring, and having two ways to do the exact same thing doesn't seem like an improvement. A first-cut proposal: VARHDRSZsame as now, ie, size of 4-byte header VARSIZE(x) for *reading* a 4-byte-header length word VARDATA(x) same as now, ie, ptr + 4 bytes SET_VARSIZE(x, len) for *writing* a 4-byte-header length word We have to remove VARATT_SIZEP anyway to catch unmodified code, and I'd propose removing VARATT_SIZE and VARATT_DATA too, as they never caught on. We'll also need names for the macros that can read the length and find the data of a datum in either-1-or-4-byte-header format. These should probably be named as variants of VARSIZE and VARDATA, but I'm not sure what exactly; any thoughts? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Plan for compressed varlena headers
So to implement the agreed upon plan I see the following items. 1) Replace the VARATT_SIZEP macro with SET_VARLENA_LEN. I intend to keep this patch separate as it will bitrot quickly and would be best if it could be applied as soon as possible even before the main patch is committed. I just sent a patch to do this with some notes to pgsql-patches. 2) Replace VARATT* macros to store and retrieve the toast bits in a manner that will work for variable length headers. This either means storing the bits at the least-significant position or using network byte order. If we want to allow storing >1 headers unaligned which I think would be good then I still think we have to read them using bytewise lookups -- ie by casting to (char*). That means network byte order or using the low order bits is equally efficient. 3) Have VARSIZE and VARATT_SIZE recognize short headers and produce accurate values. 4) Change heap_deform*tuple, heap_getattr and any other functions and macros in heapam.c that step through tuples to recognize the new headers. Actually mostly these should just work because att_addlength uses VARSIZE but there may be additional changes. Other places that use att_addlength and need to be checked are heaptuple.c, indextuple.c, arrayfuncs.c, datum.c, varlena.c, and execQual.c, and flatfiles.c. 5) Change pg_detoast_datum to recognize the new header types and decompress them. 5) Change heap_form_tuple to compress headers where possible. 6) Fix the toaster to generate new-style toasted data Did I miss anything? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org