On Tue, Oct 19, 2010 at 6:56 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Greg Stark <gsst...@mit.edu> writes: >> The elephant in the room is if the binary encoded form is smaller then >> it occupies less ram and disk bandwidth to copy it around. > > It seems equally likely that a binary-encoded form could be larger > than the text form (that's often true for our other datatypes). > Again, this is an argument that would require experimental evidence > to back it up.
That's exactly what I was thinking when I read Greg's email. I designed something vaguely (very vaguely) like this many years ago and the binary format that I worked so hard to create was enormous compared to the text format, mostly because I had a lot of small integers in the data I was serializing, and as it turns out, representing {0,1,2} in less than 7 bytes is not very easy. It can certainly be done if you set out to optimize for precisely those kinds of cases, but I ended up with something awful like: <4 byte type = list> <4 byte list length = 3> <4 byte type = integer> <4 byte integer = 0> <4 byte type = integer> <4 byte integer = 1> <4 byte type = integer> <4 byte integer = 2> = 32 bytes. Even if you were a little smarter than I was and used 2 byte integers (with some escape hatch allowing larger numbers to be represented) it's still more than twice the size of the text representation. Even if you use 1 byte integers it's still bigger. To get it down to being smaller, you've got to do something like make the high nibble of each byte a type field and the low nibble the first 4 payload bits. You can certainly do all of this but you could also just store it as text and let the TOAST compression algorithm worry about making it smaller. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers