After some significant prior discussion:

Here is what I've found:

Doing the simple thing is extremely wasteful. Let's take TSRANGE, for
instance:
   4 bytes type oid
   1 flag byte
   8 bytes lower bound
   8 bytes upper bound

But when constructing the value itself, it starts off with VARHDRSZ
bytes. It may later be compacted to a short (1 byte) header, but it
starts off as 4. So that means:

  4 bytes VARHDRSZ
  4 bytes type oid
  1 flag byte
  7 pad bytes to get back on a 'd' align boundary
  8 bytes lower bound
  8 bytes upper bound

Total: 32 bytes. When compacted into the tuple, it might be 29. We can't
skip those pad bytes, because we need to honor the subtype's alignment.

If we move the flag byte to the end, the representation works out much
better:

  4 bytes VARHDRSZ
  4 bytes type oid
  8 bytes lower bound
  8 bytes upper bound
  1 flag byte

Total: 25 bytes, turns into about 22 bytes when compacted into the
tuple. It's a little awkward to read that way, but the savings are worth
it. The flag byte is necessary to know whether there are lower and/or
upper bounds, so we need to peek ahead to length - 1, and then continue
scanning forward through the attributes.

So, I'll implement this approach. 22 bytes represents 37.5% overhead
above the good ol' PERIOD data type (a lean 16 bytes), but we can make
up some of that if using unbounded ranges. For instance, a half-open
range like "[5, INF)" would only take 14 bytes.

Regards,
        Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to